LLM Output Drift: Cross-Provider Validation & Mitigation for Financial Workflows

Key Points

Smaller LLMs (Granite-3-8B, Qwen2.5-7B) achieve 100% output consistency at T=0.0 temperature
Larger models (GPT-OSS-120B) exhibit significant output drift undermining auditability
Study covers 5 model architectures (7B-120B parameters) on regulated financial tasks

Implications

Output drift in large models poses compliance risks for reconciliations, regulatory reporting, and client communications requiring consistent results.

Action Required

Financial institutions should prioritise smaller, more consistent models for regulated workflows requiring audit trails.