BFSI insights

Reasoning Up the Instruction Ladder for Controllable Language Models

Published 12 Nov 2025 · arXiv · Zishuo Zheng
arXiv preview

Overview

The paper discusses the development of VerIH, a dataset aimed at training large language models (LLMs) to prioritize instructions effectively. This is crucial for ensuring that LLMs can handle conflicting directives from different sources, thereby improving their reliability and controllability.

Key Insights

  • Instruction Hierarchy: The study reframes instruction hierarchy resolution as a reasoning task, where models must evaluate the relationship between user prompts and system instructions.
  • VerIH Dataset: VerIH is a dataset comprising tasks with verifiable answers, designed to train models in instruction prioritization.
  • Reinforcement Learning: Lightweight reinforcement learning with VerIH transfers general reasoning capabilities to models, enhancing instruction following and hierarchy benchmarks.
  • Robustness: The trained models show improved robustness against jailbreak and prompt injection attacks by resolving conflicts between adversarial inputs and predefined policies.

BFSI Relevance

  • Why Relevant: Reliable and controllable LLMs are essential for BFSI sectors that use AI for decision-making, ensuring compliance and security.
  • Primary Sector: Financial Services
  • Subsectors: Asset Management, Risk Management
  • Actionable Implications:
    • Implement LLMs trained with VerIH to enhance decision-making reliability.
    • Use these models to improve security measures against adversarial attacks.
researcher peer-reviewed-paper global