TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework
Published 7 Nov 2025 · arXiv · Chao Zhang
Overview
TeaRAG is a framework designed to improve the efficiency of Retrieval-Augmented Generation (RAG) systems by reducing token usage during retrieval and reasoning processes. This is achieved through semantic retrieval compression and optimized reasoning steps.
Key Insights
- Token Efficiency: TeaRAG reduces token output by 61% on Llama3-8B-Instruct and 59% on Qwen2.5-14B-Instruct datasets.
- Performance Improvement: The framework improves the average Exact Match by 4% and 2% on the respective datasets.
- Methodology: Utilizes chunk-based semantic retrieval with graph retrieval and Personalized PageRank to compress retrieval content. Iterative Process-aware Direct Preference Optimization (IP-DPO) is used to streamline reasoning steps.
BFSI Relevance
- Why Relevant: Efficient data retrieval and processing are critical in BFSI sectors for decision-making and customer service.
- Primary Sector: Financial Services
- Subsectors: Asset Management, Claims Processing
- Actionable Implications: BFSI professionals should explore integrating token-efficient frameworks like TeaRAG to enhance data processing capabilities while managing computational costs.
professional peer-reviewed-paper cross-bfsi technology-and-data global