DeepKnown-Guard: A Proprietary Model-Based Safety Response Framework for AI Agents
Published 17 Nov 2025 · arXiv · Qi Li
Overview
DeepKnown-Guard introduces a safety framework for Large Language Models (LLMs) to address security issues in AI deployment. It uses a fine-tuned safety classification model and Retrieval-Augmented Generation to enhance risk management and output reliability.
Key Insights
- Risk Recall Rate: Achieves a 99.3% risk recall rate through a four-tier taxonomy for input risk classification.
- Evidence: Experimental results from the paper.
- Verifiable: Yes
- Safety Score: Attains a 100% safety score on proprietary high-risk test sets.
- Evidence: Experimental results from the paper.
- Verifiable: Yes
- Output Reliability: Uses Retrieval-Augmented Generation to ensure responses are grounded in real-time knowledge, preventing information fabrication.
- Evidence: Framework description in the paper.
- Verifiable: Yes
BFSI Relevance
- Why Relevant: Ensures AI systems in BFSI sectors are secure and reliable, crucial for maintaining trust and compliance.
- Primary Sector: Financial Services
- Subsectors: Asset Management, Risk Management
- Actionable Implications:
- Implement AI safety frameworks to enhance security.
- Use fine-tuned models for risk classification and management.
- Ensure AI outputs are traceable and grounded in verified data.
researcher peer-reviewed-paper global