Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Published 12 Nov 2025 · arXiv · Zihao Yi
Overview
The paper examines the limitations of Large Language Models (LLMs) in role-playing villainous characters, highlighting a conflict between safety alignment and creative fidelity. The study introduces the Moral RolePlay benchmark to evaluate LLMs' ability to simulate characters across a moral spectrum.
Key Insights
- LLM Limitations: LLMs show a monotonic decline in role-playing fidelity as character morality decreases, struggling with traits like deceitfulness and manipulation.
- Evidence: The study uses a new dataset with a four-level moral alignment scale.
- Verifiable: Yes, through the Moral RolePlay benchmark.
- Safety vs. Creativity: Highly safety-aligned models perform poorly in villain role-play, substituting nuanced malevolence with superficial aggression.
- Evidence: Systematic evaluation of state-of-the-art LLMs.
- Verifiable: Yes, via the benchmark results.
BFSI Relevance
- Why Relevant: Understanding LLM limitations is crucial for BFSI sectors using AI for creative and simulation tasks, such as risk assessment and scenario planning.
- Primary Sector: Financial Services
- Subsectors: Risk Management, Scenario Planning
- Actionable Implications:
- Re-evaluate AI models used for simulating negative scenarios.
- Develop more nuanced alignment methods for AI in creative tasks.
researcher peer-reviewed-paper other technology-and-data global