BFSI insights

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Published 12 Nov 2025 · arXiv · Zihao Yi
arXiv preview

Overview

The paper examines the limitations of Large Language Models (LLMs) in role-playing villainous characters, highlighting a conflict between safety alignment and creative fidelity. The study introduces the Moral RolePlay benchmark to evaluate LLMs' ability to simulate characters across a moral spectrum.

Key Insights

  • LLM Limitations: LLMs show a monotonic decline in role-playing fidelity as character morality decreases, struggling with traits like deceitfulness and manipulation.
    • Evidence: The study uses a new dataset with a four-level moral alignment scale.
    • Verifiable: Yes, through the Moral RolePlay benchmark.
  • Safety vs. Creativity: Highly safety-aligned models perform poorly in villain role-play, substituting nuanced malevolence with superficial aggression.
    • Evidence: Systematic evaluation of state-of-the-art LLMs.
    • Verifiable: Yes, via the benchmark results.

BFSI Relevance

  • Why Relevant: Understanding LLM limitations is crucial for BFSI sectors using AI for creative and simulation tasks, such as risk assessment and scenario planning.
  • Primary Sector: Financial Services
  • Subsectors: Risk Management, Scenario Planning
  • Actionable Implications:
    • Re-evaluate AI models used for simulating negative scenarios.
    • Develop more nuanced alignment methods for AI in creative tasks.
researcher peer-reviewed-paper other technology-and-data global