🤖 AI Summary
This work addresses the challenge that existing role-playing language agents struggle to dynamically model character psychological evolution and lack effective evaluation of character arc consistency. To bridge this gap, the authors propose ArcANE, a novel benchmark that establishes the first automated evaluation framework grounded in character psychological trajectories. ArcANE automatically segments narrative phases, designs cross-phase situational probes, and integrates context-conditioned modeling with fine-tuned large language models (ArcANE-8B/32B) to assess character consistency both within and beyond original narrative contexts. Experiments across 80 protagonists from 17 novels demonstrate that context-aware strategies based on character arcs significantly outperform baseline methods, with the largest gains observed in scenarios not covered by the source text; fine-tuned models further amplify this advantage.
📝 Abstract
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.