HERO'S JOURNEY: Testing Complex Rule Induction with Text Games

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

176K/year
🤖 AI Summary
This work addresses the challenge of inferring latent rules from demonstrations in goal-oriented multi-step tasks by introducing HERO’S JOURNEY, the first benchmark specifically designed for rule induction and execution. The benchmark encompasses two categories of inductive tasks—attribute-based and procedural—each featuring four structured rule forms, and supports controllable lexical grounding and identifiability settings. Leveraging a text-based game framework, structured rule generation, and an induction-oriented evaluation protocol, the study reveals that large language models exhibit limited yet usable performance on attribute rule induction but struggle significantly with procedural rule induction and multi-step execution. Execution capability emerges as the primary bottleneck, surface-level semantics exert minimal influence, and existing prompting strategies prove effective only for attribute-based tasks.
📝 Abstract
We introduce HERO'S JOURNEY, a benchmark for rule induction in goal-directed episodic tasks, where agents must infer hidden rules from demonstrations and act on them through multi-step execution. HERO'S JOURNEY covers eight tasks across attribute and procedural induction families, each with four structural rule forms, controllable lexical grounding, and identifiability conditions. Evaluating state-of-the-art LLMs, we find that models show evidence of rule induction, but the ability is limited and uneven across tasks. Meanwhile, process execution adds an execution bottleneck for models, whereas surface semantics has minimal effect. Induction-specific steering methods improve performance on attribute tasks but show no reliable gains on procedural tasks, suggesting the gap in procedural induction remains an open challenge.
Problem

Research questions and friction points this paper is trying to address.

rule induction
text games
procedural induction
attribute induction
goal-directed tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

rule induction
text games
procedural reasoning
attribute induction
execution bottleneck
🔎 Similar Papers
No similar papers found.