Improving Interactive In-Context Learning from Natural Language Feedback

📅 2026-02-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge that large language models struggle to dynamically refine their reasoning using natural language feedback. The authors propose modeling interactive in-context learning as a trainable skill, transforming single-turn verifiable tasks into multi-turn pedagogical dialogues through an information-asymmetry mechanism. Crucially, they formalize feedback learning as a distinct training objective—a first systematic effort toward this end—enabling models to internalize human feedback and perform teacher-free self-correction. The approach demonstrates strong cross-domain generalization across diverse tasks, including mathematical reasoning, programming, puzzle solving, and maze navigation. Experimental results show that small-scale models trained with this method achieve multi-turn interactive performance comparable to baseline models an order of magnitude larger, while exhibiting significantly enhanced contextual plasticity.

Technology Category

Application Category

📝 Abstract

Adapting one's thought process based on corrective feedback is an essential ability in human learning, particularly in collaborative settings. In contrast, the current large language model training paradigm relies heavily on modeling vast, static corpora. While effective for knowledge acquisition, it overlooks the interactive feedback loops essential for models to adapt dynamically to their context. In this work, we propose a framework that treats this interactive in-context learning ability not as an emergent property, but as a distinct, trainable skill. We introduce a scalable method that transforms single-turn verifiable tasks into multi-turn didactic interactions driven by information asymmetry. We first show that current flagship models struggle to integrate corrective feedback on hard reasoning tasks. We then demonstrate that models trained with our approach dramatically improve the ability to interactively learn from language feedback. More specifically, the multi-turn performance of a smaller model nearly reaches that of a model an order of magnitude larger. We also observe robust out-of-distribution generalization: interactive training on math problems transfers to diverse domains like coding, puzzles and maze navigation. Our qualitative analysis suggests that this improvement is due to an enhanced in-context plasticity. Finally, we show that this paradigm offers a unified path to self-improvement. By training the model to predict the teacher's critiques, effectively modeling the feedback environment, we convert this external signal into an internal capability, allowing the model to self-correct even without a teacher.

Problem

Research questions and friction points this paper is trying to address.

interactive learning

natural language feedback

in-context learning

language models

feedback integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

interactive in-context learning

natural language feedback

information asymmetry

in-context plasticity

self-improvement

🔎 Similar Papers

No similar papers found.

Authors to Follow