Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the impact of error-correction examples on in-context learning (ICL) performance in large language models, proposing “Corrective In-Context Learning” (CICL). Methodologically, it systematically interleaves varying proportions of erroneous predictions and their corresponding corrections within standard ICL prompts and conducts controlled experiments across multiple text classification benchmarks. The key contribution is the first systematic empirical finding that error-correction pairs significantly impair task understanding: CICL consistently underperforms standard ICL, with performance monotonically degrading as the correction ratio increases. Moreover, example difficulty proves ineffective as a filtering criterion. These results challenge the intuitive assumption that self-correction inherently benefits ICL, revealing for the first time the detrimental effects of self-correction mechanisms in ICL settings. The study thus provides critical cautionary insights and foundational theoretical implications for ICL prompt engineering and design.

Technology Category

Application Category

📝 Abstract
In-context learning (ICL) has transformed the use of large language models (LLMs) for NLP tasks, enabling few-shot learning by conditioning on labeled examples without finetuning. Despite its effectiveness, ICL is prone to errors, especially for challenging examples. With the goal of improving the performance of ICL, we propose corrective in-context learning (CICL), an approach that incorporates a model's incorrect predictions alongside ground truth corrections into the prompt, aiming to enhance classification accuracy through self-correction. However, contrary to our hypothesis, extensive experiments on text classification tasks demonstrate that CICL consistently underperforms standard ICL, with performance degrading as the proportion of corrections in the prompt increases. Our findings indicate that CICL introduces confusion by disrupting the model's task understanding, rather than refining its predictions. Additionally, we observe that presenting harder examples in standard ICL does not improve performance, suggesting that example difficulty alone may not be a reliable criterion for effective selection. By presenting these negative results, we provide important insights into the limitations of self-corrective mechanisms in LLMs and offer directions for future research.
Problem

Research questions and friction points this paper is trying to address.

Evaluates self-correction in large language models for NLP tasks.
Proposes corrective in-context learning to improve classification accuracy.
Identifies limitations of self-correction mechanisms in language models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates incorrect predictions and corrections
Enhances classification via self-correction
Evaluates performance degradation with corrections
🔎 Similar Papers
No similar papers found.