🤖 AI Summary
This work addresses the critical problem that demonstration selection and ordering in in-context learning (ICL) substantially affect large language model (LLM) performance. We propose DemoShapley and Beta-DemoShapley—novel methods that pioneer the integration of Shapley value theory with the Beta distribution to construct a context-sensitive, cardinality-flexible framework for evaluating demonstration utility. Our approach dynamically quantifies each demonstration’s marginal contribution across varying contexts while respecting prompt-length constraints. The resulting method significantly improves ICL accuracy, enhances out-of-distribution (OOD) generalization, exhibits robustness to noisy demonstrations, and promotes fairness in model predictions. Extensive experiments across multiple LLMs and benchmark datasets—including SuperGLUE, BIG-Bench, and OOD variants—demonstrate consistent gains and broad applicability, validating both effectiveness and generalizability.
📝 Abstract
Large language models (LLMs) using in-context learning (ICL) excel in many tasks without task-specific fine-tuning. However, demonstration selection and ordering greatly impact ICL effectiveness. To address this, we propose DemoShapley and Beta-DemoShapley, inspired by Data Shapley and Beta Shapley, to assess the influence of individual demonstrations. DemoShapley captures how each example influences performance in different contexts, unlike other influence-based methods that rely on a fixed number of demonstrations. Beta-DemoShapley further enhances this framework by incorporating the Beta distribution, allowing users to assign higher weights to smaller cardinalities, which aligns with ICL's prompt length and computational constraints. Our findings show that the proposed algorithms improve model performance by selecting quality demonstrations, and enhancing generalization to out-of-distribution tasks. It also identifies noise-compromised data and promotes fairness in LLMs, protecting model performance and ensuring robustness across various scenarios.