🤖 AI Summary
This work investigates how the number of in-context examples affects prediction uncertainty and answer credibility in large language models (LLMs). Method: We propose the first framework to systematically quantify and decompose uncertainty—particularly epistemic uncertainty—within in-context learning (ICL), integrating multi-shot ICL experiments with layer-wise confidence trajectory tracking. Contribution/Results: Increasing the number of examples systematically reduces total uncertainty and improves both task performance and response reliability across simple and complex tasks; however, in complex tasks, the benefits of additional examples only manifest after overcoming noise introduced by long input sequences. Furthermore, we uncover a hierarchical convergence pattern: internal model confidence progressively stabilizes across transformer layers, and task-specific knowledge injection mitigates epistemic uncertainty through targeted representation refinement. These findings provide mechanistic insights into uncertainty dynamics in ICL and inform principled example selection strategies.
📝 Abstract
Recent advances in handling long sequences have facilitated the exploration of long-context in-context learning (ICL). While much of the existing research emphasizes performance improvements driven by additional in-context examples, the influence on the trustworthiness of generated responses remains underexplored. This paper addresses this gap by investigating how increased examples influence predictive uncertainty, an essential aspect in trustworthiness. We begin by systematically quantifying the uncertainty of ICL with varying shot counts, analyzing the impact of example quantity. Through uncertainty decomposition, we introduce a novel perspective on performance enhancement, with a focus on epistemic uncertainty (EU). Our results reveal that additional examples reduce total uncertainty in both simple and complex tasks by injecting task-specific knowledge, thereby diminishing EU and enhancing performance. For complex tasks, these advantages emerge only after addressing the increased noise and uncertainty associated with longer inputs. Finally, we explore the evolution of internal confidence across layers, unveiling the mechanisms driving the reduction in uncertainty.