🤖 AI Summary
To address the high computational overhead of many-shot in-context learning (ICL) caused by long input sequences, this paper proposes Memo-ICL: a method that compresses large demonstration sets into compact, fixed-length textual summaries via LLM-driven knowledge distillation, serving as lightweight, static context at inference time. Unlike retrieval-augmented ICL, Memo-ICL eliminates real-time retrieval latency and uncertainty by generating retrieval-free, precomputed semantic memoranda. Evaluated on complex reasoning tasks, Memo-ICL matches or surpasses standard many-shot ICL in accuracy while reducing input token count by 50–80%; its performance is comparable to state-of-the-art retrieval-based approaches. The core contribution lies in explicitly distilling ICL knowledge into reusable, retrieval-free semantic memoranda—achieving a favorable trade-off between inference efficiency and generalization capability.
📝 Abstract
Recent advances in large language models (LLMs) enable effective in-context learning (ICL) with many-shot examples, but at the cost of high computational demand due to longer input tokens. To address this, we propose cheat-sheet ICL, which distills the information from many-shot ICL into a concise textual summary (cheat sheet) used as the context at inference time. Experiments on challenging reasoning tasks show that cheat-sheet ICL achieves comparable or better performance than many-shot ICL with far fewer tokens, and matches retrieval-based ICL without requiring test-time retrieval. These findings demonstrate that cheat-sheet ICL is a practical alternative for leveraging LLMs in downstream tasks.