Benchmarking In-context Experiential Learning Through Repeated Product Recommendations

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing agent evaluation frameworks overlook experience-driven adaptive learning and reasoning capabilities in dynamic environments, particularly in multi-turn natural language dialogue for product recommendation. Method: We introduce BELA, the first benchmark for context-aware experiential learning, integrating real-world Amazon product data, structured user profiles, and a large language model–driven user simulator to systematically assess agents’ active exploration, continual learning, and adaptive decision-making. Contribution/Results: Experiments reveal that state-of-the-art large language models fail to improve performance across dialogue turns, exposing fundamental limitations in contextual accumulation, preference evolution modeling, and policy iteration. BELA establishes a reproducible, scalable paradigm for evaluating and advancing long-term agent adaptability in interactive, evolving settings.

Technology Category

Application Category

📝 Abstract
To reliably navigate ever-shifting real-world environments, agents must grapple with incomplete knowledge and adapt their behavior through experience. However, current evaluations largely focus on tasks that leave no ambiguity, and do not measure agents' ability to adaptively learn and reason through the experiences they accrued. We exemplify the need for this in-context experiential learning in a product recommendation context, where agents must navigate shifting customer preferences and product landscapes through natural language dialogue. We curate a benchmark for experiential learning and active exploration (BELA) that combines (1) rich real-world products from Amazon, (2) a diverse collection of user personas to represent heterogeneous yet latent preferences, and (3) a LLM user simulator powered by the persona to create rich interactive trajectories. We observe that current frontier models struggle to meaningfully improve across episodes, underscoring the need for agentic systems with strong in-context learning capabilities.
Problem

Research questions and friction points this paper is trying to address.

Benchmarks experiential learning in ambiguous real-world environments
Evaluates agents' adaptive learning through natural language dialogue
Assesses ability to navigate shifting customer preferences and products
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for experiential learning and active exploration
Real-world products with diverse user personas simulation
LLM user simulator creating interactive dialogue trajectories
🔎 Similar Papers
No similar papers found.
G
Gilbert Yang
Decision, Risk, and Operations Division, Columbia Business School
Y
Yaqin Chen
School of Mathematics (Zhuhai), Sun Yat-sen University
Thomson Yen
Thomson Yen
Columbia University
Machine LearningQuantum Computing
Hongseok Namkoong
Hongseok Namkoong
Columbia University
AISequential Decision-making