Benchmarking In-context Experiential Learning Through Repeated Product Recommendations

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing agent evaluation frameworks overlook experience-driven adaptive learning and reasoning capabilities in dynamic environments, particularly in multi-turn natural language dialogue for product recommendation. Method: We introduce BELA, the first benchmark for context-aware experiential learning, integrating real-world Amazon product data, structured user profiles, and a large language model–driven user simulator to systematically assess agents’ active exploration, continual learning, and adaptive decision-making. Contribution/Results: Experiments reveal that state-of-the-art large language models fail to improve performance across dialogue turns, exposing fundamental limitations in contextual accumulation, preference evolution modeling, and policy iteration. BELA establishes a reproducible, scalable paradigm for evaluating and advancing long-term agent adaptability in interactive, evolving settings.

Technology Category

Application Category

📝 Abstract

To reliably navigate ever-shifting real-world environments, agents must grapple with incomplete knowledge and adapt their behavior through experience. However, current evaluations largely focus on tasks that leave no ambiguity, and do not measure agents' ability to adaptively learn and reason through the experiences they accrued. We exemplify the need for this in-context experiential learning in a product recommendation context, where agents must navigate shifting customer preferences and product landscapes through natural language dialogue. We curate a benchmark for experiential learning and active exploration (BELA) that combines (1) rich real-world products from Amazon, (2) a diverse collection of user personas to represent heterogeneous yet latent preferences, and (3) a LLM user simulator powered by the persona to create rich interactive trajectories. We observe that current frontier models struggle to meaningfully improve across episodes, underscoring the need for agentic systems with strong in-context learning capabilities.

Problem

Research questions and friction points this paper is trying to address.

Benchmarks experiential learning in ambiguous real-world environments

Evaluates agents' adaptive learning through natural language dialogue

Assesses ability to navigate shifting customer preferences and products

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for experiential learning and active exploration

Real-world products with diverse user personas simulation

LLM user simulator creating interactive dialogue trajectories

🔎 Similar Papers

No similar papers found.

Authors to Follow