Learning to Explore: An In-Context Learning Approach for Pure Exploration

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the pure-exploration problem—also known as active sequential hypothesis testing—with the goal of efficiently identifying the true hypothesis via adaptive data acquisition. We propose ICPE, a novel framework that introduces in-context learning from large language models into this domain for the first time. ICPE employs a Transformer architecture to model the sequential decision-making process, jointly trains via supervised and reinforcement learning, and incorporates meta-level task embeddings to enable implicit structural transfer across tasks. Crucially, ICPE requires no prior assumptions about hypothesis structure or domain-specific knowledge, yielding strong adaptability and generalization. Evaluated on deterministic, stochastic, and structured benchmarks, ICPE achieves performance comparable to optimal instance-dependent algorithms while significantly improving data efficiency and robustness. These results demonstrate the feasibility of solving classical active inference problems end-to-end using deep learning.

Technology Category

Application Category

📝 Abstract
In this work, we study the active sequential hypothesis testing problem, also known as pure exploration, where the goal is to actively control a data collection process to efficiently identify the correct hypothesis underlying a decision problem. While relevant across multiple domains, devising adaptive exploration strategies remains challenging, particularly due to difficulties in encoding appropriate inductive biases. Existing Reinforcement Learning (RL)-based methods often underperform when relevant information structures are inadequately represented, whereas more complex methods, like Best Arm Identification (BAI) techniques, may be difficult to devise and typically rely on explicit modeling assumptions. To address these limitations, we introduce In-Context Pure Exploration (ICPE), an in-context learning approach that uses Transformers to learn exploration strategies directly from experience. ICPE combines supervised learning and reinforcement learning to identify and exploit latent structure across related tasks, without requiring prior assumptions. Numerical results across diverse synthetic and semi-synthetic benchmarks highlight ICPE's capability to achieve robust performance performance in deterministic, stochastic, and structured settings. These results demonstrate ICPE's ability to match optimal instance-dependent algorithms using only deep learning techniques, making it a practical and general approach to data-efficient exploration.
Problem

Research questions and friction points this paper is trying to address.

Develop adaptive exploration strategies for hypothesis testing
Overcome limitations of RL and BAI methods
Learn exploration strategies from experience without prior assumptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Transformers for in-context learning
Combines supervised and reinforcement learning
Learns exploration strategies from experience