🤖 AI Summary
Large language models (LLMs) suffer from low search efficiency and insufficient path diversity in complex multi-step reasoning tasks. Method: This paper proposes a soft reasoning framework based on first-token embedding optimization. Departing from discrete token sampling, it introduces continuous, controllable perturbations directly in the embedding space and integrates a differentiable verifier-guided Bayesian optimization procedure to achieve balanced exploration and exploitation—yielding a model-agnostic inference paradigm. Crucially, it requires no chain-of-thought prompting or task-specific heuristics and supports plug-and-play integration with arbitrary black-box LLMs. Contribution/Results: Experiments demonstrate substantial improvements in both accuracy and answer coherence across diverse multi-step reasoning benchmarks, while incurring negligible computational overhead. The framework establishes a new direction for efficient, gradient-informed reasoning over frozen LLMs without architectural modification or fine-tuning.
📝 Abstract
Large Language Models (LLMs) struggle with complex reasoning due to limited diversity and inefficient search. We propose Soft Reasoning, an embedding-based search framework that optimises the embedding of the first token to guide generation. It combines (1) embedding perturbation for controlled exploration and (2) Bayesian optimisation to refine embeddings via a verifier-guided objective, balancing exploration and exploitation. This approach improves reasoning accuracy and coherence while avoiding reliance on heuristic search. Experiments demonstrate superior correctness with minimal computation, making it a scalable, model-agnostic solution.