AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large reasoning models (LRMs) suffer from high latency, computational cost, and explanatory redundancy due to excessively long reasoning chains. Method: This paper proposes a reinforcement learning–based adaptive accuracy-length co-optimization framework. It introduces an accuracy-aware lightweight reward function and a dynamically scheduled smooth length penalty, enabling real-time trade-offs between correctness and output length without hard truncation. Crucially, validation-set accuracy serves as a primary reward signal, and length penalties are deferred until accuracy targets are met—guiding the model toward more compact and structurally optimal reasoning paths. Contribution/Results: Experiments on standard and out-of-distribution mathematical reasoning tasks show average reasoning length reductions exceeding 50%, with accuracy maintained or improved. The approach significantly enhances inference efficiency and establishes a novel paradigm for controllable, length-aware reasoning.

Technology Category

Application Category

📝 Abstract
Large reasoning models (LRMs) achieve impressive reasoning capabilities by generating lengthy chain-of-thoughts, but this "overthinking" incurs high latency and cost without commensurate accuracy gains. In this work, we introduce AALC, a lightweight, accuracy-aware length reward integrated into reinforcement learning that dynamically balances correctness and brevity during training. By incorporating validation accuracy into the reward and employing a smooth, dynamically scheduled length penalty, AALC delays length penalty until target performance is met. Through extensive experiments across standard and out-of-distribution math benchmarks, we show that our approach reduces response length by over 50% while maintaining or even improving the original accuracy. Furthermore, qualitative analysis reveals that our method curbs redundant reasoning patterns such as excessive subgoal setting and verification, leading to structurally refined outputs rather than naive truncation. We also identify that efficiency gains are accompanied by reduced interpretability: models trained with AALC omit some narrative framing and explanatory context. These findings highlight the potential of reward-based strategies to guide LRMs toward more efficient, generalizable reasoning paths.
Problem

Research questions and friction points this paper is trying to address.

Balancing reasoning accuracy and response length in large models
Reducing redundant reasoning patterns without accuracy loss
Improving efficiency while maintaining model interpretability challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive accuracy-length control for efficiency
Lightweight accuracy-aware length reward
Dynamically scheduled length penalty