🤖 AI Summary
Existing ensemble methods for large language models struggle to balance adaptability and stability, often hindered by premature routing, brittle heuristic rules, or interference from parameter merging. This work proposes Dynamic Logit-Level Gating (DLLG), a novel framework that, for the first time, leverages only trajectory-level correctness signals—without requiring token-level labels or retraining of expert models—to dynamically predict token-level fusion weights via a lightweight gating network. This enables flexible yet stable logit-level integration. Experimental results demonstrate that DLLG consistently outperforms strong existing baselines across diverse reasoning and code generation benchmarks, with consistent gains observed across varying model scales.
📝 Abstract
Leveraging multiple specialized LLMs can combine complementary strengths, but existing approaches trade adaptability for stability: routing commits prematurely, heuristic ensembling depends on fragile proxies, and parameter merging introduces interference. We propose DLLG (Dynamic Logit-Level Gating), a dynamic logit-level ensembling framework that learns token-level expert fusion from sparse response-level supervision. A lightweight gating module predicts step-wise fusion weights, linking trajectory-level correctness to generation without token-level labels or expert retraining. Across diverse reasoning and code benchmarks, DLLG consistently outperforms strong routing, heuristic ensembling, and parameter-merging baselines across model scales, highlighting learned logit-level fusion as a robust and scalable paradigm for integrating specialized experts.