Latent Reasoning Guidance for Parallel Code Translation

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing approaches to parallel code translation typically defer verification and repair until after full generation, resulting in high computational overhead and delayed feedback. This work proposes a test-time latent-variable guidance mechanism that leverages a lightweight process reward model (PRM) to score continuous latent prefixes prior to decoding by the main model, enabling early intervention without retraining. By selecting the optimal path from multiple hidden-state trajectories, the method introduces— for the first time—latent-variable-based process rewards into code generation, significantly improving executability verification success rates. Evaluated on the 76-task ParaTrans benchmark, the approach raises verification success from 32.89% to 42.1%, outperforming both fine-tuned and original baselines while maintaining stable performance across three rounds of repair cycles.

📝 Abstract

Tackling complex coding tasks often requires autonomous agents and iterative repair pipelines. These increasingly rely on large amounts of test-time computation, often spending many decoding and repair steps before discovering whether a program compiles, runs, or validates. Executable parallel-code translation is an effective setting for earlier guidance because success is behavioral rather than textual. However, most guidance methods act only after complete programs or textual traces are decoded. This motivates the question: can latent reasoning provide an earlier intervention point, before the model commits to code? We study a test-time latent guidance method for this setting that trains a smaller Process Reward Model (PRM) over continuous latent prefixes and uses it to select among alternate hidden-state trajectories before final code decoding, separately from but compatible with post-decoding optimization. On a 76-task ParaTrans benchmark evaluation, latent PRM guidance improves mean validation rate from 32.89% with unguided latent reasoning to 42.1%, outperforming fine-tuned and vanilla baselines in the same setting. These gains persist under the same three-iteration repair loop. These results provide bounded evidence that useful alternative latent continuations exist and that PRM-scored latent branch selection can improve executable outcomes in this setting without retraining the main generative model.

Problem

Research questions and friction points this paper is trying to address.

parallel code translation

latent reasoning

test-time guidance

executable program generation

early intervention

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent reasoning

Process Reward Model

parallel code translation