Parallel Continuous Chain-of-Thought with Jacobi Iteration

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Chain-of-Thought (CoT) reasoning suffers from low training efficiency due to sequential dependency among implicit reasoning tokens, hindering parallelization. To address this, we propose Parallel Continuous Chain-of-Thought (PCCoT), the first method to incorporate Jacobi iteration into implicit reasoning—enabling multi-round parallel updates of implicit tokens and explicitly breaking sequential dependencies for both training and inference. PCCoT reformulates implicit reasoning as a fixed-point problem solvable in parallel, replacing autoregressive decoding with Jacobi-style iterative refinement. Empirically, it achieves comparable or improved task performance while reducing training and inference time by approximately 50%, and significantly enhancing training stability and robustness. The core contribution lies in establishing a novel paradigm for efficient and stable large-model reasoning: by casting implicit reasoning as a parallelizable fixed-point computation, PCCoT enables scalable, high-throughput, and numerically robust inference without sacrificing accuracy.

Technology Category

Application Category

📝 Abstract

Continuous chain-of-thought has been shown to be effective in saving reasoning tokens for large language models. By reasoning with continuous latent thought tokens, continuous CoT is able to perform implicit reasoning in a compact manner. However, the sequential dependencies between latent thought tokens spoil parallel training, leading to long training time. In this paper, we propose Parallel Continuous Chain-of-Thought (PCCoT), which performs Jacobi iteration on the latent thought tokens, updating them iteratively in parallel instead of sequentially and thus improving both training and inference efficiency of continuous CoT. Experiments demonstrate that by choosing the proper number of iterations, we are able to achieve comparable or even better performance while saving nearly 50% of the training and inference time. Moreover, PCCoT shows better stability and robustness in the training process. Our code is available at https://github.com/whyNLP/PCCoT.

Problem

Research questions and friction points this paper is trying to address.

Improve parallel training in continuous CoT

Reduce training and inference time

Enhance stability and robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel latent thought token updates

Jacobi iteration for efficiency

Improved training and inference speed

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning