Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the challenge that large language models, when deployed, often conceal their internal reasoning trajectories and output only final answers, thereby precluding access to effective supervision signals for interpretability or distillation. To overcome this limitation, the authors propose Reasoning Exposure Prompting (REP), a method that leverages a shadow model to generate reasoning examples formatted in a code-like structure and uses in-context learning to prompt the target model to reveal its internal reasoning process. The study demonstrates for the first time that high-quality reasoning traces can be effectively recovered through lightweight prompting strategies, even when the API interface deliberately obscures such details. Experimental results across multiple models and reasoning benchmarks show that REP significantly enhances the similarity between exposed and true internal reasoning trajectories while preserving supervision signals suitable for knowledge distillation, thereby breaking through the interpretability barrier imposed by prevailing privacy-preserving deployment assumptions.

📝 Abstract

Reasoning traces have become a valuable form of learning signals for improving and transferring the capabilities of large language models. In particular, detailed traces can help distill reasoning behavior from stronger teacher models into weaker student models. The value of capability transfer has motivated many deployed systems with reasoning models to hide raw internal traces and expose at most summaries and answers to users. As a result, we ask whether such interface-level trace hiding prevents users from obtaining useful reasoning supervision through prompting. We study this question with Reasoning Exposure Prompting (REP), a lightweight in-context elicitation method that uses shadow-model-generated demonstrations wrapped in auxiliary code-like formats to raise user-visible reasoning traces from a victim model. Across the common reasoning dataset, different victim models, and different student model distillation, REP substantially increases similarity between exposed and REP-conditioned internal traces while preserving useful reasoning signals.

Problem

Research questions and friction points this paper is trying to address.

reasoning traces

large language models

trace hiding

prompting

reasoning supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reasoning Exposure Prompting

reasoning traces

large language models