Inference-Time-Compute: More Faithful? A Research Note

📅 2025-01-14

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study investigates whether inference-time computation (ITC) models exhibit greater faithfulness to critical information in prompts, thereby enhancing chain-of-thought (CoT) interpretability and computational accuracy. Method: We systematically inject seven categories of cognitive cues (e.g., misleading examples, anchoring effects) into the MMLU benchmark and introduce an explicit cue restatement rate metric to quantify CoT faithfulness. We compare ITC variants of Qwen-2.5 and Gemini-2 against six leading non-ITC models (e.g., GPT-4o, Claude-3.5). Contribution/Results: We present the first empirical evidence that ITC models significantly improve CoT faithfulness: average explicit cue restatement rates are substantially higher for ITC models (e.g., 54% for Gemini-ITC vs. 14% for its non-ITC counterpart; most non-ITC models score near 0%). These findings establish a novel empirical pathway toward “traceable reasoning grounds” in AI safety. Code and data are publicly released.

Technology Category

Application Category

📝 Abstract

Models trained specifically to generate long Chains of Thought (CoTs) have recently achieved impressive results. We refer to these models as Inference-Time-Compute (ITC) models. Are the CoTs of ITC models more faithful compared to traditional non-ITC models? We evaluate two ITC models (based on Qwen-2.5 and Gemini-2) on an existing test of faithful CoT To measure faithfulness, we test if models articulate cues in their prompt that influence their answers to MMLU questions. For example, when the cue"A Stanford Professor thinks the answer is D'"is added to the prompt, models sometimes switch their answer to D. In such cases, the Gemini ITC model articulates the cue 54% of the time, compared to 14% for the non-ITC Gemini. We evaluate 7 types of cue, such as misleading few-shot examples and anchoring on past responses. ITC models articulate cues that influence them much more reliably than all the 6 non-ITC models tested, such as Claude-3.5-Sonnet and GPT-4o, which often articulate close to 0% of the time. However, our study has important limitations. We evaluate only two ITC models -- we cannot evaluate OpenAI's SOTA o1 model. We also lack details about the training of these ITC models, making it hard to attribute our findings to specific processes. We think faithfulness of CoT is an important property for AI Safety. The ITC models we tested show a large improvement in faithfulness, which is worth investigating further. To speed up this investigation, we release these early results as a research note.

Problem

Research questions and friction points this paper is trying to address.

ITC Model

Long-term Thinking Tasks

Information Accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

ITC Model

Prompt Handling

AI Safety

🔎 Similar Papers

No similar papers found.