Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This study investigates whether structural differences observed in circuit discovery reflect genuine variations in computational mechanisms or merely arise as “phantom specificity” due to shifts in input statistics. Focusing on a fixed literal sequence copying task, the authors extract circuits from multiple Pythia models by systematically varying input token frequencies and analyze them using causal interventions, edge-level evaluations, and cross-condition transfer tests. They find that 75 structurally distinct circuits all implement the same function, and that shared core components alone recover over 99% of performance, revealing a many-to-one mapping from structure to function. These results demonstrate that structural similarity alone is insufficient to establish mechanistic uniqueness; accurate identification of computational mechanisms requires integrating edge-level evaluation with cross-condition transfer testing.

📝 Abstract

Circuit discovery methods identify subgraphs that explain specific model behaviors, and structural differences between discovered circuits are commonly interpreted as evidence of distinct mechanisms. We test this assumption by varying input statistics while holding the task fixed, and show that the resulting structural differences exhibit apparent specialization but do not correspond to functional differences, a pattern we term phantom specialization. Using Literal Sequence Copying across four token-frequency bands plus a control condition in five Pythia models (70M-1.4B), we extract 75 circuits and find that structurally distinct circuits implement the same computation: band-specific edges transfer broadly across bands, a core shared across most bands recovers at least 99% of circuit performance, and causal interchange interventions confirm that internal representations are interchangeable across frequency bands. Repeated extractions within the same frequency band further suggest that discovery algorithms sample from an equivalence class of valid subgraphs rather than recovering a unique mechanism. Standard evaluation practice obscures this pattern: source-level evaluation inflates apparent faithfulness, while edge-level evaluation reveals the many-to-one mapping from structure to function. Our results show that structural differences between circuits are not sufficient evidence for distinct mechanisms, and that exposing this requires edge-level evaluation and cross-condition transfer tests.

Problem

Research questions and friction points this paper is trying to address.

circuit discovery

phantom specialization

mechanism interpretation

structural variation

functional equivalence

Innovation

Methods, ideas, or system contributions that make the work stand out.

circuit discovery

phantom specialization

edge-level evaluation