Seed-Induced Uniqueness in Transformer Models: Subspace Alignment Governs Subliminal Transfer

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a “seed-induced uniqueness” phenomenon in Transformers: under identical random seeds, teacher models implicitly encode student-decodable hidden features without compromising primary task performance; conversely, differing seeds severely impair such implicit knowledge transfer. We find this stems from alignment of discriminative feature subspaces—not global representation similarity—prompting our subspace-level Centered Kernel Alignment (CKA) diagnostic method. Using synthetic corpora, residual probing, adversarial inversion, and projection regularization, we quantitatively detect and control inter-model information leakage. Experiments show same-seed students exhibit significantly higher leakage (τ ≈ 0.24) than cross-seed counterparts (τ ≈ 0.12–0.13), despite near-perfect global CKA (>0.9). Our proposed safety mechanisms suppress leakage with zero primary-task accuracy degradation. The core contribution is the first attribution of implicit knowledge transfer to subspace alignment, establishing an interpretable, intervenable framework for diagnosis and mitigation.

Technology Category

Application Category

📝 Abstract
We analyze subliminal transfer in Transformer models, where a teacher embeds hidden traits that can be linearly decoded by a student without degrading main-task performance. Prior work often attributes transferability to global representational similarity, typically quantified with Centered Kernel Alignment (CKA). Using synthetic corpora with disentangled public and private labels, we distill students under matched and independent random initializations. We find that transfer strength hinges on alignment within a trait-discriminative subspace: same-seed students inherit this alignment and show higher leakage {τapprox} 0.24, whereas different-seed students--despite global CKA > 0.9--exhibit substantially reduced excess accuracy {τapprox} 0.12 - 0.13. We formalize this with subspace-level CKA diagnostic and residualized probes, showing that leakage tracks alignment within the trait-discriminative subspace rather than global representational similarity. Security controls (projection penalty, adversarial reversal, right-for-the-wrong-reasons regularization) reduce leakage in same-base models without impairing public-task fidelity. These results establish seed-induced uniqueness as a resilience property and argue for subspace-aware diagnostics for secure multi-model deployments.
Problem

Research questions and friction points this paper is trying to address.

Analyzing subliminal knowledge transfer between teacher and student Transformers
Investigating how random seeds affect trait leakage in aligned subspaces
Developing subspace-aware diagnostics for secure multi-model deployments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Subspace alignment enables subliminal transfer in Transformers
Seed-induced uniqueness governs trait leakage resilience
Subspace-level diagnostics replace global similarity measures
🔎 Similar Papers
No similar papers found.
A
Ayşe Selin Okatan
Dept. of Electrical Engineering and Computer Science, Embry-Riddle Aeronautical University, Daytona Beach, FL, USA
M
Mustafa İlhan Akbaş
Dept. of Electrical Engineering and Computer Science, Embry-Riddle Aeronautical University, Daytona Beach, FL, USA
Laxima Niure Kandel
Laxima Niure Kandel
Embry-Riddle Aeronautical University (Current); Stevens Institute of Technology (Former)
Hardware FingerprintingAnomaly DetectionGPS SpoofingWiFi Localizations
B
Berker Peköz
Dept. of Electrical Engineering and Computer Science, Embry-Riddle Aeronautical University, Daytona Beach, FL, USA