Network of Theseus (like the ship)

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Deep learning conventionally requires identical network architectures for training and inference, constraining the deployment of efficient or novel architectures. This work challenges that assumption by proposing the Network of Theseus (NoT), the first framework to decouple training and inference architectures. NoT progressively replaces modules of a teacher network with those of a target architecture via representational similarity matching, while enforcing intermediate representation consistency through neural representation alignment. The method enables cross-paradigm architectural transfer—e.g., CNN→MLP or GPT-2→RNN—preserving original performance despite substantial structural divergence. It significantly improves the accuracy–efficiency trade-off, enabling high-performing yet lightweight inference models. By freeing architectural design from training constraints, NoT expands the deployable architecture space and establishes a new paradigm for flexible, efficient model architecture exploration.

Technology Category

Application Category

📝 Abstract

A standard assumption in deep learning is that the inductive bias introduced by a neural network architecture must persist from training through inference. The architecture you train with is the architecture you deploy. This assumption constrains the community from selecting architectures that may have desirable efficiency or design properties due to difficulties with optimization. We challenge this assumption with Network of Theseus (NoT), a method for progressively converting a trained, or even untrained, guide network architecture part-by-part into an entirely different target network architecture while preserving the performance of the guide network. At each stage, components in the guide network architecture are incrementally replaced with target architecture modules and aligned via representational similarity metrics. This procedure largely preserves the functionality of the guide network even under substantial architectural changes-for example, converting a convolutional network into a multilayer perceptron, or GPT-2 into a recurrent neural network. By decoupling optimization from deployment, NoT expands the space of viable inference-time architectures, opening opportunities for better accuracy-efficiency tradeoffs and enabling more directed exploration of the architectural design space.

Problem

Research questions and friction points this paper is trying to address.

Challenges the fixed architecture assumption from training to deployment.

Proposes a method to convert networks while preserving performance.

Expands viable inference architectures for better efficiency tradeoffs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive architecture conversion preserving guide network performance

Incremental replacement using representational similarity metrics for alignment

Decouples optimization from deployment to expand viable inference architectures

🔎 Similar Papers

No similar papers found.