Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address neural incompatibility—arising from architectural and parametric disparities—in cross-scale knowledge transfer among large language models (LLMs), this paper proposes Parameterized Knowledge Transfer (PKT), a latent semantic alignment–based approach. Unlike conventional methods that directly reuse layer-wise parameters, PKT employs intermediate-layer activation representations as knowledge carriers and introduces a learnable semantic alignment module to establish consistent mappings between the latent spaces of models of differing scales, achieving behavior-level alignment without parameter sharing. This work is the first to identify latent semantic alignment as the foundational mechanism for cross-scale knowledge transfer, effectively circumventing the neural incompatibility bottleneck. PKT demonstrates significant improvements over state-of-the-art methods across four benchmark tasks, empirically validating its efficacy and revealing the critical role of semantic alignment in determining transfer performance.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) encode vast amounts of knowledge in their massive parameters, which is accessible to locate, trace, and analyze. Despite advances in neural interpretability, it is still not clear how to transfer knowledge in a fine-grained manner, namely parametric knowledge transfer (PKT). A key problem is enabling effective and efficient knowledge transfer across LLMs of different scales, which is essential for achieving greater flexibility and broader applicability in transferring knowledge between LLMs. Due to neural incompatibility, referring to the architectural and parametric differences between LLMs of varying scales, existing methods that directly reuse layer parameters are severely limited. In this paper, we identify the semantic alignment in latent space as the fundamental prerequisite for LLM cross-scale knowledge transfer. Instead of directly using the layer parameters, our approach takes activations as the medium of layer-wise knowledge transfer. Leveraging the semantics in latent space, our approach is simple and outperforms prior work, better aligning model behaviors across varying scales. Evaluations on four benchmarks demonstrate the efficacy of our method. Further analysis reveals the key factors easing cross-scale knowledge transfer and provides insights into the nature of latent semantic alignment.
Problem

Research questions and friction points this paper is trying to address.

Enabling fine-grained knowledge transfer across LLMs
Overcoming neural incompatibility in cross-scale model transfer
Using latent semantic alignment to improve transfer efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses latent semantic alignment for cross-scale transfer
Employs activations as medium for layer-wise knowledge
Aligns model behaviors across varying scales effectively
🔎 Similar Papers
No similar papers found.