🤖 AI Summary
To address neural incompatibility—arising from architectural and parametric disparities—in cross-scale knowledge transfer among large language models (LLMs), this paper proposes Parameterized Knowledge Transfer (PKT), a latent semantic alignment–based approach. Unlike conventional methods that directly reuse layer-wise parameters, PKT employs intermediate-layer activation representations as knowledge carriers and introduces a learnable semantic alignment module to establish consistent mappings between the latent spaces of models of differing scales, achieving behavior-level alignment without parameter sharing. This work is the first to identify latent semantic alignment as the foundational mechanism for cross-scale knowledge transfer, effectively circumventing the neural incompatibility bottleneck. PKT demonstrates significant improvements over state-of-the-art methods across four benchmark tasks, empirically validating its efficacy and revealing the critical role of semantic alignment in determining transfer performance.
📝 Abstract
Large Language Models (LLMs) encode vast amounts of knowledge in their massive parameters, which is accessible to locate, trace, and analyze. Despite advances in neural interpretability, it is still not clear how to transfer knowledge in a fine-grained manner, namely parametric knowledge transfer (PKT). A key problem is enabling effective and efficient knowledge transfer across LLMs of different scales, which is essential for achieving greater flexibility and broader applicability in transferring knowledge between LLMs. Due to neural incompatibility, referring to the architectural and parametric differences between LLMs of varying scales, existing methods that directly reuse layer parameters are severely limited. In this paper, we identify the semantic alignment in latent space as the fundamental prerequisite for LLM cross-scale knowledge transfer. Instead of directly using the layer parameters, our approach takes activations as the medium of layer-wise knowledge transfer. Leveraging the semantics in latent space, our approach is simple and outperforms prior work, better aligning model behaviors across varying scales. Evaluations on four benchmarks demonstrate the efficacy of our method. Further analysis reveals the key factors easing cross-scale knowledge transfer and provides insights into the nature of latent semantic alignment.