Learning to Reason: Temporal Saliency Distillation for Interpretable Knowledge Transfer

📅 2026-01-07

🏛️ European Conference on Artificial Intelligence

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses a critical limitation in existing time series knowledge distillation methods, which solely align student outputs with teacher predictions while neglecting the teacher’s internal reasoning process—leading to distributional shifts and poor interpretability in student models. To overcome this, we propose Temporal Saliency Distillation, a novel approach that extracts, for the first time, the contribution of each time step to the final prediction (i.e., temporal saliency) directly from the teacher’s logits and uses it to guide the student in emulating the same temporal reasoning logic. Our method requires no architectural modifications or additional parameters, thereby transcending the conventional paradigm of merely matching logits or intermediate features. Experiments demonstrate that the proposed approach not only enhances student model performance but also significantly improves alignment with the teacher’s predictive distribution, while outperforming existing baselines in both interpretability and safety.

Technology Category

Application Category

📝 Abstract

Knowledge distillation has proven effective for model compression by transferring knowledge from a larger network called the teacher to a smaller network called the student. Current knowledge distillation in time series is predominantly based on logit and feature aligning techniques originally developed for computer vision tasks. These methods do not explicitly account for temporal data and fall short in two key aspects. First, the mechanisms by which the transferred knowledge helps the student model learning process remain unclear due to uninterpretability of logits and features. Second, these methods transfer only limited knowledge, primarily replicating the teacher predictive accuracy. As a result, student models often produce predictive distributions that differ significantly from those of their teachers, hindering their safe substitution for teacher models. In this work, we propose transferring interpretable knowledge by extending conventional logit transfer to convey not just the right prediction but also the right reasoning of the teacher. Specifically, we induce other useful knowledge from the teacher logits termed temporal saliency which captures the importance of each input timestep to the teacher prediction. By training the student with Temporal Saliency Distillation we encourage it to make predictions based on the same input features as the teacher. Temporal Saliency Distillation requires no additional parameters or architecture specific assumptions. We demonstrate that Temporal Saliency Distillation effectively improves the performance of baseline methods while also achieving desirable properties beyond predictive accuracy. We hope our work establishes a new paradigm for interpretable knowledge distillation in time series analysis.

Problem

Research questions and friction points this paper is trying to address.

knowledge distillation

time series

interpretability

temporal data

model compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Saliency Distillation

Interpretable Knowledge Transfer

Time Series

Knowledge Distillation

Model Compression

🔎 Similar Papers

No similar papers found.

Authors to Follow