🤖 AI Summary
To address the performance limitations of shallow graph neural networks (GNNs) in multi-task learning—particularly in teacher-free settings where pre-trained teacher models are unavailable—this paper proposes a Teacher-Free Mutual Learning (TFML) framework. TFML enables collaborative training of multiple homogeneous shallow GNNs, incorporating an adaptive logit-weighting module to dynamically modulate inter-model knowledge exchange intensity and an entropy-regularized mechanism to enhance prediction confidence and generalization. Unlike conventional knowledge distillation, TFML eliminates reliance on a fixed teacher model and instead achieves bidirectional knowledge transfer through ensemble-based mutual supervision. Extensive experiments on three node classification and three graph classification benchmark datasets demonstrate that TFML significantly improves both accuracy and robustness of shallow GNNs. Notably, it achieves superior performance in joint multi-task learning scenarios, outperforming standard baselines and teacher-dependent distillation methods.
📝 Abstract
Knowledge distillation (KD) techniques have emerged as a powerful tool for transferring expertise from complex teacher models to lightweight student models, particularly beneficial for deploying high-performance models in resource-constrained devices. This approach has been successfully applied to graph neural networks (GNNs), harnessing their expressive capabilities to generate node embeddings that capture structural and feature-related information. In this study, we depart from the conventional KD approach by exploring the potential of collaborative learning among GNNs. In the absence of a pre-trained teacher model, we show that relatively simple and shallow GNN architectures can synergetically learn efficient models capable of performing better during inference, particularly in tackling multiple tasks. We propose a collaborative learning framework where ensembles of student GNNs mutually teach each other throughout the training process. We introduce an adaptive logit weighting unit to facilitate efficient knowledge exchange among models and an entropy enhancement technique to improve mutual learning. These components dynamically empower the models to adapt their learning strategies during training, optimizing their performance for downstream tasks. Extensive experiments conducted on three datasets each for node and graph classification demonstrate the effectiveness of our approach.