🤖 AI Summary
This work addresses the lack of theoretical grounding for the generalization performance of Graph Convolutional Networks (GCNs) with skip connections under layer-wise sparsification. We systematically characterize, from spectral analysis and generalization error bound perspectives, how skip connections modulate the requirement for graph sparsification. We propose a sparsity-efficient adjacency matrix (A^*) that explicitly preserves critical message-passing paths, and identify significant inter-layer heterogeneity in sensitivity to sparsification—namely, sparsification bias in the first layer dominates the generalization error. We theoretically prove that, when (A^*) retains essential edges, the model’s generalization accuracy approaches the optimal. Experiments on multiple deep GCN benchmark datasets validate the theoretical findings. The core contribution is establishing the first unified theoretical framework that jointly models skip connections and layer-wise sparsification, thereby filling a fundamental gap in generalization guarantees for their coupled deployment.
📝 Abstract
Jumping connections enable Graph Convolutional Networks (GCNs) to overcome over-smoothing, while graph sparsification reduces computational demands by selecting a sub-matrix of the graph adjacency matrix during neighborhood aggregation. Learning GCNs with graph sparsification has shown empirical success across various applications, but a theoretical understanding of the generalization guarantees remains limited, with existing analyses ignoring either graph sparsification or jumping connections. This paper presents the first learning dynamics and generalization analysis of GCNs with jumping connections using graph sparsification. Our analysis demonstrates that the generalization accuracy of the learned model closely approximates the highest achievable accuracy within a broad class of target functions dependent on the proposed sparse effective adjacency matrix $A^*$. Thus, graph sparsification maintains generalization performance when $A^*$ preserves the essential edges that support meaningful message propagation. We reveal that jumping connections lead to different sparsification requirements across layers. In a two-hidden-layer GCN, the generalization is more affected by the sparsified matrix deviations from $A^*$ of the first layer than the second layer. To the best of our knowledge, this marks the first theoretical characterization of jumping connections' role in sparsification requirements. We validate our theoretical results on benchmark datasets in deep GCNs.