🤖 AI Summary
To address the low training efficiency of Graph Neural Networks (GNNs) on large-scale multimodal graphs, existing graph compression methods suffer from gradient conflicts caused by inter-modal semantic misalignment and pathological amplification of gradient noise by the graph structure. This paper proposes a Structured Regularized Gradient Matching framework that jointly models both gradient conflict and structural noise amplification. It introduces a Dirichlet-energy-based structural damping regularizer, transforming graph topology into an optimization stability constraint. Multimodal gradients are orthogonally projected to decouple modal-specific updates, while explicit smoothness control is enforced in the gradient field. Experiments demonstrate significantly improved convergence speed and accuracy, strong generalization across diverse GNN architectures, seamless integration with downstream tasks such as neural architecture search, and excellent scalability under resource-constrained settings.
📝 Abstract
In critical web applications such as e-commerce and recommendation systems, multimodal graphs integrating rich visual and textual attributes are increasingly central, yet their large scale introduces substantial computational burdens for training Graph Neural Networks (GNNs). While Graph Condensation (GC) offers a promising solution by synthesizing smaller datasets, existing methods falter in the multimodal setting. We identify a dual challenge causing this failure: (1) conflicting gradients arising from semantic misalignments between modalities, and (2) the GNN's message-passing architecture pathologically amplifying this gradient noise across the graph structure. To address this, we propose Structurally-Regularized Gradient Matching (SR-GM), a novel condensation framework tailored for multimodal graphs. SR-GM introduces two synergistic components: first, a gradient decoupling mechanism that resolves inter-modality conflicts at their source via orthogonal projection; and second, a structural damping regularizer that acts directly on the gradient field. By leveraging the graph's Dirichlet energy, this regularizer transforms the topology from a noise amplifier into a stabilizing force during optimization. Extensive experiments demonstrate that SR-GM significantly improves accuracy and accelerates convergence compared to baseline methods. Ablation studies confirm that addressing both gradient conflict and structural amplification in tandem is essential for achieving superior performance. Moreover, the condensed multimodal graphs exhibit strong cross-architecture generalization and promise to accelerate applications like Neural Architecture Search. This research provides a scalable methodology for multimodal graph-based learning in resource-constrained environments.