🤖 AI Summary
Graph Transformers suffer from weak structural encoding capacity and struggle to effectively model topological priors. Method: This paper proposes Motivated Structure Encoding (MoSE), a novel structure-aware encoding scheme grounded in graph homomorphism counting. MoSE systematically formalizes homomorphism counting as a general-purpose structural encoding framework. Theoretically, MoSE is strictly more expressive than random-walk-based structural encoding (e.g., RWSE) and matches the expressivity of message-passing neural networks. Crucially, it introduces no additional trainable parameters and is plug-and-play compatible with diverse Graph Transformer architectures. Results: Extensive experiments on multiple graph benchmarks and molecular property prediction tasks (e.g., QM9, ZINC) demonstrate that MoSE consistently outperforms mainstream structural encodings—including Laplacian Positional Encoding and RWSE—achieving state-of-the-art performance across all evaluated settings.
📝 Abstract
Graph Transformers are popular neural networks that extend the well-known Transformer architecture to the graph domain. These architectures operate by applying self-attention on graph nodes and incorporating graph structure through the use of positional encodings (e.g., Laplacian positional encoding) or structural encodings (e.g., random-walk structural encoding). The quality of such encodings is critical, since they provide the necessary $ extit{graph inductive biases}$ to condition the model on graph structure. In this work, we propose $ extit{motif structural encoding}$ (MoSE) as a flexible and powerful structural encoding framework based on counting graph homomorphisms. Theoretically, we compare the expressive power of MoSE to random-walk structural encoding and relate both encodings to the expressive power of standard message passing neural networks. Empirically, we observe that MoSE outperforms other well-known positional and structural encodings across a range of architectures, and it achieves state-of-the-art performance on a widely studied molecular property prediction dataset.