๐ค AI Summary
This study addresses the challenge of learning biologically meaningful sequence representations from proteinโprotein interaction networks to support relational reasoning and functional analogy tasks. To this end, we propose Event2Vec, an additive sequence model that, for the first time, incorporates strictly compositional structure into network embeddings. Trained on random-walk sequences derived from the human STRING network, Event2Vec substantially outperforms non-compositional baselines such as DeepWalk, achieving a 30.2-fold improvement in pathway consistency over random models and a mean functional analogy similarity of 0.966. Moreover, it more effectively captures the hierarchical organization of biological pathways, thereby demonstrating the efficacy and advantages of compositional representations in biological network analysis.
๐ Abstract
In this work, we study whether enforcing strict compositional structure in sequence embeddings yields meaningful geometric organization when applied to protein-protein interaction networks. Using Event2Vec, an additive sequence embedding model, we train 64-dimensional representations on random walks from the human STRING interactome, and compare against a DeepWalk baseline based on Word2Vec, trained on the same walks. We find that compositional structure substantially improves pathway coherence (30.2$\times$ vs 2.9$\times$ above random), functional analogy accuracy (mean similarity 0.966 vs 0.650), and hierarchical pathway organization, while geometric properties such as norm--degree anticorrelation are shared with or exceeded by the non-compositional baseline. These results indicate that enforced compositionality specifically benefits relational and compositional reasoning tasks in biological networks.