🤖 AI Summary
Modeling orthology relationships becomes challenging when gene evolutionary histories follow phylogenetic networks—rather than trees—due to reticulate events such as horizontal gene transfer or hybridization.
Method: We systematically characterize structural constraints imposed on orthology graphs by phylogenetic networks, focusing on level-1 networks. We establish the first necessary and sufficient condition for an orthology graph to be explainable by a level-1 network: all its prime modules must be near-cographs. Building on this, we design the first linear-time algorithm for both recognizing and reconstructing such networks.
Contribution/Results: We prove that while any graph can be explained by some highly complex network, level-1 networks strike an optimal balance between biological plausibility and computational tractability. This work overcomes fundamental limitations of tree-based models, providing a rigorous theoretical foundation and efficient computational tools for inferring parsimonious, biologically interpretable phylogenetic networks directly from orthology graphs.
📝 Abstract
Orthologous genes, which arise through speciation, play a key role in comparative genomics and functional inference. In particular, graph-based methods allow for the inference of orthology estimates without prior knowledge of the underlying gene or species trees. This results in orthology graphs, where each vertex represents a gene, and an edge exists between two vertices if the corresponding genes are estimated to be orthologs. Orthology graphs inferred under a tree-like evolutionary model must be cographs. However, real-world data often deviate from this property, either due to noise in the data, errors in inference methods or, simply, because evolution follows a network-like rather than a tree-like process. The latter, in particular, raises the question of whether and how orthology graphs can be derived from or, equivalently, are explained by phylogenetic networks. Here, we study the constraints imposed on orthology graphs when the underlying evolutionary history follows a phylogenetic network instead of a tree. We show that any orthology graph can be represented by a sufficiently complex level-k network. However, such networks lack biologically meaningful constraints. In contrast, level-1 networks provide a simpler explanation, and we establish characterizations for level-1 explainable orthology graphs, i.e., those derived from level-1 evolutionary histories. To this end, we employ modular decomposition, a classical technique for studying graph structures. Specifically, an arbitrary graph is level-1 explainable if and only if each primitive subgraph is a near-cograph (a graph in which the removal of a single vertex results in a cograph). Additionally, we present a linear-time algorithm to recognize level-1 explainable orthology graphs and to construct a level-1 network that explains them, if such a network exists.