Structure-Informed Multiple Sequence Alignment: A Formal Model and Hardness Results

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This study addresses the challenge of effectively integrating structural information—such as contact maps—into multiple sequence alignment (MSA) while preserving computational tractability. To this end, we propose MSA-S, a structure-guided MSA model that treats sequences as strings and encodes structural information as positional pairs, jointly optimizing sequence alignment scores and structural overlap scores. We formulate the first rigorous integer programming model for this problem and systematically analyze its computational complexity using tools from theoretical computer science. Specifically, we prove that the decision version of MSA-S is NP-complete, and that its optimization variant admits no polynomial-time approximation scheme (PTAS), even for aligning just two sequences, unless P = NP.
📝 Abstract
We formulate a structure-informed multiple sequence alignment problem, denoted MSA-S. The model abstracts biological sequences as strings and structural information as designated position-pairs. It augments a fixed pairwise string score, defined by a fixed non-gap symbol-pair scoring rule and fixed affine gap penalties, with a binary overlap score on designated position-pairs, which can be interpreted as a contact-map overlap score in structural applications. This yields a fixed-score, integer-valued optimization model suitable for complexity-theoretic analysis. Under this formulation, we show that the decision problem MSA-S-DEC is NP-complete for a broad class of fixed pairwise string scoring schemes. We also show that NP-hardness persists even under the restriction that every designated position-pair set is nonempty and the pair-overlap threshold is strictly positive. For the associated scalarized optimization problem MSA-S-OPT(lambda) with any fixed rational constant lambda >= 1, we further show that, under the canonical unit scheme for the non-gap symbol-pair scoring rule, MSA-S-OPT(lambda) admits no polynomial-time approximation scheme (PTAS) even for two input strings (k = 2), unless P = NP. These results establish a formal complexity-theoretic baseline for structure-informed multiple sequence alignment.
Problem

Research questions and friction points this paper is trying to address.

multiple sequence alignment
structural information
contact map
NP-completeness
optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

structure-informed multiple sequence alignment
NP-completeness
contact-map overlap
computational complexity
approximation hardness
🔎 Similar Papers
No similar papers found.