PepTriX: A Framework for Explainable Peptide Analysis through Protein Language Models

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Traditional peptide classification relies on handcrafted 1D sequence features, suffering from poor generalizability; while protein language models (PLMs) achieve strong performance, they incur high fine-tuning costs, yield opaque representations, and lack robust multi-task adaptability. To address these limitations, we propose a unified, scalable multimodal framework that jointly encodes PLM-derived sequence embeddings and 3D structural features—extracted via a graph attention network—and integrates contrastive learning with cross-modal co-attention to enable zero-shot multi-task adaptation. Our approach achieves state-of-the-art performance across diverse peptide classification tasks—including toxicity and HIV inhibition prediction—while preserving biological interpretability. Domain experts validate that our model accurately identifies key biophysical and structural motifs, substantially enhancing the biological traceability of predictions.

Technology Category

Application Category

📝 Abstract

Peptide classification tasks, such as predicting toxicity and HIV inhibition, are fundamental to bioinformatics and drug discovery. Traditional approaches rely heavily on handcrafted encodings of one-dimensional (1D) peptide sequences, which can limit generalizability across tasks and datasets. Recently, protein language models (PLMs), such as ESM-2 and ESMFold, have demonstrated strong predictive performance. However, they face two critical challenges. First, fine-tuning is computationally costly. Second, their complex latent representations hinder interpretability for domain experts. Additionally, many frameworks have been developed for specific types of peptide classification, lacking generalization. These limitations restrict the ability to connect model predictions to biologically relevant motifs and structural properties. To address these limitations, we present PepTriX, a novel framework that integrates one dimensional (1D) sequence embeddings and three-dimensional (3D) structural features via a graph attention network enhanced with contrastive training and cross-modal co-attention. PepTriX automatically adapts to diverse datasets, producing task-specific peptide vectors while retaining biological plausibility. After evaluation by domain experts, we found that PepTriX performs remarkably well across multiple peptide classification tasks and provides interpretable insights into the structural and biophysical motifs that drive predictions. Thus, PepTriX offers both predictive robustness and interpretable validation, bridging the gap between performance-driven peptide-level models (PLMs) and domain-level understanding in peptide research.

Problem

Research questions and friction points this paper is trying to address.

Overcoming computational costs and interpretability limitations in protein language models

Addressing generalization issues in peptide classification across diverse datasets

Connecting model predictions to biologically relevant structural motifs and properties

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates 1D sequence embeddings and 3D structural features

Uses graph attention network with contrastive training

Automatically adapts to diverse datasets for interpretability

🔎 Similar Papers

ProtChatGPT: Towards Understanding Proteins with Large Language Models