BarcodeMamba+: Advancing State-Space Models for Fungal Biodiversity Research

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fungal DNA barcoding classification faces three key challenges: label sparsity, long-tailed class distribution, and difficulty in modeling hierarchical taxonomic structure—leading to poor generalization and hierarchical inconsistency in conventional supervised methods. To address these, we propose the first domain-specific state space model (SSM) for fungal barcoding, establishing a pretraining–fine-tuning paradigm. Our method introduces hierarchical label smoothing, class-weighted cross-entropy loss, and a MycoAI-inspired multi-head hierarchical classifier that explicitly enforces phylogenetic constraints across six taxonomic levels: phylum, class, order, family, genus, and species. Evaluated on a fungal classification benchmark with distributional shift, our approach achieves state-of-the-art accuracy at all taxonomic levels, significantly improving zero-shot generalization and hierarchical consistency. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Accurate taxonomic classification from DNA barcodes is a cornerstone of global biodiversity monitoring, yet fungi present extreme challenges due to sparse labelling and long-tailed taxa distributions. Conventional supervised learning methods often falter in this domain, struggling to generalize to unseen species and to capture the hierarchical nature of the data. To address these limitations, we introduce BarcodeMamba+, a foundation model for fungal barcode classification built on a powerful and efficient state-space model architecture. We employ a pretrain and fine-tune paradigm, which utilizes partially labelled data and we demonstrate this is substantially more effective than traditional fully-supervised methods in this data-sparse environment. During fine-tuning, we systematically integrate and evaluate a suite of enhancements--including hierarchical label smoothing, a weighted loss function, and a multi-head output layer from MycoAI--to specifically tackle the challenges of fungal taxonomy. Our experiments show that each of these components yields significant performance gains. On a challenging fungal classification benchmark with distinct taxonomic distribution shifts from the broad training set, our final model outperforms a range of existing methods across all taxonomic levels. Our work provides a powerful new tool for genomics-based biodiversity research and establishes an effective and scalable training paradigm for this challenging domain. Our code is publicly available at https://github.com/bioscan-ml/BarcodeMamba.
Problem

Research questions and friction points this paper is trying to address.

Classifying fungal DNA barcodes with sparse labels
Overcoming long-tailed taxa distribution in fungi
Generalizing to unseen species in taxonomy
Innovation

Methods, ideas, or system contributions that make the work stand out.

State-space model architecture for fungal classification
Pretrain-finetune paradigm with partially labelled data
Hierarchical smoothing and weighted loss enhancements
T
Tiancheng Gao
University of Guelph
Scott C. Lowe
Scott C. Lowe
Postdoctoral Research Fellow, Vector Institute
Machine LearningDeep learningNeuroinformaticsSelf-supervisionReasoning
B
Brendan Furneaux
University of Jyväskylä
A
Angel X. Chang
Simon Fraser University, Amii
G
Graham W. Taylor
University of Guelph, Vector Institute