IBB: Fast Burrows-Wheeler Transform Construction for Length-Diverse DNA Data

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low BWT construction efficiency and high memory overhead arise from extreme length heterogeneity in DNA sequences. To address this, we propose an efficient external-memory BWT construction method tailored for variable-length sequences. Our key contributions are: (1) a novel right-aligned sorting strategy that eliminates reliance on sequence length uniformity—unlike conventional algorithms; (2) dynamic maintenance of insertion ranks using a balanced tree, enabling incremental processing of sequences; and (3) a co-designed fine-grained bucketing I/O optimization with tree-based indexing to enhance disk access locality. Evaluated on multiple real-world genomic datasets, our method achieves 10–40% speedup over state-of-the-art approaches while maintaining comparable memory consumption. This advancement significantly facilitates FM-index deployment in real-time read alignment and de novo assembly.

Technology Category

Application Category

📝 Abstract
The Burrows-Wheeler transform (BWT) is integral to the FM-index, which is used extensively in text compression, indexing, pattern search, and bioinformatic problems as de novo assembly and read alignment. Thus, efficient construction of the BWT in terms of time and memory usage is key to these applications. We present a novel external algorithm called Improved-Bucket Burrows-Wheeler transform (IBB) for constructing the BWT of DNA datasets with highly diverse sequence lengths. IBB uses a right-aligned approach to efficiently handle sequences of varying lengths, a tree-based data structure to manage relative insert positions and ranks, and fine buckets to reduce the necessary amount of input and output to external memory. Our experiments demonstrate that IBB is 10% to 40% faster than the best existing state-of-the-art BWT construction algorithms on most datasets while maintaining competitive memory consumption.
Problem

Research questions and friction points this paper is trying to address.

DNA sequence processing
efficiency improvement
memory reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved Bucket Burrows-Wheeler Transform
Tree-based Indexing
Memory Efficiency
🔎 Similar Papers
No similar papers found.
E
Enno Adler
Paderborn University, Germany
Stefan Böttcher
Stefan Böttcher
Professor für Informatik, Universität Paderborn
Datenbanken
Rita Hartel
Rita Hartel
Paderborn University
C
Cederic Alexander Steininger
Paderborn University, Germany