Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale genomic workflows in precision medicine face challenges including substantial inter-chromosomal memory demand variability, high peak memory usage due to static resource allocation, I/O intensity, and frequent task failures. To address these, we propose a chromosome-level adaptive parallelization framework: (1) a novel memory prediction model integrating symbolic regression with interpolation-based bias correction to accurately estimate per-chromosome RAM consumption; (2) task packing formulated as a constrained knapsack problem for dynamic, resource-aware scheduling; and (3) a static processing-order optimization strategy to jointly minimize peak memory. Evaluated on real whole-genome sequencing (WGS) pipelines and large-scale simulations, our approach reduces out-of-memory failures by 92%, accelerates end-to-end execution by 1.8–3.2×, and improves resource utilization by over 40%, significantly enhancing analytical stability and scalability.

Technology Category

Application Category

📝 Abstract
Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task failures due to out-of-memory errors. Simple static resource allocation methods struggle to handle the variability in per-chromosome RAM demands, resulting in poor resource utilization and long runtimes. In this work, we propose multiple mechanisms for adaptive, RAM-efficient parallelization of chromosome-level bioinformatics workflows. First, we develop a symbolic regression model that estimates per-chromosome memory consumption for a given task and introduces an interpolating bias to conservatively minimize over-allocation. Second, we present a dynamic scheduler that adaptively predicts RAM usage with a polynomial regression model, treating task packing as a Knapsack problem to optimally batch jobs based on predicted memory requirements. Additionally, we present a static scheduler that optimizes chromosome processing order to minimize peak memory while preserving throughput. Our proposed methods, evaluated on simulations and real-world genomic pipelines, provide new mechanisms to reduce memory overruns and balance load across threads. We thereby achieve faster end-to-end execution, showcasing the potential to optimize large-scale genomic workflows.
Problem

Research questions and friction points this paper is trying to address.

Optimizing memory allocation for chromosome-level genomic workflows
Reducing task failures caused by excessive memory consumption
Improving resource utilization in precision medicine data processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic regression model estimates chromosome memory consumption
Dynamic scheduler predicts RAM usage via polynomial regression
Static scheduler optimizes chromosome order to minimize memory
🔎 Similar Papers
No similar papers found.
D
Daniel Mas Montserrat
Galatea Bio, Stanford University
R
Ray Verma
New York University
M
Míriam Barrabés
Galatea Bio
F
Francisco M. de la Vega
Galatea Bio, Stanford University
Carlos D. Bustamante
Carlos D. Bustamante
Galatea Bio, Stanford University
Alexander G. Ioannidis
Alexander G. Ioannidis
Assistant Professor