Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Machine learning (ML) applications in upstream biopharmaceutical processes are hindered by scarce, costly-to-acquire, and mechanistically complex process data. Method: This paper systematically reviews ML methodologies tailored for few-shot learning scenarios in bioprocessing. We propose the first taxonomy of ML methods specifically designed for bioprocess small-data regimes, organizing techniques along three unified dimensions: data augmentation, transfer learning, and physics-guided modeling—including meta-learning, Bayesian optimization, physics-informed neural networks (PINNs), few-shot transfer learning, and synthetic data generation. Contribution/Results: We identify and categorize 12 applicable methods, empirically evaluating their performance on critical tasks such as cell culture titer prediction and key process parameter optimization. The study reveals significant gaps in interpretability, cross-process generalizability, and experimental validation. Our analysis delivers a theoretically grounded, industrially actionable framework for method selection in biomanufacturing.

Technology Category

Application Category

📝 Abstract
Data is crucial for machine learning (ML) applications, yet acquiring large datasets can be costly and time-consuming, especially in complex, resource-intensive fields like biopharmaceuticals. A key process in this industry is upstream bioprocessing, where living cells are cultivated and optimised to produce therapeutic proteins and biologics. The intricate nature of these processes, combined with high resource demands, often limits data collection, resulting in smaller datasets. This comprehensive review explores ML methods designed to address the challenges posed by small data and classifies them into a taxonomy to guide practical applications. Furthermore, each method in the taxonomy was thoroughly analysed, with a detailed discussion of its core concepts and an evaluation of its effectiveness in tackling small data challenges, as demonstrated by application results in the upstream bioprocessing and other related domains. By analysing how these methods tackle small data challenges from different perspectives, this review provides actionable insights, identifies current research gaps, and offers guidance for leveraging ML in data-constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Addressing ML challenges with small datasets in bioprocessing
Classifying ML methods for small data in upstream bioprocessing
Evaluating ML effectiveness in data-constrained biopharmaceutical applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

ML methods for small data challenges
Taxonomy to guide practical applications
Effectiveness evaluation in bioprocessing
🔎 Similar Papers
No similar papers found.
J
Johnny Peng
Complex Adaptive Systems Laboratory, The Data Science Institute, University of Technology Sydney, NSW 2007, Australia
T
Thanh Tung Khuat
Complex Adaptive Systems Laboratory, The Data Science Institute, University of Technology Sydney, NSW 2007, Australia
Katarzyna Musial
Katarzyna Musial
Professor in Network Science, University of Technology Sydney, Australia
Complex Networked SystemsAdaptive SystemsComplexitySocial Networks
Bogdan Gabrys
Bogdan Gabrys
Prof. of Data Science, University of Technology Sydney
Computational IntelligenceData ScienceComplex Adaptive SystemsMachine LearningPredictive Analytics