🤖 AI Summary
Current bioimage analysis tools (e.g., CellProfiler) face bottlenecks in automated and reproducible feature extraction, hindering scalable deployment of machine learning workflows. To address this, we introduce *cp_measure*—a modular, API-first Python library that decouples CellProfiler’s core measurement engine and refactors it into a programmable interface, enabling seamless integration with the scientific Python ecosystem. The library supports end-to-end phenotypic analysis of 2D/3D cellular imaging and spatial transcriptomics data, ensuring highly consistent (Pearson correlation >0.999 with CellProfiler) and fully reproducible feature extraction. Empirical evaluation demonstrates its efficiency and scalability across multi-batch, multimodal biological image datasets. *cp_measure* significantly enhances robustness and productivity in feature-driven computational biology modeling, while preserving compatibility with established CellProfiler pipelines.
📝 Abstract
Biological image analysis has traditionally focused on measuring specific visual properties of interest for cells or other entities. A complementary paradigm gaining increasing traction is image-based profiling - quantifying many distinct visual features to form comprehensive profiles which may reveal hidden patterns in cellular states, drug responses, and disease mechanisms. While current tools like CellProfiler can generate these feature sets, they pose significant barriers to automated and reproducible analyses, hindering machine learning workflows. Here we introduce cp_measure, a Python library that extracts CellProfiler's core measurement capabilities into a modular, API-first tool designed for programmatic feature extraction. We demonstrate that cp_measure features retain high fidelity with CellProfiler features while enabling seamless integration with the scientific Python ecosystem. Through applications to 3D astrocyte imaging and spatial transcriptomics, we showcase how cp_measure enables reproducible, automated image-based profiling pipelines that scale effectively for machine learning applications in computational biology.