🤖 AI Summary
This work addresses the challenge of subcellular structure segmentation in label-free 3D bright-field microscopy images. We introduce the first 4-billion-parameter 3D bright-field foundation model, capable of accurately segmenting nuclei, mitochondria, and other organelles without fluorescent labeling or manual post-processing. Methodologically, we propose a novel hyperspherical representation learning paradigm, integrated with hardware-aligned sparse attention, depth-width residual hyperconnections, soft Mixture-of-Experts (MoE) gating, and anisotropic patch embedding—enabling geometrically faithful 3D tokenization. Evaluated across multiple confocal microscopy datasets, our model significantly outperforms state-of-the-art CNN- and Transformer-based baselines, achieving superior axial resolution and robustness across diverse cell types. The framework establishes a scalable, general-purpose foundation architecture for label-free, live 3D imaging analysis.
📝 Abstract
Label-free 3D brightfield microscopy offers a fast and noninvasive way to visualize cellular morphology, yet robust volumetric segmentation still typically depends on fluorescence or heavy post-processing. We address this gap by introducing Bright-4B, a 4 billion parameter foundation model that learns on the unit hypersphere to segment subcellular structures directly from 3D brightfield volumes. Bright-4B combines a hardware-aligned Native Sparse Attention mechanism (capturing local, coarse, and selected global context), depth-width residual HyperConnections that stabilize representation flow, and a soft Mixture-of-Experts for adaptive capacity. A plug-and-play anisotropic patch embed further respects confocal point-spread and axial thinning, enabling geometry-faithful 3D tokenization. The resulting model produces morphology-accurate segmentations of nuclei, mitochondria, and other organelles from brightfield stacks alone--without fluorescence, auxiliary channels, or handcrafted post-processing. Across multiple confocal datasets, Bright-4B preserves fine structural detail across depth and cell types, outperforming contemporary CNN and Transformer baselines. All code, pretrained weights, and models for downstream finetuning will be released to advance large-scale, label-free 3D cell mapping.