🤖 AI Summary
This work addresses the challenge of directly applying 2D foundation models to 3D hippocampal segmentation in neonatal brain MRI by proposing a structured window decomposition–recomposition strategy. The method partitions the 3D volume into non-overlapping sub-cubes, leverages frozen 2D DINOv3 features for parallel decoding, and reconstructs a full 3D segmentation via a dense prediction head. This approach maintains constant decoder memory usage while effectively recovering 3D anatomical structures from frozen 2D representations, enabling generalizable extension to 3D medical imaging. Evaluated on the ALBERT dataset, the method achieves a Dice score of 0.65 for single-window hippocampal segmentation, demonstrating both its efficacy and anatomical consistency.
📝 Abstract
Precise volumetric delineation of hippocampal structures is essential for quantifying neurodevelopmental trajectories in pre-term and term infants, where subtle morphological variations may carry prognostic significance. While foundation encoders trained on large-scale visual data offer discriminative representations, their 2D formulation is a limitation with respect to the $3$D organization of brain anatomy. We propose a volumetric segmentation strategy that reconciles this tension through a structured window-based disassembly-reassembly mechanism: the global MRI volume is decomposed into non-overlapping 3D windows or sub-cubes, each processed via a separate decoding arm built upon frozen high-fidelity features, and subsequently reassembled prior to a ground-truth correspendence using a dense-prediction head. This architecture preserves constant a decoder memory footprint while forcing predictions to lie within an anatomically consistent geometry. Evaluated on the ALBERT dataset for hippocampal segmentation, the proposed approach achieves a Dice score of 0.65 for a single 3D window. The method demonstrates that volumetric anatomical structure could be recovered from frozen 2D foundation representations through structured compositional decoding, and offers a principled and generalizable extension for foundation models for 3D medical applications.