🤖 AI Summary
This work addresses the challenge of estimating food volume from visual reconstructions under bite-induced deformations by proposing a bite-aware volumetric estimation pipeline. The method integrates image segmentation (SAM³), scale-ambiguous 3D reconstruction (Hunyuan3D/SAM³D), metric scaling via plate diameter estimation using MoGe-2, and Blender-based geometric cleanup with watertight meshing. Rigid ICP alignment is employed to compute Chamfer distance between pre- and post-bite states. Crucially, the approach decouples surface reconstruction, metric scaling, mesh repair, and volume integration into distinct evaluation stages, ensuring physically consistent consumption estimates without violating monotonicity. Evaluated on the CVPR 2026 MetaFood Challenge, the method achieved first place, reporting an average Chamfer distance of 8.31 across 34 meshes, a mean absolute percentage error (MAPE) of 33.87% for total volume change across 17 meal pairs, and a 53.74% MAPE for consumed volume.
📝 Abstract
Can a visually plausible food mesh be trusted to estimate the volume of consumed food? \method investigates this question using selected paired before- and after-consumption states from the MetaFood CVPR 2026 Continuous 3D Reconstruction While Eating Challenge. The submitted workflow follows a curated reconstruction protocol: SAM~3 segments the food and plate regions; Hunyuan3D/SAM~3D generates a dimensionless food mesh; the plate diameter provides the metric scale; the plate geometry is removed in Blender; and the remaining mesh is hole-filled, made watertight, and integrated to estimate volume. MoGe-2 is used only as an auxiliary cue for initial dish-diameter estimation when direct plate measurement is uncertain; it is not the primary scale source for the reported challenge result. \method ranks first, with an average Chamfer distance of 8.31 across 34 meshes using rigid ICP without scale correction. On 17 before- and after-pairs, it achieves 33.87\% state-level volume MAPE and zero monotonicity violations, while consumed-volume MAPE remains 53.74\%. The results show that surface reconstruction, metric scale, controlled mesh cleanup, watertight volume integration, and physical depletion consistency should be evaluated separately for dietary assessment. Source code and evaluation scripts will be available at \href{https://github.com/GCVCG/PerBite-CVPR-MetaFood-2026}{github.com/GCVCG/PerBite-CVPR-MetaFood-2026}.