🤖 AI Summary
Existing 3D animal pose estimation methods suffer from reliance on geometric priors, labor-intensive dense keypoint annotations, and frame-wise optimization—compromising accuracy, generalizability, and scalability. This paper introduces the first end-to-end framework for joint 3D pose and appearance modeling that requires no geometric priors, no keypoints, and no manual annotations. Our approach unifies shape carving with 3D Gaussian splatting and employs rotation-invariant visual embeddings—replacing conventional 3D keypoints entirely. Evaluated on multi-species datasets (mouse, rat, and zebra finch), it achieves high-fidelity reconstructions with pose representations better aligned with human perception and strong cross-individual and cross-scene generalization. The method significantly improves efficiency and spatiotemporal resolution for large-scale, long-duration behavioral analysis, establishing a novel paradigm for fine-grained behavioral quantification.
📝 Abstract
Accurate and scalable quantification of animal pose and appearance is crucial for studying behavior. Current 3D pose estimation techniques, such as keypoint- and mesh-based techniques, often face challenges including limited representational detail, labor-intensive annotation requirements, and expensive per-frame optimization. These limitations hinder the study of subtle movements and can make large-scale analyses impractical. We propose Pose Splatter, a novel framework leveraging shape carving and 3D Gaussian splatting to model the complete pose and appearance of laboratory animals without prior knowledge of animal geometry, per-frame optimization, or manual annotations. We also propose a novel rotation-invariant visual embedding technique for encoding pose and appearance, designed to be a plug-in replacement for 3D keypoint data in downstream behavioral analyses. Experiments on datasets of mice, rats, and zebra finches show Pose Splatter learns accurate 3D animal geometries. Notably, Pose Splatter represents subtle variations in pose, provides better low-dimensional pose embeddings over state-of-the-art as evaluated by humans, and generalizes to unseen data. By eliminating annotation and per-frame optimization bottlenecks, Pose Splatter enables analysis of large-scale, longitudinal behavior needed to map genotype, neural activity, and micro-behavior at unprecedented resolution.