🤖 AI Summary
This work proposes a behavior-aware anthropometric framework for 3D scene generation that addresses the common oversight of human behavioral needs in existing methods, which often fail to ensure spatial comfort and functionality. By integrating the behavioral reasoning capabilities of vision-language models (VLMs) with personalized anthropometric data, the approach translates behavior–object relationships into parametric layout constraints tailored to individual body dimensions. The resulting human-centered 3D layouts demonstrate strong geometric plausibility and are validated through user perception studies (N=16). Furthermore, real-scale experiments involving both individuals (N=20) and groups (N=18) show significant improvements in task completion time, walking trajectory efficiency, and the ergonomic fit of human–object interaction spaces.
📝 Abstract
Well-designed indoor scenes should prioritize how people can act within a space rather than merely what objects to place. However, existing 3D scene generation methods emphasize visual and semantic plausibility, while insufficiently addressing whether people can comfortably walk, sit, or manipulate objects. To bridge this gap, we present a Behavior-Aware Anthropometric Scene Generation framework. Our approach leverages vision-language models (VLMs) to analyze object-behavior relationships, translating spatial requirements into parametric layout constraints adapted to user-specific anthropometric data. We conducted comparative studies with state-of-the-art models using geometric metrics and a user perception study (N=16). We further conducted in-depth human-scale studies (individuals, N=20; groups, N=18). The results showed improvements in task completion time, trajectory efficiency, and human-object manipulation space. This study contributes a framework that bridges VLM-based interaction reasoning with anthropometric constraints, validated through both technical metrics and real-scale human usability studies.