🤖 AI Summary
This work addresses the challenges posed by time-varying sensor poses in autonomous semi-trailer trucks—caused by fifth-wheel articulation and trailer deformation—and the unreliability of existing perception methods under low-parallax, texture-poor conditions. To this end, we propose dCAP, a dynamic visual calibration and perception framework that leverages a Transformer-based architecture with cross-view and temporal attention mechanisms to continuously estimate the 6-DoF relative pose between cameras mounted on the tractor and trailer. This dynamic estimation replaces static extrinsic calibration, thereby enhancing perception accuracy. We further introduce STT4AT, the first simulation benchmark supporting time-varying geometric configurations of semi-trailer trucks, implemented with synchronized multi-sensor simulation in CARLA. Experiments demonstrate that dCAP significantly outperforms static calibration approaches across diverse photorealistic scenarios. The dataset, toolchain, and code will be publicly released.
📝 Abstract
Autonomous trucking poses unique challenges due to articulated tractor-trailer geometry, and time-varying sensor poses caused by the fifth-wheel joint and trailer flex. Existing perception and calibration methods assume static baselines or rely on high-parallax and texture-rich scenes, limiting their reliability under real-world settings. We propose dCAP (dynamic Calibration and Articulated Perception), a vision-based framework that continuously estimates the 6-DoF (degree of freedom) relative pose between tractor and trailer cameras. dCAP employs a transformer with cross-view and temporal attention to robustly aggregate spatial cues while maintaining temporal consistency, enabling accurate perception under rapid articulation and occlusion. Integrated with BEVFormer, dCAP improves 3D object detection by replacing static calibration with dynamically predicted extrinsics. To facilitate evaluation, we introduce STT4AT, a CARLA-based benchmark simulating semi-trailer trucks with synchronized multi-sensor suites and time-varying inter-rig geometry across diverse environments. Experiments demonstrate that dCAP achieves stable, accurate perception while addressing the limitations of static calibration in autonomous trucking. The dataset, development kit, and source code will be publicly released.