TRIDE: A Text-assisted Radar-Image weather-aware fusion network for Depth Estimation

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the degradation of depth estimation accuracy under adverse weather conditions and the lack of semantic guidance in multimodal fusion for autonomous driving, this paper proposes a weather-aware radar-image-text trimodal depth estimation framework. The method introduces natural language descriptions into the radar–image fusion pipeline for the first time, incorporating a text feature extraction module and a cross-modal alignment mechanism. It further devises a novel weather-aware fusion strategy that adaptively modulates radar confidence weights based on meteorological conditions. Additionally, a large language model–driven text generation strategy is employed to enhance semantic representation. Evaluated on the nuScenes dataset, the proposed approach achieves significant improvements: mean absolute error (MAE) decreases by 12.87% and root mean square error (RMSE) by 9.08%, demonstrating the effectiveness and generalizability of text-assisted, weather-adaptive multimodal fusion.

Technology Category

Application Category

📝 Abstract
Depth estimation, essential for autonomous driving, seeks to interpret the 3D environment surrounding vehicles. The development of radar sensors, known for their cost-efficiency and robustness, has spurred interest in radar-camera fusion-based solutions. However, existing algorithms fuse features from these modalities without accounting for weather conditions, despite radars being known to be more robust than cameras under adverse weather. Additionally, while Vision-Language models have seen rapid advancement, utilizing language descriptions alongside other modalities for depth estimation remains an open challenge. This paper first introduces a text-generation strategy along with feature extraction and fusion techniques that can assist monocular depth estimation pipelines, leading to improved accuracy across different algorithms on the KITTI dataset. Building on this, we propose TRIDE, a radar-camera fusion algorithm that enhances text feature extraction by incorporating radar point information. To address the impact of weather on sensor performance, we introduce a weather-aware fusion block that adaptively adjusts radar weighting based on current weather conditions. Our method, benchmarked on the nuScenes dataset, demonstrates performance gains over the state-of-the-art, achieving a 12.87% improvement in MAE and a 9.08% improvement in RMSE. Code: https://github.com/harborsarah/TRIDE
Problem

Research questions and friction points this paper is trying to address.

Improving depth estimation for autonomous driving using radar-camera fusion
Addressing weather impact on sensor performance in depth estimation
Utilizing text descriptions to enhance radar-camera fusion accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-generation strategy enhances depth estimation
Radar-camera fusion with weather-aware adaptive weighting
Incorporates radar point data for text features