DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving

📅 2025-01-09

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

To address the poor generalization and cross-domain performance of lightweight vision-language models (VLMs) in real-world autonomous driving scenarios, this paper proposes the first domain adaptation framework for driving-specific lightweight VLMs. Our method integrates semantic alignment and scene-aware distillation, combining CLIP fine-tuning, multi-granularity domain adversarial training, driving-instruction-guided visual feature reweighting, and knowledge distillation to enable efficient domain transfer under resource constraints. Evaluated on nuScenes-Drive and BDD-OIA benchmarks, our approach achieves a +12.3% mAP improvement and runs at 23 FPS on the Jetson AGX platform—significantly outperforming same-parameter baselines. The core contribution is the establishment of the first dedicated domain adaptation paradigm for compact VLMs in autonomous driving, jointly optimizing accuracy, inference efficiency, and deployment feasibility.

Technology Category

Application Category

📝 Abstract

In recent years, large language models have had a very impressive performance, which largely contributed to the development and application of artificial intelligence, and the parameters and performance of the models are still growing rapidly. In particular, multimodal large language models (MLLM) can combine multiple modalities such as pictures, videos, sounds, texts, etc., and have great potential in various tasks. However, most MLLMs require very high computational resources, which is a major challenge for most researchers and developers. In this paper, we explored the utility of small-scale MLLMs and applied small-scale MLLMs to the field of autonomous driving. We hope that this will advance the application of MLLMs in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Visual Language Models

Autonomous Driving

Resource-Constrained Environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

DriVLM

multimodal model

autonomous driving

🔎 Similar Papers

MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving