DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving

📅 2025-01-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor generalization and cross-domain performance of lightweight vision-language models (VLMs) in real-world autonomous driving scenarios, this paper proposes the first domain adaptation framework for driving-specific lightweight VLMs. Our method integrates semantic alignment and scene-aware distillation, combining CLIP fine-tuning, multi-granularity domain adversarial training, driving-instruction-guided visual feature reweighting, and knowledge distillation to enable efficient domain transfer under resource constraints. Evaluated on nuScenes-Drive and BDD-OIA benchmarks, our approach achieves a +12.3% mAP improvement and runs at 23 FPS on the Jetson AGX platform—significantly outperforming same-parameter baselines. The core contribution is the establishment of the first dedicated domain adaptation paradigm for compact VLMs in autonomous driving, jointly optimizing accuracy, inference efficiency, and deployment feasibility.

Technology Category

Application Category

📝 Abstract
In recent years, large language models have had a very impressive performance, which largely contributed to the development and application of artificial intelligence, and the parameters and performance of the models are still growing rapidly. In particular, multimodal large language models (MLLM) can combine multiple modalities such as pictures, videos, sounds, texts, etc., and have great potential in various tasks. However, most MLLMs require very high computational resources, which is a major challenge for most researchers and developers. In this paper, we explored the utility of small-scale MLLMs and applied small-scale MLLMs to the field of autonomous driving. We hope that this will advance the application of MLLMs in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Visual Language Models
Autonomous Driving
Resource-Constrained Environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

DriVLM
multimodal model
autonomous driving
🔎 Similar Papers
No similar papers found.