🤖 AI Summary
To address the scarcity of paired prompt-instance data in autonomous driving—limiting language-guided perception—this paper introduces NuPrompt, the first object-centric multimodal language prompting benchmark for 3D, multi-view, and multi-frame scenarios. It comprises over 40K natural language prompts, each associated with an average of 7.4 object trajectories. We formally define and pioneer the novel task of *prompt-driven end-to-end trajectory prediction*. To tackle it, we propose PromptTrack, a lightweight Transformer that jointly encodes multi-view and temporal features to directly map linguistic prompts to cross-frame, cross-view object trajectories. Evaluated on NuPrompt, PromptTrack significantly outperforms all baselines. Both the benchmark and model code are publicly released, establishing a new paradigm for language-guided perception and decision-making in autonomous driving.
📝 Abstract
A new trend in the computer vision community is to capture objects of interest following flexible human command represented by a natural language prompt. However, the progress of using language prompts in driving scenarios is stuck in a bottleneck due to the scarcity of paired prompt-instance data. To address this challenge, we propose the first object-centric language prompt set for driving scenes within 3D, multi-view, and multi-frame space, named NuPrompt. It expands nuScenes dataset by constructing a total of 40,147 language descriptions, each referring to an average of 7.4 object tracklets. Based on the object-text pairs from the new benchmark, we formulate a novel prompt-based driving task, ie, employing a language prompt to predict the described object trajectory across views and frames. Furthermore, we provide a simple end-to-end baseline model based on Transformer, named PromptTrack. Experiments show that our PromptTrack achieves impressive performance on NuPrompt. We hope this work can provide some new insights for the self-driving community. The data and code have been released at https://github.com/wudongming97/Prompt4Driving.