On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of deploying large language models (LLMs) in resource-constrained vehicular systems by leveraging the GPT-Driver framework, which encodes driving scenarios into linguistic prompts and generates trajectories through chain-of-thought reasoning. The authors propose a strategy-aware generalized knowledge distillation (GKD) approach that trains a lightweight student model using dense token-level feedback from a teacher model, and compare it against a reinforcement learning baseline that relies on teacher log-probability rewards. Evaluated on the nuScenes dataset, GKD achieves performance close to that of the teacher model while reducing model size by a factor of five, significantly outperforming the reinforcement learning method and demonstrating its effectiveness in balancing deployment efficiency with performance retention.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have recently demonstrated strong potential for autonomous vehicle motion planning by reformulating trajectory prediction as a language generation problem. However, deploying capable LLMs in resource-constrained onboard systems remains a fundamental challenge. In this paper, we study how to effectively transfer motion planning knowledge from a large teacher LLM to a smaller, more deployable student model. We build on the GPT-Driver framework, which represents driving scenes as language prompts and generates waypoint trajectories with chain-of-thought reasoning, and investigate two student training paradigms: (i) on-policy generalized knowledge distillation (GKD), which trains the student on its own self-generated outputs using dense token-level feedback from the teacher, and (ii) a dense-feedback reinforcement learning (RL) baseline that uses the teacher's log-probabilities as per-token reward signals in a policy gradient framework. Experiments on the nuScenes benchmark show that GKD substantially outperforms the RL baseline and closely approaches teacher-level performance despite a 5$\times$ reduction in model size. These results highlight the practical value of on-policy distillation as a principled and effective approach to deploying LLM-based planners in autonomous driving systems.

Problem

Research questions and friction points this paper is trying to address.

language models

autonomous vehicle

motion planning

knowledge distillation

on-policy learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

on-policy distillation

language models

autonomous driving