Before Parc Fermé: RL-Time Pruning for Efficient Embodied LLMs in Autonomous Driving

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the challenge of deploying embodied large language models (LLMs) in autonomous driving, where high memory consumption and generation latency hinder real-time performance. To this end, the authors propose BPF (Behavior-aware Pruning Framework), a novel dynamic pruning paradigm during reinforcement learning (RL) training, encompassing two strategies: BPF-RL and BPF-SFT/RL. By integrating task-specific supervision signals with closed-loop control feedback, BPF enables efficient model compression while preserving control performance. Built upon the LLM-Pruner framework, BPF iteratively prunes model architecture during both supervised fine-tuning (SFT) and RL stages and is integrated into the RobotxR1 autonomous driving system. Experiments demonstrate that BPF-SFT/RL achieves a 27% higher decoding throughput on the Jetson AGX Orin compared to existing pruning methods and smaller dense models, while offering a 1.69× better trade-off between parameter compression and performance degradation.

📝 Abstract

Embodied Large Language Models (LLMs) are increasingly used as reasoning modules in robotic control pipelines to improve human-robot interaction, but their memory and generation latency make real-time deployment difficult. Pruning can reduce these costs, but for controllers that undergo multiple pre- and post-training phases, the crucial question is not only how much to prune, but when pruning should occur. In this work, we propose Before Parc Fermé (BPF), a pruning strategy performed during RL that compresses embodied LLM controllers while they are still being optimized for closed-loop behavior. This allows pruning decisions to account for the task-specific supervision and closed-loop feedback that shape the final controller. We propose two variants: BPF-RL, which performs iterative pruning during RL by removing part of the model at predefined training intervals, and BPF-SFT/RL, which first prunes part of the model structure during SFT and then further compresses it during RL using the same iterative strategy as BPF-RL until the target pruning ratio is reached. We evaluate BPF on RobotxR1, an LLM-based autonomous-driving control pipeline, using an established LLM pruning framework (LLM-Pruner), and compare it against post-training pruning, post-training pruning with RL recovery, SFT-stage pruning, and smaller dense models from the same family. Our results show that BPF provides the best task-performance vs. memory and throughput trade-off among the considered pruning strategies. When compressing the larger RobotxR1 models, BPF-SFT/RL achieves a $1.69\times$ better size-end-to-end performance trade-off than directly selecting a smaller dense model from the same family, measured as removed parameters per lost percentage point of control adaptability. On the Jetson AGX Orin mounted on the target robotic platform, the compact models improve decode throughput by up to $27\%$.

Problem

Research questions and friction points this paper is trying to address.

Embodied LLMs

pruning

autonomous driving

real-time deployment

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

RL-time pruning

embodied LLMs

model compression