Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses autonomous navigation for edge robots operating under GPS-denied conditions and severe computational constraints. Method: We propose a lightweight vision-language-action (VLA) end-to-end collaborative framework, featuring the first deployment of a compact vision-language model (VLM) on a pure-CPU embedded platform—enabling cloud-free, online multimodal inference and real-time motion control in tandem. Contributions/Results: (1) We introduce VLM compression and CPU-specific inference optimization tailored for resource-constrained edge devices; (2) we establish a closed-loop perception-decision-execution architecture supporting real-time scene understanding and navigation in dynamic environments; (3) the system achieves a balanced trade-off between high responsiveness (<200 ms end-to-end latency) and task accuracy (86.4% average success rate) on low-power hardware. Extensive experiments validate robustness and practicality in real-world indoor and outdoor GPS-denied scenarios.

Technology Category

Application Category

📝 Abstract
The deployment of artificial intelligence models at the edge is increasingly critical for autonomous robots operating in GPS-denied environments where local, resource-efficient reasoning is essential. This work demonstrates the feasibility of deploying small Vision-Language Models (VLMs) on mobile robots to achieve real-time scene understanding and reasoning under strict computational constraints. Unlike prior approaches that separate perception from mobility, the proposed framework enables simultaneous movement and reasoning in dynamic environments using only on-board hardware. The system integrates a compact VLM with multimodal perception to perform contextual interpretation directly on embedded hardware, eliminating reliance on cloud connectivity. Experimental validation highlights the balance between computational efficiency, task accuracy, and system responsiveness. Implementation on a mobile robot confirms one of the first successful deployments of small VLMs for concurrent reasoning and mobility at the edge. This work establishes a foundation for scalable, assured autonomy in applications such as service robotics, disaster response, and defense operations.
Problem

Research questions and friction points this paper is trying to address.

Deploying AI models on edge robots with limited computational resources
Achieving real-time scene understanding and reasoning under strict constraints
Enabling simultaneous movement and reasoning without cloud connectivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deploys small VLMs on mobile robots
Integrates multimodal perception for contextual interpretation
Enables simultaneous reasoning and mobility on-board
🔎 Similar Papers
No similar papers found.
Justin Williams
Justin Williams
Peter Tong Department Chair and Vilas Distinguished Achievement Professor, University of Wisconsin
NeuroengineeringBiomedical EngineeringNeural InterfacesNeural Engineering
K
K. Gupta
Department of Cyber-Physical Systems, Clark Atlanta University, Atlanta, GA, USA
R
Roy George
Department of Cyber-Physical Systems, Clark Atlanta University, Atlanta, GA, USA
M
Mrinmoy Sarkar
Siemens Corporation, Princeton, NJ, USA