VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of spatial hallucination in vision-language models operating in GPS-denied environments, where the absence of absolute scale information leads to unreliable spatial reasoning. To mitigate this, the authors propose VANGUARD, a lightweight geometry-aware module that leverages common vehicles as environmental anchors. By integrating oriented bounding boxes with kernel density estimation, VANGUARD infers pixel-to-meter correspondences and computes ground sample distance (GSD) using a pre-calibrated reference length, thereby providing large language model (LLM) agents with consistent metric scale. A composite confidence scoring mechanism enables agents to autonomously assess measurement reliability, reducing category dependence by 2.6× and catastrophic failure risk by 4×. Evaluated on DOTA v1.5, the method achieves a median GSD error of 6.87%; when combined with SAM for area estimation, it yields a median error of 19.7% across 100 test cases.

Technology Category

Application Category

📝 Abstract
Autonomous aerial robots operating in GPS-denied or communication-degraded environments frequently lose access to camera metadata and telemetry, leaving onboard perception systems unable to recover the absolute metric scale of the scene. As LLM/VLM-based planners are increasingly adopted as high-level agents for embodied systems, their ability to reason about physical dimensions becomes safety-critical -- yet our experiments show that five state-of-the-art VLMs suffer from spatial scale hallucinations, with median area estimation errors exceeding 50%. We propose VANGUARD, a lightweight, deterministic Geometric Perception Skill designed as a callable tool that any LLM-based agent can invoke to recover Ground Sample Distance (GSD) from ubiquitous environmental anchors: small vehicles detected via oriented bounding boxes, whose modal pixel length is robustly estimated through kernel density estimation and converted to GSD using a pre-calibrated reference length. The tool returns both a GSD estimate and a composite confidence score, enabling the calling agent to autonomously decide whether to trust the measurement or fall back to alternative strategies. On the DOTA~v1.5 benchmark, VANGUARD achieves 6.87% median GSD error on 306~images. Integrated with SAM-based segmentation for downstream area measurement, the pipeline yields 19.7% median error on a 100-entry benchmark -- with 2.6x lower category dependence and 4x fewer catastrophic failures than the best VLM baseline -- demonstrating that equipping agents with deterministic geometric tools is essential for safe autonomous spatial reasoning.
Problem

Research questions and friction points this paper is trying to address.

GPS-denied environments
absolute metric scale
spatial scale hallucination
Ground Sample Distance
autonomous aerial robots
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ground Sample Distance (GSD)
Geometric Perception
GPS-denied navigation
Vehicle-anchored scaling
Deterministic perception tool
🔎 Similar Papers
No similar papers found.
Y
Yifei Chen
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China.
Xupeng Chen
Xupeng Chen
Research Scientist, TikTok | Ph.D. in Electrical Engineering, New York University
LLMMulti-ModalBCIComputer VisionNature Language Processing
F
Feng Wang
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China.
N
Niangang Jiao
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China.
J
Jiayin Liu
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China.