VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the challenge of spatial hallucination in vision-language models operating in GPS-denied environments, where the absence of absolute scale information leads to unreliable spatial reasoning. To mitigate this, the authors propose VANGUARD, a lightweight geometry-aware module that leverages common vehicles as environmental anchors. By integrating oriented bounding boxes with kernel density estimation, VANGUARD infers pixel-to-meter correspondences and computes ground sample distance (GSD) using a pre-calibrated reference length, thereby providing large language model (LLM) agents with consistent metric scale. A composite confidence scoring mechanism enables agents to autonomously assess measurement reliability, reducing category dependence by 2.6× and catastrophic failure risk by 4×. Evaluated on DOTA v1.5, the method achieves a median GSD error of 6.87%; when combined with SAM for area estimation, it yields a median error of 19.7% across 100 test cases.

Technology Category

Application Category

📝 Abstract

Autonomous aerial robots operating in GPS-denied or communication-degraded environments frequently lose access to camera metadata and telemetry, leaving onboard perception systems unable to recover the absolute metric scale of the scene. As LLM/VLM-based planners are increasingly adopted as high-level agents for embodied systems, their ability to reason about physical dimensions becomes safety-critical -- yet our experiments show that five state-of-the-art VLMs suffer from spatial scale hallucinations, with median area estimation errors exceeding 50%. We propose VANGUARD, a lightweight, deterministic Geometric Perception Skill designed as a callable tool that any LLM-based agent can invoke to recover Ground Sample Distance (GSD) from ubiquitous environmental anchors: small vehicles detected via oriented bounding boxes, whose modal pixel length is robustly estimated through kernel density estimation and converted to GSD using a pre-calibrated reference length. The tool returns both a GSD estimate and a composite confidence score, enabling the calling agent to autonomously decide whether to trust the measurement or fall back to alternative strategies. On the DOTA~v1.5 benchmark, VANGUARD achieves 6.87% median GSD error on 306~images. Integrated with SAM-based segmentation for downstream area measurement, the pipeline yields 19.7% median error on a 100-entry benchmark -- with 2.6x lower category dependence and 4x fewer catastrophic failures than the best VLM baseline -- demonstrating that equipping agents with deterministic geometric tools is essential for safe autonomous spatial reasoning.

Problem

Research questions and friction points this paper is trying to address.

GPS-denied environments

absolute metric scale

spatial scale hallucination

Ground Sample Distance

autonomous aerial robots

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ground Sample Distance (GSD)

Geometric Perception

GPS-denied navigation