Language-in-the-Loop Culvert Inspection on the Erie Canal

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of manual inspection in aging, structurally complex, and environmentally harsh historical aqueducts—such as the Erie Canal—where accessibility is severely limited, this paper proposes an end-to-end Language-in-the-Loop autonomous inspection system. The method integrates open-vocabulary region-of-interest (ROI) proposal generation, stereo depth estimation, and constraint-aware viewpoint planning, enabling high-resolution image acquisition, vision-language reasoning, and closed-loop physical interaction on a quadrupedal robot platform—without domain-specific fine-tuning. Its key contribution lies in synergistically combining large language model–driven semantic understanding with geometry-constrained relocalization: initial ROI proposals achieve 61.4% alignment with expert annotations, improving to 80% after re-imaging. The system supports onboard real-time decision-making and automated report generation. External evaluation by the New York State Canal Corporation confirms its technical efficacy and engineering practicality.

Technology Category

Application Category

📝 Abstract
Culverts on canals such as the Erie Canal, built originally in 1825, require frequent inspections to ensure safe operation. Human inspection of culverts is challenging due to age, geometry, poor illumination, weather, and lack of easy access. We introduce VISION, an end-to-end, language-in-the-loop autonomy system that couples a web-scale vision-language model (VLM) with constrained viewpoint planning for autonomous inspection of culverts. Brief prompts to the VLM solicit open-vocabulary ROI proposals with rationales and confidences, stereo depth is fused to recover scale, and a planner -- aware of culvert constraints -- commands repositioning moves to capture targeted close-ups. Deployed on a quadruped in a culvert under the Erie Canal, VISION closes the see, decide, move, re-image loop on-board and produces high-resolution images for detailed reporting without domain-specific fine-tuning. In an external evaluation by New York Canal Corporation personnel, initial ROI proposals achieved 61.4% agreement with subject-matter experts, and final post-re-imaging assessments reached 80%, indicating that VISION converts tentative hypotheses into grounded, expert-aligned findings.
Problem

Research questions and friction points this paper is trying to address.

Autonomous inspection of aging canal culverts using language-guided vision systems
Overcoming human inspection challenges with constrained viewpoint planning
Converting AI-generated hypotheses into expert-aligned inspection findings
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM-based autonomous inspection system
Constrained viewpoint planning for culverts
On-board loop closing without fine-tuning
🔎 Similar Papers
No similar papers found.