🤖 AI Summary
This work proposes an end-to-end automated framework for efficiently mining challenging autonomous driving scenarios from the Argoverse 2 dataset. The approach integrates large language model (LLM)-driven autonomous code generation with a multi-stage validation pipeline: it first leverages a GLM-5.1-enhanced Claude Code agent to automatically generate scenario-mining scripts, then filters candidate samples using timestamp-balanced accuracy metrics, followed by semantic review through independent code sessions, and finally employs the Qwen3-VL multimodal model for scene-level verification to eliminate false positives. By uniquely combining LLM-based autonomous programming with multimodal validation, the framework substantially reduces false discovery rates and effectively identifies high-value driving scenarios on the Argoverse 2 test set.
📝 Abstract
We present our submission to the CVPR 2026 Argoverse 2 Scenario Mining Challenge. Our system uses a four-stage pipeline: (1) autonomous code generation via a Claude Code agent powered by GLM~5.1, (2) iterative training set screening with Timestamp Balanced Accuracy threshold 0.8 to curate few-shot examples, (3) semantic code review by a separate Claude Code session, and (4) Qwen3-VL scene-level verification to filter false positives. We report results on the Argoverse 2 test set.