AgriCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing VQA benchmarks inadequately evaluate visual language models’ (VLMs) logical reasoning and problem-solving capabilities in complex agricultural scenarios. To address this, we introduce AgriCoT—the first Chain-of-Thought (CoT)-enhanced visual question answering dataset for agriculture, comprising 4,535 samples. AgriCoT is the first to incorporate human-annotated CoT rationales into agricultural VLM evaluation, enabling fine-grained analysis of multimodal understanding and stepwise reasoning. Zero-shot evaluation across 26 state-of-the-art VLMs reveals that while proprietary models achieve higher answer accuracy, they exhibit substantial deficiencies in reasoning coherence and causal logic. This work bridges a critical gap in explainable reasoning assessment for agriculture and advances VLM evaluation from an “answer correctness” paradigm toward a “reasoning validity” paradigm—emphasizing not only what is answered, but how and why.

Technology Category

Application Category

📝 Abstract

Recent advancements in Vision-Language Models (VLMs) have significantly transformed various industries. In agriculture, these dual-modal capabilities offer promising applications such as precision farming, crop monitoring, pest detection, and environmental sustainability. While several Visual Question Answering (VQA) datasets and benchmarks have been developed to evaluate VLM performance, they often fail to adequately assess the critical reasoning and problem-solving skills required in complex agricultural contexts. To address this gap, we introduce AgriCoT, a VQA dataset that incorporates Chain-of-Thought (CoT) reasoning, specifically designed to evaluate the reasoning capabilities of VLMs. With 4,535 carefully curated samples, AgriCoT offers a comprehensive and robust evaluation of reasoning abilities for VLMs, particularly in zero-shot scenarios, by focusing on their capacity to engage in logical reasoning and effective problem-solving. Our evaluations, conducted with 26 representative VLMs, including both proprietary and open-source models, reveal that while some proprietary models excel at answering questions, there is a notable and significant gap in their reasoning capabilities. This underscores the importance of incorporating CoT for more precise and effective assessments. Our dataset are available at https://huggingface.co/datasets/wenyb/AgriCoT.

Problem

Research questions and friction points this paper is trying to address.

Evaluating reasoning capabilities of vision-language models in agriculture

Addressing gaps in assessing logical reasoning for agricultural problem-solving

Measuring reasoning performance in zero-shot scenarios for agricultural applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing AgriCoT dataset for VLM reasoning evaluation

Incorporating Chain-of-Thought reasoning in agricultural VQA

Evaluating 26 VLMs with zero-shot logical reasoning tests

🔎 Similar Papers

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling