Agri-R1: Empowering Generalizable Agricultural Reasoning in Vision-Language Models with Reinforcement Learning

๐Ÿ“… 2026-01-08
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the limitations of existing vision-language models in agricultural disease diagnosisโ€”namely, their reliance on strong annotations, poor interpretability, and weak generalization in open-ended scenarios. The authors propose a novel method that automatically generates reasoning data without manual labeling by integrating vision-language synthesis with large language model filtering, constructing a high-quality training set using only 19% of the original samples. They further introduce a new reward function combining domain-specific lexicons and fuzzy matching, enabling structured reasoning through Group Relative Policy Optimization (GRPO). Evaluated on CDDMBench, their 3B-parameter model substantially outperforms 7Bโ€“13B baselines, achieving a 23.2% gain in disease identification accuracy, a 33.3% improvement in agricultural question answering, and a 26.10-point increase in cross-domain generalization.

Technology Category

Application Category

๐Ÿ“ Abstract
Agricultural disease diagnosis challenges VLMs, as conventional fine-tuning requires extensive labels, lacks interpretability, and generalizes poorly. While reasoning improves model robustness, existing methods rely on costly expert annotations and rarely address the open-ended, diverse nature of agricultural queries. To address these limitations, we propose \textbf{Agri-R1}, a reasoning-enhanced large model for agriculture. Our framework automates high-quality reasoning data generation via vision-language synthesis and LLM-based filtering, using only 19\% of available samples. Training employs Group Relative Policy Optimization (GRPO) with a novel proposed reward function that integrates domain-specific lexicons and fuzzy matching to assess both correctness and linguistic flexibility in open-ended responses. Evaluated on CDDMBench, our resulting 3B-parameter model achieves performance competitive with 7B- to 13B-parameter baselines, showing a +23.2\% relative gain in disease recognition accuracy, +33.3\% in agricultural knowledge QA, and a +26.10-point improvement in cross-domain generalization over standard fine-tuning. Ablation studies confirm that the synergy between structured reasoning data and GRPO-driven exploration underpins these gains, with benefits scaling as question complexity increases.
Problem

Research questions and friction points this paper is trying to address.

agricultural disease diagnosis
vision-language models
open-ended reasoning
data annotation
cross-domain generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning
vision-language models
reasoning data synthesis
Group Relative Policy Optimization
agricultural disease diagnosis
๐Ÿ”Ž Similar Papers
No similar papers found.