Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Chart reasoning faces challenges due to scarce high-quality annotated data, difficulty in modeling multi-subgraph relationships, and sensitivity to numerical reasoning. Method: This paper proposes R1-Style—a novel framework comprising (1) a procedural data synthesis pipeline that generates verifiable, progressively complex chart reasoning instances; and (2) a two-stage training paradigm: Stage I performs chain-of-thought supervised fine-tuning using Chart-COT; Stage II employs reinforcement learning with grouped relative fine-tuning (Chart-RFT) and a numerically sensitive reward mechanism to enhance deep reasoning. Contribution/Results: R1-Style significantly outperforms existing chart-specific models on open benchmarks and the newly introduced high-difficulty ChartRQA dataset, matching or exceeding the performance of state-of-the-art multimodal LLMs—including GPT-4o and Claude-3.5—thereby establishing a scalable new paradigm for data generation and training in complex chart understanding.

Technology Category

Application Category

📝 Abstract

Recently, inspired by OpenAI-o1/o3 and Deepseek-R1, the R1-Style method based on reinforcement learning fine-tuning has received widespread attention from the community. Previous R1-Style methods mainly focus on mathematical reasoning and code intelligence. It is of great research significance to verify their advantages on more general multimodal data. Chart is an important multimodal data type with rich information, which brings important research challenges in complex reasoning. In this work, we introduce Chart-R1, a chart-domain vision-language model with reinforcement learning fine-tuning to enable complex chart reasoning. To support Chart-R1, we first propose a novel programmatic data synthesis technology to generate high-quality step-by-step chart reasoning data covering single- and multi-subcharts, which makes up for the lack of reasoning data in the chart domain. Then we develop a two-stage training strategy: Chart-COT with step-by-step chain-of-thought supervision, and Chart-RFT with numerically sensitive reinforcement fine-tuning. Chart-COT aims to decompose complex chart reasoning tasks into fine-grained, understandable subtasks through step-by-step supervision, which lays a good foundation for improving the reasoning level of reinforcement learning. Chart-RFT utilize the typical group relative policy optimization strategy, in which a relatively soft reward is adopted for numerical response to emphasize the numerical sensitivity in the chart domain. We conduct extensive experiments on open-source benchmarks and self-built chart reasoning dataset (emph{i.e., ChartRQA}). Experimental results show that Chart-R1 has significant advantages compared to chart-domain methods, even comparable to open/closed source large-scale models (emph{e.g., GPT-4o, Claude-3.5}).

Problem

Research questions and friction points this paper is trying to address.

Develops Chart-R1 for complex chart reasoning using reinforcement learning.

Addresses lack of chart-domain reasoning data via programmatic synthesis.

Enhances numerical sensitivity in chart reasoning with two-stage training.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning fine-tunes chart reasoning model

Programmatic synthesis generates high-quality reasoning data

Two-stage training combines COT and RFT strategies

🔎 Similar Papers

No similar papers found.