CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses key limitations in existing reinforcement learning (RL) and large language model (LLM)-based traffic signal control methods, which often suffer from poor interpretability, scarce interaction data, and weak generalization across heterogeneous intersections. To overcome these challenges, the authors propose a novel LLM-centric control framework that leverages RL agents to explore the environment and generate high-quality trajectories. A multi-LLM structured debate mechanism is then introduced to evaluate signal timing actions, producing preference-aware supervision signals for fine-tuning the model. This approach innovatively integrates RL-assisted exploration with multi-agent debate to automatically construct interpretable, high-quality control data and enable efficient training. Extensive experiments on real-world urban road networks in SUMO demonstrate that the proposed method outperforms state-of-the-art baselines, achieving average reductions of 5.34% in travel time, 5.14% in queue length, and 7.02% in waiting time.
📝 Abstract
Traffic signal control (TSC) is a core component of intelligent transportation systems (ITS), aiming to reduce congestion, emissions, and travel time. Recent approaches based on reinforcement learning (RL) and large language models (LLMs) have improved adaptivity, but still suffer from limited interpretability, insufficient interaction data, and weak generalization to heterogeneous intersections. This paper proposes CuraLight, an LLM-centered framework where an RL agent assists the fine-tuning of an LLM-based traffic signal controller. The RL agent explores traffic environments and generates high-quality interaction trajectories, which are converted into prompt-response pairs for imitation fine-tuning. A multi-LLM ensemble deliberation system further evaluates candidate signal timing actions through structured debate, providing preference-aware supervision signals for training. Experiments conducted in SUMO across heterogeneous real-world networks from Jinan, Hangzhou, and Yizhuang demonstrate that CuraLight consistently outperforms state-of-the-art baselines, reducing average travel time by 5.34 percent, average queue length by 5.14 percent, and average waiting time by 7.02 percent. The results highlight the effectiveness of combining RL-assisted exploration with deliberation-based data curation for scalable and interpretable traffic signal control.
Problem

Research questions and friction points this paper is trying to address.

traffic signal control
large language models
data curation
generalization
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-centered control
RL-assisted data curation
multi-LLM debate
imitation fine-tuning
interpretable traffic signal control
🔎 Similar Papers
No similar papers found.
Q
Qing Guo
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
Xinhang Li
Xinhang Li
Tsinghua University
Recommender SystemKnowledge GraphTransfer Learning
J
Junyu Chen
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
Zheng Guo
Zheng Guo
University of Michigan
program synthesisAI for science
S
Shengzhe Xu
Eastern Institute of Technology, Ningbo, China
L
Lin Zhang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China; Beijing Big Data Center, Beijing, China
Lei Li
Lei Li
Associate Professor, School of Computer Science, Carnegie Mellon University
Machine LearningNatural Language ProcessingMachine TranslationLLMAI Drug Discovery