Label Curation Using Agentic AI

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of high cost, low efficiency, significant annotator bias, and unreliable label quality in large-scale multimodal data annotation by proposing AURA, a novel framework that integrates multi-agent collaboration with confusion matrix–based annotator reliability modeling. Without access to ground-truth labels, AURA jointly infers latent true labels and annotator reliability through a probabilistic graphical model combined with the Expectation-Maximization (EM) algorithm, enabling high-quality automatic label aggregation without manual pre-validation. Experimental results demonstrate that AURA improves accuracy by up to 5.8% across four benchmark datasets, with gains as high as 50% in low-quality annotation scenarios, while also providing precise quantification of annotator trustworthiness.

Technology Category

Application Category

📝 Abstract

Data annotation is essential for supervised learning, yet producing accurate, unbiased, and scalable labels remains challenging as datasets grow in size and modality. Traditional human-centric pipelines are costly, slow, and prone to annotator variability, motivating reliability-aware automated annotation. We present AURA (Agentic AI for Unified Reliability Modeling and Annotation Aggregation), an agentic AI framework for large-scale, multi-modal data annotation. AURA coordinates multiple AI agents to generate and validate labels without requiring ground truth. At its core, AURA adapts a classical probabilistic model that jointly infers latent true labels and annotator reliability via confusion matrices, using Expectation-Maximization to reconcile conflicting annotations and aggregate noisy predictions. Across the four benchmark datasets evaluated, AURA achieves accuracy improvements of up to 5.8% over baseline. In more challenging settings with poor quality annotators, the improvement is up to 50% over baseline. AURA also accurately estimates the reliability of annotators, allowing assessment of annotator quality even without any pre-validation steps.

Problem

Research questions and friction points this paper is trying to address.

data annotation

label curation

annotator reliability

supervised learning

multi-modal data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic AI

Label Curation

Reliability Modeling

Expectation-Maximization

Multi-modal Annotation

🔎 Similar Papers

Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models

2024-07-15arXiv.orgCitations: 0

Preference Consistency Matters: Enhancing Preference Learning in Language Models with Automated Self-Curation of Training Corpora

2024-08-23Citations: 0

Authors to Follow