EvoPool: Evolutionary Programmatic Annotation for Label-Efficient Specialized Supervision

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

156K/year
🤖 AI Summary
This work addresses the poor few-shot learning performance of large language models (LLMs) in high-stakes, label-scarce professional domains by proposing EvoPool, a novel framework inspired by Darwinian evolution. EvoPool employs a multi-agent collaborative process to iteratively generate executable labeling programs, incorporating a selection mechanism based on feasibility, diversity, and marginal contribution. It further introduces EvoAgg, a text-aware soft label aggregation strategy. Evaluated across eight specialized tasks, EvoPool outperforms the strongest LLM-based labeling baseline on seven, achieving an average macro F1 improvement of 0.141 (up to 0.301). Moreover, it accelerates the annotation process by 4,500 to 31,000 times compared to LLMs, enabling highly efficient and cost-effective expert-level labeling.
📝 Abstract
Large language models excel at general tasks but underperform smaller supervised models in specialized, high-stakes domains where training labels are costly. We address this regime with EvoPool, an evolutionary multi-agent framework inspired by Darwinian evolution. Three specialized agents iteratively propose executable annotator code, a small validation set provides a fitness signal, and a deterministic gate keeps only annotators that pass viability, diversity, and marginal-contribution checks across generations. Pool votes are mapped to soft training labels by EvoAgg, a text-aware aggregator combining semantic features with annotator-vote features. The authored pool runs at near-zero per-example cost and is 4500 to 31000x faster than LLM annotation on 100K examples. Across 7 of 8 LLM-weak specialized and complex tasks spanning biomedical relation extraction, legal-clause classification, complex reasoning, and dense multi-label biomedical classification, EvoPool beats the strongest LLM annotation baseline by an average +0.141 macro-F1, peaking at +0.301 on ChemProt and +0.265 on PubMed. Code is available at: https://github.com/tianyi0216/EvoPool
Problem

Research questions and friction points this paper is trying to address.

label-efficient learning
specialized supervision
programmatic annotation
costly annotations
domain-specific tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary Programmatic Annotation
Label-Efficient Supervision
Multi-Agent Framework
Soft Label Aggregation
Specialized Domain Learning