EvoPool: Evolutionary Programmatic Annotation for Label-Efficient Specialized Supervision

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the poor few-shot learning performance of large language models (LLMs) in high-stakes, label-scarce professional domains by proposing EvoPool, a novel framework inspired by Darwinian evolution. EvoPool employs a multi-agent collaborative process to iteratively generate executable labeling programs, incorporating a selection mechanism based on feasibility, diversity, and marginal contribution. It further introduces EvoAgg, a text-aware soft label aggregation strategy. Evaluated across eight specialized tasks, EvoPool outperforms the strongest LLM-based labeling baseline on seven, achieving an average macro F1 improvement of 0.141 (up to 0.301). Moreover, it accelerates the annotation process by 4,500 to 31,000 times compared to LLMs, enabling highly efficient and cost-effective expert-level labeling.

📝 Abstract

Large language models excel at general tasks but underperform smaller supervised models in specialized, high-stakes domains where training labels are costly. We address this regime with EvoPool, an evolutionary multi-agent framework inspired by Darwinian evolution. Three specialized agents iteratively propose executable annotator code, a small validation set provides a fitness signal, and a deterministic gate keeps only annotators that pass viability, diversity, and marginal-contribution checks across generations. Pool votes are mapped to soft training labels by EvoAgg, a text-aware aggregator combining semantic features with annotator-vote features. The authored pool runs at near-zero per-example cost and is 4500 to 31000x faster than LLM annotation on 100K examples. Across 7 of 8 LLM-weak specialized and complex tasks spanning biomedical relation extraction, legal-clause classification, complex reasoning, and dense multi-label biomedical classification, EvoPool beats the strongest LLM annotation baseline by an average +0.141 macro-F1, peaking at +0.301 on ChemProt and +0.265 on PubMed. Code is available at: https://github.com/tianyi0216/EvoPool

Problem

Research questions and friction points this paper is trying to address.

label-efficient learning

specialized supervision

programmatic annotation

costly annotations

domain-specific tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary Programmatic Annotation

Label-Efficient Supervision

Multi-Agent Framework