Better with Experience: Self-Evolving LLM Agents for Evidence-Grounded Health Community Notes

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Current large language model (LLM)-enhanced annotation systems in health communities reset context when processing new posts, preventing continuous learning from historical corrections and thereby limiting both efficiency and quality. To address this, this work proposes EvoNote, a framework featuring an evolvable experiential memory mechanism that maps trajectory-level feedback to fine-grained dimensions—such as claim analysis, evidence retrieval, and annotation composition—through credit assignment, and distills these into action-level memories. Integrating LLM agents, multimodal health content understanding, and hierarchical utility evaluation, EvoNote significantly outperforms human annotators on the MM-HealthCN benchmark, achieving 89.6% performance parity with experts, generates valid annotations for 82.0% of unrated posts, and reduces error-correction candidate generation time from 13 hours to under two minutes.

📝 Abstract

Large Language Model (LLM)-augmented Community Notes offer a scalable path for timely, evidence-grounded correction of health misinformation on social platforms. However, they still reset at every post, leaving useful correction experience from prior cases unused. We introduce EvoNote, an agentic framework that enables health Community Notes generation to self-evolve through an evolving experience memory of prior misinformation correction episodes. Its core is fine-grained credit assignment: EvoNote grounds trajectory-level feedback in health-specific note qualities and distills it into action-level memory for claim analysis, evidence acquisition, and note writing. We evaluate EvoNote on MM-HealthCN, a 1.2K-instance multimodal benchmark of user-flagged health posts with human-written Community Notes and crowd-derived helpfulness labels. Under a human-validated hierarchical utility judge, EvoNote-generated notes are preferred over corresponding human-written notes in 89.6% of cases; on a separate set of Needs More Ratings posts without a crowd helpfulness verdict, EvoNote produces helpful notes for 82.0% of cases. It also reduces the median time needed to produce a candidate correction from over 13 hours in the human-note pipeline to under 2 minutes. Analyses link these gains to stronger evidence use and reusable correction strategies, positioning self-evolving note generation as a promising paradigm for health misinformation governance.

Problem

Research questions and friction points this paper is trying to address.

health misinformation

Community Notes

experience reuse

evidence-grounded correction

self-evolving agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-evolving agents

evidence-grounded correction

fine-grained credit assignment