🤖 AI Summary
Current large language model (LLM)-enhanced annotation systems in health communities reset context when processing new posts, preventing continuous learning from historical corrections and thereby limiting both efficiency and quality. To address this, this work proposes EvoNote, a framework featuring an evolvable experiential memory mechanism that maps trajectory-level feedback to fine-grained dimensions—such as claim analysis, evidence retrieval, and annotation composition—through credit assignment, and distills these into action-level memories. Integrating LLM agents, multimodal health content understanding, and hierarchical utility evaluation, EvoNote significantly outperforms human annotators on the MM-HealthCN benchmark, achieving 89.6% performance parity with experts, generates valid annotations for 82.0% of unrated posts, and reduces error-correction candidate generation time from 13 hours to under two minutes.
📝 Abstract
Large Language Model (LLM)-augmented Community Notes offer a scalable path for timely, evidence-grounded correction of health misinformation on social platforms. However, they still reset at every post, leaving useful correction experience from prior cases unused. We introduce EvoNote, an agentic framework that enables health Community Notes generation to self-evolve through an evolving experience memory of prior misinformation correction episodes. Its core is fine-grained credit assignment: EvoNote grounds trajectory-level feedback in health-specific note qualities and distills it into action-level memory for claim analysis, evidence acquisition, and note writing. We evaluate EvoNote on MM-HealthCN, a 1.2K-instance multimodal benchmark of user-flagged health posts with human-written Community Notes and crowd-derived helpfulness labels. Under a human-validated hierarchical utility judge, EvoNote-generated notes are preferred over corresponding human-written notes in 89.6% of cases; on a separate set of Needs More Ratings posts without a crowd helpfulness verdict, EvoNote produces helpful notes for 82.0% of cases. It also reduces the median time needed to produce a candidate correction from over 13 hours in the human-note pipeline to under 2 minutes. Analyses link these gains to stronger evidence use and reusable correction strategies, positioning self-evolving note generation as a promising paradigm for health misinformation governance.