🤖 AI Summary
Few-shot continual event detection (FCED) confronts dual challenges of data scarcity and catastrophic forgetting. Existing full-parameter fine-tuning approaches suffer from cross-task knowledge interference, while data augmentation—commonly adopted to alleviate scarcity—risks compromising semantic fidelity. To address these issues, we propose a semantic-aware Mixture-of-Experts (MoE) continual learning framework. It employs low-rank adaptation (LoRA)-based expert modules for lightweight, isolated task-specific knowledge storage; introduces a label-description-driven semantic routing mechanism to dynamically assign tasks to dedicated experts; and integrates contrastive learning with knowledge distillation to enhance generalization and stability—without relying on data augmentation. Evaluated on multiple FCED benchmarks, our method significantly mitigates forgetting and achieves state-of-the-art performance, demonstrating superior robustness—especially under extremely limited shot settings.
📝 Abstract
Few-shot Continual Event Detection (FCED) poses the dual challenges of learning from limited data and mitigating catastrophic forgetting across sequential tasks. Existing approaches often suffer from severe forgetting due to the full fine-tuning of a shared base model, which leads to knowledge interference between tasks. Moreover, they frequently rely on data augmentation strategies that can introduce unnatural or semantically distorted inputs. To address these limitations, we propose LEAF, a novel and robust expert-based framework for FCED. LEAF integrates a specialized mixture of experts architecture into the base model, where each expert is parameterized with low-rank adaptation (LoRA) matrices. A semantic-aware expert selection mechanism dynamically routes instances to the most relevant experts, enabling expert specialization and reducing knowledge interference. To improve generalization in limited-data settings, LEAF incorporates a contrastive learning objective guided by label descriptions, which capture high-level semantic information about event types. Furthermore, to prevent overfitting on the memory buffer, our framework employs a knowledge distillation strategy that transfers knowledge from previous models to the current one. Extensive experiments on multiple FCED benchmarks demonstrate that LEAF consistently achieves state-of-the-art performance.