AULLM++: Structural Reasoning with Large Language Models for Micro-Expression Recognition

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Micro-expression action unit (AU) detection is often hindered by background noise, coarse-grained feature representations, and the neglect of inter-AU dependencies, which collectively limit recognition performance. To address these challenges, this work proposes the AULLM++ framework, which uniquely integrates large language models with structured reasoning. Specifically, it employs a multi-granularity evidence-enhanced fusion projector to extract visual-semantic prompts, leverages a relation-aware AU graph neural network to model dependencies among AUs, and incorporates counterfactual consistency regularization to enhance robustness. Evaluated on standard micro-expression benchmarks, the proposed method achieves state-of-the-art performance and demonstrates strong cross-domain generalization capabilities.

Technology Category

Application Category

📝 Abstract
Micro-expression Action Unit (AU) detection identifies localized AUs from subtle facial muscle activations, providing a foundation for decoding affective cues. Previous methods face three key limitations: (1) heavy reliance on low-density visual information, rendering discriminative evidence vulnerable to background noise; (2) coarse-grained feature processing that misaligns with the demand for fine-grained representations; and (3) neglect of inter-AU correlations, restricting the parsing of complex expression patterns. We propose AULLM++, a reasoning-oriented framework leveraging Large Language Models (LLMs), which injects visual features into textual prompts as actionable semantic premises to guide inference. It formulates AU prediction into three stages: evidence construction, structure modeling, and deduction-based prediction. Specifically, a Multi-Granularity Evidence-Enhanced Fusion Projector (MGE-EFP) fuses mid-level texture cues with high-level semantics, distilling them into a compact Content Token (CT). Furthermore, inspired by micro- and macro-expression AU correspondence, we encode AU relationships as a sparse structural prior and learn interaction strengths via a Relation-Aware AU Graph Neural Network (R-AUGNN), producing an Instruction Token (IT). We then fuse CT and IT into a structured textual prompt and introduce Counterfactual Consistency Regularization (CCR) to construct counterfactual samples, enhancing the model's generalization. Extensive experiments demonstrate AULLM++ achieves state-of-the-art performance on standard benchmarks and exhibits superior cross-domain generalization.
Problem

Research questions and friction points this paper is trying to address.

Micro-Expression Recognition
Action Unit Detection
Inter-AU Correlations
Fine-Grained Representation
Visual Noise Robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Micro-Expression Recognition
Action Unit Detection
Graph Neural Network
Counterfactual Consistency Regularization
Z
Zhishu Liu
Great Bay University, Dongguan, China
K
Kaishen Yuan
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
B
Bo Zhao
Great Bay University, Dongguan, China
H
Hui Ma
Great Bay University, Dongguan, China; Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Zitong Yu
Zitong Yu
U.S. Food and Drug Administration
Medical imagingDeep learningMachine learningImage reconstruction