MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training

📅 2025-08-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing personalized facial expression recognition methods often employ coarse-grained fusion of multi-source domains and multimodal information, leading to loss of subject-specific features and reduced inter-individual diversity. To address this, we propose a multimodal subject selection and adaptive co-training framework. First, we introduce a subject-relevance selection mechanism to dynamically identify highly correlated source domains. Second, we design a dual-objective co-optimization strategy comprising class-aware pseudo-label generation and class-agnostic consistency regularization. Third, we explicitly model individual differences via dominant-modality-guided pseudo-labeling, cross-modal feature alignment, and confident-sample fusion. Evaluated on BioVid and StressID benchmarks, our method significantly outperforms state-of-the-art unsupervised domain adaptation (UDA) and multi-source domain adaptation (MSDA) approaches—particularly enhancing robustness and accuracy in digital health applications such as stress and pain recognition across subjects.

Technology Category

Application Category

📝 Abstract

Personalized expression recognition (ER) involves adapting a machine learning model to subject-specific data for improved recognition of expressions with considerable interpersonal variability. Subject-specific ER can benefit significantly from multi-source domain adaptation (MSDA) methods, where each domain corresponds to a specific subject, to improve model accuracy and robustness. Despite promising results, state-of-the-art MSDA approaches often overlook multimodal information or blend sources into a single domain, limiting subject diversity and failing to explicitly capture unique subject-specific characteristics. To address these limitations, we introduce MuSACo, a multi-modal subject-specific selection and adaptation method for ER based on co-training. It leverages complementary information across multiple modalities and multiple source domains for subject-specific adaptation. This makes MuSACo particularly relevant for affective computing applications in digital health, such as patient-specific assessment for stress or pain, where subject-level nuances are crucial. MuSACo selects source subjects relevant to the target and generates pseudo-labels using the dominant modality for class-aware learning, in conjunction with a class-agnostic loss to learn from less confident target samples. Finally, source features from each modality are aligned, while only confident target features are combined. Our experimental results on challenging multimodal ER datasets: BioVid and StressID, show that MuSACo can outperform UDA (blending) and state-of-the-art MSDA methods.

Problem

Research questions and friction points this paper is trying to address.

Adapting ER models to subject-specific data for interpersonal variability

Leveraging multimodal MSDA to capture unique subject characteristics

Improving accuracy in affective computing for digital health applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal co-training for subject-specific adaptation

Pseudo-labeling with dominant modality for class-aware learning

Feature alignment across modalities for confident targets

🔎 Similar Papers

No similar papers found.

Authors to Follow