Multimodal Sexism Identification and Characterization using Large Language Models and Gradient Boosting

πŸ“… 2026-06-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

198K/year
πŸ€– AI Summary
This study addresses the identification and characterization of multimodal gender bias in internet memes and short videos. It proposes a late-fusion framework based on gradient-boosted regression, augmented with a hierarchical post-processing strategy that integrates visual, textual, demographic, biometric, and high-level semantic features extracted by large language models (LLMs). The findings reveal that LLM-derived semantic features substantially enhance detection performance in static meme tasks, whereas temporal modeling proves critical for dynamic video tasks. Notably, using the full, unfiltered feature set during testing yields superior generalization in video analysis, underscoring a fundamental divergence in optimal processing strategies between static and dynamic modalities.
πŸ“ Abstract
We present the AILS-NTUA submission to the EXIST 2026 Lab at CLEF, addressing multimodal sexism identification and characterization in memes (Task 2) and short-form videos (Task 3). Our system follows a feature-engineered late-fusion pipeline built around gradient-boosted regression models and hierarchical post-processing. For memes, we combine visual, textual, demographic, biometric, and LLM-derived semantic indicators designed to capture high-level cues such as stereotyping, objectification, irony, and misogyny. For videos, we investigate the effect of feature selection, frame-based visual representations, OCR-based textual features, acoustic descriptors, and sensor-derived metadata. Development results show that focused LLM-derived semantic cues improve meme sexism identification, while video performance is highly sensitive to feature dimensionality and cross-modal noise. For videos, development results favor compact feature selection, but official test results show that this conclusion does not fully transfer to unseen data, where the unfiltered representation generalizes better. Overall, our findings highlight the usefulness of targeted semantic feature engineering for static memes and the need for more robust temporal modeling in noisy short-form video settings.
Problem

Research questions and friction points this paper is trying to address.

multimodal sexism
memes
short-form videos
sexism identification
gender bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal sexism detection
LLM-derived semantic features
gradient boosting
late-fusion pipeline
feature engineering
πŸ”Ž Similar Papers
2024-06-11Conference and Labs of the Evaluation ForumCitations: 1