MAPLE: Modality-Aware Post-training and Learning Ecosystem

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reinforcement learning post-training methods fail to differentiate among multimodal inputs, leading to high policy gradient variance, slow convergence, and poor robustness to missing modalities or distribution shifts. To address this, this work proposes the first task-level benchmark for minimal modality-combination annotation and introduces MAPO, a modality-aware policy optimization framework that integrates hierarchical batching, adaptive weighting, and a curriculum scheduling mechanism based on signal-combination difficulty. Experiments on MAPLE-bench demonstrate that the proposed approach reduces the accuracy gap between unimodal and multimodal settings by 30.24%, accelerates convergence by a factor of 3.18, and maintains stable performance across diverse modality-missing scenarios.

Technology Category

Application Category

📝 Abstract
Multimodal language models now integrate text, audio, and video for unified reasoning. Yet existing RL post-training pipelines treat all input signals as equally relevant, ignoring which modalities each task actually requires. This modality-blind training inflates policy-gradient variance, slows convergence, and degrades robustness to real-world distribution shifts where signals may be missing, added, or reweighted. We introduce MAPLE, a complete modality-aware post-training and learning ecosystem comprising: (1) MAPLE-bench, the first benchmark explicitly annotating minimal signal combinations required per task; (2) MAPO, a modality-aware policy optimization framework that stratifies batches by modality requirement to reduce gradient variance from heterogeneous group advantages; (3) Adaptive weighting and curriculum scheduling that balances and prioritizes harder signal combinations. Systematic analysis across loss aggregation, clipping, sampling, and curriculum design establishes MAPO's optimal training strategy. Adaptive weighting and curriculum focused learning further boost performance across signal combinations. MAPLE narrows uni/multi-modal accuracy gaps by 30.24%, converges 3.18x faster, and maintains stability across all modality combinations under realistic reduced signal access. MAPLE constitutes a complete recipe for deployment-ready multimodal RL post-training.
Problem

Research questions and friction points this paper is trying to address.

multimodal language models
modality-aware training
reinforcement learning post-training
distribution shift
gradient variance
Innovation

Methods, ideas, or system contributions that make the work stand out.

modality-aware
policy optimization
multimodal RL
curriculum learning
gradient variance reduction
🔎 Similar Papers
N
Nikhil Verma
LG Electronics Toronto AI Lab, Toronto, Canada
M
Minjung Kim
LG Electronics CTO AI Lab, Seoul, Republic of Korea
J
JooYoung Yoo
LG Electronics CTO AI Lab, Seoul, Republic of Korea
K
Kyung-Min Jin
LG Electronics CTO AI Lab, Seoul, Republic of Korea
Manasa Bharadwaj
Manasa Bharadwaj
Staff AI Research Scientist, LG Electronics Toronto AI Lab
NLPConversational AIGenerative AI
Kevin Ferreira
Kevin Ferreira
LG Electronics Toronto AI Lab
Ko Keun Kim
Ko Keun Kim
AI Lab, LG Electronics
AIMachine LearningNeuroscienceBiomedical EngineeringBiological Signal Processing
Y
Youngjoon Kim
LG Electronics CTO AI Lab, Seoul, Republic of Korea