๐ค AI Summary
This work addresses video retrieval and grounded article generation in multimodal retrieval-augmented generation by organizing the MAGMaR 2026 shared task, which comprises two subtasks: video retrieval and grounded text generation. For the first time, the task jointly evaluates system performance on both retrieval and generation within a unified framework and introduces human-annotated optimal outputs as a gold standard for evaluation. Participating systems leveraged techniques including multimodal retrieval, videoโtext alignment, and controllable text generation. Results show that all 17 submitted systems in the retrieval subtask surpassed last yearโs champion baseline, and in the generation subtask, 16 systems from four teams each produced at least one article rated by human evaluators as among the best, demonstrating the effectiveness and innovation of the proposed approaches.
๐ Abstract
This overview paper presents the results of the shared task for the second workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR). In this shared task participants submitted systems focused on either (i) video retrieval or (ii) grounded generation of articles given retrieved videos. Teams could submit to either task. For the retrieval task, we had 2 participating teams that submitted a total of 17 systems -- all of which beat a baseline derived from the winner of last year's shared task. On the generation side, we had 4 teams submit 16 systems. All teams had at least one generated report that was labeled the best by a human annotator.