PRISMR: Overcoming Parse Collapse in Multimodal Listwise Ranking via Parameterized Representation Internalization

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of parsing collapse in generative listwise ranking within multimodal long-context scenarios, where premature termination of autoregressive decoding often leads to incomplete rankings. To mitigate this issue, the authors propose PRISMR, a novel framework that introduces parameterized structural conditioning into multimodal listwise ranking for the first time. PRISMR employs a lightweight hypernetwork to encode candidate items in parallel and dynamically generates item-specific LoRA weights to instantiate adapters, thereby internalizing transient list-processing logic directly into the model. This approach substantially reduces parsing collapse rates and enhances ranking performance. The effectiveness and generalizability of PRISMR are validated on a newly curated large-scale multimodal review-ranking benchmark, demonstrating strong cross-domain and cross-backbone transfer capabilities.

📝 Abstract

Generative listwise ranking with Large Multimodal Models (LMMs) aims to capture global list context in a single forward pass, but its effectiveness degrades in long-context multimodal scenarios. We identify a recurring failure mode, parse collapse, where the autoregressive decoder produces fluent yet incomplete rankings by silently omitting candidates and terminating early. This failure stems from limited context utilization rather than simple formatting mistakes, making prompt engineering and constrained decoding insufficient. We propose PRISMR (Parameterized Representation Internalization for Semantic Multimodal Ranking), a framework that replaces transient in-context list processing with parametric structural conditioning. PRISMR uses a lightweight hypernetwork to encode multimodal candidates in parallel and generate item-specific LoRA weights, which are synthesized into an instance-specific adapter for a LMM. This paradigm enables more robust internalization of list structure while preserving the base model. We further introduce a large-scale multimodal review-ranking benchmark for evaluation. Experiments demonstrate that PRISMR substantially reduces parse collapse, improves listwise ranking performance, and transfers effectively across domains and instruction-tuned backbones.

Problem

Research questions and friction points this paper is trying to address.

parse collapse

multimodal listwise ranking

long-context

autoregressive decoding

incomplete ranking

Innovation

Methods, ideas, or system contributions that make the work stand out.

parse collapse

parameterized representation internalization

multimodal listwise ranking