π€ AI Summary
This work addresses the challenge of parsing collapse in generative listwise ranking within multimodal long-context scenarios, where premature termination of autoregressive decoding often leads to incomplete rankings. To mitigate this issue, the authors propose PRISMR, a novel framework that introduces parameterized structural conditioning into multimodal listwise ranking for the first time. PRISMR employs a lightweight hypernetwork to encode candidate items in parallel and dynamically generates item-specific LoRA weights to instantiate adapters, thereby internalizing transient list-processing logic directly into the model. This approach substantially reduces parsing collapse rates and enhances ranking performance. The effectiveness and generalizability of PRISMR are validated on a newly curated large-scale multimodal review-ranking benchmark, demonstrating strong cross-domain and cross-backbone transfer capabilities.
π Abstract
Generative listwise ranking with Large Multimodal Models (LMMs) aims to capture global list context in a single forward pass, but
its effectiveness degrades in long-context multimodal scenarios. We identify a recurring failure mode, parse collapse, where the
autoregressive decoder produces fluent yet incomplete rankings by silently omitting candidates and terminating early. This
failure stems from limited context utilization rather than simple formatting mistakes, making prompt engineering and constrained
decoding insufficient. We propose PRISMR (Parameterized Representation Internalization for Semantic Multimodal Ranking), a
framework that replaces transient in-context list processing with parametric structural conditioning. PRISMR uses a lightweight
hypernetwork to encode multimodal candidates in parallel and generate item-specific LoRA weights, which are synthesized into an
instance-specific adapter for a LMM. This paradigm enables more robust internalization of list structure while preserving the
base model. We further introduce a large-scale multimodal review-ranking benchmark for evaluation. Experiments demonstrate that
PRISMR substantially reduces parse collapse, improves listwise ranking performance, and transfers effectively across domains and
instruction-tuned backbones.