FITRep: Attention-Guided Item Representation via MLLMs

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF

career value

165K/year
🤖 AI Summary
Near-duplicate items in online platforms cause visual and textual similarity confusion, degrading recommendation quality. Existing multimodal large language models (MLLMs) treat multimodal representations as black-box embeddings, neglecting structural relationships between primary and auxiliary elements—leading to local structural collapse. To address this, we propose FITRep, the first attention-guided, white-box item representation framework. FITRep innovatively integrates feature integration theory to jointly design: (i) a hierarchical semantic concept extraction module (CHIE), (ii) an adaptive UMAP-based structure-preserving dimensionality reduction module (SPDR), and (iii) FAISS-based fine-grained clustering (FBC) for deduplication. Deployed in Meituan’s advertising system, A/B testing shows a 3.60% lift in CTR and a 4.25% increase in CPM, while effectively mitigating structural collapse. FITRep establishes a novel, interpretable, and structure-aware paradigm for multimodal item representation.

Technology Category

Application Category

📝 Abstract
Online platforms usually suffer from user experience degradation due to near-duplicate items with similar visuals and text. While Multimodal Large Language Models (MLLMs) enable multimodal embedding, existing methods treat representations as black boxes, ignoring structural relationships (e.g., primary vs. auxiliary elements), leading to local structural collapse problem. To address this, inspired by Feature Integration Theory (FIT), we propose FITRep, the first attention-guided, white-box item representation framework for fine-grained item deduplication. FITRep consists of: (1) Concept Hierarchical Information Extraction (CHIE), using MLLMs to extract hierarchical semantic concepts; (2) Structure-Preserving Dimensionality Reduction (SPDR), an adaptive UMAP-based method for efficient information compression; and (3) FAISS-Based Clustering (FBC), a FAISS-based clustering that assigns each item a unique cluster id using FAISS. Deployed on Meituan's advertising system, FITRep achieves +3.60% CTR and +4.25% CPM gains in online A/B tests, demonstrating both effectiveness and real-world impact.
Problem

Research questions and friction points this paper is trying to address.

Addresses near-duplicate item degradation in online platforms
Solves structural collapse in multimodal item representations
Enables fine-grained item deduplication through attention guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-guided white-box item representation framework
Hierarchical semantic concepts extraction using MLLMs
Structure-preserving dimensionality reduction with adaptive UMAP
🔎 Similar Papers