PinLanding: Content-First Keyword Landing Page Generation via Multi-Modal AI for Web-Scale Discovery

πŸ“… 2025-03-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the poor scalability, narrow topic coverage, and coarse content matching of keyword landing pages (KLPs) generated via manual curation or sparse search logs on online platforms, this paper proposes a content-driven KLP auto-generation paradigm. Methodologically, we design a cascaded VLM+LLM+CLIP dual-encoder architecture: a vision-language model (VLM) extracts fine-grained visual attributes; a large language model (LLM) generates semantically coherent topic tags; and a CLIP-based dual-encoder ensures precise cross-modal alignment, jointly optimized across multiple stages for end-to-end quality improvement. Evaluated on Fashion200K, our approach achieves 99.7% Recall@10. Deployed online for 4.2 million shopping landing pages, it increases topic coverage fourfold and improves attribute accuracy by 14.29% (human evaluation). To our knowledge, this is the first KLP generation framework grounded directly in raw visual contentβ€”not user behavioral signals.

Technology Category

Application Category

πŸ“ Abstract
Online platforms like Pinterest hosting vast content collections traditionally rely on manual curation or user-generated search logs to create keyword landing pages (KLPs) -- topic-centered collection pages that serve as entry points for content discovery. While manual curation ensures quality, it doesn't scale to millions of collections, and search log approaches result in limited topic coverage and imprecise content matching. In this paper, we present PinLanding, a novel content-first architecture that transforms the way platforms create topical collections. Instead of deriving topics from user behavior, our system employs a multi-stage pipeline combining vision-language model (VLM) for attribute extraction, large language model (LLM) for topic generation, and a CLIP-based dual-encoder architecture for precise content matching. Our model achieves 99.7% Recall@10 on Fashion200K benchmark, demonstrating strong attribute understanding capabilities. In production deployment for search engine optimization with 4.2 million shopping landing pages, the system achieves a 4X increase in topic coverage and 14.29% improvement in collection attribute precision over the traditional search log-based approach via human evaluation. The architecture can be generalized beyond search traffic to power various user experiences, including content discovery and recommendations, providing a scalable solution to transform unstructured content into curated topical collections across any content domain.
Problem

Research questions and friction points this paper is trying to address.

Automates keyword landing page creation for web-scale content discovery.
Improves topic coverage and content matching precision over traditional methods.
Leverages multi-modal AI to transform unstructured content into curated collections.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal AI for content-first KLP generation
Vision-language model for attribute extraction
CLIP-based dual-encoder for precise content matching
πŸ”Ž Similar Papers