π€ AI Summary
To address the poor scalability, narrow topic coverage, and coarse content matching of keyword landing pages (KLPs) generated via manual curation or sparse search logs on online platforms, this paper proposes a content-driven KLP auto-generation paradigm. Methodologically, we design a cascaded VLM+LLM+CLIP dual-encoder architecture: a vision-language model (VLM) extracts fine-grained visual attributes; a large language model (LLM) generates semantically coherent topic tags; and a CLIP-based dual-encoder ensures precise cross-modal alignment, jointly optimized across multiple stages for end-to-end quality improvement. Evaluated on Fashion200K, our approach achieves 99.7% Recall@10. Deployed online for 4.2 million shopping landing pages, it increases topic coverage fourfold and improves attribute accuracy by 14.29% (human evaluation). To our knowledge, this is the first KLP generation framework grounded directly in raw visual contentβnot user behavioral signals.
π Abstract
Online platforms like Pinterest hosting vast content collections traditionally rely on manual curation or user-generated search logs to create keyword landing pages (KLPs) -- topic-centered collection pages that serve as entry points for content discovery. While manual curation ensures quality, it doesn't scale to millions of collections, and search log approaches result in limited topic coverage and imprecise content matching. In this paper, we present PinLanding, a novel content-first architecture that transforms the way platforms create topical collections. Instead of deriving topics from user behavior, our system employs a multi-stage pipeline combining vision-language model (VLM) for attribute extraction, large language model (LLM) for topic generation, and a CLIP-based dual-encoder architecture for precise content matching. Our model achieves 99.7% Recall@10 on Fashion200K benchmark, demonstrating strong attribute understanding capabilities. In production deployment for search engine optimization with 4.2 million shopping landing pages, the system achieves a 4X increase in topic coverage and 14.29% improvement in collection attribute precision over the traditional search log-based approach via human evaluation. The architecture can be generalized beyond search traffic to power various user experiences, including content discovery and recommendations, providing a scalable solution to transform unstructured content into curated topical collections across any content domain.