What Matters in LLM-Based Feature Extractor for Recommender? A Systematic Analysis of Prompts, Models, and Adaptation

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing studies employ large language models (LLMs) as semantic feature extractors for sequential recommendation systems (SRS), but suffer from highly heterogeneous prompt design, architectural choices, and adaptation strategies—hindering fair attribution of design factors. To address this, we propose RecXplore, the first modular analytical framework that decouples LLM-driven sequential recommendation into four independently evaluable components: data processing, feature extraction, feature adaptation, and sequence modeling. This enables standardized ablation studies and systematic discovery of effective design patterns. Evaluated on four public benchmark datasets, RecXplore achieves end-to-end improvements of +18.7% in NDCG@5 and +12.7% in HR@5 over strong baselines by composing state-of-the-art modules. Our core contribution is establishing the first decomposable, reproducible, and comparable analytical paradigm for LLM-based feature extraction in sequential recommendation.

Technology Category

Application Category

📝 Abstract
Using Large Language Models (LLMs) to generate semantic features has been demonstrated as a powerful paradigm for enhancing Sequential Recommender Systems (SRS). This typically involves three stages: processing item text, extracting features with LLMs, and adapting them for downstream models. However, existing methods vary widely in prompting, architecture, and adaptation strategies, making it difficult to fairly compare design choices and identify what truly drives performance. In this work, we propose RecXplore, a modular analytical framework that decomposes the LLM-as-feature-extractor pipeline into four modules: data processing, semantic feature extraction, feature adaptation, and sequential modeling. Instead of proposing new techniques, RecXplore revisits and organizes established methods, enabling systematic exploration of each module in isolation. Experiments on four public datasets show that simply combining the best designs from existing techniques without exhaustive search yields up to 18.7% relative improvement in NDCG@5 and 12.7% in HR@5 over strong baselines. These results underscore the utility of modular benchmarking for identifying effective design patterns and promoting standardized research in LLM-enhanced recommendation.
Problem

Research questions and friction points this paper is trying to address.

Systematically analyzes LLM-based feature extraction for recommenders
Identifies key factors in prompts, models, and adaptation strategies
Proposes modular framework to evaluate design choices fairly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework for feature extraction
Systematic analysis of existing methods
Combining best designs without exhaustive search
K
Kainan Shi
Xi’an Jiaotong University, Xi’an, China
Peilin Zhou
Peilin Zhou
HKUST; Peking University
sequential recommendationnatural language processing
G
Ge Wang
Xi’an Jiaotong University, Xi’an, China
H
Han Ding
Xi’an Jiaotong University, Xi’an, China
F
Fei Wang
Xi’an Jiaotong University, Xi’an, China