๐ค AI Summary
User profiling faces challenges including scarcity of ground-truth labels, highly heterogeneous and noisy data, and limited reliability of large language models (LLMs), compounded by the absence of standardized benchmarks. To address these issues, we propose Conf-Profile, a confidence-driven two-stage unsupervised user profiling framework. Methodologically, it introduces the first confidence-guided unsupervised reinforcement learning paradigm: LLMs generate synthetic labels with calibrated confidence scores; pseudo-label optimization is achieved via confidence-weighted voting, dynamic calibration, and knowledge distillation; and confidence-aware sample selection, reward weighting, and policy updates enable difficulty-adaptive learning. Evaluated on Qwen3-8B, Conf-Profile achieves a +13.97 F1-score improvement over baselines, demonstratingๆพ่ enhanced robustness and generalization. This work establishes a scalable, label-free paradigm for high-quality user profiling.
๐ Abstract
User profiling, as a core technique for user understanding, aims to infer structural attributes from user information. Large Language Models (LLMs) provide a promising avenue for user profiling, yet the progress is hindered by the lack of comprehensive benchmarks. To bridge this gap, we propose ProfileBench, an industrial benchmark derived from a real-world video platform, encompassing heterogeneous user data and a well-structured profiling taxonomy. However, the profiling task remains challenging due to the difficulty of collecting large-scale ground-truth labels, and the heterogeneous and noisy user information can compromise the reliability of LLMs. To approach label-free and reliable user profiling, we propose a Confidence-driven Profile reasoning framework Conf-Profile, featuring a two-stage paradigm. We first synthesize high-quality labels by leveraging advanced LLMs with confidence hints, followed by confidence-weighted voting for accuracy improvement and confidence calibration for a balanced distribution. The multiple profile results, rationales, and confidence scores are aggregated and distilled into a lightweight LLM. We further enhance the reasoning ability via confidence-guided unsupervised reinforcement learning, which exploits confidence for difficulty filtering, quasi-ground truth voting, and reward weighting. Experimental results demonstrate that Conf-Profile delivers substantial performance through the two-stage training, improving F1 by 13.97 on Qwen3-8B.