ToolRec: Calibrated Preference Alignment for Query Recommendation in On-Device Assistants

📅 2026-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing on-device large language models in query recommendation, which struggle to accurately align with user intent—particularly for system tool invocation queries—and are hindered by noise and preference bias in raw click logs. To overcome these challenges, the authors propose ToolRec, a novel framework that introduces SysToolKit, a comprehensive repository of 708 system tools, and a context-aware tool retrieval mechanism. Furthermore, they devise a two-level click signal calibration strategy that differentiates user activity levels and upweights tool-invocation queries, integrated with a sample-weighted Kahneman-Tversky optimization (KTO) objective for improved preference alignment. Evaluated through online A/B testing on OPPO’s XiaoBu Assistant—with over 150 million monthly active users—the approach significantly boosts both click-through rate and total clicks while maintaining high query relevance.
📝 Abstract
Large Language Models (LLMs) have significantly advanced generative query recommendation. However, existing alignment methods primarily focus on standard chatbot scenarios, falling short in on-device intelligent assistants where users predominantly expect the rapid invocation of system-level tools. Moreover, directly aligning LLMs with real-world click logs introduces severe noise due to varying user activity levels and the failure to emphasize execution-oriented queries. To address these challenges, we propose ToolRec, a calibrated preference alignment framework tailored for on-device query recommendation. To ground query recommendation with executable actions, we first construct SysToolKit, a comprehensive repository of 708 system tools, paired with a context-aware tool retrieval mechanism to ensure recommendation relevance. We then propose a dual-level calibration mechanism to refine raw click data, effectively mitigating user behavioral noise by calibrating signals based on user activity levels, while simultaneously up-weighting click signals on system-level tool-invoking queries. Guided by these refined preference signals, we then align the model using a sample-level weighted Kahneman-Tversky Optimization (KTO). Extensive online A/B tests on our mobile assistant platform OPPO Xiaobu, which has over 150 million monthly active users, demonstrate that ToolRec can significantly improve Click-Through Rate (CTR) and total clicks volume over strong baselines while maintaining high query relevance.
Problem

Research questions and friction points this paper is trying to address.

on-device assistants
query recommendation
preference alignment
click logs noise
system-level tools
Innovation

Methods, ideas, or system contributions that make the work stand out.

preference alignment
on-device assistant
query recommendation
click calibration
tool invocation
🔎 Similar Papers
No similar papers found.