Tool-Aware Planning in Contact Center AI: Evaluating LLMs through Lineage-Guided Query Decomposition

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that large language models struggle to effectively decompose complex user queries in contact center scenarios into executable, multi-step plans with clearly assigned tools—particularly when coordinating structured (e.g., Text2SQL) and unstructured (e.g., RAG) tools. To tackle this, the authors propose a tool-aware planning framework featuring a lineage-guided query decomposition method that generates dependency-aware steps amenable to parallel execution. They also introduce the first dual-mode evaluation benchmark for this task, comprising seven dimensions including tool-prompt alignment and query adherence. A systematic evaluation across 14 prominent models reveals significant performance degradation beyond four-step or composite queries. Claude-3-7-Sonnet achieves the highest overall score of 84.8%, while o3-mini attains a 49.75% single-match accuracy on “A+”-grade queries. Optimizing plan lineage substantially enhances step executability.

Technology Category

Application Category

📝 Abstract
We present a domain-grounded framework and benchmark for tool-aware plan generation in contact centers, where answering a query for business insights, our target use case, requires decomposing it into executable steps over structured tools (Text2SQL (T2S)/Snowflake) and unstructured tools (RAG/transcripts) with explicit depends_on for parallelism. Our contributions are threefold: (i) a reference-based plan evaluation framework operating in two modes - a metric-wise evaluator spanning seven dimensions (e.g., tool-prompt alignment, query adherence) and a one-shot evaluator; (ii) a data curation methodology that iteratively refines plans via an evaluator->optimizer loop to produce high-quality plan lineages (ordered plan revisions) while reducing manual effort; and (iii) a large-scale study of 14 LLMs across sizes and families for their ability to decompose queries into step-by-step, executable, and tool-assigned plans, evaluated under prompts with and without lineage. Empirically, LLMs struggle on compound queries and on plans exceeding 4 steps (typically 5-15); the best total metric score reaches 84.8% (Claude-3-7-Sonnet), while the strongest one-shot match rate at the"A+"tier (Extremely Good, Very Good) is only 49.75% (o3-mini). Plan lineage yields mixed gains overall but benefits several top models and improves step executability for many. Our results highlight persistent gaps in tool-understanding, especially in tool-prompt alignment and tool-usage completeness, and show that shorter, simpler plans are markedly easier. The framework and findings provide a reproducible path for assessing and improving agentic planning with tools for answering data-analysis queries in contact-center settings.
Problem

Research questions and friction points this paper is trying to address.

tool-aware planning
query decomposition
contact center AI
LLM evaluation
executable plans
Innovation

Methods, ideas, or system contributions that make the work stand out.

tool-aware planning
plan lineage
query decomposition
LLM evaluation
contact center AI
🔎 Similar Papers
No similar papers found.
V
Varun Nathan
Observe.AI, Bangalore, India
S
Shreyas Guha
Observe.AI, Bangalore, India
Ayush Kumar
Ayush Kumar
University of Manitoba
Multidrug Resistance in Gram negative bacteria