🤖 AI Summary
To address the high computational cost, latency, and deployment challenges of large language models (LLMs) in call center applications, this paper proposes a production-oriented LLM framework that jointly optimizes cost and effectiveness. Methodologically, it integrates private models, open-weight LLMs, and domain-adapted fine-tuned variants, augmented by prompt engineering, knowledge distillation, and inference acceleration techniques to enable lightweight deployment and real-time analytics. Innovatively, we introduce a multi-dimensional model evaluation framework and an empirically grounded cost analysis model, effectively overcoming traditional high-overhead bottlenecks. Evaluated on a live environment processing over one million monthly calls, our system achieves >92% accuracy in driver factor identification, reduces inference cost by 67%, and supports ≥1,000 concurrent requests. It significantly enhances automation and operational efficiency across key tasks—including topic modeling, call classification, trend detection, and FAQ generation.
📝 Abstract
Large Language Models have transformed the Contact Center industry, manifesting in enhanced self-service tools, streamlined administrative processes, and augmented agent productivity. This paper delineates our system that automates call driver generation, which serves as the foundation for tasks such as topic modeling, incoming call classification, trend detection, and FAQ generation, delivering actionable insights for contact center agents and administrators to consume. We present a cost-efficient LLM system design, with 1) a comprehensive evaluation of proprietary, open-weight, and fine-tuned models and 2) cost-efficient strategies, and 3) the corresponding cost analysis when deployed in production environments.