π€ AI Summary
Current large language model (LLM) chatbots face significant compliance risks in organizational settings due to the complex interplay of multiple policies, yet existing evaluation methods largely overlook this challenge. This work proposes COPALβthe first automated alignment evaluation framework tailored to composite organizational policies. COPAL systematically uncovers alignment deficiencies by mining empirical interaction patterns, modeling explicit policy contracts, and automatically generating queries that trigger violations arising from policy combinations. Experimental evaluation across nine mainstream LLMs reveals an average error rate of 33.1% on composite policy adherence, highlighting a critical gap in the ability of current systems to handle such intricate compliance scenarios.
π Abstract
Large language model chatbots are increasingly deployed in organizational settings such as healthcare, finance, and public services. Evaluating policy alignment is therefore critical to reliable chatbot deployment. By analyzing real-world user queries, we identify composed-policy violation is prevalent in various chatbots but overlooked by existing benchmarks. This paper present COPAL, an automated tool for evaluating composed-policy alignment in chatbots. COPAL efficiently generates queries that trigger composed-policy failures in chatbots via empirically derived interaction patterns and explicit handling contracts. Queries generated by COPAL expose substantial query handling failures: across 9 served models, composed-policy queries yield a 33.1% error rate on average, indicating that composed-policy alignment warrants further investigation.