π€ AI Summary
This work addresses a critical security vulnerability in multi-agent AI orchestration systems, where compositional threats enable a single legitimate request to be decomposed into seemingly compliant subtasks that collectively violate policyβa risk undetectable by current mechanisms. The paper introduces Semantic Intent Fragmentation (SIF), a novel attack paradigm that induces orchestrators to generate policy-violating plans without injecting malicious content, thereby enabling privilege escalation or silent data exfiltration. Drawing on OWASP, MITRE ATLAS, and NIST frameworks, the authors design a three-stage red-teaming evaluation pipeline integrating deterministic taint analysis, chain-of-thought auditing, and cross-model compliance judges. Evaluations across 14 enterprise scenarios reveal that GPT-20B-based orchestrators produce violating plans in 71% of cases while all subtasks pass individual checks. The proposed defense method achieves 100% pre-execution interception of such attacks, exposing fundamental limitations in existing subtask-level security approaches.
π Abstract
We introduce Semantic Intent Fragmentation (SIF), an attack class against LLM orchestration systems where a single, legitimately phrased request causes an orchestrator to decompose a task into subtasks that are individually benign but jointly violate security policy. Current safety mechanisms operate at the subtask level, so each step clears existing classifiers -- the violation only emerges at the composed plan. SIF exploits OWASP LLM06:2025 through four mechanisms: bulk scope escalation, silent data exfiltration, embedded trigger deployment, and quasi-identifier aggregation, requiring no injected content, no system modification, and no attacker interaction after the initial request. We construct a three-stage red-teaming pipeline grounded in OWASP, MITRE ATLAS, and NIST frameworks to generate realistic enterprise scenarios. Across 14 scenarios spanning financial reporting, information security, and HR analytics, a GPT-20B orchestrator produces policy-violating plans in 71% of cases (10/14) while every subtask appears benign. Three independent signals validate this: deterministic taint analysis, chain-of-thought evaluation, and a cross-model compliance judge with 0% false positives. Stronger orchestrators increase SIF success rates. Plan-level information-flow tracking combined with compliance evaluation detects all attacks before execution, showing the compositional safety gap is closable.