Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses a critical security vulnerability in multi-agent AI orchestration systems, where compositional threats enable a single legitimate request to be decomposed into seemingly compliant subtasks that collectively violate policy—a risk undetectable by current mechanisms. The paper introduces Semantic Intent Fragmentation (SIF), a novel attack paradigm that induces orchestrators to generate policy-violating plans without injecting malicious content, thereby enabling privilege escalation or silent data exfiltration. Drawing on OWASP, MITRE ATLAS, and NIST frameworks, the authors design a three-stage red-teaming evaluation pipeline integrating deterministic taint analysis, chain-of-thought auditing, and cross-model compliance judges. Evaluations across 14 enterprise scenarios reveal that GPT-20B-based orchestrators produce violating plans in 71% of cases while all subtasks pass individual checks. The proposed defense method achieves 100% pre-execution interception of such attacks, exposing fundamental limitations in existing subtask-level security approaches.

Technology Category

Application Category

📝 Abstract

We introduce Semantic Intent Fragmentation (SIF), an attack class against LLM orchestration systems where a single, legitimately phrased request causes an orchestrator to decompose a task into subtasks that are individually benign but jointly violate security policy. Current safety mechanisms operate at the subtask level, so each step clears existing classifiers -- the violation only emerges at the composed plan. SIF exploits OWASP LLM06:2025 through four mechanisms: bulk scope escalation, silent data exfiltration, embedded trigger deployment, and quasi-identifier aggregation, requiring no injected content, no system modification, and no attacker interaction after the initial request. We construct a three-stage red-teaming pipeline grounded in OWASP, MITRE ATLAS, and NIST frameworks to generate realistic enterprise scenarios. Across 14 scenarios spanning financial reporting, information security, and HR analytics, a GPT-20B orchestrator produces policy-violating plans in 71% of cases (10/14) while every subtask appears benign. Three independent signals validate this: deterministic taint analysis, chain-of-thought evaluation, and a cross-model compliance judge with 0% false positives. Stronger orchestrators increase SIF success rates. Plan-level information-flow tracking combined with compliance evaluation detects all attacks before execution, showing the compositional safety gap is closable.

Problem

Research questions and friction points this paper is trying to address.

Semantic Intent Fragmentation

LLM orchestration

compositional attack

security policy violation

multi-agent AI pipelines

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Intent Fragmentation

compositional attack

LLM orchestration