Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a novel security vulnerability in large language models (LLMs)—Tool Completion Attack (TCA)—where adversaries manipulate tool-call sequences to induce LLMs to violate instruction intent. To address this, we propose Context-Aware Hierarchical Learning (CAHL), a method that dynamically models semantic associations among instruction segments and encodes role-sensitive hierarchical structures, augmented by a dynamic balancing optimization strategy. Based on CAHL, we construct Tool-Completion, the first dedicated benchmark for evaluating TCA robustness. Experiments demonstrate that CAHL significantly enhances zero-shot robustness against both TCA and conventional adversarial attacks—reducing attack success rates substantially—while preserving strong performance on general-purpose tasks and maintaining cross-task generalization. Our approach establishes a new paradigm for instruction-level safety in LLMs and provides a scalable, principled defense framework.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction handling, particularly when exposed to adversarial scenarios. In this work, we identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA), which exploits function-calling mechanisms to subvert model behavior. To evaluate LLM robustness against such threats, we introduce the Tool-Completion benchmark, a comprehensive security assessment framework, which reveals that even state-of-the-art models remain susceptible to TCA, with surprisingly high attack success rates. To address these vulnerabilities, we introduce Context-Aware Hierarchical Learning (CAHL), a sophisticated mechanism that dynamically balances semantic comprehension with role-specific instruction constraints. CAHL leverages the contextual correlations between different instruction segments to establish a robust, context-aware instruction hierarchy. Extensive experiments demonstrate that CAHL significantly enhances LLM robustness against both conventional attacks and the proposed TCA, exhibiting strong generalization capabilities in zero-shot evaluations while still preserving model performance on generic tasks. Our code is available at https://github.com/S2AILab/CAHL.
Problem

Research questions and friction points this paper is trying to address.

Addresses vulnerabilities in LLMs from uniform token processing
Introduces Tool-Completion Attack exploiting function-calling mechanisms
Proposes Context-Aware Hierarchical Learning to enhance model robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Context-Aware Hierarchical Learning for dynamic instruction balancing
Leverages contextual correlations to establish robust instruction hierarchies
Enhances LLM robustness against attacks while preserving general performance
🔎 Similar Papers
No similar papers found.
T
Tengyun Ma
Harbin Institute of Technology (Shenzhen)
J
Jiaqi Yao
Harbin Institute of Technology (Shenzhen)
Daojing He
Daojing He
School of Computer Science and Engineering, South China University of Technology
Network and Information security
S
Shihao Peng
Harbin Institute of Technology (Shenzhen)
Y
Yu Li
Zhejiang University
S
Shaohui Liu
Harbin Institute of Technology
Zhuotao Tian
Zhuotao Tian
Professor, Harbin Institute of Technology (Shenzhen)
Vision-language ModelMulti-modal PerceptionComputer Vision