sudoLLM : On Multi-role Alignment of Language Models

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Current large language models (LLMs) lack fine-grained, user-authorized access control mechanisms, hindering permission-aware responses in sensitive information processing. To address this, we propose sudoLLM—a novel framework that introduces operating-system-style privilege concepts into LLM alignment for the first time. It enables multi-role, permission-driven generation control via implicit user-identity bias signals. Our method comprises lightweight query-injected identity encoding, multi-role supervised fine-tuning, and adversarial prompt robustness evaluation. Experiments demonstrate that sudoLLM improves resistance to prompt jailbreaking by 42% across mainstream benchmarks, significantly enhancing controllability and alignment accuracy in safety-critical scenarios—while preserving original language capabilities without degradation. This effectively alleviates the fundamental tension between modeling objectives and security requirements.

Technology Category

Application Category

📝 Abstract

User authorization-based access privileges are a key feature in many safety-critical systems, but have thus far been absent from the large language model (LLM) realm. In this work, drawing inspiration from such access control systems, we introduce sudoLLM, a novel framework that results in multi-role aligned LLMs, i.e., LLMs that account for, and behave in accordance with, user access rights. sudoLLM injects subtle user-based biases into queries and trains an LLM to utilize this bias signal in order to produce sensitive information if and only if the user is authorized. We present empirical results demonstrating that this approach shows substantially improved alignment, generalization, and resistance to prompt-based jailbreaking attacks. The persistent tension between the language modeling objective and safety alignment, which is often exploited to jailbreak LLMs, is somewhat resolved with the aid of the injected bias signal. Our framework is meant as an additional security layer, and complements existing guardrail mechanisms for enhanced end-to-end safety with LLMs.

Problem

Research questions and friction points this paper is trying to address.

Aligning LLMs with multi-role user access rights

Preventing unauthorized access to sensitive information

Enhancing resistance to jailbreaking attacks via bias signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

User authorization-based access control for LLMs

Injects user-based biases into queries

Trains LLM to utilize bias for security

🔎 Similar Papers

No similar papers found.