Silent Tokens, Loud Effects: Padding in LLMs

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically uncovers the nontrivial impact of padding tokens on batched inference in large language models (LLMs). Addressing the widespread assumption that padding is benign, we conduct controlled empirical analysis across four dimensions—hidden-layer activations, generation quality, social bias, and safety alignment—using Llama, Gemma, and Qwen model families. By injecting varying amounts of padding into input sequences, we demonstrate that even minimal padding significantly perturbs internal representations, degrades coherence and factual consistency—especially in smaller models—and induces unpredictable fluctuations in bias metrics. Critically, standard safety mechanisms—including refusal behavior and content filtering—are substantially weakened under padding. This work is the first to elevate padding from an implementation detail to a critical factor affecting model robustness and trustworthy deployment. It provides both a rigorous diagnostic benchmark and urgent practical guidance for efficient, secure LLM engineering.

Technology Category

Application Category

📝 Abstract
Padding tokens are widely used in large language models (LLMs) to equalize sequence lengths during batched inference. While they should be fully masked, implementation errors can cause them to influence computation, and the extent of this influence is not well understood. We systematically study this effect across three open-source model families (Llama, Gemma, Qwen), inserting controlled amounts of padding and evaluating outcomes along four axes: activations, generation quality, bias, and safety. Even small amounts of padding shift hidden representations, degrade quality in smaller models, alter bias in unpredictable ways, and weaken safety guardrails. These findings demonstrate that padding is not a harmless detail but a robustness risk that must be carefully handled in deployment.
Problem

Research questions and friction points this paper is trying to address.

Padding tokens influence LLM computation due to implementation errors
Padding shifts activations, degrades quality and alters model bias
Padding weakens safety guardrails and poses deployment robustness risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically studies padding effects on LLMs
Evaluates impacts on activations, quality, bias, safety
Reveals padding degrades model robustness and safety
🔎 Similar Papers
No similar papers found.
R
Rom Himelstein
Department of Data and Decision Science, Technion - Israel Institute of Technology
A
Amit LeVi
Department of Computer Science, Technion - Israel Institute of Technology
Yonatan Belinkov
Yonatan Belinkov
Technion
Natural Language ProcessingModel InterpretabilityArtificial Intelligence
Avi Mendelson
Avi Mendelson
Electrical Engineering and Computer Science, Technion,
Computer systems