🤖 AI Summary
This work identifies a critical security vulnerability in Kubernetes stemming from the CRI-API abstraction, which renders the container image pull process state invisible—creating a “state blindness” that enables node-level denial-of-service (DoS) attacks. Attackers exploit this gap to persistently exhaust CPU, I/O, and network resources, thereby blocking new image pulls. This is the first systematic discovery and empirical exploitation of this architectural flaw. The study proposes a two-phase mitigation: (1) a lightweight runtime interception mechanism as an immediate countermeasure, and (2) a fundamental CRI architecture refactoring to decouple image pulling from the CRI interface. Validation employs CRI reverse engineering, fine-grained resource monitoring, controlled stress injection, and kernel-level I/O and network modeling. Experiments demonstrate that the attack saturates node CPU at 95% and persistently halts image pulls; the interim solution reduces attack success rate by 92%, while the architectural redesign eliminates the state blindness entirely—providing both theoretical foundations and practical engineering guidance for securing Kubernetes container runtimes.
📝 Abstract
Kubernetes (K8s) has grown in popularity over the past few years to become the de-facto standard for container orchestration in cloud-native environments. While research is not new to topics such as containerization and access control security, the Application Programming Interface (API) interactions between K8s and its runtime interfaces have not been studied thoroughly. In particular, the CRI-API is responsible for abstracting the container runtime, managing the creation and lifecycle of containers along with the downloads of the respective images. However, this decoupling of concerns and the abstraction of the container runtime renders K8s unaware of the status of the downloading process of the container images, obstructing the monitoring of the resources allocated to such process. In this paper, we discuss how this lack of status information can be exploited as a Denial of Service attack in a K8s cluster. We show that such attacks can generate up to 95% average CPU usage, prevent downloading new container images, and increase I/O and network usage for a potentially unlimited amount of time. Finally, we propose two possible mitigation strategies: one, implemented as a stopgap solution, and another, requiring more radical architectural changes in the relationship between K8s and the CRI-API.