🤖 AI Summary
Dockerfile build flakiness—non-deterministic failures arising from external dependency drift and environmental evolution despite unchanged Dockerfiles—severely undermines CI/CD reliability. This paper presents the first systematic, empirical investigation of flakiness causes, based on a nine-month analysis of 8,132 open-source projects, and establishes a comprehensive taxonomy. We propose FLAKIDOCK, a novel framework that integrates static analysis, dynamic execution tracing, vector-based similarity retrieval, and LLM-driven iterative repair—overcoming the limited generalizability of rule-based tools. Evaluation shows FLAKIDOCK achieves 73.55% repair accuracy, significantly outperforming state-of-the-art approaches. Our work introduces the first scalable, data-driven paradigm for diagnosing and repairing Docker build instability.
📝 Abstract
Dockerfile flakiness-unpredictable temporal build failures caused by external dependencies and evolving environments-undermines deployment reliability and increases debugging overhead. Unlike traditional Dockerfile issues, flakiness occurs without modifications to the Dockerfile itself, complicating its resolution. In this work, we present the first comprehensive study of Dockerfile flakiness, featuring a nine-month analysis of 8,132 Dockerized projects, revealing that around 10% exhibit flaky behavior. We propose a taxonomy categorizing common flakiness causes, including dependency errors and server connectivity issues. Existing tools fail to effectively address these challenges due to their reliance on pre-defined rules and limited generalizability. To overcome these limitations, we introduce FLAKIDOCK, a novel repair framework combining static and dynamic analysis, similarity retrieval, and an iterative feedback loop powered by Large Language Models (LLMs). Our evaluation demonstrates that FLAKIDOCK achieves a repair accuracy of 73.55%, significantly surpassing state-of-the-art tools and baselines.