FastFI: Enhancing API Call-Site Robustness in Microservice-Based Systems with Fault Injection

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the exponential growth of combinatorial failure space in microservice architectures, where traditional random fault injection is inefficient and lacks precise identification of critical failure modes or actionable hardening guidance. To overcome these limitations, the authors propose FastFI, a novel framework that integrates a customized SAT solver leveraging the monotonicity and low overlap properties of CNF formulas via depth-first search, dynamic fault injection, and microservice call-chain analysis. FastFI efficiently enumerates all valid combinatorial faults and provides actionable hardening recommendations based on API criticality assessment. Evaluated on four microservice benchmarks, FastFI reduces end-to-end fault injection time by 76.12% on average, accurately identifies high-impact APIs, and incurs manageable resource overhead.

Technology Category

Application Category

📝 Abstract
Fault injection is a key technique for assessing software reliability, enabling proactive detection of system defects before they manifest in production. However, the increasing complexity of microservice architectures leads to exponential growth in the fault-injection space, rendering traditional random injection inefficient. Recent lineage-driven approaches mitigate this problem through heuristic pruning, but they face two limitations. First, combinatorial-fault discovery remains bottlenecked by general-purpose SAT solvers, which fail to exploit the monotone and low-overlap structure of derived CNF formulas and typically rely on a static upper bound on fault size. Second, existing techniques provide limited post-injection guidance beyond reporting detected faults. To address these challenges, we propose FastFI, a fault-injection-guided framework to enhance the robustness of API call sites in microservice-based systems. FastFI features a DFS-based solver with dynamic fault injection to discover all valid combinatorial faults, and it leverages fault-injection results to identify critical APIs whose call sites should be hardened for robustness. Experiments on four representative microservice benchmarks show that FastFI reduces end-to-end fault-injection time by an average of 76.12\% compared to state-of-the-art baselines while maintaining acceptable resource overhead. Moreover, FastFI accurately identifies high-impact APIs and provides actionable guidance for call-site hardening.
Problem

Research questions and friction points this paper is trying to address.

fault injection
microservice systems
combinatorial faults
API call-site robustness
software reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

fault injection
microservice robustness
combinatorial fault discovery
DFS-based solver
API hardening
🔎 Similar Papers
No similar papers found.
Y
Yuzhen Tan
School of Computer Science, Wuhan University, China
Jian Wang
Jian Wang
School of Computer Science, Wuhan University
Software engineeringServices computingMicroserviceAI Agent
S
Shuaiyu Xie
School of Computer Science, Wuhan University, China
B
Bing Li
School of Computer Science, Wuhan University, Zhongguancun Laboratory, China
Y
Yunqing Yong
School of Computer Science, Wuhan University, China
Neng Zhang
Neng Zhang
Central China Normal University
software engineeringservices computingknowledge mining
S
Shaolin Tan
Zhongguancun Laboratory, China