FastFI: Enhancing API Call-Site Robustness in Microservice-Based Systems with Fault Injection

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the exponential growth of combinatorial failure space in microservice architectures, where traditional random fault injection is inefficient and lacks precise identification of critical failure modes or actionable hardening guidance. To overcome these limitations, the authors propose FastFI, a novel framework that integrates a customized SAT solver leveraging the monotonicity and low overlap properties of CNF formulas via depth-first search, dynamic fault injection, and microservice call-chain analysis. FastFI efficiently enumerates all valid combinatorial faults and provides actionable hardening recommendations based on API criticality assessment. Evaluated on four microservice benchmarks, FastFI reduces end-to-end fault injection time by 76.12% on average, accurately identifies high-impact APIs, and incurs manageable resource overhead.

Technology Category

Application Category

📝 Abstract

Fault injection is a key technique for assessing software reliability, enabling proactive detection of system defects before they manifest in production. However, the increasing complexity of microservice architectures leads to exponential growth in the fault-injection space, rendering traditional random injection inefficient. Recent lineage-driven approaches mitigate this problem through heuristic pruning, but they face two limitations. First, combinatorial-fault discovery remains bottlenecked by general-purpose SAT solvers, which fail to exploit the monotone and low-overlap structure of derived CNF formulas and typically rely on a static upper bound on fault size. Second, existing techniques provide limited post-injection guidance beyond reporting detected faults. To address these challenges, we propose FastFI, a fault-injection-guided framework to enhance the robustness of API call sites in microservice-based systems. FastFI features a DFS-based solver with dynamic fault injection to discover all valid combinatorial faults, and it leverages fault-injection results to identify critical APIs whose call sites should be hardened for robustness. Experiments on four representative microservice benchmarks show that FastFI reduces end-to-end fault-injection time by an average of 76.12\% compared to state-of-the-art baselines while maintaining acceptable resource overhead. Moreover, FastFI accurately identifies high-impact APIs and provides actionable guidance for call-site hardening.

Problem

Research questions and friction points this paper is trying to address.

fault injection

microservice systems

combinatorial faults

API call-site robustness

software reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

fault injection

microservice robustness

combinatorial fault discovery