Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Test Case Fault Localization (TCFL) remains challenging—especially for non-deterministic failures or high-overhead systems—since existing approaches primarily target faults in the System Under Test (SUT) while neglecting defects within test scripts themselves. Method: This paper proposes the first black-box TCFL method that requires no test case execution. It innovatively leverages Large Language Models (LLMs) for test code fault localization, integrating static log parsing, control-flow reconstruction, and a three-stage execution trace pruning algorithm. Given only a single failure log and error message—and without access to SUT source code—it estimates a minimized execution trace and ranks suspicious code blocks. Contribution/Results: Evaluated on an industrial dataset, the method achieves 90% F1-score on estimated traces and reduces inference time by 34%. Its block-level Top-3 localization accuracy reaches 81%, significantly enhancing the practicality and efficiency of LLMs in test debugging.

Technology Category

Application Category

📝 Abstract

Fault localization (FL) is a critical step in debugging which typically relies on repeated executions to pinpoint faulty code regions. However, repeated executions can be impractical in the presence of non-deterministic failures or high execution costs. While recent efforts have leveraged Large Language Models (LLMs) to aid execution-free FL, these have primarily focused on identifying faults in the system under test (SUT) rather than in the often complex system test code. However, the latter is also important as, in practice, many failures are triggered by faulty test code. To overcome these challenges, we introduce a fully static, LLM-driven approach for system test code fault localization (TCFL) that does not require executing the test case. Our method uses a single failure execution log to estimate the test's execution trace through three novel algorithms that identify only code statements likely involved in the failure. This pruned trace, combined with the error message, is used to prompt the LLM to rank potential faulty locations. Our black-box, system-level approach requires no access to the SUT source code and is applicable to large test scripts that assess full system behavior. We evaluate our technique at function, block, and line levels using an industrial dataset of faulty test cases not previously used in pre-training LLMs. Results show that our best estimated trace closely match actual traces, with an F1 score of around 90%. Additionally, pruning the complex system test code reduces the LLM's inference time by up to 34% without any loss in FL performance. Our results further suggest that block-level TCFL offers a practical balance, narrowing the search space while preserving useful context, achieving an 81% hit rate at top-3 (Hit@3).

Problem

Research questions and friction points this paper is trying to address.

Localize faults in complex system test code without execution

Reduce LLM inference time by pruning irrelevant code statements

Achieve accurate fault localization at block-level with high hit rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven static test code fault localization

Execution trace estimation via novel algorithms

Pruned trace and error message for LLM ranking

🔎 Similar Papers

Automated Test Case Repair Using Language Models