LSPRAG: LSP-Guided RAG for Language-Agnostic Real-Time Unit Test Generation

📅 2025-10-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current automated unit test generation approaches suffer from poor cross-language generalizability and difficulty in real-time IDE integration; LLM-based methods further depend critically on the quality of context-aware prompts. To address these limitations, we propose an LSP-guided Retrieval-Augmented Generation (RAG) framework that leverages the Language Server Protocol (LSP) to dynamically retrieve precise symbol definitions and references during development—bypassing expensive, language-specific static analysis pipelines. This enables language-agnostic contextual understanding and test generation directly within the editing environment. Our framework supports seamless integration across Java, Go, and Python. Empirical evaluation demonstrates substantial improvements over baselines: line coverage increases by 213.31% (Java), 174.55% (Go), and 31.57% (Python), confirming significant gains in both test effectiveness and generation efficiency.

Technology Category

Application Category

📝 Abstract
Automated unit test generation is essential for robust software development, yet existing approaches struggle to generalize across multiple programming languages and operate within real-time development. While Large Language Models (LLMs) offer a promising solution, their ability to generate high coverage test code depends on prompting a concise context of the focal method. Current solutions, such as Retrieval-Augmented Generation, either rely on imprecise similarity-based searches or demand the creation of costly, language-specific static analysis pipelines. To address this gap, we present LSPRAG, a framework for concise-context retrieval tailored for real-time, language-agnostic unit test generation. LSPRAG leverages off-the-shelf Language Server Protocol (LSP) back-ends to supply LLMs with precise symbol definitions and references in real time. By reusing mature LSP servers, LSPRAG provides an LLM with language-aware context retrieval, requiring minimal per-language engineering effort. We evaluated LSPRAG on open-source projects spanning Java, Go, and Python. Compared to the best performance of baselines, LSPRAG increased line coverage by up to 174.55% for Golang, 213.31% for Java, and 31.57% for Python.
Problem

Research questions and friction points this paper is trying to address.

Generating language-agnostic real-time unit tests
Overcoming imprecise context retrieval in test generation
Reducing per-language engineering for automated testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Language Server Protocol for context retrieval
Provides language-agnostic unit test generation
Reuses mature LSP servers for minimal engineering
Gwihwan Go
Gwihwan Go
Tsinghua University
Q
Quan Zhang
East China Normal University
Chijin Zhou
Chijin Zhou
East China Normal University
System SecuritySoftware EngineeringProgram Analysis
Z
Zhao Wei
Tencent
Y
Yu Jiang
Tsinghua University