Enhancing Project-Specific Code Completion by Inferring Internal API Information

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

140K/year

🤖 AI Summary

This work addresses the accuracy degradation in project-specific code completion caused by missing internal API information due to absent explicit import statements. We propose an import-agnostic method to automatically infer intra-project APIs. Our approach is built upon a retrieval-augmented generation (RAG) framework that integrates large language models with context-aware modeling. Key contributions include: (1) a fine-grained API knowledge base jointly constructed from usage examples and semantic descriptions; (2) a project-level, context-aware API representation expansion mechanism; and (3) ProjBench—the first large-scale, realistic benchmark designed to eliminate import leakage bias. Experiments on ProjBench and CrossCodeEval show absolute improvements of +22.72% in code exact match and +18.31% in identifier exact match. When integrated with baseline models, gains reach +47.80% and +35.55%, respectively.

Technology Category

Application Category

📝 Abstract

Project-specific code completion is a critical task that leverages context from a project to generate accurate code. State-of-the-art methods use retrieval-augmented generation (RAG) with large language models (LLMs) and project information for code completion. However, they often struggle to incorporate internal API information, which is crucial for accuracy, especially when APIs are not explicitly imported in the file. To address this, we propose a method to infer internal API information without relying on imports. Our method extends the representation of APIs by constructing usage examples and semantic descriptions, building a knowledge base for LLMs to generate relevant completions. We also introduce ProjBench, a benchmark that avoids leaked imports and consists of large-scale real-world projects. Experiments on ProjBench and CrossCodeEval show that our approach significantly outperforms existing methods, improving code exact match by 22.72% and identifier exact match by 18.31%. Additionally, integrating our method with existing baselines boosts code match by 47.80% and identifier match by 35.55%.

Problem

Research questions and friction points this paper is trying to address.

Improving code completion by inferring internal API information

Enhancing API representation with usage examples and descriptions

Addressing accuracy issues in project-specific code completion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Infer internal API information without imports

Construct usage examples and semantic descriptions

Introduce ProjBench benchmark for evaluation

🔎 Similar Papers

No similar papers found.