Enhancing Project-Specific Code Completion by Inferring Internal API Information

๐Ÿ“… 2025-07-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the accuracy degradation in project-specific code completion caused by missing internal API information due to absent explicit import statements. We propose an import-agnostic method to automatically infer intra-project APIs. Our approach is built upon a retrieval-augmented generation (RAG) framework that integrates large language models with context-aware modeling. Key contributions include: (1) a fine-grained API knowledge base jointly constructed from usage examples and semantic descriptions; (2) a project-level, context-aware API representation expansion mechanism; and (3) ProjBenchโ€”the first large-scale, realistic benchmark designed to eliminate import leakage bias. Experiments on ProjBench and CrossCodeEval show absolute improvements of +22.72% in code exact match and +18.31% in identifier exact match. When integrated with baseline models, gains reach +47.80% and +35.55%, respectively.

Technology Category

Application Category

๐Ÿ“ Abstract
Project-specific code completion is a critical task that leverages context from a project to generate accurate code. State-of-the-art methods use retrieval-augmented generation (RAG) with large language models (LLMs) and project information for code completion. However, they often struggle to incorporate internal API information, which is crucial for accuracy, especially when APIs are not explicitly imported in the file. To address this, we propose a method to infer internal API information without relying on imports. Our method extends the representation of APIs by constructing usage examples and semantic descriptions, building a knowledge base for LLMs to generate relevant completions. We also introduce ProjBench, a benchmark that avoids leaked imports and consists of large-scale real-world projects. Experiments on ProjBench and CrossCodeEval show that our approach significantly outperforms existing methods, improving code exact match by 22.72% and identifier exact match by 18.31%. Additionally, integrating our method with existing baselines boosts code match by 47.80% and identifier match by 35.55%.
Problem

Research questions and friction points this paper is trying to address.

Improving code completion by inferring internal API information
Enhancing API representation with usage examples and descriptions
Addressing accuracy issues in project-specific code completion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Infer internal API information without imports
Construct usage examples and semantic descriptions
Introduce ProjBench benchmark for evaluation
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Le Deng
State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, 310027, China
Xiaoxue Ren
Xiaoxue Ren
Zhejiang University
Software Engineering
Chao Ni
Chao Ni
Zhejiang University
AI4SESoftware AnalyticsSoftware Maintenance
M
Ming Liang
Ant Group, China
D
David Lo
School of Computing and Information Systems, Singapore Management University, Singapore 188065
Zhongxin Liu
Zhongxin Liu
Zhejiang University
Software EngineeringLarge Language Models