ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration

📅 2024-12-06

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Large language models (LLMs) struggle to generalize to unseen APIs—such as private or rapidly evolving libraries—due to prohibitive retraining costs. This work proposes a training-free planning-execution closed-loop framework: it first decomposes programming tasks into subgoals, then generates exploratory API calls via zero-shot reasoning; subsequent real-world execution feedback drives iterative exploration and chain-wise aggregation of feedback trajectories, augmented by a self-debugging mechanism to validate and correct intermediate outputs. Its core innovation is the novel “chain-of-API-exploration” mechanism, enabling zero-shot generalization to unknown APIs. Evaluated on Torchdata-Github and a newly constructed complex API benchmark, our method achieves pass@10 improvements of 11.24% over RAG-based and 14.07% over pretrained baselines, significantly enhancing robustness and accuracy in code generation for unseen APIs.

Technology Category

Application Category

📝 Abstract

Through training on publicly available source code libraries, large language models (LLMs) can invoke multiple encapsulated APIs to solve complex programming problems. However, existing models inherently cannot generalize to use APIs that are unseen in their training corpora. As libraries continuously evolve, it becomes impractical to exhaustively retrain LLMs with new API knowledge. This limitation hampers LLMs from solving problems which require newly introduced or privately maintained libraries. Human programmers often explore unfamiliar APIs by writing experimental code before invoking them for a more complex problem. Inspired by this behavior, we propose , a training-free framework that empowers LLMs to invoke multiple unseen APIs in code solution by (1) planning a complex problem into several API invocation subtasks, and (2) exploring correct API usage through a novel chain-of-API-exploration. Concretely, ExploraCoder guides the LLM to iteratively generate several experimental API invocations for each simple subtask, where the promising execution experience are exploited by subsequent subtasks. This forms a chained exploration trace that ultimately guides LLM in generating the final solution. We evaluate ExploraCoder on Torchdata-Github benchmark as well as a newly constructed benchmark that involves more complex API interactions. Experimental results demonstrate that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10. Moreover, the integration of a self-debug mechanism further boosts ExploraCoder's performance on more challenging tasks. Comprehensive ablation and case studies provide further insights into the effectiveness of ExploraCoder.

Problem

Research questions and friction points this paper is trying to address.

Enabling LLMs to use unseen APIs without retraining

Improving code generation for evolving libraries

Solving programming tasks requiring new private APIs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Planning complex problems into API subtasks

Chained exploration for correct API usage

Training-free framework for unseen APIs

🔎 Similar Papers

On the Effectiveness of Large Language Models in Domain-Specific Code Generation