A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of cross-repository reuse of neural network modules in open-source PyTorch codebases, this paper introduces NN-RAG—a retrieval-augmented generation (RAG) system that constructs a searchable, executable modular knowledge base. Methodologically, it proposes scope-aware dependency parsing and import-preserving reconstruction, coupled with a validation-gated refinement strategy to enable architectural pattern transfer and closed-loop extraction of executable modules. It further integrates multi-level deduplication (exact, lexical, and structural), automated correctness validation, and dataset registration. Evaluated across 19 mainstream repositories, NN-RAG extracts 1,289 candidate modules, of which 941 (73.0%) pass rigorous validation and contribute 72% of the novel network architectures in the LEMUR benchmark—marking the first large-scale, high-fidelity, and verifiable reuse of neural network components across heterogeneous codebases.

Technology Category

Application Category

📝 Abstract
Reusing existing neural-network components is central to research efficiency, yet discovering, extracting, and validating such modules across thousands of open-source repositories remains difficult. We introduce NN-RAG, a retrieval-augmented generation system that converts large, heterogeneous PyTorch codebases into a searchable and executable library of validated neural modules. Unlike conventional code search or clone-detection tools, NN-RAG performs scope-aware dependency resolution, import-preserving reconstruction, and validator-gated promotion -- ensuring that every retrieved block is scope-closed, compilable, and runnable. Applied to 19 major repositories, the pipeline extracted 1,289 candidate blocks, validated 941 (73.0%), and demonstrated that over 80% are structurally unique. Through multi-level de-duplication (exact, lexical, structural), we find that NN-RAG contributes the overwhelming majority of unique architectures to the LEMUR dataset, supplying approximately 72% of all novel network structures. Beyond quantity, NN-RAG uniquely enables cross-repository migration of architectural patterns, automatically identifying reusable modules in one project and regenerating them, dependency-complete, in another context. To our knowledge, no other open-source system provides this capability at scale. The framework's neutral specifications further allow optional integration with language models for synthesis or dataset registration without redistributing third-party code. Overall, NN-RAG transforms fragmented vision code into a reproducible, provenance-tracked substrate for algorithmic discovery, offering a first open-source solution that both quantifies and expands the diversity of executable neural architectures across repositories.
Problem

Research questions and friction points this paper is trying to address.

Extracts reusable neural modules from large PyTorch codebases for research efficiency.
Validates and ensures retrieved code blocks are compilable and runnable.
Enables cross-repository migration of architectural patterns to expand neural network diversity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented generation system for neural modules
Scope-aware dependency resolution and validator-gated promotion
Enables cross-repository migration of architectural patterns
🔎 Similar Papers
No similar papers found.