Towards the Automated Extraction and Refactoring of NoSQL Schemas from Application Code

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The absence of explicit logical schemas in NoSQL applications hinders maintenance and query optimization. Method: This paper proposes a model-driven reverse engineering approach based on static code analysis. It defines an object-oriented language metamodel and a unified schema metamodel (uSchema), and integrates control-flow-driven data access modeling with structural inference to automatically extract logical schemas for NoSQL applications. The approach further supports join-pattern identification and automated refactoring of field redundancy. Contribution/Results: To our knowledge, this is the first domain-specific reverse engineering pipeline for NoSQL, enabling platform-independent model transformations. End-to-end evaluation on MongoDB applications demonstrates high schema extraction accuracy, generation of executable refactored code, elimination of expensive join operations, and significant improvements in both query performance and maintainability.

Technology Category

Application Category

📝 Abstract
In this paper, we present a static code analysis strategy to extract logical schemas from NoSQL applications. Our solution is based on a model-driven reverse engineering process composed of a chain of platform-independent model transformations. The extracted schema conforms to the uschema{} unified metamodel, which can represent both NoSQL and relational schemas. To support this process, we define a metamodel capable of representing the core elements of object-oriented languages. Application code is first injected into a code model, from which a control flow model is derived. This, in turn, enables the generation of a model representing both data access operations and the structure of stored data. From these models, the uschema{} logical schema is inferred. Additionally, the extracted information can be used to identify refactoring opportunities. We illustrate this capability through the detection of join-like query patterns and the automated application of field duplication strategies to eliminate expensive joins. All stages of the process are described in detail, and the approach is validated through a round-trip experiment in which a application using a MongoDB store is automatically generated from a predefined schema. The inferred schema is then compared to the original to assess the accuracy of the extraction process.
Problem

Research questions and friction points this paper is trying to address.

Automated extraction of NoSQL schemas from application code
Model-driven reverse engineering for schema refactoring
Detection and elimination of expensive join operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Static code analysis extracts NoSQL schemas
Model-driven reverse engineering with transformations
Automated refactoring detects join-like patterns
🔎 Similar Papers
No similar papers found.
C
Carlos J. Fernández-Candel
Faculty of Computer Science, University of Murcia, Murcia, Spain
Anthony Cleve
Anthony Cleve
Professor, Namur Digital Institute (NADI), University of Namur
information systemsdatabase engineeringsoftware engineeringsoftware evolution
J
Jesús J. García-Molina
Faculty of Computer Science, University of Murcia, Murcia, Spain