🤖 AI Summary
The absence of explicit logical schemas in NoSQL applications hinders maintenance and query optimization. Method: This paper proposes a model-driven reverse engineering approach based on static code analysis. It defines an object-oriented language metamodel and a unified schema metamodel (uSchema), and integrates control-flow-driven data access modeling with structural inference to automatically extract logical schemas for NoSQL applications. The approach further supports join-pattern identification and automated refactoring of field redundancy. Contribution/Results: To our knowledge, this is the first domain-specific reverse engineering pipeline for NoSQL, enabling platform-independent model transformations. End-to-end evaluation on MongoDB applications demonstrates high schema extraction accuracy, generation of executable refactored code, elimination of expensive join operations, and significant improvements in both query performance and maintainability.
📝 Abstract
In this paper, we present a static code analysis strategy to extract logical schemas from NoSQL applications. Our solution is based on a model-driven reverse engineering process composed of a chain of platform-independent model transformations. The extracted schema conforms to the uschema{} unified metamodel, which can represent both NoSQL and relational schemas. To support this process, we define a metamodel capable of representing the core elements of object-oriented languages. Application code is first injected into a code model, from which a control flow model is derived. This, in turn, enables the generation of a model representing both data access operations and the structure of stored data. From these models, the uschema{} logical schema is inferred. Additionally, the extracted information can be used to identify refactoring opportunities. We illustrate this capability through the detection of join-like query patterns and the automated application of field duplication strategies to eliminate expensive joins. All stages of the process are described in detail, and the approach is validated through a round-trip experiment in which a application using a MongoDB store is automatically generated from a predefined schema. The inferred schema is then compared to the original to assess the accuracy of the extraction process.