🤖 AI Summary
To address the challenges of verifying functional equivalence and the high cost and low reliability of manual validation in COBOL-to-Java automated translation, this paper proposes the first cross-language semantic equivalence verification framework integrating symbolic execution with external dependency simulation. The framework models COBOL programs and employs AST-driven test generation to automatically convert COBOL unit tests into JUnit tests, while leveraging mocking mechanisms to handle external calls. It further introduces an LLM-based feedback refinement loop to iteratively improve the underlying translation model. Experimental results demonstrate that the method automatically detects and precisely localizes semantic discrepancies, significantly enhancing the trustworthiness of AI-generated Java code. This framework has been successfully deployed in IBM Watsonx Code Assistant for Z (WCA4Z) to support enterprise-scale modernization of legacy COBOL systems on IBM Z platforms.
📝 Abstract
Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterprise-level code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code, making manual validation of translated Java code from COBOL a necessary but time-consuming and labor-intensive process. In this paper, we share our experience of developing a testing framework for IBM Watsonx Code Assistant for Z (WCA4Z) [5], an industrial tool designed for COBOL to Java translation. The framework automates the process of testing the functional equivalence of the translated Java code against the original COBOL programs in an industry context. Our framework uses symbolic execution to generate unit tests for COBOL, mocking external calls and transforming them into JUnit tests to validate semantic equivalence with translated Java. The results not only help identify and repair any detected discrepancies but also provide feedback to improve the AI model.