Quality Evaluation of COBOL to Java Code Transformation

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address the challenges in evaluating LLM-driven COBOL-to-Java code translation quality—namely, model opacity, difficulty in quantifying evaluation metrics, and high cost of manual assessment—this paper proposes a multidimensional automated evaluation framework. The framework integrates static analysis checkers with LLM-as-a-Judge (LaaJ) techniques, embedded within a continuous integration pipeline and an automated reporting system to enable large-scale, reproducible benchmarking. Its key innovation lies in the first-of-its-kind synergistic modeling of classical program analysis and large language model–based judgment, yielding an interpretable and scalable quality scoring system. Experimental results demonstrate substantial reduction in reliance on manual review while significantly improving the reliability and engineering efficiency of IBM watsonx Code Assistant for Z in legacy system modernization.

Technology Category

Application Category

📝 Abstract

We present an automated evaluation system for assessing COBOL-to-Java code translation within IBM's watsonx Code Assistant for Z (WCA4Z). The system addresses key challenges in evaluating LLM-based translators, including model opacity and the complexity of translation quality assessment. Our approach combines analytic checkers with LLM-as-a-judge (LaaJ) techniques to deliver scalable, multi-faceted evaluations. The system supports continuous integration workflows, enables large-scale benchmarking, and reduces reliance on manual review. We describe the system architecture, evaluation strategies, and reporting mechanisms that provide actionable insights for developers and project managers, facilitating the evolution of high-quality, modernized codebases.

Problem

Research questions and friction points this paper is trying to address.

Automated evaluation of COBOL-to-Java code translation quality

Addressing model opacity and translation complexity challenges

Scalable multi-faceted assessment using analytic and LLM techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated COBOL-to-Java translation evaluation system

Combines analytic checkers with LLM-as-a-judge

Supports continuous integration and large-scale benchmarking

🔎 Similar Papers

Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation