π€ AI Summary
This work addresses the challenge that Python scripts authored by remote sensing scientists often lack scalability for large-scale satellite data processing. To bridge this gap, the authors propose an intelligent agent system that automatically translates existing Python geospatial workflows into efficient Apache Spark programs without requiring users to learn new frameworks. The system innovatively enhances the Scala-based RDPro libraryβs compatibility with large language models through structured API wrappers, function alias mapping, and an error-log-driven repair mechanism. Built upon LangGraph, it implements a staged pipeline for code generation and localized correction. Experiments on real-world geospatial workflows demonstrate that the approach correctly and efficiently processes massive remote sensing datasets, substantially improving scalability while preserving the original workflow semantics.
π Abstract
Domain scientists increasingly develop Python scripts to analyze satellite imagery but they lack scalability to large-scale data. This paper demonstrates GRAIL, an agentic translation system that converts Python geospatial workflows into executable Spark-based programs without requiring scientists to learn a new framework. Rather than fine-tuning a specialized LLM model, GRAIL adapts RDPro, a Scala library for satellite data analysis, to make it LLM-ready using structured documentation, API alias functions, and repair-oriented error logs. Translation is structured as a LangGraph pipeline that decomposes code generation into explicit sections with guided inputs and outputs, enabling targeted repair without regenerating the full program. We demonstrate GRAIL on real-world geospatial workflows and showcase the correctness and scalability of the translated code.