GRAIL: AI translation for scientists application workflow on satellite data

πŸ“… 2026-05-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

167K/year
πŸ€– AI Summary
This work addresses the challenge that Python scripts authored by remote sensing scientists often lack scalability for large-scale satellite data processing. To bridge this gap, the authors propose an intelligent agent system that automatically translates existing Python geospatial workflows into efficient Apache Spark programs without requiring users to learn new frameworks. The system innovatively enhances the Scala-based RDPro library’s compatibility with large language models through structured API wrappers, function alias mapping, and an error-log-driven repair mechanism. Built upon LangGraph, it implements a staged pipeline for code generation and localized correction. Experiments on real-world geospatial workflows demonstrate that the approach correctly and efficiently processes massive remote sensing datasets, substantially improving scalability while preserving the original workflow semantics.
πŸ“ Abstract
Domain scientists increasingly develop Python scripts to analyze satellite imagery but they lack scalability to large-scale data. This paper demonstrates GRAIL, an agentic translation system that converts Python geospatial workflows into executable Spark-based programs without requiring scientists to learn a new framework. Rather than fine-tuning a specialized LLM model, GRAIL adapts RDPro, a Scala library for satellite data analysis, to make it LLM-ready using structured documentation, API alias functions, and repair-oriented error logs. Translation is structured as a LangGraph pipeline that decomposes code generation into explicit sections with guided inputs and outputs, enabling targeted repair without regenerating the full program. We demonstrate GRAIL on real-world geospatial workflows and showcase the correctness and scalability of the translated code.
Problem

Research questions and friction points this paper is trying to address.

satellite data
scalability
geospatial workflows
large-scale data
Python scripts
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic translation
LLM-ready library adaptation
LangGraph pipeline
Spark-based geospatial workflow
repair-oriented code generation
πŸ”Ž Similar Papers
No similar papers found.