AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a low-code platform for AI for Science to address key challenges in scientific code generation, including the unreliability of large language models, severe error propagation in multi-agent workflows, and the lack of well-defined evaluation metrics for scientific tasks. The platform employs a three-agent Bayesian adversarial framework that jointly optimizes task management, code generation, and evaluation by dynamically updating prompt distributions and generating adversarial test cases. This co-optimization of tests and code reduces dependence on model reliability. Integrating multidimensional evaluation—encompassing functional correctness, structural alignment, and static analysis—with a human-in-the-loop interface, the system effectively suppresses error propagation and generates more robust and reliable scientific code, outperforming existing methods on interdisciplinary Earth science tasks.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) demonstrate potentials for automating scientific code generation but face challenges in reliability, error propagation in multi-agent workflows, and evaluation in domains with ill-defined success metrics. We present a Bayesian adversarial multi-agent framework specifically designed for AI for Science (AI4S) tasks in the form of a Low-code Platform (LCP). Three LLM-based agents are coordinated under the Bayesian framework: a Task Manager that structures user inputs into actionable plans and adaptive test cases, a Code Generator that produces candidate solutions, and an Evaluator providing comprehensive feedback. The framework employs an adversarial loop where the Task Manager iteratively refines test cases to challenge the Code Generator, while prompt distributions are dynamically updated using Bayesian principles by integrating code quality metrics: functional correctness, structural alignment, and static analysis. This co-optimization of tests and code reduces dependence on LLM reliability and addresses evaluation uncertainty inherent to scientific tasks. LCP also streamlines human-AI collaboration by translating non-expert prompts into domain-specific requirements, bypassing the need for manual prompt engineering by practitioners without coding backgrounds. Benchmark evaluations demonstrate LCP's effectiveness in generating robust code while minimizing error propagation. The proposed platform is also tested on an Earth Science cross-disciplinary task and demonstrates strong reliability, outperforming competing models.
Problem

Research questions and friction points this paper is trying to address.

AI for Science
code generation reliability
error propagation
evaluation uncertainty
multi-agent workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian adversarial framework
multi-agent LLM
low-code platform
AI for Science
code generation reliability
🔎 Similar Papers
No similar papers found.
Z
Zihang Zeng
Artificial Intelligence Innovation and Incubation Institute, Fudan University; Shanghai Innovation Institute
J
Jiaquan Zhang
Artificial Intelligence Innovation and Incubation Institute, Fudan University; Shanghai Innovation Institute
Pengze Li
Pengze Li
复旦大学
Science
Y
Yuan Qi
Artificial Intelligence Innovation and Incubation Institute, Fudan University; Shanghai Academy of AI for Science
Xi Chen
Xi Chen
Professor, Institute of Atmospheric Physics, Chinese Academy of Sciences
computational fluid dynamicsgeophysical fluid dynamicsdynamical corenumerical weather prediction