π€ AI Summary
This work addresses the tendency of existing large language modelβdriven autonomous agents to produce mathematically plausible yet fundamentally flawed models, often lacking domain grounding and adversarial validation. To overcome this limitation, the authors propose a cognitively inspired autonomous modeling framework that retrieves relevant historical modeling paradigms from experiential memory and incorporates an adversarial cognitive debate mechanism between theoretical and empirical perspectives. This self-verification process occurs prior to code generation and integrates formal predicate constraints with executable code snippets to dynamically validate feasibility and ensure consistency throughout the modeling pipeline. Evaluated on the MM-Bench and EngiBench benchmarks, the proposed approach significantly outperforms current methods, achieving notable advances in both model rigor and code executability.
π Abstract
Real-world mathematical modeling is inherently an experiential and collaborative endeavor. Domain experts rarely solve complex problems from scratch; instead, they draw upon analogies from historical cases and subject their hypotheses to rigorous peer scrutiny. However, autonomous agents powered by Large Language Models predominantly rely on isolated reasoning paradigms, frequently generating plausible but fundamentally flawed models due to a lack of domain grounding and adversarial verification. To address these limitations, we propose Sci-Mind, a novel framework that mirrors the human scientific discovery process. Sci-Mind integrates Experiential Memory Recall to retrieve executable code snippets and modeling paradigm descriptors, grounding abstract reasoning in historical solutions. Subsequently, it employs an Adversarial Cognitive Dialectic where a Theorist optimizing mathematical coherence and a Pragmatist enforcing data feasibility debate through competing objectives to prune elegant but infeasible formulations. A Self-Validating Execution Strategy further ensures blueprint consistency through formal predicates before code generation, achieving fully autonomous execution. Extensive experiments on the MM-Bench and EngiBench benchmarks demonstrate that Sci-Mind significantly outperforms leading autonomous agents in both modeling rigorousness and code executability.