🤖 AI Summary
Existing full-stack web application generation methods support only frontend code, compromising functional completeness and end-to-end reliability. This paper introduces TDDev—the first test-driven development (TDD)-integrated, multi-agent large language model framework for fully automated, end-to-end full-stack application generation from natural language or design sketches. TDDev generates executable applications comprising frontend, backend, database schemas, and interactive logic. Its core innovation lies in a closed-loop pipeline: multimodal perception → test case derivation → collaborative code generation → human–computer interaction simulation. It automatically infers comprehensive test cases covering both functionality and UI, then iteratively refines interdependent multi-file code. Experiments demonstrate that TDDev achieves a 14.4% absolute improvement in overall accuracy over state-of-the-art methods across multiple benchmarks, marking the first solution enabling high-fidelity, high-reliability, fully automated end-to-end full-stack application generation.
📝 Abstract
Developing full-stack web applications is complex and time-intensive, demanding proficiency across diverse technologies and frameworks. Although recent advances in multimodal large language models (MLLMs) enable automated webpage generation from visual inputs, current solutions remain limited to front-end tasks and fail to deliver fully functional applications. In this work, we introduce TDDev, the first test-driven development (TDD)-enabled LLM-agent framework for end-to-end full-stack web application generation. Given a natural language description or design image, TDDev automatically derives executable test cases, generates front-end and back-end code, simulates user interactions, and iteratively refines the implementation until all requirements are satisfied. Our framework addresses key challenges in full-stack automation, including underspecified user requirements, complex interdependencies among multiple files, and the need for both functional correctness and visual fidelity. Through extensive experiments on diverse application scenarios, TDDev achieves a 14.4% improvement on overall accuracy compared to state-of-the-art baselines, demonstrating its effectiveness in producing reliable, high-quality web applications without requiring manual intervention.