🤖 AI Summary
Recipe image generation in food computing suffers from a lack of real-world, multimodal aligned data—particularly triple-aligned annotations linking recipe goals, stepwise instructions, and corresponding images.
Method: This paper introduces Recipe3D, the first benchmark for goal-step-image triadic recipe generation. Curated via expert annotation and multi-source culinary platform scraping, it covers diverse ingredients, multi-step cooking procedures, varied cooking styles, and broad food categories, enabling fine-grained spatiotemporal alignment across goals, procedural steps, and images.
Contribution/Results: We propose the first end-to-end verifiable evaluation protocol supporting cross-modal understanding and generation modeling. The fully open-sourced dataset (GitHub) establishes the first standardized benchmark for recipe generation, multi-step visual reasoning, and food AI—significantly advancing research and applications in culinary intelligence.
📝 Abstract
Recipe image generation is an important challenge in food computing, with applications from culinary education to interactive recipe platforms. However, there is currently no real-world dataset that comprehensively connects recipe goals, sequential steps, and corresponding images. To address this, we introduce RecipeGen, the first real-world goal-step-image benchmark for recipe generation, featuring diverse ingredients, varied recipe steps, multiple cooking styles, and a broad collection of food categories. Data is in https://github.com/zhangdaxia22/RecipeGen.