🤖 AI Summary
Existing multimodal large language models often generate interior layouts that are either physically unrealizable or aesthetically inconsistent, struggling to simultaneously satisfy spatial feasibility and design preferences. This work proposes a reinforcement alignment framework that innovatively decouples hard constraints (spatial feasibility) from soft preferences (aesthetics), evaluating aesthetic quality exclusively within the space of feasible solutions. To obtain stable preference signals, the approach introduces group-wise relative reinforcement learning and integrates procedural constraint checking, a dual-branch reward mechanism, and fine-tuning of multimodal large language models. Evaluated across multiple benchmarks, the method significantly improves both constructability and aesthetic coherence of generated designs.
📝 Abstract
Interior design is a requirements-to-visual-plan generation process that must simultaneously satisfy verifiable spatial feasibility and comparative aesthetic preferences. While recent multimodal large language models (MLLMs) offer a unified foundation for interpreting user intent and producing design rationales, our empirical analysis reveals a persistent contradiction in real-world deployment: MLLMs often produce layouts that are unbuildable and aesthetically inconsistent. These findings indicate that simply adding in-domain text is insufficient; effective interior design requires an alignment mechanism that separates hard constraints from soft preferences and coordinates them during optimization. To address this, we propose Design-MLLM, a reinforcement alignment framework that optimizes a feasibility-first preference objective via a dual-branch, aesthetic-oriented reward. Specifically, Design-MLLM (i) explicitly evaluates spatial feasibility using programmatic constraint checks, (ii) assesses aesthetic preference only among feasible candidates to avoid visually appealing but unexecutable shortcuts, and (iii) performs group-relative optimization to obtain stable preference signals. Through this process, Design-MLLM learns a controllable policy that consistently selects and generates solutions that are both executable and aesthetically coherent, rather than occasionally producing visually appealing but infeasible designs. Extensive experiments on various benchmark datasets demonstrate the advantages of Design-MLLM.