HouseMind: Tokenization Allows Multimodal Large Language Models to Understand, Generate and Edit Architectural Floor Plans
Published in: CVPR 2026, 2026
Architectural floor plan design demands joint reasoning over geometry, semantics, and spatial hierarchy, which remains a major challenge for current AI systems. HouseMind is a multimodal large language model that unifies floor plan understanding, generation, and editing in one framework. It introduces discrete room-instance tokens via VQ-VAE to bridge layout geometry and symbolic reasoning, enabling controllable and interpretable operations. Experiments show strong geometric validity and controllability while remaining efficient and locally deployable.
Recommended Citation: Qin, S.Z., Weber, R.E., Lu, X.Z., 2026. Tokenization Allows Multimodal Large Language Models to Understand, Generate and Edit Architectural Floor Plans. CVPR 2026. https://arxiv.org/abs/2603.11640
PaperURL | Project Page

