Unlocking UML Class Diagram Understanding in Vision Language Models

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

184K/year
🤖 AI Summary
This study addresses the limited capability of existing vision-language models in understanding specialized diagrams such as UML class diagrams, a challenge exacerbated by the absence of dedicated benchmarks and training data. To bridge this gap, the authors introduce the first visual question answering benchmark specifically designed for UML class diagrams and release a large-scale dataset comprising 16,000 image–question–answer triplets. Leveraging this dataset, they apply lightweight fine-tuning to state-of-the-art vision-language models using LoRA (Low-Rank Adaptation), which substantially enhances model performance on UML diagram comprehension. Experimental results demonstrate that the fine-tuned models significantly outperform current leading architectures, including Qwen-VL-3.5-27B, thereby validating the effectiveness of domain-specific data and efficient parameter-efficient fine-tuning strategies for specialized visual reasoning tasks.
📝 Abstract
Although Vision Language Models (VLMs) have seen tremendous progress across all kinds of use cases, they still fall behind in answering questions regard-ing diagrams compared to photos. Although progress has been made in the area of bar charts, line charts and other diagrams like that there is still few research concerned with other types of diagrams, e.g. in the computer science domain. Our work presents a benchmark for visual question answering based on UML class diagrams which is both challenging and manageable. We further construct a large-scale training dataset with 16.000 image-question-answer triples and show that a LoRA-based finetune easily outperforms Qwen 3.5 27B, which is a recent and well-performing VLM in many other benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Vision Language Models
UML Class Diagrams
Visual Question Answering
Diagram Understanding
Computer Science Diagrams
Innovation

Methods, ideas, or system contributions that make the work stand out.

UML class diagrams
Vision Language Models
Visual Question Answering
LoRA fine-tuning
Domain-specific benchmark
🔎 Similar Papers