🤖 AI Summary
This study addresses the lack of high-quality, privacy-preserving multilingual artwork descriptions for blind and low-vision individuals in museum settings. It presents the first curated parallel corpus of art descriptions in German, Romanian, and Serbian, developed under the constraints of a small-scale local vision-language model (Qwen2.5-VL-3B-Instruct) with curator guidance. The work introduces a comparative fine-tuning framework employing both language-specific and multilingual LoRA adapters. Experimental results demonstrate that language-specific adapters yield more stable performance for Romanian and Serbian, while the multilingual adapter remains competitive for German. These findings validate the feasibility and deployment potential of compact vision-language models in enabling accessible, multilingual engagement with visual art.
📝 Abstract
Blind and low-vision (BLV) audiences remain underserved by visual art descriptions, particularly across languages and in museum settings where privacy and intellectual-property constraints may favour small on-premise vision-language models (VLMs). This pilot study investigates curator-guided multilingual art description with Qwen2.5-VL-3B-Instruct for German, Romanian, and Serbian. We construct a parallel BLV-oriented caption corpus from artwork images and metadata, and compare language-specific LoRA adapters with a single multilingual adapter under a fixed backbone and training budget. Evaluation combines automatic lexical and embedding-based metrics with an LLM-as-Judge protocol calibrated against a small Romanian BLV pilot study. Under our pilot setup, language-specific adapters show more stable controllability and visually grounded description quality for Romanian and Serbian, while multilingual adaptation remains competitive in German. We frame these findings as deployment-oriented evidence for small on-premise VLMs, and highlight the need for larger BLV user studies and broader language coverage before drawing general conclusions about multilingual accessibility.