A Pilot Study on Curator-Guided Multilingual Art Description for Blind and Low-Vision Audiences with Small Vision-Language Models

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This study addresses the lack of high-quality, privacy-preserving multilingual artwork descriptions for blind and low-vision individuals in museum settings. It presents the first curated parallel corpus of art descriptions in German, Romanian, and Serbian, developed under the constraints of a small-scale local vision-language model (Qwen2.5-VL-3B-Instruct) with curator guidance. The work introduces a comparative fine-tuning framework employing both language-specific and multilingual LoRA adapters. Experimental results demonstrate that language-specific adapters yield more stable performance for Romanian and Serbian, while the multilingual adapter remains competitive for German. These findings validate the feasibility and deployment potential of compact vision-language models in enabling accessible, multilingual engagement with visual art.
📝 Abstract
Blind and low-vision (BLV) audiences remain underserved by visual art descriptions, particularly across languages and in museum settings where privacy and intellectual-property constraints may favour small on-premise vision-language models (VLMs). This pilot study investigates curator-guided multilingual art description with Qwen2.5-VL-3B-Instruct for German, Romanian, and Serbian. We construct a parallel BLV-oriented caption corpus from artwork images and metadata, and compare language-specific LoRA adapters with a single multilingual adapter under a fixed backbone and training budget. Evaluation combines automatic lexical and embedding-based metrics with an LLM-as-Judge protocol calibrated against a small Romanian BLV pilot study. Under our pilot setup, language-specific adapters show more stable controllability and visually grounded description quality for Romanian and Serbian, while multilingual adaptation remains competitive in German. We frame these findings as deployment-oriented evidence for small on-premise VLMs, and highlight the need for larger BLV user studies and broader language coverage before drawing general conclusions about multilingual accessibility.
Problem

Research questions and friction points this paper is trying to address.

blind and low-vision
multilingual art description
vision-language models
accessibility
museum settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

small vision-language models
multilingual art description
LoRA adapters
blind and low-vision accessibility
curator-guided generation