NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries

πŸ“… 2026-03-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes NaiLIA, a novel approach to address the challenge of effectively integrating dense textual intent descriptions with palette-based queries for precise nail art image retrievalβ€”a task where existing vision-language models struggle. NaiLIA is the first method to jointly model fine-grained text descriptions and continuous color palettes, enhancing cross-modal understanding through a multimodal semantic alignment mechanism. To handle weakly annotated data, it introduces a confidence-based relaxed loss function. Evaluated on a newly curated large-scale benchmark dataset comprising 10,625 multicultural nail art images and dense annotations authored by over 200 annotators, NaiLIA significantly outperforms current state-of-the-art methods, demonstrating its effectiveness and innovation in complex intent-driven nail art retrieval.

Technology Category

Application Category

πŸ“ Abstract
We focus on the task of retrieving nail design images based on dense intent descriptions, which represent multi-layered user intent for nail designs. This is challenging because such descriptions specify unconstrained painted elements and pre-manufactured embellishments as well as visual characteristics, themes, and overall impressions. In addition to these descriptions, we assume that users provide palette queries by specifying zero or more colors via a color picker, enabling the expression of subtle and continuous color nuances. Existing vision-language foundation models often struggle to incorporate such descriptions and palettes. To address this, we propose NaiLIA, a multimodal retrieval method for nail design images, which comprehensively aligns with dense intent descriptions and palette queries during retrieval. Our approach introduces a relaxed loss based on confidence scores for unlabeled images that can align with the descriptions. To evaluate NaiLIA, we constructed a benchmark consisting of 10,625 images collected from people with diverse cultural backgrounds. The images were annotated with long and dense intent descriptions given by over 200 annotators. Experimental results demonstrate that NaiLIA outperforms standard methods.
Problem

Research questions and friction points this paper is trying to address.

nail design retrieval
dense intent descriptions
palette queries
multimodal retrieval
vision-language alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal retrieval
dense intent description
palette query
relaxed loss
nail design
πŸ”Ž Similar Papers
No similar papers found.
K
Kanon Amemiya
Keio University
D
Daichi Yashima
Keio University
K
Kei Katsumata
Keio University
T
Takumi Komatsu
Keio University
R
Ryosuke Korekata
Keio University
S
Seitaro Otsuki
Keio University
Komei Sugiura
Komei Sugiura
Professor, Keio University
Multimodal AIRobot LearningEmbodied AIMachine Learning