Multimodal Dataset Normalization and Perceptual Validation for Music-Taste Correspondences

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of existing music–flavor crossmodal research, which has been constrained by small-scale, high-cost perceptual data. The authors overcome this bottleneck through two complementary experiments: first, they demonstrate that crossmodal structures identified in small-scale human-annotated data generalize to large-scale synthetically labeled audio; second, they evaluate the alignment between chemically derived computational flavor profiles and human perception via an online auditory experiment. Results show that crossmodal structures remain stable across different supervision regimes, and computational flavor representations exhibit strong agreement with human ratings (p<0.0001, Mantel r=0.45, Procrustes m²=0.51). This work provides the first evidence that synthetically labeled data can preserve genuine perceptual structure and establishes a reproducible framework for computational flavor modeling and validation, releasing both dataset and code.

Technology Category

Application Category

📝 Abstract
Collecting large, aligned cross-modal datasets for music-flavor research is difficult because perceptual experiments are costly and small by design. We address this bottleneck through two complementary experiments. The first tests whether audio-flavor correlations, feature-importance rankings, and latent-factor structure transfer from an experimental soundtracks collection (257~tracks with human annotations) to a large FMA-derived corpus ($\sim$49,300 segments with synthetic labels). The second validates computational flavor targets -- derived from food chemistry via a reproducible pipeline -- against human perception in an online listener study (49~participants, 20~tracks). Results from both experiments converge: the quantitative transfer analysis confirms that cross-modal structure is preserved across supervision regimes, and the perceptual evaluation shows significant alignment between computational targets and listener ratings (permutation $p<0.0001$, Mantel $r=0.45$, Procrustes $m^2=0.51$). Together, these findings support the conclusion that sonic seasoning effects are present in synthetic FMA annotations. We release datasets and companion code to support reproducible cross-modal AI research.
Problem

Research questions and friction points this paper is trying to address.

multimodal dataset
music-taste correspondence
perceptual validation
cross-modal alignment
sonic seasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal transfer
computational flavor targets
sonic seasoning
multimodal dataset normalization
perceptual validation
🔎 Similar Papers
No similar papers found.