Multimodal Dataset Normalization and Perceptual Validation for Music-Taste Correspondences

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the limitations of existing music–flavor crossmodal research, which has been constrained by small-scale, high-cost perceptual data. The authors overcome this bottleneck through two complementary experiments: first, they demonstrate that crossmodal structures identified in small-scale human-annotated data generalize to large-scale synthetically labeled audio; second, they evaluate the alignment between chemically derived computational flavor profiles and human perception via an online auditory experiment. Results show that crossmodal structures remain stable across different supervision regimes, and computational flavor representations exhibit strong agreement with human ratings (p<0.0001, Mantel r=0.45, Procrustes m²=0.51). This work provides the first evidence that synthetically labeled data can preserve genuine perceptual structure and establishes a reproducible framework for computational flavor modeling and validation, releasing both dataset and code.

Technology Category

Application Category

📝 Abstract

Collecting large, aligned cross-modal datasets for music-flavor research is difficult because perceptual experiments are costly and small by design. We address this bottleneck through two complementary experiments. The first tests whether audio-flavor correlations, feature-importance rankings, and latent-factor structure transfer from an experimental soundtracks collection (257~tracks with human annotations) to a large FMA-derived corpus ($\sim$49,300 segments with synthetic labels). The second validates computational flavor targets -- derived from food chemistry via a reproducible pipeline -- against human perception in an online listener study (49~participants, 20~tracks). Results from both experiments converge: the quantitative transfer analysis confirms that cross-modal structure is preserved across supervision regimes, and the perceptual evaluation shows significant alignment between computational targets and listener ratings (permutation $p<0.0001$, Mantel $r=0.45$, Procrustes $m^2=0.51$). Together, these findings support the conclusion that sonic seasoning effects are present in synthetic FMA annotations. We release datasets and companion code to support reproducible cross-modal AI research.

Problem

Research questions and friction points this paper is trying to address.

multimodal dataset

music-taste correspondence

perceptual validation

cross-modal alignment

sonic seasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal transfer

computational flavor targets

sonic seasoning

multimodal dataset normalization

perceptual validation

🔎 Similar Papers

No similar papers found.

Authors to Follow