Exploring Molecular Odor Taxonomies for Structure-based Odor Predictions using Machine Learning

๐Ÿ“… 2025-08-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the challenging problem of mapping molecular structures to human odor perception, aiming to systematically model the olfactory space and its complex relationship with chemical structure. We propose a dual-pathway odor classification framework: an expert-driven approach that constructs a multilevel taxonomy comprising 777 descriptive odor attributes, and a data-driven approach that automatically learns hierarchical odor categories via semantic-aware similarity analysis, co-occurrence mining, and hierarchical clustering. Both pathways release fully open-sourced datasets and classification frameworks to foster community collaboration. Experimental results demonstrate that either taxonomy significantly improves odor prediction performance across multiple machine learning models, consistently outperforming random baselines. Error analysis further reveals intrinsic nonlinearity in odorโ€“structure relationships and semantic inconsistencies within existing classification schemes. Collectively, this work establishes a novel, interpretable paradigm for computational olfaction modeling.

Technology Category

Application Category

๐Ÿ“ Abstract
One of the key challenges to predict odor from molecular structure is unarguably our limited understanding of the odor space and the complexity of the underlying structure-odor relationships. Here, we show that the predictive performance of machine learning models for structure-based odor predictions can be improved using both, an expert and a data-driven odor taxonomy. The expert taxonomy is based on semantic and perceptual similarities, while the data-driven taxonomy is based on clustering co-occurrence patterns of odor descriptors directly from the prepared dataset. Both taxonomies improve the predictions of different machine learning models and outperform random groupings of descriptors that do not reflect existing relations between odor descriptors. We assess the quality of both taxonomies through their predictive performance across different odor classes and perform an in-depth error analysis highlighting the complexity of odor-structure relationships and identifying potential inconsistencies within the taxonomies by showcasing pear odorants used in perfumery. The data-driven taxonomy allows us to critically evaluate our expert taxonomy and better understand the molecular odor space. Both taxonomies as well as a full dataset are made available to the community, providing a stepping stone for a future community-driven exploration of the molecular basis of smell. In addition, we provide a detailed multi-layer expert taxonomy including a total of 777 different descriptors from the Pyrfume repository.
Problem

Research questions and friction points this paper is trying to address.

Improving odor prediction from molecular structure using machine learning
Evaluating expert and data-driven odor taxonomies for better predictions
Understanding odor-structure relationships through predictive performance analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert and data-driven odor taxonomies improve predictions
Machine learning models analyze odor descriptor co-occurrence
Public dataset and taxonomies support smell research
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Akshay Sajan
Department of Computer Science, VU Bioinformatics Group, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands
S
Stijn Sluis
Department of Computer Science, VU Bioinformatics Group, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands
R
Reza Haydarlou
Department of Computer Science, VU Bioinformatics Group, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands
Sanne Abeln
Sanne Abeln
Professor of AI Technology for Life, Utrecht University
AI for the Life SciencesProtein BioinformaticsGenomic AlterationsNeurodegenerative Disease.
Pasquale Lisena
Pasquale Lisena
EURECOM
Knowledge GraphsSemantic WebDigital Humanities
R
Raphael Troncy
EURECOM, Campus SophiaTech, 450 Route des Chappes, 06410 Biot, France
C
Caro Verbeek
Faculty of Humanities, Art and Culture, History, Antiquity, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands
Inger Leemans
Inger Leemans
Vrije Universiteit Amsterdam
Cultural historyDigital humanitiesHistory of EmotionsEarlymodern historyOlfaction
H
Halima Mouhib
Department of Computer Science, VU Bioinformatics Group, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands