๐ค AI Summary
This study addresses the challenging problem of mapping molecular structures to human odor perception, aiming to systematically model the olfactory space and its complex relationship with chemical structure. We propose a dual-pathway odor classification framework: an expert-driven approach that constructs a multilevel taxonomy comprising 777 descriptive odor attributes, and a data-driven approach that automatically learns hierarchical odor categories via semantic-aware similarity analysis, co-occurrence mining, and hierarchical clustering. Both pathways release fully open-sourced datasets and classification frameworks to foster community collaboration. Experimental results demonstrate that either taxonomy significantly improves odor prediction performance across multiple machine learning models, consistently outperforming random baselines. Error analysis further reveals intrinsic nonlinearity in odorโstructure relationships and semantic inconsistencies within existing classification schemes. Collectively, this work establishes a novel, interpretable paradigm for computational olfaction modeling.
๐ Abstract
One of the key challenges to predict odor from molecular structure is unarguably our limited understanding of the odor space and the complexity of the underlying structure-odor relationships. Here, we show that the predictive performance of machine learning models for structure-based odor predictions can be improved using both, an expert and a data-driven odor taxonomy. The expert taxonomy is based on semantic and perceptual similarities, while the data-driven taxonomy is based on clustering co-occurrence patterns of odor descriptors directly from the prepared dataset. Both taxonomies improve the predictions of different machine learning models and outperform random groupings of descriptors that do not reflect existing relations between odor descriptors. We assess the quality of both taxonomies through their predictive performance across different odor classes and perform an in-depth error analysis highlighting the complexity of odor-structure relationships and identifying potential inconsistencies within the taxonomies by showcasing pear odorants used in perfumery. The data-driven taxonomy allows us to critically evaluate our expert taxonomy and better understand the molecular odor space. Both taxonomies as well as a full dataset are made available to the community, providing a stepping stone for a future community-driven exploration of the molecular basis of smell. In addition, we provide a detailed multi-layer expert taxonomy including a total of 777 different descriptors from the Pyrfume repository.