🤖 AI Summary
This study addresses food fraud in olive oil adulteration detection by proposing a hyperspectral classification framework based on Bayesian Additive Regression Trees (BART). Unlike conventional approaches, the method eliminates manual feature engineering and leverages BART’s intrinsic variable selection mechanism to automatically identify discriminative wavelengths and model their nonlinear interactions, achieving both high accuracy and strong interpretability. Integrated with PCA-based dimensionality reduction and systematic hyperparameter optimization, the model attains 97.2% classification accuracy on the test set. Furthermore, BART’s variable importance metric identifies three most discriminative wavelengths—1160.71 nm, 1328.57 nm, and 1389.29 nm—enabling perfect (100%) classification accuracy using only these three spectral features. To our knowledge, this is the first systematic application of BART to hyperspectral discrimination of olive oil purity, establishing a novel paradigm for rapid, non-destructive, and interpretable authentication of food authenticity.
📝 Abstract
Feature engineering plays a critical role in handling hyperspectral data and is essential for identifying key wavelengths in food fraud detection. This study employs Bayesian Additive Regression Trees (BART), a flexible machine learning approach, to discriminate and classify samples of olive oil based on their level of purity. Leveraging its built-in variable selection mechanism, we employ BART to effectively identify the most representative spectral features and to capture the complex interactions among variables. We use network representation to illustrate our findings, highlighting the competitiveness of our proposed methodology. Results demonstrate that when principal component analysis is used for dimensionality reduction, BART outperforms state-of-the-art models, achieving a classification accuracy of 96.8% under default settings, which further improves to 97.2% after hyperparameter tuning. If we leverage a variable selection procedure within BART, the model achieves perfect classification performance on this dataset, improving upon previous optimal results both in terms of accuracy and interpretability. Our results demonstrate that three key wavelengths, 1160.71 nm, 1328.57 nm, and 1389.29 nm, play a central role in discriminating the olive oil samples, thus highlighting an application of our methodology in the context of food quality. Further analysis reveals that these variables do not function independently but rather interact synergistically to achieve accurate classification, and improved detection speed.