Bayesian Additive Regression Trees (BART) in Food Authenticity: A Classification Approach to Food Fraud Detection

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses food fraud in olive oil adulteration detection by proposing a hyperspectral classification framework based on Bayesian Additive Regression Trees (BART). Unlike conventional approaches, the method eliminates manual feature engineering and leverages BART’s intrinsic variable selection mechanism to automatically identify discriminative wavelengths and model their nonlinear interactions, achieving both high accuracy and strong interpretability. Integrated with PCA-based dimensionality reduction and systematic hyperparameter optimization, the model attains 97.2% classification accuracy on the test set. Furthermore, BART’s variable importance metric identifies three most discriminative wavelengths—1160.71 nm, 1328.57 nm, and 1389.29 nm—enabling perfect (100%) classification accuracy using only these three spectral features. To our knowledge, this is the first systematic application of BART to hyperspectral discrimination of olive oil purity, establishing a novel paradigm for rapid, non-destructive, and interpretable authentication of food authenticity.

Technology Category

Application Category

📝 Abstract
Feature engineering plays a critical role in handling hyperspectral data and is essential for identifying key wavelengths in food fraud detection. This study employs Bayesian Additive Regression Trees (BART), a flexible machine learning approach, to discriminate and classify samples of olive oil based on their level of purity. Leveraging its built-in variable selection mechanism, we employ BART to effectively identify the most representative spectral features and to capture the complex interactions among variables. We use network representation to illustrate our findings, highlighting the competitiveness of our proposed methodology. Results demonstrate that when principal component analysis is used for dimensionality reduction, BART outperforms state-of-the-art models, achieving a classification accuracy of 96.8% under default settings, which further improves to 97.2% after hyperparameter tuning. If we leverage a variable selection procedure within BART, the model achieves perfect classification performance on this dataset, improving upon previous optimal results both in terms of accuracy and interpretability. Our results demonstrate that three key wavelengths, 1160.71 nm, 1328.57 nm, and 1389.29 nm, play a central role in discriminating the olive oil samples, thus highlighting an application of our methodology in the context of food quality. Further analysis reveals that these variables do not function independently but rather interact synergistically to achieve accurate classification, and improved detection speed.
Problem

Research questions and friction points this paper is trying to address.

Classifying olive oil purity levels using BART for fraud detection
Identifying key spectral wavelengths to improve detection accuracy
Capturing complex variable interactions to enhance classification performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

BART uses built-in variable selection for feature identification
Network representation visualizes complex variable interactions
Key wavelengths enable perfect classification accuracy
🔎 Similar Papers
No similar papers found.
M
Mengxiang Zhu
School of Mathematics and Statistics, University College Dublin, Belfield, Dublin 4, Ireland, D04V1W8
Riccardo Rastelli
Riccardo Rastelli
Assistant Professor, University College Dublin
StatisticsNetwork analysis