🤖 AI Summary
Traditional machine learning methods (e.g., random forests, PLSR) suffer from limited predictive performance in field-scale digital soil mapping due to small sample sizes and high-dimensional feature spaces.
Method: We systematically evaluated nine state-of-the-art tabular neural networks—including TabPFN, FT-Transformer, and RealMLP—across 31 real-world soil datasets.
Contribution/Results: Modern tabular neural networks consistently and significantly outperformed classical approaches; among them, TabPFN achieved superior and robust prediction accuracy across all benchmarks. This study provides the first empirical validation of deep learning’s efficacy and general advantage for soil property modeling under small-sample regimes, challenging the long-standing dominance of conventional algorithms. We formally propose TabPFN as a new default baseline model for soil property prediction when training data are scarce.
📝 Abstract
In the field of pedometrics, tabular machine learning is the predominant method for predicting soil properties from remote and proximal soil sensing data, forming a central component of digital soil mapping. At the field-scale, this predictive soil modeling (PSM) task is typically constrained by small training sample sizes and high feature-to-sample ratios in soil spectroscopy. Traditionally, these conditions have proven challenging for conventional deep learning methods. Classical machine learning algorithms, particularly tree-based models like Random Forest and linear models such as Partial Least Squares Regression, have long been the default choice for field-scale PSM. Recent advances in artificial neural networks (ANN) for tabular data challenge this view, yet their suitability for field-scale PSM has not been proven. We introduce a comprehensive benchmark that evaluates state-of-the-art ANN architectures, including the latest multilayer perceptron (MLP)-based models (TabM, RealMLP), attention-based transformer variants (FT-Transformer, ExcelFormer, T2G-Former, AMFormer), retrieval-augmented approaches (TabR, ModernNCA), and an in-context learning foundation model (TabPFN). Our evaluation encompasses 31 field- and farm-scale datasets containing 30 to 460 samples and three critical soil properties: soil organic matter or soil organic carbon, pH, and clay content. Our results reveal that modern ANNs consistently outperform classical methods on the majority of tasks, demonstrating that deep learning has matured sufficiently to overcome the long-standing dominance of classical machine learning for PSM. Notably, TabPFN delivers the strongest overall performance, showing robustness across varying conditions. We therefore recommend the adoption of modern ANNs for field-scale PSM and propose TabPFN as the new default choice in the toolkit of every pedometrician.