🤖 AI Summary
This study investigates whether generalized additive models (GAMs) and neural networks (NNs) are competitive or complementary for predictive modeling on real-world tabular data, addressing the longstanding trade-off between interpretability and performance.
Method: Following the PRISMA framework, we conduct an empirical meta-analysis across 430 publicly available tabular datasets. Performance is evaluated using RMSE, R², and AUC; mixed-effects modeling quantifies the influence of domain type, sample size, and feature dimensionality as moderating factors.
Contribution/Results: We find no statistically significant overall performance difference between GAMs and NNs. NNs exhibit only marginal gains in large-scale, high-dimensional settings—gains that diminish with increasing data volume. In contrast, GAMs demonstrate robustness in low-sample regimes and retain intrinsic interpretability. This is the first large-scale meta-analytic study to empirically delineate the performance boundaries and contextual applicability of GAMs versus NNs. Our results provide an evidence-based benchmark for the interpretability–performance trade-off and advocate a complementary methodology for tabular data modeling.
📝 Abstract
Neural networks have become a popular tool in predictive modelling, more commonly associated with machine learning and artificial intelligence than with statistics. Generalised Additive Models (GAMs) are flexible non-linear statistical models that retain interpretability. Both are state-of-the-art in their own right, with their respective advantages and disadvantages. This paper analyses how these two model classes have performed on real-world tabular data. Following PRISMA guidelines, we conducted a systematic review of papers that performed empirical comparisons of GAMs and neural networks. Eligible papers were identified, yielding 143 papers, with 430 datasets. Key attributes at both paper and dataset levels were extracted and reported. Beyond summarising comparisons, we analyse reported performance metrics using mixed-effects modelling to investigate potential characteristics that can explain and quantify observed differences, including application area, study year, sample size, number of predictors, and neural network complexity. Across datasets, no consistent evidence of superiority was found for either GAMs or neural networks when considering the most frequently reported metrics (RMSE, $R^2$, and AUC). Neural networks tended to outperform in larger datasets and in those with more predictors, but this advantage narrowed over time. Conversely, GAMs remained competitive, particularly in smaller data settings, while retaining interpretability. Reporting of dataset characteristics and neural network complexity was incomplete in much of the literature, limiting transparency and reproducibility. This review highlights that GAMs and neural networks should be viewed as complementary approaches rather than competitors. For many tabular applications, the performance trade-off is modest, and interpretability may favour GAMs.