Avoiding Overfitting in Variable-Order Markov Models: a Cross-Validation Approach

📅 2025-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dynamic systems—such as transportation networks and corporate ownership networks—often exhibit higher-order Markov dependencies; however, conventional higher-order Markov models frequently misinterpret stochastic noise as genuine higher-order structure, leading to overfitting and degraded multi-step prediction accuracy. To address this, we propose DIVOP—the first framework that integrates cross-validation into variable-order Markov path selection. DIVOP robustly distinguishes true higher-order dependencies from spurious fluctuations via KL-divergence–based benchmarking and path significance testing, enabling sparse and interpretable dynamic structural modeling. Evaluated on synthetic and real-world datasets, DIVOP consistently outperforms state-of-the-art methods in both precision and recall. Applied to the global corporate ownership network, it identifies higher-order dependency links involving tax havens with 82% significance, uncovering their structural role in international capital flows.

Technology Category

Application Category

📝 Abstract
Higher$ ext{-}$order Markov chain models are widely used to represent agent transitions in dynamic systems, such as passengers in transport networks. They capture transitions in complex systems by considering not only the current state but also the path of previously visited states. For example, the likelihood of train passengers traveling from Paris (current state) to Rome could increase significantly if their journey originated in Italy (prior state). Although this approach provides a more faithful representation of the system than first$ ext{-}$order models, we find that commonly used methods$-$relying on Kullback$ ext{-}$Leibler divergence$-$frequently overfit the data, mistaking fluctuations for higher$ ext{-}$order dependencies and undermining forecasts and resource allocation. Here, we introduce DIVOP (Detection of Informative Variable$ ext{-}$Order Paths), an algorithm that employs cross$ ext{-}$validation to robustly distinguish meaningful higher$ ext{-}$order dependencies from noise. In both synthetic and real$ ext{-}$world datasets, DIVOP outperforms two state$ ext{-}$of$ ext{-}$the$ ext{-}$art algorithms by achieving higher precision, recall, and sparser representations of the underlying dynamics. When applied to global corporate ownership data, DIVOP reveals that tax havens appear in 82$%$ of all significant higher$ ext{-}$order dependencies, underscoring their outsized influence in corporate networks. By mitigating overfitting, DIVOP enables more reliable multi$ ext{-}$step predictions and decision$ ext{-}$making, paving the way toward deeper insights into the hidden structures that drive modern interconnected systems.
Problem

Research questions and friction points this paper is trying to address.

Markov Models
Overfitting Prevention
Dynamic Systems Analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

DIVOP
High-order Markov Chain
Complex System Analysis
🔎 Similar Papers
No similar papers found.
V
Valeria Secchini
CORPTAX, Institute of Economic Studies, Faculty of Social Sciences, Charles University, Prague, Czech Republic
Javier Garcia-Bernardo
Javier Garcia-Bernardo
Assistant Professor, Utrecht University
Complex SystemsComputational Social ScienceNetwork ScienceApplied Data ScienceInequality
P
Petr Jansk'y
CORPTAX, Institute of Economic Studies, Faculty of Social Sciences, Charles University, Prague, Czech Republic