🤖 AI Summary
Retail enterprises suffer from departmental silos, resulting in data fragmentation and poor model interpretability, which hinders causal identification in demand forecasting. To address this, we propose an integrated analytical framework combining interpretable machine learning with causal inference—incorporating SHAP-based variance decomposition, double/debiased machine learning (DML), causal graph modeling, and counterfactual reasoning. We introduce a novel validation criterion: “interpretability–causal effect sign consistency.” Empirically, we demonstrate for the first time in retail settings that DML incorporating multiple confounders accurately recovers causal directions, and intrinsically interpretable models (e.g., tree ensembles) exhibit superior SHAP stability. Evaluated on real-world operational data, our approach significantly improves the sign accuracy of estimated causal effects, enabling high-fidelity sales attribution and robust, evidence-based operational decision-making.
📝 Abstract
Most major retailers today have multiple divisions focused on various aspects, such as marketing, supply chain, online customer experience, store customer experience, employee productivity, and vendor fulfillment. They also regularly collect data corresponding to all these aspects as dashboards and weekly/monthly/quarterly reports. Although several machine learning and statistical techniques have been in place to analyze and predict key metrics, such models typically lack interpretability. Moreover, such techniques also do not allow the validation or discovery of causal links. In this paper, we aim to provide a recipe for applying model interpretability and causal inference for deriving sales insights. In this paper, we review the existing literature on causal inference and interpretability in the context of problems in e-commerce and retail, and apply them to a real-world dataset. We find that an inherently explainable model has a lower variance of SHAP values, and show that including multiple confounders through a double machine learning approach allows us to get the correct sign of causal effect.