The Use of Binary Choice Forests to Model and Estimate Discrete Choices

📅 2019-08-02

🏛️ Social Science Research Network

📈 Citations: 18

✨ Influential: 3

career value

209K/year

🤖 AI Summary

In retail demand modeling, discrete choice models (DCMs) face a fundamental trade-off between interpretability and flexibility: traditional parametric models (e.g., multinomial logit) fail to capture irrational behavior and preference heterogeneity, while black-box machine learning models lack transparency. To address this, we propose the Binary Choice Forest (BCF), the first tree-based ensemble method formally proven equivalent to DCMs under mild regularity conditions. BCF introduces a customized splitting criterion grounded in choice theory, probability-consistent leaf estimation, and a novel preference ranking recovery mechanism. It enables sequential search modeling, heterogeneous behavior representation, product importance quantification, and seamless integration of price and user features. Evaluated on synthetic and real-world transaction datasets, BCF consistently outperforms state-of-the-art parametric benchmarks in predictive accuracy while explicitly revealing both individual- and population-level preference structures—achieving “interpretable flexibility.”

📝 Abstract

We show the equivalence of discrete choice models and the class of binary choice forests, which are random forests based on binary choice trees. This suggests that standard machine learning techniques based on random forests can serve to estimate discrete choice models with an interpretable output. This is confirmed by our data-driven theoretical results which show that random forests can predict the choice probability of any discrete choice model consistently, with its splitting criterion capable of recovering preference rank lists. The framework has unique advantages: it can capture behavioral patterns such as irrationality or sequential searches; it handles nonstandard formats of training data that result from aggregation; it can measure product importance based on how frequently a random customer would make decisions depending on the presence of the product; it can also incorporate price information and customer features. Our numerical results show that using random forests to estimate customer choices represented by binary choice forests can outperform the best parametric models in synthetic and real datasets.

Problem

Research questions and friction points this paper is trying to address.

Modeling customer choice behavior in retail using interpretable binary forests

Overcoming misspecification issues in traditional discrete choice models

Predicting choice probabilities for unseen product assortments accurately

Innovation

Methods, ideas, or system contributions that make the work stand out.

Binary decision trees model discrete choice behavior

Interpretable random forests avoid model misspecification

Algorithm handles unseen assortments and customer irrationality

🔎 Similar Papers

No similar papers found.