Torch-Choice: A PyTorch Package for Large-Scale Choice Modelling with Python

📅 2023-04-04
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in large-scale discrete choice modeling—including heterogeneous data formats, inflexible model specifications, low computational efficiency, and limited regularization support—this paper introduces the first PyTorch-based, end-to-end differentiable choice modeling library. The library unifies classical models (e.g., Multinomial Logit and Nested Logit) with tensor-based automatic differentiation, and innovatively supports R-style formula parsing, GPU acceleration, memory-mapped ChoiceDataset, and built-in regularization mechanisms. Experiments demonstrate millisecond-scale real-time training on datasets with tens of millions of observations, achieving 10–100× speedup over R/mlogit. The library significantly outperforms existing tools across three key dimensions: number of observations, dimensionality of covariates, and size of choice sets. Moreover, it ensures cross-platform scalability and industrial-grade deployability.
📝 Abstract
The $ exttt{torch-choice}$ is an open-source library for flexible, fast choice modeling with Python and PyTorch. $ exttt{torch-choice}$ provides a $ exttt{ChoiceDataset}$ data structure to manage databases flexibly and memory-efficiently. The paper demonstrates constructing a $ exttt{ChoiceDataset}$ from databases of various formats and functionalities of $ exttt{ChoiceDataset}$. The package implements two widely used models, namely the multinomial logit and nested logit models, and supports regularization during model estimation. The package incorporates the option to take advantage of GPUs for estimation, allowing it to scale to massive datasets while being computationally efficient. Models can be initialized using either R-style formula strings or Python dictionaries. We conclude with a comparison of the computational efficiencies of $ exttt{torch-choice}$ and $ exttt{mlogit}$ in R as (1) the number of observations increases, (2) the number of covariates increases, and (3) the expansion of item sets. Finally, we demonstrate the scalability of $ exttt{torch-choice}$ on large-scale datasets.
Problem

Research questions and friction points this paper is trying to address.

Develops a Python library for large-scale choice modeling
Enables flexible and memory-efficient data management for choice models
Supports GPU acceleration for scalable and efficient model estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flexible ChoiceDataset for efficient data management
Supports multinomial and nested logit models
GPU acceleration for large-scale datasets
🔎 Similar Papers
No similar papers found.