🤖 AI Summary
Traditional demand estimation struggles to quantify implicit product attributes—such as visual design—leading to biased modeling of substitution relationships.
Method: We propose a novel method that automatically infers consumer substitution preferences from multimodal unstructured data (images and text) by jointly embedding visual and textual features via CLIP and BERT, and directly integrating these embeddings into a random-coefficients logit model—eliminating the need for manual attribute specification and enabling end-to-end learning of latent preferences and substitution structures. We combine choice experiment data with a counterfactual prediction framework.
Results: Our approach significantly improves prediction accuracy for second-best alternatives in experiments. Empirically applied across 40 Amazon product categories, it consistently identifies substitution sets better aligned with market intuition. The method establishes a scalable, interpretable paradigm for demand modeling in high-dimensional, unstructured environments.
📝 Abstract
We propose a demand estimation method that leverages unstructured text and image data to infer substitution patterns. Using pre-trained deep learning models, we extract embeddings from product images and textual descriptions and incorporate them into a random coefficients logit model. This approach enables researchers to estimate demand even when they lack data on product attributes or when consumers value hard-to-quantify attributes, such as visual design or functional benefits. Using data from a choice experiment, we show that our approach outperforms standard attribute-based models in counterfactual predictions of consumers' second choices. We also apply it across 40 product categories on Amazon and consistently find that text and image data help identify close substitutes within each category.