Using LLMs to Directly Guess Conditional Expectations Can Improve Efficiency in Causal Estimation

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low estimation accuracy of causal effects under high-dimensional confounders, this paper proposes a novel method integrating large language models (LLMs) with double machine learning (DML). Specifically, the LLM leverages its semantic understanding and generative capabilities on historical auction texts to directly model and predict conditional expectation functions; the resulting outputs serve as enhanced covariate embeddings within the DML framework. This approach circumvents information loss inherent in conventional high-dimensional feature engineering and embedding compression, thereby mitigating the curse of dimensionality. Empirical evaluation on a small-sample online jewelry auction dataset demonstrates that, compared to methods relying solely on pretrained embeddings, the proposed approach significantly improves both accuracy and stability of average treatment effect (ATE) estimation. The results validate the feasibility and advantages of systematically integrating LLM-derived prior knowledge with structured causal inference frameworks.

Technology Category

Application Category

📝 Abstract
We propose a simple yet effective use of LLM-powered AI tools to improve causal estimation. In double machine learning, the accuracy of causal estimates of the effect of a treatment on an outcome in the presence of a high-dimensional confounder depends on the performance of estimators of conditional expectation functions. We show that predictions made by generative models trained on historical data can be used to improve the performance of these estimators relative to approaches that solely rely on adjusting for embeddings extracted from these models. We argue that the historical knowledge and reasoning capacities associated with these generative models can help overcome curse-of-dimensionality problems in causal inference problems. We consider a case study using a small dataset of online jewelry auctions, and demonstrate that inclusion of LLM-generated guesses as predictors can improve efficiency in estimation.
Problem

Research questions and friction points this paper is trying to address.

Improving causal estimation efficiency using LLM-generated conditional expectations
Overcoming dimensionality curse in causal inference with generative models
Enhancing double machine learning through LLM predictions as additional predictors
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs directly guess conditional expectations for efficiency
Generative models improve estimators beyond embedding adjustments
Historical knowledge overcomes dimensionality curse in inference
🔎 Similar Papers
No similar papers found.
C
Chris Engh
Yale University
P. M. Aronow
P. M. Aronow
Professor, Yale University