Discovery of Maximally Consistent Causal Orders with Large Language Models

📅 2024-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional causal discovery methods rely on strong, untestable assumptions, while LLM-based causal knowledge extraction suffers from hallucination and struggles to distinguish direct from indirect causal relationships. Method: We propose a novel ranking-based causal discovery paradigm that shifts the objective from learning a DAG to inferring a robust acyclic tournament—i.e., a total causal ordering. Our framework leverages LLMs to compute pairwise causal consistency scores and optimizes for maximum consistency via a tailored ranking algorithm. To ensure scalability and correctness, we introduce a semi-complete directed graph representation and an efficient enumeration-and-pruning solver. Contribution/Results: Evaluated on standard benchmarks and real-world epidemiological data, our method significantly outperforms conventional DAG-learning approaches, recovering highly consistent causal orderings. It demonstrates feasibility, robustness, and practical utility of ranking-based causal discovery—offering a more reliable and interpretable alternative to structure-learning paradigms.

Technology Category

Application Category

📝 Abstract
Causal discovery is essential for understanding complex systems, as it aims to uncover causal relationships from observational data in the form of a causal directed acyclic graph (DAG). However, traditional methods often rely on strong, untestable assumptions, which makes them unreliable in real applications. Large Language Models (LLMs) present a promising alternative for extracting causal knowledge from text-based metadata, which consolidates domain expertise. However, LLMs are prone to unreliability and hallucinations, necessitating strategies that account for their limitations. One such strategy involves leveraging a consistency measure to evaluate reliability. Additionally, most text metadata does not clearly distinguish direct causal relationships from indirect ones, further complicating the discovery of a causal DAG. As a result, focusing on causal orderings, rather than causal DAGs, emerges as a more practical and robust approach. We propose a novel method to derive a class of acyclic tournaments (representing plausible causal orders) that maximizes a consistency score derived from an LLM. Our approach begins by computing pairwise consistency scores between variables, yielding a semi-complete directed graph that aggregates these scores. From this structure, we identify optimal acyclic tournaments, prioritizing those that maximize consistency across all configurations. We tested our method on both well-established benchmarks, as well as real-world datasets from epidemiology and public health. Our results demonstrate the effectiveness of our approach in recovering a class of causal orders.
Problem

Research questions and friction points this paper is trying to address.

Maximizing consistency in causal orders
Leveraging LLMs for causal discovery
Distinguishing direct from indirect causal relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Large Language Models
Maximizes consistency score
Identifies optimal acyclic tournaments
🔎 Similar Papers
No similar papers found.
Federico Baldo
Federico Baldo
INSERM, previously University of Bologna
Causal InferenceMachine LearningCombinatorial OptimizationArtificial Intelligence
Simon Ferreira
Simon Ferreira
Sorbonne Université
Causality
C
Charles K. Assaad
Sorbonne Université, INSERM, Institut Pierre Louis d’Epidémiologie et de Santé Publique, F75012, Paris, France