Fairness-Driven LLM-based Causal Discovery with Active Learning and Dynamic Scoring

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the high computational cost, poor scalability, and insufficient fairness guarantees of causal discovery (CD) on large-scale data, this paper proposes the first LLM-driven fairness-aware CD framework. Methodologically, it integrates metadata reasoning, breadth-first querying, and active learning to reduce pairwise variable query complexity from O(n²) to O(n); introduces a dynamic scoring mechanism that jointly leverages mutual information, partial correlation, and LLM confidence for adaptive edge identification; and enables quantification of both direct and indirect causal effects involving sensitive attributes. Experiments across multiple benchmarks demonstrate that the framework significantly outperforms conventional CD methods—improving causal graph accuracy while enhancing bias detection capability. It establishes a novel paradigm for fairness-aware modeling in large-scale ML systems.

Technology Category

Application Category

📝 Abstract

Causal discovery (CD) plays a pivotal role in numerous scientific fields by clarifying the causal relationships that underlie phenomena observed in diverse disciplines. Despite significant advancements in CD algorithms that enhance bias and fairness analyses in machine learning, their application faces challenges due to the high computational demands and complexities of large-scale data. This paper introduces a framework that leverages Large Language Models (LLMs) for CD, utilizing a metadata-based approach akin to the reasoning processes of human experts. By shifting from pairwise queries to a more scalable breadth-first search (BFS) strategy, the number of required queries is reduced from quadratic to linear in terms of variable count, thereby addressing scalability concerns inherent in previous approaches. This method utilizes an Active Learning (AL) and a Dynamic Scoring Mechanism that prioritizes queries based on their potential information gain, combining mutual information, partial correlation, and LLM confidence scores to refine the causal graph more efficiently and accurately. This BFS query strategy reduces the required number of queries significantly, thereby addressing scalability concerns inherent in previous approaches. This study provides a more scalable and efficient solution for leveraging LLMs in fairness-driven CD, highlighting the effects of the different parameters on performance. We perform fairness analyses on the inferred causal graphs, identifying direct and indirect effects of sensitive attributes on outcomes. A comparison of these analyses against those from graphs produced by baseline methods highlights the importance of accurate causal graph construction in understanding bias and ensuring fairness in machine learning systems.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational complexity in causal discovery using LLMs and BFS.

Improves fairness analysis via accurate causal graph construction.

Combines active learning and dynamic scoring for efficient query prioritization.

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based causal discovery with metadata

BFS query strategy reduces query count

Active Learning with Dynamic Scoring

🔎 Similar Papers

No similar papers found.

Authors to Follow