LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

📅 2024-10-28
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address missing data imputation in healthcare and finance, this paper proposes a fine-tuning-free ensemble framework leveraging multiple large language models (LLMs). Methodologically, it constructs a feature-value bipartite information graph at dual granularity to identify high-quality neighboring samples, integrates graph-augmented few-shot prompting for structured neighborhood awareness, and employs multi-LLM collaborative reasoning with confidence-weighted voting for final imputation. Its key contribution is the first introduction of an “LLM forest” ensemble paradigm—unifying graph-based modeling, few-shot prompt engineering, and confidence-weighted decision fusion—to effectively mitigate hallucination. Evaluated on nine real-world datasets, the method significantly outperforms state-of-the-art approaches, achieving an average 18.7% reduction in mean absolute error (MAE), while demonstrating both high accuracy and strong robustness.

Technology Category

Application Category

📝 Abstract
Missing data imputation is a critical challenge in various domains, such as healthcare and finance, where data completeness is vital for accurate analysis. Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation, making them a promising tool for data imputation. However, challenges persist in designing effective prompts for a finetuning-free process and in mitigating the risk of LLM hallucinations. To address these issues, we propose a novel framework, LLM-Forest, which introduces a"forest"of few-shot learning LLM"trees"with confidence-based weighted voting, inspired by ensemble learning (Random Forest). This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity. Extensive experiments on 9 real-world datasets demonstrate the effectiveness and efficiency of LLM-Forest.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Missing Data Imputation
Medical and Financial Domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-Forest
Data Imputation
Ensemble Voting
🔎 Similar Papers
No similar papers found.