LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

📅 2024-10-28

📈 Citations: 1

✨ Influential: 0

career value

178K/year

🤖 AI Summary

To address missing data imputation in healthcare and finance, this paper proposes a fine-tuning-free ensemble framework leveraging multiple large language models (LLMs). Methodologically, it constructs a feature-value bipartite information graph at dual granularity to identify high-quality neighboring samples, integrates graph-augmented few-shot prompting for structured neighborhood awareness, and employs multi-LLM collaborative reasoning with confidence-weighted voting for final imputation. Its key contribution is the first introduction of an “LLM forest” ensemble paradigm—unifying graph-based modeling, few-shot prompt engineering, and confidence-weighted decision fusion—to effectively mitigate hallucination. Evaluated on nine real-world datasets, the method significantly outperforms state-of-the-art approaches, achieving an average 18.7% reduction in mean absolute error (MAE), while demonstrating both high accuracy and strong robustness.

Technology Category

Application Category

📝 Abstract

Missing data imputation is a critical challenge in various domains, such as healthcare and finance, where data completeness is vital for accurate analysis. Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation, making them a promising tool for data imputation. However, challenges persist in designing effective prompts for a finetuning-free process and in mitigating the risk of LLM hallucinations. To address these issues, we propose a novel framework, LLM-Forest, which introduces a"forest"of few-shot learning LLM"trees"with confidence-based weighted voting, inspired by ensemble learning (Random Forest). This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity. Extensive experiments on 9 real-world datasets demonstrate the effectiveness and efficiency of LLM-Forest.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Missing Data Imputation

Medical and Financial Domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-Forest

Data Imputation

Ensemble Voting

🔎 Similar Papers

A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications