LLM Meeting Decision Trees on Tabular Data

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address two key bottlenecks in applying large language models (LLMs) to tabular data—(i) poor generalizability and privacy risks of existing serialization methods, and (ii) high fine-tuning costs and context-length limitations of in-context learning—we propose DeLTa, the first framework that synergistically integrates LLMs with decision trees via logical rules as an intermediary. DeLTa eliminates table-to-text serialization, avoids LLM fine-tuning or in-context learning, and achieves end-to-end error correction by leveraging LLMs to rewrite and calibrate decision tree rules. Its core innovations are a rule-level error calibration mechanism and a zero-serialization modeling paradigm. Evaluated across diverse tabular benchmarks, DeLTa achieves state-of-the-art performance, significantly improving accuracy, robustness, and full-data learning capability while inherently mitigating privacy leakage risks.

Technology Category

Application Category

📝 Abstract
Tabular data have been playing a vital role in diverse real-world fields, including healthcare, finance, etc. With the recent success of Large Language Models (LLMs), early explorations of extending LLMs to the domain of tabular data have been developed. Most of these LLM-based methods typically first serialize tabular data into natural language descriptions, and then tune LLMs or directly infer on these serialized data. However, these methods suffer from two key inherent issues: (i) data perspective: existing data serialization methods lack universal applicability for structured tabular data, and may pose privacy risks through direct textual exposure, and (ii) model perspective: LLM fine-tuning methods struggle with tabular data, and in-context learning scalability is bottle-necked by input length constraints (suitable for few-shot learning). This work explores a novel direction of integrating LLMs into tabular data throughough logical decision tree rules as intermediaries, proposes a decision tree enhancer with LLM-derived rule for tabular prediction, DeLTa. The proposed DeLTa avoids tabular data serialization, and can be applied to full data learning setting without LLM fine-tuning. Specifically, we leverage the reasoning ability of LLMs to redesign an improved rule given a set of decision tree rules. Furthermore, we provide a calibration method for original decision trees via new generated rule by LLM, which approximates the error correction vector to steer the original decision tree predictions in the direction of ``errors'' reducing. Finally, extensive experiments on diverse tabular benchmarks show that our method achieves state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

LLMs lack universal data serialization for tabular data
LLM fine-tuning struggles with tabular data scalability
Privacy risks arise from direct textual exposure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLMs with decision tree rules
Avoids tabular data serialization privacy risks
Uses LLM to enhance decision tree accuracy
🔎 Similar Papers
No similar papers found.
Hangting Ye
Hangting Ye
Jilin University
Machine LearningData Mining
J
Jinmeng Li
School of Artificial Intelligence, Jilin University
H
He Zhao
CSIRO’s Data61
D
Dandan Guo
School of Artificial Intelligence, Jilin University
Y
Yi Chang
School of Artificial Intelligence, Jilin University