"Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models

📅 2024-09-27
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of constructing interpretable models for few-shot tabular data, this paper proposes Zero-shot Decision Trees (ZDT): a framework that generates structurally sound and semantically coherent decision trees—without any training data or fine-tuning—by leveraging the implicit world knowledge encoded in large language models (LLMs) via prompt engineering, and subsequently extracts topological embeddings from the generated trees. The method comprises symbolic parsing of tree structures and learnable tree embedding encoding. To our knowledge, this is the first work achieving pure zero-shot decision tree generation and embedding learning. Experiments on multiple small-scale tabular benchmarks demonstrate that ZDT achieves significantly higher accuracy than conventional data-driven decision trees. Moreover, its learned embeddings consistently outperform those derived from XGBoost and other tree-based models in downstream tasks, establishing a novel, knowledge-driven, and highly interpretable paradigm for low-resource scenarios.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) provide powerful means to leverage prior knowledge for predictive modeling when data is limited. In this work, we demonstrate how LLMs can use their compressed world knowledge to generate intrinsically interpretable machine learning models, i.e., decision trees, without any training data. We find that these zero-shot decision trees can even surpass data-driven trees on some small-sized tabular datasets and that embeddings derived from these trees perform better than data-driven tree-based embeddings on average. Our decision tree induction and embedding approaches can therefore serve as new knowledge-driven baselines for data-driven machine learning methods in the low-data regime. Furthermore, they offer ways to harness the rich world knowledge within LLMs for tabular machine learning tasks. Our code and results are available at https://github.com/ml-lab-htw/llm-trees.
Problem

Research questions and friction points this paper is trying to address.

Zero-shot decision tree induction using LLMs without training data
Comparing zero-shot trees with data-driven trees on tabular datasets
Enhancing tabular ML tasks via LLM knowledge-driven embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot decision tree induction using LLMs
Knowledge-driven embeddings outperform data-driven ones
LLMs generate interpretable models without training data
🔎 Similar Papers
No similar papers found.
Ricardo Knauer
Ricardo Knauer
AI Researcher and Trainer, Hochschule für Technik und Wirtschaft Berlin
Interpretable AISmall DataHealthcareMovement SciencePhysiotherapy
Mario Koddenbrock
Mario Koddenbrock
PHD Student at HTW Berlin
Machine LearningComputer Vision
R
Raphael Wallsberger
KI-Werkstatt, University of Applied Sciences Berlin, Germany
N
N.M. Brisson
Julius Wolff Institute, Berlin Institute of Health at Charit´e - Universit¨atsmedizin Berlin, Germany; Berlin Movement Diagnostics (BeMoveD), Charit´e - Universit¨atsmedizin Berlin, Germany
G
Georg N. Duda
Julius Wolff Institute, Berlin Institute of Health at Charit´e - Universit¨atsmedizin Berlin, Germany; Berlin Movement Diagnostics (BeMoveD), Charit´e - Universit¨atsmedizin Berlin, Germany
D
Deborah Falla
Centre of Precision Rehabilitation for Spinal Pain, School of Sport, Exercise and Rehabilitation Sciences, University of Birmingham, United Kingdom
D
David W. Evans
Centre of Precision Rehabilitation for Spinal Pain, School of Sport, Exercise and Rehabilitation Sciences, University of Birmingham, United Kingdom
Erik Rodner
Erik Rodner
University of Applied Sciences (HTW Berlin)
computer visionmachine learningtime series analysis