Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

To address poor generalization due to label scarcity, unreliable feature engineering, and high inference latency from test-time LLM invocation in few-shot tabular learning, this paper proposes an *implicit knowledge distillation framework at training time*, enabling the first directed transfer of latent priors from large language models (LLMs) to tabular models. Our method comprises four key components: (i) implicit-space knowledge distillation, (ii) feature-value weighted fusion, (iii) LLM-tabular joint representation alignment, and (iv) semi-supervised optimization—supporting both unsupervised pretraining and unlabeled data augmentation while eliminating test-time LLM dependency entirely. Evaluated on multiple few-shot tabular benchmarks, our approach achieves state-of-the-art performance, particularly under extreme settings (≤5 samples per class), where it demonstrates significantly improved robustness over text prompting and test-time knowledge extraction baselines.

Technology Category

Application Category

📝 Abstract

Few-shot tabular learning, in which machine learning models are trained with a limited amount of labeled data, provides a cost-effective approach to addressing real-world challenges. The advent of Large Language Models (LLMs) has sparked interest in leveraging their pre-trained knowledge for few-shot tabular learning. Despite promising results, existing approaches either rely on test-time knowledge extraction, which introduces undesirable latency, or text-level knowledge, which leads to unreliable feature engineering. To overcome these limitations, we propose Latte, a training-time knowledge extraction framework that transfers the latent prior knowledge within LLMs to optimize a more generalized downstream model. Latte enables general knowledge-guided downstream tabular learning, facilitating the weighted fusion of information across different feature values while reducing the risk of overfitting to limited labeled data. Furthermore, Latte is compatible with existing unsupervised pre-training paradigms and effectively utilizes available unlabeled samples to overcome the performance limitations imposed by an extremely small labeled dataset. Extensive experiments on various few-shot tabular learning benchmarks demonstrate the superior performance of Latte, establishing it as a state-of-the-art approach in this domain

Problem

Research questions and friction points this paper is trying to address.

Transferring LLMs' latent knowledge for few-shot tabular learning

Overcoming unreliable feature engineering in tabular data

Enhancing performance with limited labeled and unlabeled data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latte transfers LLMs' latent knowledge for tabular learning

Training-time extraction avoids test-time latency issues

Combines unsupervised pre-training with few-shot learning

🔎 Similar Papers

Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science