🤖 AI Summary
To address poor generalization due to label scarcity, unreliable feature engineering, and high inference latency from test-time LLM invocation in few-shot tabular learning, this paper proposes an *implicit knowledge distillation framework at training time*, enabling the first directed transfer of latent priors from large language models (LLMs) to tabular models. Our method comprises four key components: (i) implicit-space knowledge distillation, (ii) feature-value weighted fusion, (iii) LLM-tabular joint representation alignment, and (iv) semi-supervised optimization—supporting both unsupervised pretraining and unlabeled data augmentation while eliminating test-time LLM dependency entirely. Evaluated on multiple few-shot tabular benchmarks, our approach achieves state-of-the-art performance, particularly under extreme settings (≤5 samples per class), where it demonstrates significantly improved robustness over text prompting and test-time knowledge extraction baselines.
📝 Abstract
Few-shot tabular learning, in which machine learning models are trained with a limited amount of labeled data, provides a cost-effective approach to addressing real-world challenges. The advent of Large Language Models (LLMs) has sparked interest in leveraging their pre-trained knowledge for few-shot tabular learning. Despite promising results, existing approaches either rely on test-time knowledge extraction, which introduces undesirable latency, or text-level knowledge, which leads to unreliable feature engineering. To overcome these limitations, we propose Latte, a training-time knowledge extraction framework that transfers the latent prior knowledge within LLMs to optimize a more generalized downstream model. Latte enables general knowledge-guided downstream tabular learning, facilitating the weighted fusion of information across different feature values while reducing the risk of overfitting to limited labeled data. Furthermore, Latte is compatible with existing unsupervised pre-training paradigms and effectively utilizes available unlabeled samples to overcome the performance limitations imposed by an extremely small labeled dataset. Extensive experiments on various few-shot tabular learning benchmarks demonstrate the superior performance of Latte, establishing it as a state-of-the-art approach in this domain