🤖 AI Summary
TabPFN v2 outperforms tree-based models on tabular benchmarks but suffers from quadratic computational complexity in sequence length, rendering it infeasible for contexts exceeding 10K tokens; existing compression techniques (e.g., KNN sampling) require task-specific preprocessing. To address this, we propose LCTabPFN—a training-free, preprocessing-free long-context adaptation method for TabPFN. Its core is a lightweight tiled-block attention mechanism that computes exact self-attention within blocks while maintaining linear-time scalability and full GPU compatibility. On the TabArena benchmark, LCTabPFN achieves substantial gains in long-context regimes, enabling TabPFN to scale to ~10K samples without accuracy degradation—marking the first such capability for this architecture. Crucially, it surpasses state-of-the-art tree models across all evaluated long-context settings, establishing a new performance frontier for transformer-based tabular learning.
📝 Abstract
TabPFN v2 achieves better results than tree-based models on several tabular benchmarks, which is notable since tree-based models are usually the strongest choice for tabular data. However, it cannot handle more than 10K context tokens because transformers have quadratic computation and memory costs.
Unlike existing approaches that rely on context compression, such as selecting representative samples via K-nearest neighbors (KNN), we introduce a extbf{tiled-block} strategy to compute attention within the TabPFN framework. This design is compatible with standard GPU setups and, to the best of our knowledge, is the first to enable TabPFN to extbf{process long contexts without any pre-processing}. We demonstrate the effectiveness of our approach on the standard TabArena benchmark.