Two in context learning tasks with complex functions

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work investigates the function approximation capability and in-context learning (ICL) generalization of small, feed-forward-network-free pure-attention Transformers. Can such minimal architectures approximate arbitrary polynomials and continuous functions, and generalize beyond training tasks? Method: We propose a lightweight architecture based on function sampling and supervised ICL, constructing diverse mathematical tasks—including polynomials of arbitrary degree and complex-function zero localization—within shared training/test distributions. Theory: We prove that attention-only models can uniformly approximate any polynomial and continuous function on compact domains. Experiments: The model significantly outperforms GPT-4 on unseen polynomial families and zero-finding tasks. Contribution: This is the first work to establish rigorous universal approximation guarantees for small-scale pure-attention models; it reveals strong in-distribution generalization yet a fundamental limitation: inability to abstract function-class structure, implying generalization is strictly confined to the support of the training distribution.

Technology Category

Application Category

📝 Abstract

We examine two in context learning (ICL) tasks with mathematical functions in several train and test settings for transformer models. Our study generalizes work on linear functions by showing that small transformers, even models with attention layers only, can approximate arbitrary polynomial functions and hence continuous functions under certain conditions. Our models also can approximate previously unseen classes of polynomial functions, as well as the zeros of complex functions. Our models perform far better on this task than LLMs like GPT4 and involve complex reasoning when provided with suitable training data and methods. Our models also have important limitations; they fail to generalize outside of training distributions and so don't learn class forms of functions. We explain why this is so.

Problem

Research questions and friction points this paper is trying to address.

Analyzing transformer models in ICL tasks

Approximating polynomial and continuous functions

Limitations in generalizing beyond training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers approximate polynomial functions

Models outperform GPT4 in tasks

Limited generalization outside training data

🔎 Similar Papers

In-Context Learning with Long-Context Models: An In-Depth Exploration