TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

Source

Raw Markdown: paper_tabicl-2025.md
PDF: paper_tabicl-2025.pdf
Preprint: https://arxiv.org/abs/2502.05564
Official code: https://github.com/soda-inria/tabicl
Official checkpoint: https://huggingface.co/jingang/TabICL/blob/main/tabicl-classifier-v1-20250208.ckpt

Core Claim

TabICL shows that in-context learning can scale beyond small tabular classification tasks by first compressing each row into a fixed-dimensional embedding, then running dataset-level in-context prediction over those row embeddings.

Key Contributions

Introduces a two-stage architecture: distribution-aware column-wise embedding and row-wise interaction create row embeddings, then a transformer performs dataset-wise in-context learning.
Pretrains on synthetic structural-causal-model datasets, including a tree-based synthetic data generator, with curriculum learning up to 60K samples.
Uses hierarchical classification to extend inference beyond the 10-class limit used during pretraining.
Reports faster inference than TabPFNv2 and stronger large-dataset results on TALENT classification datasets.

Benchmarked Models

TabICL-v1: the ICML 2025 classification model evaluated in the paper. The official code is released at https://github.com/soda-inria/tabicl, and the package/project checkpoint is tabicl-classifier-v1-20250208.ckpt.

Method Notes

TabICL is a tabular-data foundation model rather than a time-series model. Its context is a supervised training table, not a temporal history, and its predictions are class probabilities for test rows rather than future observations.

The model uses cell values as primitive units. A Set Transformer-style column encoder builds distribution-aware cell embeddings, a row-wise transformer aggregates feature interactions into row embeddings, and a final transformer combines row embeddings with labels for in-context classification.

Evidence And Results

The paper evaluates on 200 TALENT classification datasets and focuses on 171 datasets with at most 10 classes for native comparison with TabPFNv2. It reports that TabICL is statistically comparable to TabPFNv2 overall, faster in training-plus-inference time, and stronger on the 53 datasets with more than 10K samples.

For scalability, the paper reports inference on tables up to 500K samples and 500 features with CPU and disk offloading, and practical 100K-sample inference with about 5GB GPU memory and 32GB RAM.

Limitations

TabICL is classification-only in the paper, breaks exact column permutation invariance through RoPE, and relies on the TALENT benchmark’s preprocessing and evaluation assumptions. It does not model temporal next-state dynamics or action-conditioned consequences.

Links Into The Wiki

Open Questions

Can TabICL-style row compression transfer to passive dynamics models or action-conditioned world models where context contains ordered trajectories?
Which synthetic data generators best preserve useful priors when moving from static tabular data to multivariate time series?

Alex Knowledge Base

Explorer

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

Source

Core Claim

Key Contributions

Benchmarked Models

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks