TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
Source
- Raw Markdown: paper_tabicl-2025.md
- PDF: paper_tabicl-2025.pdf
- Preprint: https://arxiv.org/abs/2502.05564
- Official code: https://github.com/soda-inria/tabicl
- Official checkpoint: https://huggingface.co/jingang/TabICL/blob/main/tabicl-classifier-v1-20250208.ckpt
Core Claim
TabICL shows that in-context learning can scale beyond small tabular classification tasks by first compressing each row into a fixed-dimensional embedding, then running dataset-level in-context prediction over those row embeddings.
Key Contributions
- Introduces a two-stage architecture: distribution-aware column-wise embedding and row-wise interaction create row embeddings, then a transformer performs dataset-wise in-context learning.
- Pretrains on synthetic structural-causal-model datasets, including a tree-based synthetic data generator, with curriculum learning up to 60K samples.
- Uses hierarchical classification to extend inference beyond the 10-class limit used during pretraining.
- Reports faster inference than TabPFNv2 and stronger large-dataset results on TALENT classification datasets.
Benchmarked Models
- TabICL-v1: the ICML 2025 classification model evaluated in the paper. The official code is released at https://github.com/soda-inria/tabicl, and the package/project checkpoint is
tabicl-classifier-v1-20250208.ckpt.
Method Notes
TabICL is a tabular-data foundation model rather than a time-series model. Its context is a supervised training table, not a temporal history, and its predictions are class probabilities for test rows rather than future observations.
The model uses cell values as primitive units. A Set Transformer-style column encoder builds distribution-aware cell embeddings, a row-wise transformer aggregates feature interactions into row embeddings, and a final transformer combines row embeddings with labels for in-context classification.
Evidence And Results
The paper evaluates on 200 TALENT classification datasets and focuses on 171 datasets with at most 10 classes for native comparison with TabPFNv2. It reports that TabICL is statistically comparable to TabPFNv2 overall, faster in training-plus-inference time, and stronger on the 53 datasets with more than 10K samples.
For scalability, the paper reports inference on tables up to 500K samples and 500 features with CPU and disk offloading, and practical 100K-sample inference with about 5GB GPU memory and 32GB RAM.
Limitations
TabICL is classification-only in the paper, breaks exact column permutation invariance through RoPE, and relies on the TALENT benchmark’s preprocessing and evaluation assumptions. It does not model temporal next-state dynamics or action-conditioned consequences.
Links Into The Wiki
- Tabular Foundation Models
- Time-Series Foundation Models
- Synthetic Data For Time Series
- Time-Series Benchmark Hygiene
- TabPFN-3
Open Questions
- Can TabICL-style row compression transfer to passive dynamics models or action-conditioned world models where context contains ordered trajectories?
- Which synthetic data generators best preserve useful priors when moving from static tabular data to multivariate time series?