TabPFN-3: Technical Report

Source

Core Claim

TabPFN-3 extends the TabPFN line from small tabular datasets toward larger static supervised tables, with a redesigned in-context architecture, synthetic-only pretraining prior, many-class support, inference optimizations, and downstream extensions for time-series forecasting, relational data, and tabular-text data.

Benchmarked Model Entries

  • TabPFN-3: open-weight base tabular foundation model for classification and regression.
  • TabPFN-3-Plus: API and enterprise variant with native tabular-text support.
  • TabPFN-3-Plus (Thinking): API variant using test-time compute scaling for stronger tabular predictions.
  • TabPFN-TS-3: specialized TabPFN-3 checkpoint for time-series forecasting benchmarks.

Key Contributions

  • Scales the TabPFN family to datasets with up to 1M training rows and 200 features, while also supporting other regimes such as many features or many classes.
  • Uses a three-stage architecture: feature distribution embedding, row-wise feature aggregation, and row-embedding in-context learning.
  • Adds an attention-based many-class decoder, stronger preprocessing, improved synthetic structural-causal pretraining prior, row-chunked inference, and reduced KV caching.
  • Reports that a forward pass of TabPFN-3 sets a new TabArena performance standard, while TabPFN-3-Plus (Thinking) further improves results through test-time compute.
  • Extends the TabPFN ecosystem through TabPFN-TS-3 for time-series forecasting, RelBenchV1 relational-data evaluation, TabSTAR tabular-text evaluation, and faster SHAP-value computation through KV caching.

Method Notes

The core model remains a static tabular-data foundation model. Its context is a supervised table of rows and labels, not an ordered temporal history, and ordinary feature columns should not be treated as events, exogenous variables, control inputs, or interventions by default.

TabPFN-TS-3 is the time-series-specific extension to track in this wiki. It matters because it shows that the TabPFN synthetic-prior and in-context-learning machinery can be specialized for passive forecasting, but those results should not be merged with static TabArena results or with action-conditioned world-model claims.

Evidence And Results

  • The report states that TabPFN-3 outperforms other TabArena models in a forward pass and Pareto-dominates the speed/performance frontier under the reported protocol.
  • The report says TabPFN-3 beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M training rows and 200 features.
  • TabPFN-3-Plus (Thinking) is reported to beat non-TabPFN models by more than 200 Elo on standard TabArena and 420 Elo on the largest-data subset.
  • TabPFN-TS-3 is reported as ranking second on fev-bench, behind Chronos-2 in the report narrative.
  • The report claims up to 20x speedup over TabPFN-2.5 and 1M-row inference on a single H100 through reduced KV caching and row chunking.

Limitations

  • This is a Prior Labs technical report rather than a peer-reviewed paper.
  • Several headline results depend on API or enterprise variants, especially TabPFN-3-Plus and TabPFN-3-Plus (Thinking), not only the open-weight base checkpoint.
  • The TABPFN-3.0 License v1.0 is permissive for research and internal evaluation, but production commercial use requires separate licensing.
  • The time-series result is from a specialized TabPFN-TS-3 checkpoint, so it should not be used as evidence that the base TabPFN-3 interface directly models temporal histories.
  • The model is not an action-conditioned world model: it predicts labels or forecasts from observed context, but it does not expose actions, control inputs, interventions, or counterfactual rollout channels as first-class interfaces.

Open Questions

  • Which parts of the TabPFN-3 synthetic structural-causal prior transfer to multivariate time-series forecasting without leaking benchmark-specific templates?
  • How should the wiki compare open-weight TabPFN-3, API TabPFN-3-Plus, Thinking mode, and TabPFN-TS-3 without collapsing different availability and adaptation modes?
  • Can PFN-style tabular context learning become action-conditioned if future work makes interventions, operator actions, or control inputs explicit in the context?