---
bibliography:
- bib.bib
---

```{=latex}
\newcommand{\field}[2]{\textbf{#1:}\enspace #2\par}
```
```{=latex}
\renewcommand{\field}[2]{%
  \par\noindent
  {\small\color{black!55}\textsc{#1}}\quad #2\par\addvspace{2pt}}
```
```{=latex}
\newenvironment{redblock}{\par\begingroup\color{red}}{\endgroup\par}
```
```{=latex}
\newcommand{\ourmodeltwofive}{\mbox{TabPFN-2.5}\xspace}
```
```{=latex}
\newcommand{\ourmodel}{\mbox{TabPFN-3}\xspace}
```
```{=latex}
\newcommand{\ourmodelenhanced}{\mbox{TabPFN-3-Plus} (Thinking)\xspace}
```
```{=latex}
\newcommand{\ourmodelplus}{\mbox{TabPFN-3-Plus}\xspace}
```
```{=latex}
\newcommand{\rtzero}{$\text{RT}_{\text{zero}}$}
```
```{=latex}
\newcommand{\err}{\operatorname{err}}
```
```{=latex}
\newcommand{\besterr}{\operatorname{best\_err}}
```
```{=latex}
\vspace*{-1cm}
```
```{=latex}
\begin{tcolorbox}

\vspace{-0.35em}\hspace*{-0.2cm}\includegraphics[width=0.2\linewidth]{figures/prior-logo.png}

{\LARGE\bfseries TabPFN-3: Technical Report\par}

\vspace{0.35em}
{\large \hyperref[app:contributors]{Prior Labs Team}}  (see Appendix \ref{app:contributors} for the list of contributors)


\vspace{1.5em}

Tabular data underpins most high-value prediction problems in science and industry, and TabPFN has driven the foundation model revolution for this modality. Designed with feedback from our users, \ourmodel builds on this foundation to scale state-of-the-art performance to datasets with 1M training rows and substantially reduce training and inference time.
Pretrained exclusively on synthetic data from our prior, \ourmodel dramatically pushes the frontier of tabular prediction and brings substantial gains on time series, relational, and tabular-text data.

\vspace{-1.0em}
\paragraph{A new performance standard.}
On the standard tabular benchmark TabArena, a forward pass of \ourmodel outperforms all other models, including tuned and ensembled baselines, by a significant margin, and pareto-dominates the speed/performance frontier. \ourmodel also scales to more diverse datasets: it ranks first on datasets with many classes, and beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M training rows and 200 features.


\vspace{-1.0em}
\paragraph{Thinking mode.}
\ourmodel introduces test-time compute scaling to tabular foundation models. Our API offering \ourmodelenhanced exploits this to beat all non-TabPFN models by over 200 Elo on the standard TabArena benchmark, rising to 420 Elo on the largest data subset, and outperforming AutoGluon 1.5 extreme in less than a tenth of its runtime, without using LLMs, real data, internet search or any other model besides TabPFN.

\vspace{-1.0em}
\paragraph{Broader capabilities.}
TabPFN-3 extends the capabilities of our models, enabling SOTA prediction on many-class datasets, relational data (new SOTA foundation model on RelBenchV1) and tabular-text datasets (SOTA on TabSTAR via \ourmodelplus).
It also directly improves existing integrations of TabPFN:
a specialized \ourmodel checkpoint,
TabPFN-TS-3, ranks 2$^{\text{nd}}$ on the time-series benchmark fev-bench,
and SHAP-value computation through \texttt{shapiq} is up to $120\times$ faster with KV caching.

\vspace{-1.0em}
\paragraph{An enterprise-ready model.} \ourmodel achieves this performance while being up to 20x faster than \ourmodeltwofive. In addition, a reduced KV cache and row-chunking scale to 1M rows on a single H100 with fast inference speed.


\vspace{.5em}

We release TabPFN-3 under the \texttt{TABPFN-3.0 License v1.0}, permissive for research and internal evaluation. \ourmodelenhanced is available via API and enterprise licensing including on-prem and VPC environments (AWS SageMaker, Azure AI Foundry).

\vspace{0.5em}

  \field{Date}{May 12, 2026}
   \field{License}{\texttt{TABPFN-3.0 License v1.0} (see Section \ref{sec:license} for details)}
  \field{Docs}{\url{https://docs.priorlabs.ai}}

\end{tcolorbox}
```
```{=latex}
\centering
```
![**Performance on the TabArena benchmark [@erickson2025tabarena], largest data subset (10k-100k samples)**. TabPFN-3 outperforms any other model in a forward pass. `\ourmodelenhanced `{=latex}is dramatically better yet, outperforming AutoGluon 1.5 extreme [@autogluon_tabular], a complex ensemble of models tuned for 4 hours, while being 10x faster. ](figures/tabarena_v3/Medium/tuning-impact-elo.png "fig:"){#fig:tabarena-hero-plot width="\\linewidth"} `\vspace{-0.5cm}`{=latex}

```{=latex}
\newpage
```
`\setcounter{tocdepth}{2}`{=latex} `\tableofcontents`{=latex} `\newpage`{=latex}

```{=latex}
\centering
```
```{=latex}
\vspace*{-6.5cm}
```
```{=latex}
\centering
```
![**Pairwise win rates on TabArena-medium** (10k--100k rows) for a curated set of the strongest models on TabArena. See Appendix `\ref{sec:tabarena_leaderboard_tables}`{=latex} for the full results.](figures/tabarena_v3/Medium/tuning_trajectories/pareto_n_configs_imp_total.png){#fig:tabarena_winrate_medium width="\\linewidth"}

```{=latex}
\hfill
```
```{=latex}
\centering
```
![**Pairwise win rates on TabArena-medium** (10k--100k rows) for a curated set of the strongest models on TabArena. See Appendix `\ref{sec:tabarena_leaderboard_tables}`{=latex} for the full results.](figures/tabarena_v3/Medium/winrate_matrix.png){#fig:tabarena_winrate_medium width="\\linewidth"}

Introduction
============

Tabular data sits at the core of operational decision-making across science and industry, including clinical risk prediction [@henry2015targeted; @johnson2016mimic; @ophir2020deep], credit scoring [@abdou2011credit; @khandani2010consumer; @lessmann2015benchmarking], predictive maintenance [@carvalho2019systematic; @dalzochio2020machine], and scientific measurement [@baldi2014searching; @dunn2020benchmarking]. While gradient-boosted trees were the reliable default for decades [@shwartz2022tabular; @grinsztajn2022tree; @salinas2024tabrepo], tabular foundation models have displaced them as the strongest predictors on standard small-to-medium-sized benchmarks over the last year [@erickson2025tabarena].

Earlier TabPFN releases established and extended this paradigm. TabPFN v1 [@hollmann2022tabpfnv1] showed that a transformer pretrained on synthetic tasks could approximate Bayesian inference in a single forward pass, though only on a thousand rows of clean numerical data. TabPFN v2 [@Hollmann2025tabpfnv2] scaled this to 10,000 rows datasets with categorical features, missing values, and outliers, becoming the first tabular foundation model to outperform tuned gradient-boosted trees on standard benchmarks. TabPFN-2.5 [@TabPFN-2.5] extended the strong performance to 100,000 rows and 2,000 features and matched four-hour-tuned ensembles in a single forward pass. Across these releases, an active research ecosystem of extensions grew on top of the core model -- domains include time-series forecasting [@hoo2024tabpfn_ts], causal inference [@robertson_dopfn; @balazadeh_causalpfn; @feuerriegel_causalfm], Bayesian optimization [@Yu2025GITBO], graph learning [@Hayler2025GraphsTablesZeroShot; @eremeev2025turningtabularfoundationmodels], interpretability [@rundel2024interpretable; @ye2026closer], reinforcement learning [@Schiff2025TabPFNRL] -- with over 200 published applications (see Appendix `\ref{app:use_cases}`{=latex}) and more than three million PyPI downloads.

TabPFN-3 is shaped by the feedback from users and the entire ecosystem. To remove common bottlenecks, we scaled beyond a hundred thousand rows to one million rows, cut the memory and latency of inference at scale, added support for many-class classification, and honed our calibrated predictive distributions in a single forward pass. Furthermore, we carefully designed the TabPFN-3 model and training process to lift performance on both core tabular prediction as well as the many downstream extensions built on top of the open-source model, in particular time-series forecasting, multi-table relational data, and interpretability.

The remainder of this report describes the architecture, prior, and inference-time optimizations of TabPFN-3 (Section `\ref{sec:methods}`{=latex}); evaluates its performance on public and internal benchmarks across classification, regression, many-class, time-series, and relational data (Section `\ref{sec:results}`{=latex}); surveys the adoption and ecosystem the model is built for (Section `\ref{sec:usecases_extensions}`{=latex}); and details licensing and availability (Section `\ref{sec:license}`{=latex}). Appendices provide architectural hyperparameters, prior visualizations, additional internal benchmarks, more detailed benchmark results and an extensive list of published TabPFN use cases. For installation and usage, see <https://docs.priorlabs.ai/>.

```{=latex}
\centering
```
```{=latex}
\begin{subtable}[b]{0.47\textwidth}
    \centering
    \small
    \begin{tabular}{lrrrr}
      \toprule
      \multirow{2}{*}{Model} & \multirow{2}{*}{Rows} & \multirow{2}{*}{Features}
        & \multicolumn{2}{c}{Parameters} \\
      \cmidrule(lr){4-5}
      & & & Clf. & Reg. \\
      \midrule
      TabPFN-v1  & $1{,}000$       & $100$      & $26$\,M & ---       \\
      TabPFN-v2  & $10{,}000$      & $500$      & $7$\,M  & $11$\,M \\
      TabPFN-2.5 & $100{,}000$     & $2{,}000$  & $11$\,M & $10$\,M \\
      TabPFN-2.6 & $100{,}000$     & $2{,}000$  & $11$\,M & $13$\,M \\
      \midrule
      \multirow{3}{*}{TabPFN-3}
        & $1{,}000{,}000$ & $200$      & \multirow{3}{*}{$53$\,M} & \multirow{3}{*}{$58$\,M} \\
        & $100{,}000$     & $2{,}000$  & & \\
        & $1{,}000$       & $20{,}000$ & & \\
      \bottomrule
    \end{tabular}
    \vspace{1.8cm}
    \caption{\textbf{Overview of previous TabPFN releases}, the maximal numbers of rows and features that they yielded state-of-the-art performance in, and their parameter counts.
    TabPFN-v1 supports classification datasets only.}
    \label{tab:tabpfn-variants}
  \end{subtable}
```
```{=latex}
\hfill
```
```{=latex}
\centering
```
![**TabPFN-3 presents a significant improvement against TabPFN-2.5**. We report the per-datasets scores on TabArena. The normalization procedure is described in Section `\ref{app:methodology}`{=latex}.](figures/hero_plots/latest_results_per_split_all__TabPFN-3_default_vs_v2p5.png){#fig:tabpfn3-vs-v2p5 width="0.8\\linewidth"}

TabPFN-3 {#sec:methods}
========

TabPFN-3 comes with a new architecture (Section `\ref{sec:arch_overview}`{=latex}), including an attention-based many-class decoder (Section `\ref{sec:many-class-decoder}`{=latex}), an improved preprocessing pipeline (Section `\ref{sec:preprocessing}`{=latex}), inference-time optimizations that enable scaling to one million rows on a single GPU (Section `\ref{sec:inference-optimization}`{=latex}), and an improved synthetic SCM prior used for pre-training (Section `\ref{sec:synthetic-prior}`{=latex}). We also introduce the API and enterprise features `\ourmodelplus `{=latex}which handles text in tables natively and `\ourmodelenhanced `{=latex}which applies test-time-compute for dramatically improved performance (Section `\ref{sec:tabpfn3plus}`{=latex}).

Architecture {#sec:arch_overview}
------------

An overview of TabPFN-3's full architecture is shown in `\Cref{fig:arch_diagram}`{=latex}. TabPFN-3 introduces a substantially redesigned architecture that scales in-context learning to datasets with one million rows.\
TabPFN v1 [@hollmann2022tabpfnv1] used a transformer architecture to perform in-context learning (ICL) on embeddings of entire rows. TabPFN-2.x (v2, v2.5, v2.6) [@Hollmann2025tabpfnv2; @TabPFN-2.5] used a transformer architecture that alternates row-wise and feature-wise attention layers; this improves performance, but becomes prohibitively expensive as the dataset size grows. TabPFN-3 returns to TabPFN v1's ICL for embeddings of entire rows. It builds on the two-stage row-compression design introduced by @qu2025tabicl [@qu2026tabiclv2] in the TabICL architecture, which uses a column-wise feature embedding layer followed by row-wise feature aggregation to obtain the row representation that is used in a TabPFN v1-like ICL layer.

Before entering the two compression stages, we group features, similar to TabPFN-2.x, while adopting TabICLv2's[@qu2026tabiclv2] group assignment, which creates triplets by grouping each feature with two cyclically shifted neighbors. Each triplet is mapped to the hidden dimension of the model by a learned linear projection (cell embedding), and target-aware embeddings are added to the cell embeddings of training rows [@qu2026tabiclv2].

The resulting grouped feature embeddings are processed by the following three stages:

-   **Stage 1: Feature distribution embedding (column-wise).** Each feature column is embedded independently using a transformer with an efficient inducing-point attention mechanism. This avoids the quadratic cost of full cross-row attention while still capturing column-level statistics at arbitrary dataset scales.

-   **Stage 2: Feature aggregation (row-wise).** For each data point, a set of learned [cls]{.smallcaps} tokens and the feature embeddings of that row attend to one another via non-causal attention, allowing cross-feature information to be distilled into a fixed number of vectors. Concatenating the [cls]{.smallcaps} tokens' hidden states yields a single, fixed-dimensional embedding per row, decoupling the subsequent in-context learning stage from the number of input features.

-   **Stage 3: In-context learning.** The row embeddings for the training and test sets are jointly passed to a transformer that performs in-context learning: training-row embeddings attend to one another to capture relationships within the training set, while test-row embeddings attend to training-row embeddings to produce predictions. Because each data point is now a single vector, this stage operates on a sequence proportional only to the number of rows, enabling efficient scaling to large datasets.

In Stages 1 and 3, and in the many-class decoder (introduced below), every attention layer applies the query-aware scalable softmax (QASSMax) [@qu2026tabiclv2], itself inspired from SSMAX [@ssmax_vanilla], which rescales attention queries as a function of input length, improving length generalization of in-context learning to large training sets. Detailed architectural hyperparameters are provided in Appendix `\ref{app:architecture-hyperparams}`{=latex}.

```{=latex}
\centering
```
```{=latex}
\adjustbox{width=0.75\linewidth}{
    \begin{tikzpicture}[
    >=Stealth,
    font=\small,
    cell/.style={
      rectangle, draw=teal!80, thick, fill=teal!20,
      minimum width=0.55cm, minimum height=0.38cm,
      anchor=center, inner sep=1pt
    },
    cellEmpty/.style={
      rectangle, draw=teal!80, thick, fill=white,
      minimum width=0.55cm, minimum height=0.38cm,
      anchor=center, inner sep=1pt
    },
    cellRed/.style={
      rectangle, draw=red!80, thick, fill=red!20,
      minimum width=0.55cm, minimum height=0.38cm,
      anchor=center, inner sep=1pt
    },
    nanCell/.style={
      rectangle, draw=red!80, thick, fill=red!15,
      minimum width=0.30cm, minimum height=0.38cm,
      anchor=center, inner sep=1pt, font=\tiny
    },
    block/.style={
      rectangle, draw=black, thick, fill=white,
      minimum width=6.0cm, minimum height=0.95cm,
      align=center, inner sep=4pt
    },
    smallblock/.style={
      rectangle, draw=black, thick, fill=white,
      minimum width=2.2cm, minimum height=0.50cm,
      align=center, font=\scriptsize, inner sep=2pt
    },
    decblock/.style={
      rectangle, draw=violet!60!black, thick, fill=violet!8,
      minimum width=2.6cm, minimum height=0.55cm,
      align=center, font=\footnotesize, inner sep=3pt
    },
    sidelabel/.style={
      align=left, font=\scriptsize\itshape, text=black!60,
      anchor=west, inner sep=2pt
    },
    novelty/.style={
      rectangle, draw=red!80, thick, fill=red!15,
      rounded corners=2pt, align=left,
      inner sep=4pt, font=\scriptsize, anchor=west
    },
    arrow/.style={-Stealth, thick, black},
    bluearrow/.style={-Stealth, thick, black},
    purplearrow/.style={-Stealth, thick, black},
    thinarrow/.style={-Stealth, thin, black},
    plus/.style={
      circle, draw=black, thick, inner sep=0.5pt,
      minimum size=0.40cm, font=\footnotesize, fill=white
    },
    embedbar/.style={
      rectangle, draw=teal!80, thick, fill=teal!20,
      minimum width=0.26cm, minimum height=0.55cm,
      anchor=center, inner sep=0pt
    },
    embedbarY/.style={
      rectangle, draw=orange!70!black, thick, fill=orange!25,
      minimum width=0.26cm, minimum height=0.55cm,
      anchor=center, inner sep=0pt
    },
    indvec/.style={
      rectangle, draw=blue!70, thick, fill=cyan!25,
      minimum width=0.26cm, minimum height=0.50cm,
      anchor=center, inner sep=0pt
    },
    tensorbox/.style={
      rectangle, draw=teal!80, very thick, fill=teal!8,
      minimum width=2.6cm, minimum height=0.60cm,
      align=center, font=\footnotesize
    },
    tensorboxV/.style={
      rectangle, draw=violet!60!black, very thick, fill=violet!6,
      minimum width=2.6cm, minimum height=0.60cm,
      align=center, font=\footnotesize
    }
  ]

  \definecolor{cA}{RGB}{ 70, 165, 195}
  \definecolor{cB}{RGB}{110, 175,  95}
  \definecolor{cX}{RGB}{225, 145,  60}
  \definecolor{cD}{RGB}{145, 105, 175}
  \colorlet{cA2}{cA!65!blue}
  \colorlet{cB2}{cB!65!teal}
  \colorlet{cX2}{cX!70!yellow}
  \colorlet{cD2}{cD!65!magenta}

  \pgfmathsetmacro{\orthXshift}{4.7}

  \def\stageW{6.8cm}


  \matrix (rawtbl) [
    matrix of nodes,
    nodes={cell, minimum height=0.32cm},
    row 3/.style={nodes={minimum height=0.15cm, inner sep=0pt}},
    column sep=-\pgflinewidth, row sep=-\pgflinewidth,
    nodes in empty cells,
  ] at (0, 7.1)
  {
    {\tiny $x_{11}$} & {\tiny $x_{12}$} & {\tiny $\cdots$} & {\tiny $x_{1m}$} \\
    {\tiny $x_{21}$} & {\tiny \textcolor{red!80}{Inf}} & {\tiny $\cdots$} & {\tiny \textcolor{red!80}{NaN}} \\
    {\tiny \raisebox{2pt}{\scalebox{1}[0.6]{$\vdots$}}} & {\tiny \raisebox{2pt}{\scalebox{1}[0.6]{$\vdots$}}} & {\tiny \raisebox{2pt}{\scalebox{1}[0.6]{$\ddots$}}} & {\tiny \raisebox{2pt}{\scalebox{1}[0.6]{$\vdots$}}} \\
  };
  \node[anchor=west, font=\scriptsize] at (rawtbl.east) {Input $X \in \mathbb{R}^{N \times C}$};

  \coordinate (mainSplit) at (0, 6.30);
  \draw[thick, solid] (rawtbl.south) -- (mainSplit);

  \tikzset{
    preprocstep/.style={
      align=center, font=\scriptsize, inner ysep=1pt, inner xsep=2pt,
      draw=groupGrayEdge, fill=white, rounded corners=2pt,
      minimum width=4.7cm, minimum height=0.30cm
    }
  }

  \coordinate (grayPanelTop) at (0, 5.65);

  \node[preprocstep] (stdblk) at (0, 5.30)
  {Mean-Imputation~\&~Standardize};

  \node[preprocstep, below=2pt of stdblk] (group)
  {Feature Grouping \tiny circular shifts $(0,1,3)$};

  \node[preprocstep, below=2pt of group] (linear)
  {Cell Embedding, Linear$(\,3{+}3 \to d\,)$};

  \node[anchor=south, font=\footnotesize\bfseries, text=black,
    draw=groupGrayEdge, thick, rounded corners=2pt,
    fill=groupGray, inner xsep=3pt, inner ysep=1pt] (grayTitle)
    at (grayPanelTop) {Feature Grouping \& Cell Embedding};

  \begin{scope}[on background layer]
    \node[fit=(grayPanelTop)(stdblk)(group)(linear), minimum width=\stageW,
      fill=groupGray, draw=groupGrayEdge, thick,
      rounded corners=4pt, inner xsep=10pt, inner ysep=4pt] (grayGroup) {};
  \end{scope}

  \draw[arrow] (mainSplit) -- (grayTitle.north);

  \node[nanCell, fill=red!10] (nm1) at (4.0, 6.30) {0};
  \node[nanCell, fill=red!25] (nm2) at (4.3, 6.30) {1};
  \node[nanCell, fill=red!10] (nm3) at (4.6, 6.30) {0};
  \node[anchor=south, font=\scriptsize, text=red!70!black]
    at (4.3, 6.55) {NaN/Inf mask};

  \draw[arrow] (mainSplit) -- (nm1.west);
  \draw[arrow] (nm2.south) |- (linear.east);

  \coordinate (addPos) at (0, 3.75);
  \node[plus, fill=orange!25, draw=orange!70!black] (taeplus) at (addPos) {$+$};

  \begin{scope}[shift={(\orthXshift, 3.75)}]
    \foreach \i/\col in {0/cA, 1/cB, 2/cX, 3/cD} {
      \filldraw[draw=black!40, line width=0.15pt, fill=\col]
      (-0.42, {0.18 - \i*0.09}) rectangle (0.42, {0.10 - \i*0.09});
    }
    \node[draw=blue!60!black, thick, rounded corners=2pt,
    minimum width=1.05cm, minimum height=0.50cm, inner sep=0pt]
    (taeTable) at (0, 0) {};
    \node[anchor=east, font=\scriptsize, text=black] at (1.8, 0) {$\leftarrow\!y_i \in \mathcal{Y}$};
    \node[anchor=south, font=\tiny, text=black, align=center,
    inner sep=0.5pt]
    at (0, 0.25) {Label Embedding (trainable orth.)};
  \end{scope}

  \draw[bluearrow] (taeTable.west) -- (taeplus.east);
  \draw[thick, black] (linear.south) -- (taeplus.north);

  \begin{scope}[yshift=0.6cm]
    \begin{scope}[yshift=1.1cm]
      \coordinate (colCenter) at (0, 0.45);

      \node[anchor=south, font=\footnotesize\bfseries, text=black,
        draw=groupCyanEdge, thick, rounded corners=2pt,
      fill=groupCyan, inner xsep=3pt, inner ysep=1pt] (cyanTitle)
      at ($(colCenter) + (0, 0.95)$)
      {Feature Embedding \tiny{(col-wise)}};
      \node[draw=groupCyanEdge, thick, fill=white, rounded corners=2pt,
        font=\scriptsize\bfseries, text=black,
        inner sep=2pt, anchor=north east]
      at (3.35, 1.49) {$\times\,3$};

      \foreach \g in {2, 1} {
        \pgfmathsetmacro{\dx}{\g * 0.16}
        \pgfmathsetmacro{\dy}{\g * 0.10}
        \pgfmathsetmacro{\sc}{0.90 - \g * 0.13}
        \pgfmathsetmacro{\tealpct}{int(40 * \sc)}
        \pgfmathsetmacro{\cyanpct}{int(40 * \sc)}
        \pgfmathsetmacro{\tealdraw}{int(70 * \sc)}
        \pgfmathsetmacro{\bluedraw}{int(60 * \sc)}
        \foreach \r in {0, 1, 2, 3, 4} {
          \pgfmathsetmacro{\yc}{0.95 - \r * 0.24 + \dy}
          \pgfmathsetmacro{\xc}{-1.50 + \dx}
          \filldraw[draw=teal!\tealdraw, line width=0.3pt, fill=teal!\tealpct,
          rounded corners=1pt]
          (\xc - 0.13, \yc - 0.10) rectangle (\xc + 0.13, \yc + 0.10);
        }
        \foreach \i in {0, 1, 2} {
          \pgfmathsetmacro{\yi}{0.71 - \i * 0.26 + \dy}
          \pgfmathsetmacro{\xi}{0.00 + \dx}
          \filldraw[draw=blue!\bluedraw, line width=0.3pt, fill=cyan!\cyanpct,
          rounded corners=1pt]
          (\xi - 0.13, \yi - 0.11) rectangle (\xi + 0.13, \yi + 0.11);
        }
        \foreach \r in {0, 1, 2, 3, 4} {
          \pgfmathsetmacro{\yc}{0.95 - \r * 0.24 + \dy}
          \pgfmathsetmacro{\xc}{1.45 + \dx}
          \filldraw[draw=teal!\tealdraw, line width=0.3pt, fill=teal!\tealpct,
          rounded corners=1pt]
          (\xc - 0.13, \yc - 0.10) rectangle (\xc + 0.13, \yc + 0.10);
        }
      }

      \foreach \r in {0, 1, 2, 3, 4} {
        \pgfmathsetmacro{\yc}{0.95 - \r * 0.24}
        \filldraw[draw=teal!80, line width=0.5pt, fill=teal!45, rounded corners=1pt]
        (-1.63, \yc - 0.10) rectangle (-1.37, \yc + 0.10);
      }
      \node[anchor=south, font=\tiny, text=black]
      at (-1.50, 1.10) {$N$ rows};

      \foreach \i in {0, 1, 2} {
        \pgfmathsetmacro{\yi}{0.71 - \i * 0.26}
        \filldraw[draw=blue!70, line width=0.5pt, fill=cyan!40, rounded corners=1pt]
        (-0.13, \yi - 0.11) rectangle (0.13, \yi + 0.11);
      }
      \coordinate(inducingSouth) at (0, 0.19);

      \node[anchor=south, font=\tiny, text=black, align=center]
      at (0.00, 0.94) {$K$ inducing};

      \foreach \r in {0, 1, 2, 3, 4} {
        \pgfmathsetmacro{\yc}{0.95 - \r * 0.24}
        \filldraw[draw=teal!80, line width=0.5pt, fill=teal!45, rounded corners=1pt]
        (1.32, \yc - 0.10) rectangle (1.58, \yc + 0.10);
      }
      \node[anchor=south, font=\tiny, text=black]
      at (1.45, 1.10) (colEmbOutRows) {$N$ rows};

      \coordinate (bcAnchor) at (1.60, -0.15);
      \begin{scope}[semithick, black!60]
        \draw ($(bcAnchor) + (-0.08, -0.05)$) -- ($(bcAnchor) + (-0.104, -0.012)$);
        \draw ($(bcAnchor) + (-0.08, -0.05)$) -- ($(bcAnchor) + (0.24, 0.15)$);
        \draw ($(bcAnchor) + (0.24, 0.15)$) -- ($(bcAnchor) + (0.216, 0.188)$);
        \draw ($(bcAnchor) + (0.24, 0.15)$) -- ($(bcAnchor) + (0.40, 0.25)$);
        \draw ($(bcAnchor) + (0.40, 0.25)$) -- ($(bcAnchor) + (0.376, 0.288)$);
        \node[anchor=south, font=\tiny, text=black, rotate=32] at ($(bcAnchor) + (0.3, -0.15)$)  {chunk};
      \end{scope}

      \coordinate (bcolAnchor) at (-1.30, -0.15);
      \begin{scope}[semithick, black!60]
        \draw ($(bcolAnchor) + (-0.08, -0.05)$) -- ($(bcolAnchor) + (-0.104, -0.012)$);
        \draw ($(bcolAnchor) + (-0.08, -0.05)$) -- ($(bcolAnchor) + (0.24, 0.15)$);
        \draw ($(bcolAnchor) + (0.24, 0.15)$) -- ($(bcolAnchor) + (0.40, 0.25)$);
        \draw ($(bcolAnchor) + (0.40, 0.25)$) -- ($(bcolAnchor) + (0.376, 0.288)$);
        \node[anchor=south, font=\tiny, text=black, rotate=32] at ($(bcolAnchor) + (0.3, -0.15)$)  {columns};
      \end{scope}

      \coordinate (funnelA) at (-0.75, 0.45);
      \begin{scope}[every path/.style={
        -, thin, draw=black, opacity=0.55, line cap=round}]
        \foreach \rowy in {0.95, 0.71, 0.47, 0.23, -0.01} {
          \draw (-1.35, \rowy) -- (funnelA);
        }
      \end{scope}
      \begin{scope}[every path/.style={
        -Stealth, thin, draw=black, opacity=0.75, line cap=round}]
        \foreach \indy in {0.71, 0.45, 0.19} {
          \draw (funnelA) -- (-0.15, \indy);
        }
      \end{scope}

      \coordinate (funnelB) at (0.75, 0.45);
      \begin{scope}[every path/.style={
        -, thin, draw=black, opacity=0.55, line cap=round}]
        \foreach \indy in {0.71, 0.45, 0.19} {
          \draw (0.15, \indy) -- (funnelB);
        }
      \end{scope}
      \begin{scope}[every path/.style={
        -Stealth, thin, draw=black, opacity=0.75, line cap=round}]
        \foreach \rowy in {0.95, 0.71, 0.47, 0.23, -0.01} {
          \draw (funnelB) -- (1.30, \rowy);
        }
      \end{scope}

      \node[anchor=west, font=\scriptsize, text=black, align=left,
      text width=2.9cm]
      at (3.70, 0.45)
      {applied per column\\in parallel
        \tikz[baseline=-0.5ex]{\draw[dash pattern={on 2pt off 1pt},black,thin](0,0)--(9pt,0);}
      \\ or chunked~\tikz[baseline=-0.5ex]{\draw[dotted,black,thick](0,0)--(8pt,0);}};

      \node[fit={(-2.05, -0.25) (2.05, 1.40)}, inner sep=0pt] (tfcol) {};
    \end{scope}

    \begin{pgfonlayer}{midground}
      \draw[arrow] (taeplus.south) -- (taeplus.south |- cyanTitle.north);
    \end{pgfonlayer}


    \begin{scope}[yshift=-0.4cm]
      \coordinate (featRow) at (0, -1.60);

      \foreach \g in {2, 1} {
        \pgfmathsetmacro{\dx}{\g * 0.16}
        \pgfmathsetmacro{\dy}{\g * 0.10}
        \pgfmathsetmacro{\sc}{0.45 - \g * 0.13}
        \pgfmathsetmacro{\bluepct}{int(100 * \sc)}
        \pgfmathsetmacro{\tealpct}{int(65 * \sc)}
        \pgfmathsetmacro{\bluedraw}{int(120 * \sc)}
        \pgfmathsetmacro{\tealdraw}{int(120 * \sc)}
        \foreach \k in {0, 1, 2, 3} {
          \pgfmathsetmacro{\xc}{-2.30 + \k * 0.43 + \dx}
          \filldraw[draw=blue!\bluedraw, line width=0.3pt, fill=blue!\bluepct,
          rounded corners=0.5pt]
          (\xc, -1.85 + \dy) rectangle (\xc + 0.36, -1.35 + \dy);
        }
        \foreach \k in {0, 1, 2, 3, 4, 5} {
          \pgfmathsetmacro{\xf}{-0.40 + \k * 0.43 + \dx}
          \filldraw[draw=teal!\tealdraw, line width=0.3pt, fill=teal!\tealpct,
          rounded corners=0.5pt]
          (\xf, -1.85 + \dy) rectangle (\xf + 0.36, -1.35 + \dy);
        }
      }

      \foreach \k in {0, 1, 2, 3} {
        \pgfmathsetmacro{\xc}{-2.30 + \k * 0.43}
        \filldraw[draw=blue!70!black, line width=0.4pt, fill=blue!40,
        rounded corners=0.5pt]
        (\xc, -1.85) rectangle (\xc + 0.36, -1.35);
        \node[font=\tiny\bfseries, text=white] at (\xc + 0.18, -1.60) {C};
      }
      \foreach \k in {0, 1, 2, 3, 4, 5} {
        \pgfmathsetmacro{\xf}{-0.40 + \k * 0.43}
        \filldraw[draw=teal!80, line width=0.3pt, fill=teal!25,
        rounded corners=0.5pt]
        (\xf, -1.85) rectangle (\xf + 0.36, -1.35);
      }

      \coordinate (brAnchor) at (2.15, -1.88);
      \begin{scope}[semithick, black!60]
        \draw ($(brAnchor) + (-0.08, -0.05)$) -- ($(brAnchor) + (-0.104, -0.012)$);
        \draw ($(brAnchor) + (-0.08, -0.05)$) -- ($(brAnchor) + (0.24, 0.15)$);
        \draw ($(brAnchor) + (0.24, 0.15)$) -- ($(brAnchor) + (0.216, 0.188)$);
        \draw ($(brAnchor) + (0.24, 0.15)$) -- ($(brAnchor) + (0.40, 0.25)$);
        \draw ($(brAnchor) + (0.40, 0.25)$) -- ($(brAnchor) + (0.376, 0.288)$);
        \node[anchor=south, font=\tiny, text=black, rotate=32] at ($(brAnchor) + (0.3, -0.15)$)  {chunk};
      \end{scope}

      \draw[decorate, decoration={brace, mirror, amplitude=3pt}, black]
      (-0.42, -1.87) -- (2.13, -1.87);
      \node[anchor=north, font=\scriptsize, text=black]
      at (0.85, -1.82) {\tiny{columns}};
    \end{scope}

    \begin{scope}[yshift=0.15cm]
      \node[anchor=south, font=\footnotesize\bfseries, text=black,
        draw=groupGreenEdge, thick, rounded corners=2pt,
      fill=groupGreen, inner xsep=3pt, inner ysep=1pt] (greenTitle)
      at (0, -1.05) {Feature Aggregation \tiny{(row-wise)}};
      \node[draw=groupGreenEdge, thick, fill=white, rounded corners=2pt,
        font=\scriptsize\bfseries, text=black,
        inner sep=2pt, anchor=north east]
      at (3.35, -0.96) {$\times\,3$};

      \coordinate (bowAnchor) at (-0.30, -1.15);
      \begin{scope}[every path/.style={
            -Stealth, thick, draw=black, opacity=0.55,
        line cap=round}]
        \foreach \k in {0, 1, 2, 3} {
          \pgfmathsetmacro{\xc}{-2.12 + \k * 0.43}
          \draw (bowAnchor) to [out=180,in=70 - 4 * \k] (\xc, -1.75);
        }
        \foreach \k in {0, 1, 2, 3, 4, 5} {
          \pgfmathsetmacro{\xf}{-0.22 + \k * 0.43}
          \draw (bowAnchor)to [out=0,in=100 + 4 * \k] (\xf, -1.75);
        }

      \end{scope}

      \node[anchor=west, font=\scriptsize, text=black, align=left,
      text width=2.7cm]
      at (3.70, -1.80)
      {applied per row\\
      in parallel \\ or chunked\\ (recomputing column embeddings)};

      \node[fit={(-2.55, -1.05) (2.55, -2.50)}, inner sep=0pt] (tfrow) {};


      \pgfmathsetmacro{\yc}{-3.05}
      \filldraw[draw=teal!85!black, line width=0.4pt, fill=teal!30, rounded corners=0.5pt] (-0.88, {\yc - 0.21}) rectangle (-0.88 + 0.43 * 3 + 0.36 + 0.1, {\yc + 0.21});
      \foreach \k in {0,1,2,3} {
        \pgfmathsetmacro{\xc}{-0.83 + \k * 0.43}
        \filldraw[draw=blue!70!black, line width=0.4pt, fill=blue!40,
        rounded corners=0.5pt]
        (\xc, {\yc - 0.14}) rectangle (\xc + 0.36, {\yc + 0.14});
        \node[font=\tiny\bfseries, text=white]
        at (\xc + 0.18, \yc) {C};
      }
      \node[anchor=west, font=\scriptsize, text=black, align=left]
      at (0.95, \yc)
      {4 CLS per row\\flatten $\to 4d$};

      \draw[arrow] (-1.475, -1.85) |- (-0.88, \yc);

      \coordinate (addPos) at (0, -3.60);
      \node[plus, fill=orange!25, draw=orange!70!black] (iclplus) at (addPos) {$+$};

      \draw[thick, black] (0, \yc -0.2) -- (iclplus.north);

      \begin{scope}[shift={(\orthXshift, -3.60)}]
        \foreach \i/\col in {0/cA2, 1/cB2, 2/cX2, 3/cD2} {
          \filldraw[draw=black!40, line width=0.15pt, fill=\col]
          (-0.42, {0.18 - \i*0.09}) rectangle (0.42, {0.10 - \i*0.09});
        }
        \node[draw=blue!70!black, thick, rounded corners=2pt,
        minimum width=1.05cm, minimum height=0.50cm, inner sep=0pt]
        (iclTable) at (0, 0) {};
        \node[anchor=east, font=\scriptsize, text=black] at (1.8, 0) {$\leftarrow\!y_i \in \mathcal{Y}$};
        \node[anchor=south, font=\tiny, text=black, align=center]
        at (0, 0.28) {Label Embedding (trainable orth.)};
      \end{scope}

      \draw[bluearrow] (iclTable.west) -- (iclplus.east);
      \begin{scope}[yshift=0.4cm]
        \def\iclTitleY{-4.65}      %
        \def\iclTrainYC{-5.35}     %
        \def\iclTestYC{-6.05}      %
        \def\iclBarHH{0.20}        %
        \def\iclBarHW{0.40}        %
        \def\iclBotY{-6.35}        %

        \foreach \i/\xc in {1/-1.50, 2/-0.50, 3/0.50, 4/1.50} {
          \filldraw[draw=groupOrangeEdge!75, line width=0.4pt,
            fill=groupOrangeEdge!22, rounded corners=0.5pt]
            (\xc - \iclBarHW, \iclTrainYC - \iclBarHH)
            rectangle
            (\xc + \iclBarHW, \iclTrainYC + \iclBarHH);
          \node[font=\scriptsize, text=black, inner sep=0pt]
            at (\xc, \iclTrainYC)
            {$h^{\mathrm{train}}_{\i}$};
          \coordinate (trBar\i N) at (\xc, \iclTrainYC + \iclBarHH);
          \coordinate (trBar\i S) at (\xc, \iclTrainYC - \iclBarHH);
        }
        \foreach \i/\xc in {1/-0.50, 2/0.50} {
          \filldraw[draw=groupOrangeEdge!90, line width=0.4pt,
            fill=groupOrangeEdge!42, rounded corners=0.5pt]
            (\xc - \iclBarHW, \iclTestYC - \iclBarHH)
            rectangle
            (\xc + \iclBarHW, \iclTestYC + \iclBarHH);
          \node[font=\scriptsize, text=black, inner sep=0pt]
            at (\xc, \iclTestYC)
            {$h^{\mathrm{test}}_{\i}$};
          \coordinate (teBar\i N) at (\xc, \iclTestYC + \iclBarHH);
          \coordinate (teBar\i S) at (\xc, \iclTestYC - \iclBarHH);
        }

        \coordinate (iclBowAnchor) at (0, -4.85);
        \begin{scope}[every path/.style={
          -Stealth, thick, draw=black, opacity=0.55, line cap=round}]
          \foreach \k/\xc in {0/-1.50, 1/-0.50} {
            \pgfmathsetmacro{\inAng}{70 - 12 * \k}
            \draw (iclBowAnchor) to[out=180, in=\inAng]
              (\xc, {\iclTrainYC + \iclBarHH});
          }
          \foreach \k/\xc in {0/0.50, 1/1.50} {
            \pgfmathsetmacro{\inAng}{110 + 12 * \k}
            \draw (iclBowAnchor) to[out=0, in=\inAng]
              (\xc, {\iclTrainYC + \iclBarHH});
          }
        \end{scope}
        \node[anchor=west, font=\scriptsize, text=black, align=left]
          at (3.70, -5.10)
          {train$\,\leftrightarrow\,$train\\\textit{multi-head self-attn}};

        \begin{scope}[every path/.style={
          -Stealth, thin, black, line cap=round}]
          \foreach \teX in {-0.50, 0.50} {
            \foreach \trX in {-1.50, -0.50, 0.50, 1.50} {
              \draw[opacity=0.65]
                (\teX, \iclTestYC + \iclBarHH)
                -- (\trX, \iclTrainYC - \iclBarHH);
            }
          }
        \end{scope}
        \node[anchor=west, font=\scriptsize, text=black, align=left]
          at (3.70, -6.00)
          {test$\,\rightarrow\,$train\\\textit{multi-query cross-attn}};

        \node[anchor=south, font=\footnotesize\bfseries, text=black,
          draw=groupOrangeEdge, thick, rounded corners=2pt,
          fill=groupOrange, inner xsep=3pt, inner ysep=1pt] (orangeTitle)
          at (0, \iclTitleY)
          {In-Context Learning};
        \node[draw=groupOrangeEdge, thick, fill=white, rounded corners=2pt,
          font=\scriptsize\bfseries, text=black, inner sep=2pt,
          anchor=north east]
          at (3.35, -4.56) {$\times\,24$};

        \node[fit={(-2.00, \iclBotY) (2.00, \iclTitleY)},
          inner sep=0pt] (tficl) {};

        \begin{pgfonlayer}{midground}
          \draw[arrow] (iclplus.south) -- (orangeTitle.north);
        \end{pgfonlayer}


        \coordinate (decTop) at (0, -7.40);

        \def\trainx{-1.55}
        \def\testx{1.20}
        \def\rowdy{0.44}
        \def\rowytop{-7.60}

        \node[anchor=south, font=\footnotesize\bfseries, text=black,
          draw=groupPurpleEdge, thick, rounded corners=2pt,
        fill=groupPurple, inner xsep=3pt, inner ysep=1pt] (purpleTitle)
        at (0, -7.40) {Many-Class Decoder};

        \begin{pgfonlayer}{midground}
          \draw[arrow] (0, {\iclBotY - 0.14}) -- (purpleTitle.north);
        \end{pgfonlayer}

        \foreach \i/\cls/\letter/\angA/\angB in {
          1/cA/0/30/60,
          2/cB/1/120/150,
          3/cX/2/200/230,
          4/cA/0/10/40
        } {
          \pgfmathsetmacro{\rowy}{\rowytop - (\i - 1) * \rowdy}
          \node[anchor=east, font=\scriptsize] at ($(\trainx - 0.28, \rowy)$)
          {$h^{\mathrm{train}}_{\i}$};
          \filldraw[draw=\cls!60!black, line width=0.4pt, fill=\cls, rounded corners=1.5pt]
          ($(\trainx - 0.18, \rowy - 0.15)$) rectangle
          ($(\trainx + 0.48, \rowy + 0.15)$);
          \node[font=\scriptsize\bfseries, text=white, anchor=center]
          at ($(\trainx + 0.15, \rowy)$) {\letter};
          \foreach \k/\ang in {0/\angA, 1/\angB} {
            \pgfmathsetmacro{\cx}{\trainx + 0.68 + \k*0.32}
            \filldraw[draw=violet!50!black, line width=0.3pt, fill=violet!15]
            ($(\cx - 0.14, \rowy - 0.17)$) rectangle
            ($(\cx + 0.14, \rowy + 0.17)$);
            \draw[-{Stealth[length=3pt, width=2.4pt]},
              line width=0.4pt, violet!55!black]
              ($(\cx, \rowy) + ({0.09*cos(\ang+180)}, {0.09*sin(\ang+180)})$) --
              ($(\cx, \rowy) + ({0.09*cos(\ang)},     {0.09*sin(\ang)})$);
          }
          \coordinate (tr\i) at ($(\trainx + 1.14, \rowy)$);
        }

        \def\testytop{-7.90}
        \def\testdy{0.88}
        \foreach \i/\angA/\angB in {1/25/55, 2/140/170} {
          \pgfmathsetmacro{\rowy}{\testytop - (\i - 1) * \testdy}
          \foreach \k/\ang in {0/\angA, 1/\angB} {
            \pgfmathsetmacro{\cx}{\testx + \k*0.32 - 0.16}
            \filldraw[draw=violet!60!black, line width=0.3pt, fill=violet!22]
            ($(\cx - 0.14, \rowy - 0.17)$) rectangle
            ($(\cx + 0.14, \rowy + 0.17)$);
            \draw[-{Stealth[length=3pt, width=2.4pt]},
              line width=0.4pt, violet!65!black]
              ($(\cx, \rowy) + ({0.09*cos(\ang+180)}, {0.09*sin(\ang+180)})$) --
              ($(\cx, \rowy) + ({0.09*cos(\ang)},     {0.09*sin(\ang)})$);
          }
          \node[anchor=west, font=\scriptsize] at ($(\testx + 0.32, \rowy)$)
          {$h^{\mathrm{test}}_{\i}$};
          \coordinate (te\i) at ($(\testx - 0.30, \rowy)$);
          \coordinate (teR\i) at ($(\testx + 0.85, \rowy)$);
        }

        \draw[cA, line width=1.4pt, opacity=0.85] (te1) -- (tr1);
        \draw[cA, line width=1.0pt, opacity=0.65] (te1) -- (tr4);
        \draw[cB, line width=1.3pt, opacity=0.80] (te2) -- (tr2);
        \draw[cX, line width=1.0pt, opacity=0.70] (te2) -- (tr3);

        \node[anchor=west, font=\scriptsize, text=black, align=left]
          at (3.70, -7.95)
          {Attention-weighted\\ average of one-hot \\ encoded labels $(y_i)$};

        \foreach \i/\wA/\wB/\wC/\wD in {
          1/1.00/0.00/0.00/0.00,
          2/0.00/0.60/0.40/0.00
        } {
          \pgfmathsetmacro{\rowy}{\testytop - (\i - 1) * \testdy}
          \pgfmathsetmacro{\xstart}{\testx + 1.00}
          \pgfmathsetmacro{\barH}{0.12}
          \pgfmathsetmacro{\barW}{0.75}
          \pgfmathsetmacro{\sA}{\wA * \barW}
          \pgfmathsetmacro{\sB}{\wB * \barW}
          \pgfmathsetmacro{\sC}{\wC * \barW}
          \pgfmathsetmacro{\sD}{\wD * \barW}
          \filldraw[fill=cA, draw=black!40, line width=0.2pt]
          (\xstart, \rowy - \barH) rectangle (\xstart + \sA, \rowy + \barH);
          \pgfmathsetmacro{\xa}{\xstart + \sA}
          \filldraw[fill=cB, draw=black!40, line width=0.2pt]
          (\xa, \rowy - \barH) rectangle (\xa + \sB, \rowy + \barH);
          \pgfmathsetmacro{\xb}{\xa + \sB}
          \filldraw[fill=cX, draw=black!40, line width=0.2pt]
          (\xb, \rowy - \barH) rectangle (\xb + \sC, \rowy + \barH);
          \pgfmathsetmacro{\xc}{\xb + \sC}
          \filldraw[fill=cD, draw=black!40, line width=0.2pt]
          (\xc, \rowy - \barH) rectangle (\xc + \sD, \rowy + \barH);
        }
        \node[anchor=south, font=\scriptsize\bfseries, text=black, align=center]
        at ($(\testx + 1.40, \testytop + 0.2)$) {$p(y\!\mid\!h^{\mathrm{test}}_i)$};

        \node[anchor=north, font=\scriptsize, align=center] at (0, -9.55)
        {$\hat{y} \in \mathbb{R}^{N_\text{test} \times |\mathcal{Y}|}$ \,(class logits, many-class classification)};
        \draw[arrow]
        (0, -9.20) -- (0, -9.55);

        \node[fit={(\trainx - 1.50, -7.40) (\trainx - 0.50, -9.05)},
        inner sep=0pt] (purpleAnchorL) {};
        \node[fit={(\testx + 0.85, -7.40) (\testx + 1.85, -9.05)},
        inner sep=0pt] (purpleAnchorR) {};

      \end{scope}

      \def\stageW{6.8cm}
      \begin{scope}[on background layer]
        \node[fit=(tfcol), minimum width=\stageW,
          fill=groupCyan, draw=groupCyanEdge, thick,
        rounded corners=4pt, inner xsep=10pt, inner ysep=4pt] (cyanGroup) {};

        \node[fit=(tfrow), minimum width=\stageW,
          fill=groupGreen, draw=groupGreenEdge, thick,
        rounded corners=4pt, inner xsep=10pt, inner ysep=4pt] (greenGroup) {};

        \node[fit=(tficl), minimum width=\stageW,
          fill=groupOrange, draw=groupOrangeEdge, thick,
        rounded corners=4pt, inner xsep=10pt, inner ysep=4pt] (orangeGroup) {};

        \node[fit=(purpleAnchorL)(purpleAnchorR), minimum width=\stageW,
          fill=groupPurple, draw=groupPurpleEdge, thick,
        rounded corners=4pt, inner xsep=10pt, inner ysep=6pt] (purpleGroup) {};
      \end{scope}
    \end{scope}

    \begin{pgfonlayer}{midground}
      \node[plus, fill=orange!25, draw=orange!70!black] (cyansw) at (0, -0.05) {};
      \draw (0, -0.05) node[spdt, scale=0.3] {};
      \draw[arrow, dotted] (inducingSouth.south -| cyansw) -- node[right, pos=0.65, font=\tiny, text=black] {$3\mkern-6mu\times\mkern-6muC\mkern-6mu\times\mkern-6muK$} (cyansw.north);
      \draw[arrow, dash pattern={on 2pt off 1pt}] let
      \p1 = (colEmbOutRows |- cyanGroup.south),
      \p2 = (cyansw.east)
      in
      (\p1) -- node[right, pos=0.5, font=\tiny, text=black] {$C\mkern-6mu\times\mkern-6muN$} (\x1, \y2) -- (\p2);
      \draw[arrow] (cyansw.south) -- (greenTitle.north);
      \draw[dotted, thick] (linear.west) -- ++(-1.2, 0) coordinate (lhsCorner) -- (lhsCorner |- {$(cyansw.west) + (0.2, 0.4)$}) -- node[above=4pt, left, font=\tiny, text=black] {$(3\!+\!3)\mkern-6mu\times\mkern-6muC$} ($(cyansw.west) + (0.2, 0.4)$);

    \end{pgfonlayer}
  \end{scope}
\end{tikzpicture}
}
```
TabPFN-3 introduces several architectural innovations on top of the three-stage architecture:

-   **Attention-based many-class decoder.** For classification, the fixed-width MLP output head of previous TabPFN versions is replaced with an attention-based retrieval decoder that treats class prediction as soft nearest-neighbor retrieval over the in-context training set. The decoder is non-parametric in the class count, enabling native support for an arbitrary number of classes. A detailed description is given in Section `\ref{sec:many-class-decoder}`{=latex}.

-   **Row-chunking.** A two-phase inference scheme that decouples peak GPU activation memory from dataset size (rows $\times$ columns), while producing outputs equivalent to the unchunked computation: we precompute the distribution embedder's inducing-vector summary once over the full training set, then stream rows through feature embedding and column aggregation in fixed-size chunks that reuse this cached summary as their attention key/value set. See Section `\ref{sec:row_chunking}`{=latex} for more details.

-   **Reduced KV cache via multi-query attention.** In the ICL transformer, test-row queries attend to train-row keys and values using a single KV head (multi-query attention), while train rows retain full multi-head attention. This allows reducing the per-estimator KV cache to approximately 7 GB for datasets of one million rows, enabling ultra-fast inference on common GPUs. This is described in detail in Section `\ref{sec:kv_cache}`{=latex}.

-   **Orthogonal target embeddings.** Training labels are encoded with learned embeddings initialized via orthogonal decomposition, providing near-maximally separated class representations at the start of training and improving gradient flow in the many-class regime.

-   **RMSNorm.** All normalization layers use RMSNorm in place of the layer normalization used in `\ourmodeltwofive`{=latex}. RMSNorm omits the mean-centering term, reducing compute while preserving training stability.

-   **Native missing-value handling.** For each cell that is `NaN`, TabPFN-3 computes a binary indicator and concatenates it with the cell value before embedding. The model therefore receives an explicit signal about missing data and can condition its predictions accordingly, rather than relying on upstream imputation.

Many-class Decoder {#sec:many-class-decoder}
------------------

For multiclass classification, `\ourmodel `{=latex}replaces the fixed-width MLP classification head used in TabPFN-2.6 (and earlier versions) with an *attention-based retrieval decoder* over the in-context training set, which treats class prediction as a soft nearest-neighbor retrieval: the final-layer train embeddings $\{h^\mathrm{train}_n\}_{n=1}^{N_\mathrm{train}}$ act as keys, the corresponding one-hot label vectors $\mathbf{y}_n \in \{0,1\}^{C}$ as values, and test embeddings $h^\mathrm{test}_m$ as queries. After the usual learned linear projections $W_Q, W_K$ and a multi-head split, the decoder computes $$p_m \;=\; \frac{1}{H}\sum_{h=1}^{H}\sum_{n=1}^{N}
            \alpha^{(h)}_{m,n}\,\mathbf{y}_n,
\qquad
\alpha^{(h)}_{m,n}=\mathrm{softmax}_n\!\left(
            \tfrac{q^{(h)}_m \cdot k^{(h)}_n}{\sqrt{D_h}}\right),$$ that is: a (head-averaged) attention-weighted average of the in-context one-hot labels, which is then converted to logits via $\log\!\big(\mathrm{clip}(p_m)\big)$. This formulation has two consequences. First, classes are no longer tied to fixed output positions of a parametric head so the decoder is naturally permutation-equivariant in the class indices. Second, decoding is non-parametric in $C$: the decoder's parameters depend only on the embedding dimension and the number of attention heads, not on some $C_{\max}$, decoupling the head's capacity from the supported label cardinality.

**Class-count limit from pre-training.** Although the decoder is non-parametric in $C$, the trained `\ourmodel `{=latex}still fixes a hard ceiling $C_{\max} = 160$ at pre-training time via three checkpoint-bound tensors: the trainable orthogonal label embeddings $E_{\mathrm{col}}, E_{\mathrm{icl}} \in \mathbb{R}^{C_{\max} \times D}$ used by the column encoder and the ICL transformer, and the one-hot value tensor consumed by the decoder. Enlarging $C_{\max}$ at pre-training therefore costs only $\mathcal{O}(C_{\max}\,D)$ extra parameters and no extra decode-time memory.

Preprocessing {#sec:preprocessing}
-------------

As in previous versions, TabPFN-3 aggregates predictions across multiple estimators, each operating on a distinct combination of dataset permutations and feature transformations, forming an effective ensemble that enhances robustness and generalization. Individual estimators apply complementary feature transformations---combining robust scaling and soft clipping (following [@holzmuller2024realmlp]) with quantile transformations and standard scaling---to balance stability and sensitivity across varying feature distributions. As in TabPFN-2.5, a subset of estimators augments the feature matrix with singular value decomposition (SVD) components, capturing high-energy directions of global variance.

TabPFN-3 introduces two further improvements to this pipeline. First, features are subsampled in a round-robin fashion, ensuring that each feature appears in at least one estimator and is never systematically excluded from the ensemble. For datasets exceeding 100,000 rows, random feature subsampling is replaced by an informed selection based on Gini importance derived from a lightweight tree model fitted on a subsample, focusing each estimator on the most discriminative features rather than an arbitrary subset. Second, feature transformations such as quantile normalization are now executed on GPU, substantially reducing preprocessing latency and making the pipeline practical at the larger dataset scales supported by TabPFN-3. As in TabPFN-2.5 [@TabPFN-2.5], post-processing capabilities are available, including decision threshold tuning for metric-specific optimization (e.g., F1-score) and temperature scaling for probability calibration.

Inference Optimization {#sec:inference-optimization}
----------------------

`\ourmodel `{=latex}introduces several inference-time optimizations that together reduce its compute and memory footprint enough to scale to one-million-rows on a single GPU with sub-second inference latency.

### Row-Chunking {#sec:row_chunking}

`\ourmodel`{=latex}'s pre-ICL stages---cell embedding, feature distribution embedding, and feature aggregation---materialize an $(n_\mathrm{train}+n_\mathrm{test}){\times}n_\mathrm{features}{\times}d$ activation, so peak memory can saturate the GPU well before any operation becomes compute-bound. One solution is to offload activations to CPU memory or disk, as in TabICLv2 [@qu2026tabiclv2]. This however requires a large amount of CPU memory (250GB for a $1\text{M} \times 500$ table in @qu2026tabiclv2), or otherwise incurs substantial I/O overhead (@qu2026tabiclv2 report a 4x slowdown). We instead stream the row dimension in fixed-size slices and keep all activations on the GPU.

```{=latex}
\centering
```
![**Chunking flattens the peak-memory without impacting the time-per-call.** Model forward pass without preprocessing, measured on a H100, for $n_\mathrm{features} \in \{10, 100, 500\}$. Top row: peak GPU memory (GiB) versus number of training rows; bottom row: time per call (ms). Three series per panel: `\ourmodel `{=latex}without chunking (blue), `\ourmodel `{=latex}with chunking (pink), and the `\ourmodeltwofive `{=latex}baseline (black). Both axes are log-scaled. Note that *TabPFN-3 is much faster than TabPFN-2.5, especially at large feature counts.*](figures/inference/chunked_vs_nonchunked_h100.png){#fig:chunked_vs_nonchunked width="\\linewidth"}

A naive row-wise stream is not directly applicable: the distribution embedder summarizes the training set into a fixed-size[^1] set of inducing points via cross-attention over all training rows, and splitting that call across chunks would change its semantics. `\ourmodel `{=latex}resolves this with a two-phase scheme exactly equivalent to the unchunked computation: (i) the inducing states are computed once over the full training set, chunked along the (independent) column dimension to bound its own memory cost; (ii) rows are then streamed through feature distribution embedding and the feature aggregator in fixed-size chunks, each reusing the precomputed inducing states as its attention key/value set, and the per-chunk row embeddings are concatenated along the row axis. The scheme adds a small overhead from recomputing cell embeddings in phase (ii) but avoids the disk-bandwidth bottleneck. We enable chunking when $n_\mathrm{train}+n_\mathrm{test} > 2048$.

Figure `\ref{fig:chunked_vs_nonchunked}`{=latex} highlights the different memory--compute trade-offs of `\ourmodel `{=latex}and `\ourmodeltwofive`{=latex}. Without chunking, the peak memory of `\ourmodel `{=latex}grows steeply with $n_\mathrm{train}$ and $n_\mathrm{features}$. This is because the model carries a pre-ICL activation $n_\mathrm{features}$-wide through cell embedding, feature distribution embedding, and feature aggregation before collapsing the feature axis into a single row representation for the ICL transformer. By contrast, `\ourmodeltwofive `{=latex}alternates row- and column-attention layers over a representation grouped into $n_\mathrm{features}/3$ tokens, and therefore never materialises a tensor wider than this. This explains why `\ourmodel`{=latex}'s unchunked peak memory exceeds `\ourmodeltwofive`{=latex}'s. Applying row-chunking to `\ourmodel `{=latex}flattens peak memory with respect to $n_\mathrm{features}$ and yields an approximately ${\sim}5{\times}$ reduction at the largest shapes, enabling 1M-row inference, while incurring only a small wall-clock overhead of a few percent near $n_\mathrm{train}\approx 10^4$ that becomes amortised at larger scales once the $n_\mathrm{train}^2$ ICL row-attention dominates. At the same time, the feature-collapsed row representation gives `\ourmodel `{=latex}a substantial runtime advantage at large $n_\mathrm{train}$ or $n_\mathrm{features}$ since its ICL row-attention scales as $n_\mathrm{train}^2$ independently of $n_\mathrm{features}$, whereas `\ourmodeltwofive`{=latex}'s row attention retains linear dependence on $n_\mathrm{features}$ and scales with $n_\mathrm{features} \cdot n_\mathrm{train}^2$.

```{=latex}
\centering
```
```{=latex}
\centering
```
![**Chunking eliminates the OOM frontier; the KV-cache adds memory essentially constant in $n_\mathrm{feature}$.** Maximum $n_\mathrm{train}$ that fits on one 80 GiB H100 for $n_\mathrm{features}\in\{10,50,200\}$. Bars: `\ourmodeltwofive`{=latex}, `\ourmodel`{=latex}, `\ourmodel`{=latex} + chunking, `\ourmodel`{=latex} + chunking + KV-cache. White labels: peak memory; \`\`$\geq 1.0$M" marks bars that hit the search cap.](figures/inference/max_n_train_probe.png){#fig:max_n_train_probe width="\\linewidth"}

```{=latex}
\centering
```
x ![**Cached predict is 1--2 orders of magnitude faster than the uncached `\ourmodel`{=latex}.** Time per model forward pass without preprocessing on H100 at $n_\mathrm{train}=50{,}000$, $n_\mathrm{test}=100$, $n_\mathrm{features}\in\{10,100\}$. Bars: `\ourmodeltwofive `{=latex}cold fit+predict, `\ourmodel `{=latex}cold fit+predict, `\ourmodel `{=latex}fit-with-cache, `\ourmodel `{=latex}cached predict.](figures/inference/H100_kv_cache.png "fig:"){#fig:kv_cache_h100 width="\\linewidth"}

\

### Fast Inference with a Small KV-cache {#sec:kv_cache}

Being an in-context-learning model, `\ourmodel `{=latex}combines training (fit) and inference (predict) in one forward pass. While this allows for very fast training, it can make online or batched predictions too slow for production usecases. Caching the keys and values (KVs) from the train set removes this issue. While KV-caching has been available in our previous models, the memory cost of the cache was prohibitive for larger datasets. `\ourmodel `{=latex}solves this in two ways:

-   Compared to `\ourmodeltwofive`{=latex}, which needs to store an embedding for each cell of the table, `\ourmodel `{=latex}only needs to store three components: the per-block inducing states produced by the feature distribution embedder, the train-side keys and values of the ICL self-attention at every transformer block in the ICL stage; as well as the train embeddings of the final ICL layer, which are consumed by the many-class decoder. The inducing states are small and the two other components only scale with the number of rows rather than rows × features.

-   We use multi-query with only a single head for cross attention between test and train samples, reducing KV-cache size by a factor of eight.

This achieves a KV-cache size of 7GiB per estimator for 1M rows datasets, making `\ourmodel`{=latex}'s default 8 estimators usable on common GPUs even for the largest datasets we support. As can be seen in Figure `\ref{fig:max_n_train_probe}`{=latex}, peak memory of (chunked) cache-predict is basically flat across feature sizes. On an H100, cached-predict is one to three orders of magnitude faster than either the `\ourmodeltwofive `{=latex}baseline or `\ourmodel`{=latex}'s own cold \`\`fit+predict" path (Figure `\ref{fig:kv_cache_h100}`{=latex}), achieving between 0.1 and 3 ms/test point for batches of 100 test points. The fit-with-cache call costs essentially the same as the cold fit+predict at every measured shape, including $n_\mathrm{train}=10^6$ where both complete in ${\sim}107$ s (Figure `\ref{fig:kv_cache_scaling_h100}`{=latex}).

```{=latex}
\centering
```
```{=latex}
\hspace{-3em}
```
![**`\ourmodel`{=latex}'s KV-cached predict allows for one to three orders of magnitude speedup**. We report results for a single estimator without preprocessing on an H100, for $n_\mathrm{features} \in \{10, 100\}$ and $n_\mathrm{test} = 100$. Four series per panel: `\ourmodeltwofive `{=latex}`fit+predict` (black, baseline), `\ourmodel `{=latex}cold `fit+predict` (blue, no cache reuse), `\ourmodel `{=latex}`fit (build cache)` that builds the cache (magenta -- overlaps the cold curve since the train-side work is identical, the cache is just retained), and `\ourmodel `{=latex}cached `predict` (yellow). The KV-Cache is built under the deployed multi-query test-side configuration ($n_\mathrm{kv,test}=1$).](figures/inference/H100_kv_cache_scaling.png){#fig:kv_cache_scaling_h100 width="0.85\\linewidth"}

### Model Distillation {#sec:distillation}

In production environments constrained by latency or memory budgets, hardware availability, or regulatory requirements that mandate familiar model classes, `\ourmodel `{=latex}also supports distillation into dataset-specific MLPs or tree ensembles via the engine introduced with `\ourmodeltwofive `{=latex}[@TabPFN-2.5]. The distilled artifact runs on CPU at the sub-millisecond latency of a standard MLP or tree ensemble while retaining most of `\ourmodel`{=latex}'s predictive performance on the dataset it was distilled for.

### Compilation and FlashAttention-3 {#sec:compile_and_fa3}

`\ourmodel `{=latex}ships with two opt-in performance features that target different bottlenecks: `torch.compile`, which fuses dispatch on the non-attention hot paths, and FlashAttention-3 (FA3) [@flash_attention_3], a Hopper-specific kernel for the in-context-learning attention. On MI-250x, `torch.compile` reaches up to $1.58{\times}$ speedup on the non-chunked forward pass; on H100, FA3 reaches $1.5\text{--}1.7{\times}$ at $n_\mathrm{train}=10^6$ over the SDPA fallback. Both compose cleanly with row chunking and are auto-detected at runtime; see Appendix `\ref{app:compile-fa3}`{=latex} for the full measurements and per-shape breakdowns.

### Improved interpretability for TabPFN

TabPFN-3's reduced KV-cache (Section `\ref{sec:kv_cache}`{=latex}) and fast inference make interpretability extensions significantly more practical.

Through the `tabpfn-extensions` package, TabPFN is directly integrated with the popular `shapiq` library [@shapiq], enabling efficient approximation of any-order Shapley interactions. `\autoref{fig:shap-kv-speedup}`{=latex} in the Appendix shows both the absolute runtime and the relative speed-ups achieved by KV caching. For large datasets, KV caching provides more than $120\times$ efficiency gains, reducing the runtime per test row to 1.08 seconds even for a training table with 200k rows and 500 features.

Synthetic Prior {#sec:synthetic-prior}
---------------

Following previous TabPFN model variants [@TabPFN-2.5; @hollmann2022tabpfnv1; @Hollmann2025tabpfnv2], TabPFN-3 is trained on synthetically generated data based on our Structural Causal Model (SCM) prior. A schematic flow chart demonstrating how our SCM prior works is shown in Figure `\ref{fig:prior}`{=latex}.

Our philosophy in designing our prior is to maximize breadth of possible datasets while capturing the structure models will encounter in real-world data. The result is an updated, more sophisticated prior that allows us to scale up training and continue extracting signal from the wide range of synthetic datasets it generates: our final TabPFN-3 model was trained on more than 8 trillion tokens.

```{=latex}
\definecolor{PriorInk}{HTML}{0C0C0C}
```
```{=latex}
\definecolor{PriorOff}{HTML}{F8F8F9}
```
```{=latex}
\definecolor{PriorPlum}{HTML}{101075}
```
```{=latex}
\definecolor{PriorPurple}{HTML}{3F2670}
```
```{=latex}
\definecolor{PriorMauve}{HTML}{B6698D}
```
```{=latex}
\definecolor{PriorPlumLight}{HTML}{8585BB}
```
```{=latex}
\definecolor{PriorPurpleSoft}{HTML}{C5BAD2}
```
```{=latex}
\definecolor{PriorMauvePale}{HTML}{E5CFD9}
```
```{=latex}
\definecolor{PriorYellow}{HTML}{FFD400}
```
```{=latex}
\definecolor{HeaderBg}{HTML}{E0E8F5}
```
```{=latex}
\colorlet{PriorGrid}{PriorPurpleSoft}
```
```{=latex}
\colorlet{SourceColor}{PriorPlumLight!55!white}
```
```{=latex}
\colorlet{HiddenColor}{PriorOff!92!PriorInk}
```
```{=latex}
\colorlet{FeatColor}{PriorPlum}
```
```{=latex}
\colorlet{TargetColor}{PriorYellow}
```
```{=latex}
\colorlet{HidColor}{PriorPlumLight}
```
```{=latex}
\colorlet{ChildGreen}{PriorMauvePale}
```
```{=latex}
\colorlet{ArrowBlue}{PriorPurple}
```
```{=latex}
\colorlet{MidBlue}{PriorPurple}
```
```{=latex}
\colorlet{HdrLabel1}{PriorPlum}
```
```{=latex}
\colorlet{HdrLabel2}{PriorPurple}
```
```{=latex}
\begin{tikzpicture}[
    >=Latex, font=\small,
    scmnode/.style={circle,draw=PriorInk!80,line width=0.5pt,
                    minimum size=7mm,inner sep=0pt},
    src/.style ={scmnode,fill=SourceColor},
    hid/.style ={scmnode,fill=HidColor,text=PriorInk},
    feat/.style={scmnode,fill=FeatColor,text=white},
    tgt/.style ={scmnode,fill=TargetColor,text=PriorInk},
    par/.style ={scmnode,fill=SourceColor},
    chld/.style={scmnode,fill=ChildGreen},
    edge/.style={->,line width=0.5pt,draw=PriorInk!70},
    flow/.style={-{Triangle[length=3mm,width=2.4mm]},
                 line width=1.4pt,draw=ArrowBlue,line cap=round},
    panelttl/.style={font=\bfseries\small,text=PriorInk,anchor=west},
    pcap/.style={font=\itshape\footnotesize,text=PriorInk!90},
    boxout/.style={draw=PriorInk!80,rounded corners=2pt,line width=0.5pt,
                   inner sep=6pt,fill=white},
    minicard/.style={draw=PriorGrid,rounded corners=2pt,
                     line width=0.4pt,fill=white},
    dashbox/.style={draw=PriorInk!70,dashed,rounded corners=2pt,
                    line width=0.5pt,fill=white,inner sep=0pt},
]

\begin{scope}[on background layer]
  \foreach \xa/\ya/\xb/\yb in {%
        0/8.5/7.5/12.5,
        8.0/8.5/15.5/12.5,
        0/4.0/15.5/8.0,
        0/0/7.5/3.5,
        8.0/0/15.5/3.5}{
    \fill[white,rounded corners=3pt] (\xa,\ya) rectangle (\xb,\yb);
    \begin{scope}
      \clip[rounded corners=3pt] (\xa,\ya) rectangle (\xb,\yb);
      \fill[HeaderBg] (\xa,{\yb-0.7}) rectangle (\xb,\yb);
    \end{scope}
    \draw[PriorGrid,line width=0.3pt] (\xa,{\yb-0.7}) -- (\xb,{\yb-0.7});
    \draw[PriorGrid,rounded corners=3pt,line width=0.6pt]
          (\xa,\ya) rectangle (\xb,\yb);
  }
\end{scope}

\node[panelttl] at (0.20,12.15) {Step 1: Sample Hyper-parameters};
\node[panelttl] at (8.20,12.15) {Step 2: Sample DAG};
\node[panelttl] at (0.20, 7.65) {Step 3: Compute SCM};
\node[panelttl] at (0.20, 3.15) {Step 4: Extract Dataset};
\node[panelttl] at (8.20, 3.15) {Step 5: Post-processing};

\begin{scope}[shift={(0,8.5)}]
  \draw[minicard] (0.40,2.10) rectangle (1.95,2.90);
  \begin{scope}
    \clip (0.40,2.10) rectangle (1.95,3.80);
    \fill[SourceColor!70] plot[domain=-1.5:1.5,samples=40]
          ({1.175 + \x*0.42},{2.20 + 0.60*exp(-\x*\x)})
          -- ({1.175+1.5*0.42},2.20) -- ({1.175-1.5*0.42},2.20) -- cycle;
    \draw[MidBlue,line width=0.6pt] plot[domain=-1.5:1.5,samples=40]
          ({1.175 + \x*0.42},{2.20 + 0.60*exp(-\x*\x)});
    \draw[dashed,gray!70] (1.175,2.20) -- (1.175,2.80);
  \end{scope}

  \draw[minicard] (0.40,0.90) rectangle (1.95,1.70);
  \foreach \i/\h/\c in {0/0.55/PriorPlum, 1/0.70/PriorPlum, 2/0.40/PriorYellow,
                        3/0.30/PriorMauve, 4/0.50/PriorPlum}{
    \fill[\c] ({0.55+\i*0.27},1.00) rectangle
              ({0.78+\i*0.27},{1.30+\h*0.45});
  }
  \draw[gray!70] (0.45,1.00) -- (1.90,1.00);

  \foreach \y in {2.7,2.3,1.5,1.1}{
    \draw[gray!55,line width=2pt,line cap=round] (2.20,\y) -- (3.10,\y);
  }
  \foreach \y/\f in {2.7/0.55, 2.3/0.30, 1.5/0.65, 1.1/0.45}{
    \draw[MidBlue,line width=2pt,line cap=round]
          (2.20,\y) -- ({2.20+\f*0.90},\y);
    \fill[white] ({2.20+\f*0.90},\y) circle (0.10);
    \draw[MidBlue,line width=0.6pt] ({2.20+\f*0.90},\y) circle (0.10);
  }

  \draw[flow] (3.35,1.90) -- (4.25,1.9)
       node[midway,above=2pt,font=\bfseries\footnotesize] {Sample};

  \node[boxout,anchor=west,font=\scriptsize,inner sep=5pt, thin]
       at (4.45,1.90) {%
    \begin{tabular}{l@{\hspace{0.5em}}r}
      \texttt{num\_rows:}       & $N$ \\
      \texttt{num\_features:}   & $P$ \\
      \texttt{num\_classes:}    & $C$ \\
      \multicolumn{2}{c}{$\dots$} \\
    \end{tabular}};
\end{scope}

\begin{scope}[shift={(8.0,8.5)}]
  \node[font=\scriptsize\bfseries,anchor=south,text=PriorInk]
        at (1.80,2.82) {DAG Sampler};
  \draw[minicard] (0.30,1.92) rectangle (3.30,2.78);
  \draw[minicard] (0.50,1.95) rectangle (1.55,2.75);
  \foreach \px/\py in {0.70/2.55, 1.35/2.55, 0.80/2.25, 1.30/2.30}{
    \fill[PriorPlum] (\px,\py) circle (0.06);
    \draw[PriorInk!70,line width=0.3pt] (\px,\py) circle (0.06);
  }
  \draw[->,line width=0.55pt] (1.65,2.40) -- (1.80,2.40);
  \draw[minicard] (1.85,1.95) rectangle (3.05,2.75);
  \foreach \sx/\sy/\tx/\ty in {%
      2.05/2.55/2.80/2.55,
      2.05/2.55/2.10/2.20,
      2.80/2.55/2.80/2.20,
      2.10/2.20/2.80/2.20}{
    \draw[line width=0.35pt,draw=PriorInk!70] (\sx,\sy) -- (\tx,\ty);
  }
  \draw[line width=0.65pt,draw=PriorYellow] (2.05,2.55) -- (2.80,2.20);
  \foreach \px/\py in {2.05/2.55, 2.80/2.55, 2.10/2.20, 2.80/2.20}{
    \fill[PriorPlum] (\px,\py) circle (0.06);
    \draw[PriorInk!70,line width=0.3pt] (\px,\py) circle (0.06);
  }

  \node[font=\scriptsize\bfseries,anchor=south west,text=PriorInk]
        at (0.42,1.30) {Noise processes};
  \node[font=\tiny\itshape,anchor=south east,text=PriorInk!60]
        at (3.18,1.30) {$\varepsilon_i$};
  \draw[minicard] (0.30,0.20) rectangle (3.30,1.25);


  \foreach \xs/\ys in {%
      0.55/1.18, 0.78/1.02, 0.98/1.15, 1.20/1.00, 1.42/1.19,
      1.65/1.05, 1.88/1.13, 2.10/1.20, 2.32/1.04, 2.55/1.16,
      2.78/1.08, 3.00/1.12}{
    \fill[PriorPlum] (\xs,\ys) circle (0.04);
  }
  \foreach \xs/\ys in {%
      0.55/0.78, 0.80/0.92, 1.00/0.81, 1.22/0.95, 1.45/0.83,
      1.68/0.89, 1.90/0.76, 2.12/0.93, 2.35/0.85, 2.58/0.79,
      2.80/0.91, 3.02/0.84}{
    \fill[PriorMauve] (\xs,\ys) circle (0.04);
  }
  \foreach \xs/\ys in {%
      0.50/0.68, 0.75/0.52, 0.95/0.65, 1.18/0.55, 1.40/0.70,
      1.62/0.58, 1.85/0.51, 2.08/0.66, 2.30/0.60, 2.52/0.54,
      2.75/0.69, 2.98/0.57}{
    \fill[PriorPurple] (\xs,\ys) circle (0.04);
  }
  \foreach \xs/\ys in {%
      0.55/0.42, 0.78/0.28, 1.00/0.38, 1.23/0.45, 1.45/0.30,
      1.68/0.40, 1.90/0.26, 2.12/0.36, 2.35/0.43, 2.57/0.29,
      2.80/0.41, 3.03/0.33}{
    \fill[PriorPlumLight] (\xs,\ys) circle (0.045);
  }

  \draw[flow] (3.40,2.37) -- (4.10,2.37);
  \draw[flow] (3.40,0.73) -- (4.10,0.73);

  \node[hid] (n1) at (4.55,1.65) {$1$};
  \node[hid] (n4) at (5.65,2.90) {$4$};
  \node[hid] (n2) at (5.65,1.65) {$2$};
  \node[hid] (n3) at (6.95,2.30) {$3$};
  \node[hid] (n5) at (6.20,0.55) {$5$};
  \foreach \u/\v in {n1/n4,n1/n2,n1/n5,n4/n2,n4/n3,n2/n3,n2/n5,n3/n5}
    \draw[edge] (\u) -- (\v);
\end{scope}

\begin{scope}[shift={(0,4.0)}]
  \node[hid,scale=0.85] (m1) at (0.80,1.85) {$1$};
  \node[hid,scale=0.85] (m4) at (1.65,2.75) {$4$};
  \node[hid,scale=0.85] (m2) at (1.65,1.85) {$2$};
  \node[hid,scale=0.85] (m3) at (2.65,2.25) {$3$};
  \node[hid,scale=0.85] (m5) at (2.10,1.05) {$5$};
  \foreach \u/\v in {m1/m4,m1/m2,m1/m5,m4/m2,m4/m3,m2/m3,m2/m5,m3/m5}
    \draw[edge] (\u) -- (\v);

  \draw[flow] (3.20,1.90) -- (3.95,1.90);

  \node[pcap] at (5.40,3.10) {zoom-in};
  \draw[dashbox] (4.15,0.85) rectangle (6.65,2.95);
  \node[par] (zp4) at (4.80,2.40) {$4$};
  \node[par] (zp2) at (6.00,2.40) {$2$};
  \node[chld] (zc3) at (5.40,1.30) {$3$};
  \draw[edge] (zp4) -- (zp2);
  \draw[edge] (zp4) -- (zc3);
  \draw[edge] (zp2) -- (zc3);
  \node[pcap,align=center] at (5.40,0.40)
        {Child node value combines\\\& aggregates parents};

  \node[anchor=west,font=\small\bfseries,text=PriorInk]
        at (7.30,2.75) {Per-node structural equation:};
  \node[anchor=west,font=\small] at (7.50,2.25)
        {General:\quad $X_i = f_i\bigl(\mathrm{pa}(X_i)\bigr) + \varepsilon_i$};
  \node[anchor=west,font=\small] at (7.50,1.65)
        {Specific (child~$3$):};
  \node[anchor=west,font=\small] at (7.85,1.15)
        {$X_{3} = f\!\bigl(X_{4}, X_{2}\bigr) + \varepsilon_{3}$};

  \node[pcap,anchor=west] at (7.50,0.45)
        {$\ast$ Computed in topological order over $G(V,E)$};
\end{scope}

\begin{scope}[shift={(0,0)}]
  \node[feat] (e1) at (0.95,1.30) {$1$};
  \node[feat] (e4) at (2.10,2.40) {$4$};
  \node[hid] (e2) at (2.10,1.30) {$2$};
  \node[tgt]  (e3) at (3.40,1.85) {$3$};
  \node[feat]  (e5) at (2.65,0.40) {$5$};
  \foreach \u/\v in {e1/e4,e1/e2,e1/e5,e4/e2,e4/e3,e2/e3,e2/e5,e3/e5}
    \draw[edge] (\u) -- (\v);

  \node[feat,scale=0.85] at (4.65,2.25) {};
  \node[anchor=west,font=\footnotesize] at (4.90,2.25) {Features ($X$)};
  \node[tgt,scale=0.85]  at (4.65,1.45) {};
  \node[anchor=west,font=\footnotesize] at (4.90,1.45) {Target ($Y$)};
  \node[hid,scale=0.85]  at (4.65,0.65) {};
  \node[anchor=west,font=\footnotesize] at (4.90,0.65) {Hidden};
\end{scope}

\begin{scope}[shift={(8.0,0)}]
  \def\cellW{0.375}
  \def\cellH{0.25}
  \begin{scope}[shift={(0.20,0.50)}]
    \foreach \i in {0,1,2,3}{
      \fill[gray!45] ({\i*\cellW},1.50)
                     rectangle ({(\i+1)*\cellW-0.03},1.75);
      \draw[gray!50,line width=0.3pt]
            ({\i*\cellW},1.50) rectangle ({(\i+1)*\cellW-0.03},1.75);
    }
    \foreach \r in {0,1,2,3}{
      \foreach \i in {0,1,2,3}{
        \fill[white] ({\i*\cellW},{1.20 - \r*\cellH})
                     rectangle ({(\i+1)*\cellW-0.03},{1.45 - \r*\cellH});
        \draw[gray!50,line width=0.3pt]
              ({\i*\cellW},{1.20 - \r*\cellH})
              rectangle ({(\i+1)*\cellW-0.03},{1.45 - \r*\cellH});
      }
    }
    \node[pcap] at (0.74,-0.10) {Raw data};
  \end{scope}

  \draw[flow] (1.85,1.50) -- (2.55,1.50);

  \draw[minicard] (2.65,0.75) rectangle (4.65,2.40);
  \foreach \cx/\cy in {3.10/1.925, 4.20/1.925, 3.10/1.225, 4.20/1.225}{
    \draw[minicard] ({\cx-0.30},{\cy-0.25}) rectangle ({\cx+0.30},{\cy+0.25});
  }
  \foreach \pt in {(2.95,1.75),(3.05,1.85),(3.15,1.95),(3.25,2.05)}
    \fill[PriorInk!85] \pt circle (0.024);
  \draw[->,line width=0.6pt] (4.05,1.75) -- (4.37,2.05);
  \foreach \i/\h in {0/0.16,1/0.26,2/0.14,3/0.28}{
    \fill[PriorInk!85] ({2.85+0.14*\i},1.08)
                       rectangle ({2.95+0.14*\i},{1.08+\h});
  }
  \draw[PriorInk!85,line width=0.6pt]
        (3.95,1.05) -- (4.10,1.05) -- (4.10,1.22) --
        (4.25,1.22) -- (4.25,1.40) -- (4.43,1.40);
  \node[pcap] at (3.65,0.35) {Post-processing};

  \draw[flow] (4.80,1.50) -- (5.50,1.50);

  \begin{scope}[shift={(5.60,0.50)}]
    \foreach \i/\c in {0/PriorPlum, 1/PriorPurple,
                       2/PriorMauve, 3/PriorYellow}{
      \fill[\c] ({\i*\cellW},1.50)
                rectangle ({(\i+1)*\cellW-0.03},1.75);
      \draw[gray!50,line width=0.3pt]
            ({\i*\cellW},1.50) rectangle ({(\i+1)*\cellW-0.03},1.75);
    }
    \foreach \r in {0,1,2,3}{
      \foreach \i in {0,1,2,3}{
        \fill[white] ({\i*\cellW},{1.20 - \r*\cellH})
                     rectangle ({(\i+1)*\cellW-0.03},{1.45 - \r*\cellH});
        \draw[gray!50,line width=0.3pt]
              ({\i*\cellW},{1.20 - \r*\cellH})
              rectangle ({(\i+1)*\cellW-0.03},{1.45 - \r*\cellH});
      }
    }
    \node[pcap] at (0.74,-0.18) {Synthetic Dataset};
  \end{scope}
\end{scope}

\draw[flow] (7.55,10.50) -- (7.95,10.50);   %
\draw[flow] (11.75,8.45) -- (11.75,8.05);   %
\draw[flow] (3.75,3.95)  -- (3.75,3.55);    %
\draw[flow] (7.55,1.75)  -- (7.95,1.75);    %

\end{tikzpicture}
```
```{=latex}
\captionof{figure}{\textbf{Schematic visualization of our SCM prior.} (i) We first sample high-level hyperparameters for the dataset, including number of features and number of rows. (ii) Based on the hyperparameters, we utilize our graph sampling algorithms to generate a directed acyclic graph (DAG) underlying our SCM; in parallel, an i.i.d.\ noise sample $\varepsilon_i$ is drawn per node (each colour shade in the lower-left mini-panel corresponds to a different node). (iii) We compute a topological ordering of the DAG. Based on this, we create a computational graph: First we fill root nodes (i.e.\ exogenous variables) and subsequently traverse the computational graph in topological order, combining parent nodes using our combiner mechanisms and activations to propagate values to the child nodes. (iv) We choose suitable features and target variables from our fully computed SCM. (v) We apply post-processing to the dataset.}
```
`\label{fig:prior}`{=latex}

1.  **Graph generation.** We expand the distribution of graphs underlying the SCM by introducing new sampling algorithms, enabling richer structural diversity. Sample graphs are shown in Figure `\ref{fig:graph_sampling}`{=latex}.

2.  **Combiner mechanisms.** We introduce a host of new combiner mechanisms that combine values of parent nodes to propagate values to child nodes, some examples of which are visualized for a simple two-dimensional case in Figure `\ref{fig:mechanisms}`{=latex}. Increasing the variety of functional forms by which child nodes depend on the respective parent nodes allows for richer node relationships in the SCM.

3.  **Categorical variables.** Compared to TabPFN-2.5, we reworked the treatment of categorical variables in our SCM, moving from a comparatively simple categorical data model to more expressive variants.

4.  **High-frequency oscillators.** TabPFN-2.5 struggled with high-frequency oscillations despite performing well on sinusoidal data generally. Improved sinusoidal activations give TabPFN-3 strong performance across the full frequency spectrum.

5.  **Spatial prior.** Many tabular datasets have underlying spatial structure (e.g. datasets containing longitude and latitude as covariates, grids of sensors, etc.). We add spatial activations that allow our prior to encode spatial relationships between variables.

6.  **Many-class prior.** The flexible many-class decoder in TabPFN-3 enables native classification support for an arbitrary number of classes. We match this architectural design in the prior, ensuring high quality datasets that enable state-of-the-art downstream performance from binary datasets to datasets with hundreds of classes.

7.  **Temporal prior.** Many tabular datasets have temporal structure: rows are collected over intervals of time, train and test splits are often ordered by time rather than drawn i.i.d., and temporal dependencies between variables are common. We extend the SCM into a discrete-time Dynamic Structural Causal Model [@boeken2024dynamicstructuralcausalmodels].

8.  **Out-of-distribution prior.** We add out-of-distribution prediction tasks, allowing models trained on our prior data to remain performant under distribution shifts, as well as moving from pure interpolation to extrapolation. A simple example highlighting how our o.o.d. prior allows `\ourmodel `{=latex}to perform extrapolation is shown in Figure `\ref{fig:ood}`{=latex} - a capability that is notably absent from most tree-based algorithms as well as most other tabular foundation models.

TabPFN-3-Plus and Thinking mode {#sec:tabpfn3plus}
-------------------------------

On top of `\ourmodel`{=latex}, which we release open-source, our API and enterprise deployments provide access to `\ourmodelplus `{=latex}and its thinking mode \"TabPFN-3-Plus (Thinking)\" (named TabPFN-3-Thinking in our plots) These variants are fully compatible with the open-source `\ourmodel `{=latex}interface and can be used as a drop-in replacement, while offering additional capabilities:

#### Native text-feature support.

`\ourmodelplus `{=latex}accepts string-valued columns directly, without requiring upstream featurization. Free-text fields -- such as product names, insurance claim descriptions, or customer reviews -- are encoded jointly with numeric and categorical features inside the model, so cross-feature interactions between text and structured columns are learned end-to-end rather than imposed by a fixed encoder.

#### Thinking mode. {#thinking-mode.-1}

`\ourmodelenhanced `{=latex}applies additional inference-time computation on top of `\ourmodelplus `{=latex}to push prediction quality further. Thinking mode composes with native text-feature support, so a single call can handle mixed numerical, categorical, and text columns under the same inference-time-compute regime. We emphasize that our Thinking mode achieves this strong performance while only relying on TabPFN, without using LLMs, real data, internet search, or any other model.

`\ourmodelplus`{=latex}, including Thinking mode, is available through our API and through enterprise deployments including on-prem and VPC deployment on AWS SageMaker and Azure AI Foundry; see Section `\ref{sec:license}`{=latex} for licensing and access. Benchmark results are reported in Sections `\ref{sec:tabarena-result}`{=latex} (TabArena), `\ref{sec:tabstar-result}`{=latex} (TabSTAR), and `\ref{sec:large-data}`{=latex} (Large data). `\FloatBarrier`{=latex}

Experimental Results {#sec:results}
====================

In this section, we report experimental results across a variety of benchmarks. In Section `\ref{subsec:results_tabular}`{=latex}, we focus on public tabular benchmarks: TabArena [@erickson2025tabarena], TALENT [@talent_benchmark_jmlr], and the text-tabular TabSTAR collection [@arazi_tabstar_2025]. Section `\ref{subsec:results_internal}`{=latex} describes internal benchmarks spanning various subtypes of tabular learning, including large-scale datasets and features, many-class classification, and quantile regression. The subsequent sections extend beyond classic tabular learning: Section `\ref{subsec:results_ts}`{=latex} addresses time-series data, Section `\ref{subsec:results_rel}`{=latex} covers relational learning, and Section `\ref{subsec:results_embed}`{=latex} focuses on embeddings.

Public Tabular Benchmarks {#subsec:results_tabular}
-------------------------

### TabArena {#sec:tabarena-result}

```{=latex}
\centering
```
![**TabPFN-3 performance on the standard TabArena benchmark** [@erickson2025tabarena], including all 51 datasets (up to 100K rows). TabPFN-3 outperforms any other model in a forward pass, while `\ourmodelenhanced `{=latex}strongly outperforms all existing methods, including AutoGluon 1.5 extreme [@autogluon_tabular], a complex ensemble of models including TabPFN v2 tuned for 4 hours, in less than a tenth of the runtime. ](figures/tabarena_v3/all/tuning-impact-elo.png){#fig:tabarena_enhanced width="\\linewidth"}

TabArena [@erickson2025tabarena] (NeurIPS 2025 Datasets & Benchmarks) is a recent and heavily curated tabular benchmark, based on the largest number of candidate datasets considered, and created and maintained by open-source contributors from a wide range of institutions. In particular, it compares a large and regularly updated list of recent models, including tree-based models like CatBoost [@prokhorenkova2018catboost], LightGBM [@lightgbm] or XGBoost [@chen2016xgboost], as well as newer deep-learning models like RealMLP [@holzmuller2024realmlp], TabM [@gorishniy2024tabm], ModernNCA [@ye2025revisitingnearestneighbortabular] or xRFM [@beaglehole2025xrfmaccuratescalableinterpretable], the AutoML system AutoGluon [@autogluon_tabular], and other Tabular Foundation Models like TabICL [@qu2025tabicl; @qu2026tabiclv2], TabDPT [@ma2025tabdptscalingtabularfoundation], TabSTAR [@arazi_tabstar_2025], LimiX [@zhang2025limix], Mitra [@zhang2025mitramixedsyntheticpriors] or TabPFN v2 [@Hollmann2025tabpfnv2]. The benchmark contains a set of 51 datasets selected from 1053 to be representative of real-world tabular data. See @erickson2025tabarena for the list of datasets and Section `\ref{app:tabarena-metrics}`{=latex} for definitions of TabArena's Elo and Improvability metrics.

```{=latex}
\centering
```
```{=latex}
\centering
```
![**Pairwise win rates on TabArena** for a curated set of the strongest models on TabArena. See Appendix `\ref{sec:tabarena_leaderboard_tables}`{=latex} for the full results.](figures/tabarena_v3/all/tuning_trajectories/pareto_n_configs_imp_total.png){#fig:tabarena_leaderboard width="\\linewidth"}

```{=latex}
\hfill
```
```{=latex}
\centering
```
![**Pairwise win rates on TabArena** for a curated set of the strongest models on TabArena. See Appendix `\ref{sec:tabarena_leaderboard_tables}`{=latex} for the full results.](figures/tabarena_v3/all/winrate_matrix.png){#fig:tabarena_leaderboard width="\\linewidth"}

#### Pushing the performance frontier on TabArena.

Figure `\ref{fig:tabarena_enhanced}`{=latex} shows the performance of `\ourmodel `{=latex}and `\ourmodelenhanced `{=latex}on TabArena. `\ourmodel `{=latex}outperforms in one forward pass all other models, including tuned and ensembled baselines, by a significant margin, gaining 72 Elo points over our previous Real-TabPFN-2.5 tuned and ensembled. `\ourmodelenhanced`{=latex}, leveraging test-time computation, significantly outperforms open-source `\ourmodel `{=latex}on TabArena, beating any non-TabPFN model (including tuned and ensembled baselines) by over 200 Elo points, and outperforming AutoGluon 1.5 extreme, a complex ensemble of models including TabPFN v2, tuned for 4 hours, by over 100 Elo points while being 10x faster. Looking at the win rate matrix in Figure `\ref{fig:tabarena_leaderboard}`{=latex}, we can see that `\ourmodelplus `{=latex}with Thinking mode (respectively `\ourmodel`{=latex}) has over 93% (respectively 80%) win rate against tuned and ensembled CatBoost, LightGBM and XGBoost, and a 69% (respectively 56%) win rate against AutoGluon 1.5 extreme tuned for 4 hours.

#### Dominating the time / performance Pareto-frontier.

The strong results of our models are achieved while being much faster to train than the baselines. On Figure `\ref{fig:tabarena_pareto}`{=latex}, we can see that our model family, (`\ourmodel `{=latex}with 1, 2, and 4 estimators and `\ourmodelplus `{=latex}with Thinking mode) strictly dominates the combined training + inference time/performance pareto-frontier on TabArena by a large margin.

#### Scaling to larger datasets.

`\ourmodel `{=latex}was built to scale to large datasets, and `\ourmodelenhanced `{=latex}benefits from this scalability. While TabArena only contains datasets up to 100k rows, we can still observe very strong performance on the 15 largest datasets in TabArena with between 10k and 100k rows, as shown in Figure `\ref{fig:tabarena-hero-plot}`{=latex}. In particular, on this subset `\ourmodel `{=latex}outperforms any other model by 100 Elo, and `\ourmodelenhanced `{=latex}dramatically outperforms any other non-TabPFN model (including tuned and ensembled baselines) by over 420 Elo points, and beats AutoGluon 1.5 extreme (4h) by 220 Elo points. Looking at the win rate matrix in Figure `\ref{fig:tabarena_winrate_medium}`{=latex}, `\ourmodelenhanced `{=latex}has over 99% win rate against tuned and ensembled LightGBM and XGBoost, 98% win rate against CatBoost tuned and ensembled, and 82% win rate against AutoGluon 1.5 extreme tuned for 4 hours. In Section `\ref{sec:large-data}`{=latex}, we study the performance of our model beyond 100K rows, going up to 1M training rows.

### TALENT {#sec:talent-result}

The TALENT benchmark [@talent_benchmark_jmlr] provides a complementary view on the performance of `\ourmodel`{=latex}. Instead of a smaller curated list of datasets, this benchmark uses a large number of diverse datasets (300) from a wide range of domains. The strong results of TabPFN-3 on this benchmark confirm the robustness of its performance. Indeed, TabPFN-3 ranks first on the TALENT benchmark in aggregate, as shown in Figure `\ref{fig:TALENT-tabicl-datasets}`{=latex}, as well as for each task type (regression, binary and multiclass classification) in Figure `\ref{fig:per-task-TALENT-rank}`{=latex}.

```{=latex}
\centering
```
![**Average rank on the TALENT benchmark, using the TabICLv2 evaluation protocol from @qu2026tabiclv2 (274 datasets).** The original 300-dataset TALENT [@talent_benchmark_jmlr] minus the 26 development datasets used for TabPFN-2 / TabICLv2 development removed in the TabICLv2 paper), spanning regression, binary and multiclass classification. Bars show mean rank (lower is better); error bars are 95% bootstrap confidence intervals over datasets (see appendix `\ref{app:TALENT}`{=latex}). Methods tagged *(N imputed, X%)* failed on some datasets and have that fraction of their score cells filled with K-nearest-neighbour values.](figures/TALENT/average_rank_horizontal_paper.png){#fig:TALENT-tabicl-datasets width="0.9\\linewidth"}

### TabSTAR {#sec:tabstar-result}

The TabSTAR study [@arazi_tabstar_2025] assembled 50 text-tabular datasets, gathered from previous work [@shi2021benchmarking; @grinsztajn2023vectorizing; @kim2024carte]. These datasets represent real world tasks, where at least one feature is text-based and cannot be faithfully represented without text processing methods. While the open-source version of `\ourmodel `{=latex}only supports numerical and categorical variables, `\ourmodelplus `{=latex}also offers native support for text features. We compare TabPFN API models with both text-aware models and numerical-only baselines. Figure `\ref{fig:text_leaderboard}`{=latex} shows that `\ourmodelplus `{=latex}dominates the leaderboard by a significant margin, and combining our thinking mode with native text support pushes performance further. Furthermore, among models that omit text features due to lack of native support, `\ourmodel `{=latex}remains the top performer. Appendix `\ref{app:TABSTAR}`{=latex} provides further details on the benchmark, as well as a performance breakdown by task type.

```{=latex}
\centering
```
![**Performance over the TabSTAR Text-Tabular Collection**. `\ourmodelenhanced `{=latex}and TabPFN-3-Plus significantly outperform text-aware models such as CatBoost, TabSTAR and SAP-RPT-OSS. In turn, these models dominate over numerical-only baselines, for which TabPFN-3 gets the best results.](figures/text_leaderboard/tabstar_leaderboard.png){#fig:text_leaderboard width="0.8\\linewidth"}

Internal Benchmarks {#subsec:results_internal}
-------------------

To complement the public TabArena [@erickson2025tabarena] and TALENT [@talent_benchmark_jmlr] benchmarks, we evaluate TabPFN-3 on a set of internal benchmarks designed to stress capabilities that are only partially covered by existing public evaluations. These benchmarks test whether TabPFN-3 pushes the frontier of tabular foundation models beyond the small- and medium-data regimes emphasized in prior work. In particular, we evaluate scaling to more than one million samples, high-dimensional feature spaces, many-class classification, and quantile regression.

Our primary comparisons are against the leading gradient boosted tree frameworks XGBoost [@chen2016xgboost], CatBoost [@prokhorenkova2018catboost], and LightGBM [@lightgbm], as well as TabICLv2 [@qu2026tabiclv2], a recent foundation model for tabular data with strong results on public benchmarks.

### Large Data {#sec:large-data}

#### Evaluation Protocol.

The primary baselines for our large-data evaluation are tree-based methods, which recent large-scale tabular benchmarks have shown to be highly competitive beyond 100,000 samples [@talent_benchmark_jmlr]. Our large-data benchmarking effort focuses on datasets with 100,000 to 1 million training rows and up to 200 features.

This benchmark targets the large-row regime for which `\ourmodel `{=latex}was designed. As described in Section `\ref{sec:arch_overview}`{=latex}, `\ourmodel `{=latex}first compresses feature information into fixed-dimensional row representations and subsequently performs in-context learning over these rows. This architectural decomposition enables inference on datasets with up to one million rows on a single GPU. At the same time, it induces a scaling trade-off: when both the number of rows and the number of features are very large, the early compression of feature information can become a bottleneck. We treat the high-dimensional, low-sample regime as a separate evaluation setting, studied in Section `\ref{sec_meany_feats}`{=latex}, rather than conflating it with the large-row setting considered here.

Our benchmark datasets span diverse real-world domains including healthcare, finance, logistics, and environmental science. For regression, the datasets in our benchmark exhibit temporal structure, where models are trained on past data and must generalize to future data. We found this setting to be the most common and representative of real-world deployment conditions.

#### Results.

`\ourmodel `{=latex}achieves state-of-the-art performance on our large-data benchmark, outperforming default and 8-hour-tuned gradient-boosted tree baselines in a single forward pass, as shown in Figure `\ref{fig:large_data_all}`{=latex}. Further, we show a preview version of `\ourmodelenhanced `{=latex}on large data, which improves `\ourmodel `{=latex}performance further for classification datasets (as `\ourmodelplus `{=latex}with Thinking mode does not yet support temporal datasets as of the time of writing, we could not evaluate it on our regression benchmark). To better understand how `\ourmodel `{=latex}performance scales with training size, we report performance on subsampled versions of our datasets (keeping test set constant, and only considering datasets with 1M training samples) in Figure  `\ref{fig:large_data_scaling}`{=latex}. Across the 100k--1M range, TabPFN-3 scales smoothly and retains the top normalized score at every training-set size.

```{=latex}
\centering
```
```{=latex}
\centering
```
![ **TabPFN-3 achieves state-of-the-art performance on the large-rows benchmark (up to 1M training rows and 200 features, 13 datasets)**, outperforming both default and 8-hour-tuned gradient-boosted tree baselines as well as TabICLv2 in a single forward pass. **(a)** Classification (9 datasets). **(b)** Regression (4 datasets) use temporal splits. Normalized scores are higher-is-better; see Section `\ref{sec:metric_norm}`{=latex} for the normalization procedure and Appendix `\ref{app:large_data_datasets}`{=latex} for critical difference diagrams. ](figures/internal_benchmarking/large_data/big_data_cls_v2_wo_AG_w_thinking__roc_auc_normalized.png){#fig:large_data_all width="\\textwidth"}

```{=latex}
\centering
```
![ **TabPFN-3 achieves state-of-the-art performance on the large-rows benchmark (up to 1M training rows and 200 features, 13 datasets)**, outperforming both default and 8-hour-tuned gradient-boosted tree baselines as well as TabICLv2 in a single forward pass. **(a)** Classification (9 datasets). **(b)** Regression (4 datasets) use temporal splits. Normalized scores are higher-is-better; see Section `\ref{sec:metric_norm}`{=latex} for the normalization procedure and Appendix `\ref{app:large_data_datasets}`{=latex} for critical difference diagrams. ](figures/internal_benchmarking/large_data/big_data_reg_v3_wo_AG__rmse_normalized.png){#fig:large_data_all width="\\textwidth"}

```{=latex}
\centering
```
![**`\ourmodel{}`{=latex} tops the normalized scaling curves for ROC-AUC OvR classification and RMSE regression across dataset scales.** Results are shown on the four large-data benchmark datasets that reach at least 1M training rows (one classification, three regression). For each dataset we subsample the training set to 100k, 250k, 500k and 1M rows with 3 random repeats. Shaded bands are 95% bootstrap confidence intervals across the four datasets and 3 repeats. . ](figures/internal_benchmarking/large_data/scaling_normalized_1M_combined.png){#fig:large_data_scaling width="0.8\\linewidth"}

#### Large data results from TALENT benchmark.

To confirm our internal results, we also extract the 14 available datasets in the TALENT benchmark with more than 100K and less than 1M training samples (see Appendix `\ref{app:large_data_datasets}`{=latex}). On this subset, TabPFN-3 is again the best ranked model against the baselines provided by the TALENT benchmark, as shown in Figure `\ref{fig:large_rows_rank}`{=latex}.

### Many-Class Classification {#sec:many-class-eval}

`\ourmodel `{=latex}introduces a many-class decoder (Section `\ref{sec:many-class-decoder}`{=latex}) that we trained to support up to 160 classes, a regime where most tabular foundation models fail entirely. Creating a benchmark from real-world datasets with naturally many classes is challenging; we therefore evaluate on a synthetic benchmark derived by bucketing regression targets from real regression benchmark datasets. We also confirm the strong performance of `\ourmodel `{=latex}on the 4 datasets from the TALENT benchmark that have more than 50 classes in Section `\ref{app:TALENT-many-class}`{=latex}.

#### Synthetic many-class benchmark.

We construct a synthetic benchmark by converting the TabArena regression datasets into many-class classification problems via jittered quantile binning; full construction details are given in Appendix `\ref{app:many_class_construction}`{=latex}. Figure `\ref{fig:many_class_synthetic}`{=latex} shows the ROC-AUC (OvR) and accuracy. TabPFN-3 achieves the highest normalized ROC-AUC of $1.00$, ranking first overall and outperforming all baselines by a large margin. On ROC-AUC (OvR), the next best model is TabICLv2 at $0.89$ using its many-class wrapper to go beyond its 10 classes limit. TabPFN-2.5 achieves $0.83$, using its own many-class error-correcting-code-based wrapper[^2]. Conventional tree-based methods and KNN all perform notably worse, even after $1$ hour of tuning.

```{=latex}
\centering
```
```{=latex}
\centering
```
![ **On the synthetic many-class benchmark `\ourmodel `{=latex}achieves a normalized ROC-AUC (OvR) of $1.00$, outperforming all GBT baselines by a large margin**. The benchmark contains up to 100 classes, 9 datasets that are derived from TabArena regression tasks via Dirichlet-jittered quantile binning with shuffled labels). Normalized scores are higher-is-better; see Section `\ref{sec:metric_norm}`{=latex} for the normalization procedure. The corresponding Critical Difference diagram can be seen in Figure `\ref{fig:many_class_roc_auc_cd}`{=latex}. ](figures/internal_benchmarking/many_class/tabarena_many_class_100_v2__roc_auc_normalized.png){#fig:many_class_synthetic width="\\textwidth"}

```{=latex}
\centering
```
![ **On the synthetic many-class benchmark `\ourmodel `{=latex}achieves a normalized ROC-AUC (OvR) of $1.00$, outperforming all GBT baselines by a large margin**. The benchmark contains up to 100 classes, 9 datasets that are derived from TabArena regression tasks via Dirichlet-jittered quantile binning with shuffled labels). Normalized scores are higher-is-better; see Section `\ref{sec:metric_norm}`{=latex} for the normalization procedure. The corresponding Critical Difference diagram can be seen in Figure `\ref{fig:many_class_roc_auc_cd}`{=latex}. ](figures/internal_benchmarking/many_class/tabarena_many_class_100_v2__accuracy_normalized.png){#fig:many_class_synthetic width="\\textwidth"}

### Many Features {#sec_meany_feats}

The high-dimensional, low-sample regime poses a qualitatively different challenge from the large-row setting studied in Section `\ref{sec:large-data}`{=latex}. Whereas large-row benchmarks primarily test scalability to many training examples, the many-features setting tests robust generalization and feature-subset selection when the number of candidate features far exceeds the number of samples.

We evaluate this setting on a dedicated *many-features* slice of six real-world classification datasets with 100--320 samples, 1,100--22,200 features, and 2--4 classes, mostly from biomedical or gene-expression-style domains. Such large feature-to-sample ratios are challenging for tree-based methods because they increase the risk of selecting spurious feature interactions.

Figure `\ref{fig:many_feats_roc_auc}`{=latex} shows that `\ourmodel `{=latex}performs strongly on this challenging slice, reaching the best normalized ROC-AUC with 32 estimators. Earlier TabPFN variants, in particular Real-TabPFN-2.5 and TabPFN v2, also perform competitively, suggesting that TabPFN-style pretraining provides a robust inductive bias for high-dimensional, low-sample problems.

As described in Section `\ref{sec:preprocessing}`{=latex}, each `\ourmodel `{=latex}estimator is restricted to at most 200 input features per default. Thus, for datasets with tens of thousands of raw features, individual estimators operate on feature subsets rather than compressing the full feature set. At the same estimator budget, Real-TabPFN-2.5 can slightly outperform `\ourmodel`{=latex}; we hypothesize that this reflects two factors: Real-TabPFN-2.5 uses up to 500 features per estimator, providing broader feature-space coverage on some datasets, and its alternating row-wise and feature-wise attention may better exploit the selected feature subset. For `\ourmodel`{=latex}, increasing the number of estimators improves coverage of the raw feature space and raises the probability that informative feature subsets are included. In our OSS version, this estimator budget is scaled automatically for high-dimensional inputs, making the ensemble substantially more effective in this regime.

Overall, the many-features slice suggests that TabPFN estimators can be ensembled effectively in a high-noise feature-selection regime, where conventional tree-based methods are prone to overfitting to noisy or spurious feature interactions.

```{=latex}
\centering
```
```{=latex}
\centering
```
![ **TabPFN-3 exhibits strong predictive distribution modeling on quantile regression.** Normalized pinball loss on our quantile regression benchmark, constructed from TabArena regression datasets and averaged across 10 quantile levels $q \in \{0.1, 0.2, \ldots, 0.9\}$ [@koenker_regression_quantiles]. Normalized scores are higher-is-better; see Section `\ref{sec:metric_norm}`{=latex} for the normalization procedure. ](figures/internal_benchmarking/many_feats/many_feats_1k_25k_multiclass_v2__roc_auc_normalized.png){#fig:quantile_normalized width="\\linewidth"}

```{=latex}
\hfill
```
```{=latex}
\centering
```
![ **TabPFN-3 exhibits strong predictive distribution modeling on quantile regression.** Normalized pinball loss on our quantile regression benchmark, constructed from TabArena regression datasets and averaged across 10 quantile levels $q \in \{0.1, 0.2, \ldots, 0.9\}$ [@koenker_regression_quantiles]. Normalized scores are higher-is-better; see Section `\ref{sec:metric_norm}`{=latex} for the normalization procedure. ](figures/internal_benchmarking/quantile/tabarena_regression__pinball_loss_normalized.png){#fig:quantile_normalized width="\\linewidth"}

### Quantile Regression {#sec:quantile_regression}

Beyond point predictions, TabPFN-3 provides full predictive distributions via a bar-distribution regression head (Section `\ref{app:architecture-hyperparams}`{=latex}), from which arbitrary quantiles are decoded at inference by inverting the predicted CDF --- all from a single forward pass, with no retraining per quantile level. Since TabArena does not natively support quantile regression evaluation, we construct a dedicated benchmark by downloading the TabArena regression datasets and evaluating all models on pinball loss [@koenker_regression_quantiles], averaged across 10 quantile levels $q \in \{0.1, 0.2, \ldots, 0.9\}$. We compare against four baselines spanning the typical strategies for quantile regression: a linear quantile regressor, which fits a separate pinball-loss model per quantile level; XGBoost in quantile mode, which uses a single multi-output booster but adds one tree per quantile per boosting round, scaling training cost roughly linearly in the number of levels; quantile random forests [@meinshausen2006qrf], which train a single MSE-objective forest and read off all quantiles from leaf-level empirical CDFs at no extra training cost; and TabICL-v2, a tabular foundation model with a quantile head.

TabPFN-3 achieves a normalized pinball loss score very close to $1.00$, ranking first overall and outperforming all baselines, demonstrating that the bar-distribution head produces well-calibrated predictive distributions superior to dedicated quantile regression baselines at no additional training cost per quantile level. The normalized Pinball loss is shown in Figure `\ref{fig:quantile_normalized}`{=latex}, while the corresponding Critical Difference plot can be found in the Appendix in Figure `\ref{fig:quantile_cd}`{=latex}. `\FloatBarrier`{=latex}

Time-Series Forecasting {#subsec:results_ts}
-----------------------

In addition to the classification and regression checkpoints, we release a new TabPFN-3 checkpoint for TabPFN-TS [@hoo2024tabpfn_ts] fine-tuned on **synthetic** time-series data for probabilistic time-series forecasting. This checkpoint can be used in our [`tabpfn-time-series`](https://github.com/PriorLabs/tabpfn-time-series) library. We evaluate it on fev-bench [@shchur2025fev], a benchmark containing 100 diverse time-series forecasting tasks. Following this benchmark, we report win rates and skill scores relative to the Seasonal Naive baseline in Table `\ref{tab:fev-bench}`{=latex} (full version in Appendix Table `\ref{tab:fev-bench-full}`{=latex}).

```{=latex}
\centering
```
```{=latex}
\vspace{-3mm}
```
```{=latex}
\centering
```
**(a) SQL (probabilistic)**\
`\resizebox{\linewidth}{!}{\begin{tabular}{lrrrrr}
\toprule
\textbf{Model} & \textbf{Win (\%)} & \textbf{Skill (\%)} & \textbf{Runtime (s)} & \textbf{Leak.\ (\%)} & \textbf{\# fails} \\
\midrule
Chronos-2      & 91.7 & 47.3 & 0.8   & 0  & 0 \\
\textcolor{PriorMauve}{TabPFN-TS-3}    & 73.6 & 43.1 & 234.6 & 0  & 0 \\
TiRex          & 83.4 & 42.6 & 0.2   & 1  & 0 \\
TimesFM-2.5    & 78.6 & 42.2 & 1.9   & 10 & 0 \\
Toto-1.0       & 71.6 & 40.7 & 22.1  & 8  & 0 \\
\textcolor{PriorMauve}{TabPFN-v2-TS}   & 64.1 & 39.6 & 88.9  & 0  & 2 \\
Moirai-2.0     & 66.2 & 39.3 & 0.3   & 28 & 0 \\
Chronos-Bolt   & 66.2 & 38.9 & 0.2   & 0  & 0 \\
Sundial-Base   & 47.1 & 33.4 & 8.0   & 1  & 0 \\
TabICL-v2      & 53.8 & 30.8 & 64.7  & 0  & 0 \\
Stat. Ensemble & 43.8 & 20.2 & 148.6 & 0  & 11 \\
Seasonal Naive & 19.1 & 0.0  & 0.5   & 0  & 0 \\
\bottomrule
\end{tabular}
}`{=latex}

```{=latex}
\hfill
```
```{=latex}
\centering
```
**(b) MASE (point)**\
`\resizebox{\linewidth}{!}{\begin{tabular}{lrrrrr}
\toprule
\textbf{Model} & \textbf{Win (\%)} & \textbf{Skill (\%)} & \textbf{Runtime (s)} & \textbf{Leak.\ (\%)} & \textbf{\# fails} \\
\midrule
Chronos-2      & 86.9 & 35.5 & 0.8   & 0  & 0 \\
\textcolor{PriorMauve}{TabPFN-TS-3}    & 69.8 & 30.6 & 234.6 & 0  & 0 \\
TimesFM-2.5    & 74.9 & 30.2 & 1.9   & 10 & 0 \\
TiRex          & 76.9 & 30.0 & 0.2   & 1  & 0 \\
Toto-1.0       & 66.3 & 28.2 & 22.1  & 8  & 0 \\
\textcolor{PriorMauve}{TabPFN-v2-TS}   & 58.5 & 27.6 & 88.9  & 0  & 2 \\
Moirai-2.0     & 61.4 & 27.3 & 0.3   & 28 & 0 \\
Chronos-Bolt   & 60.7 & 26.5 & 0.2   & 0  & 0 \\
Sundial-Base   & 53.4 & 24.7 & 8.0   & 1  & 0 \\
Stat. Ensemble & 46.7 & 15.7 & 148.6 & 0  & 11 \\
TabICL-v2      & 33.2 & 7.0  & 64.7  & 0  & 0 \\
Seasonal Naive & 20.0 & 0.0  & 0.5   & 0  & 0 \\
\bottomrule
\end{tabular}
}`{=latex}

```{=latex}
\footnotetext{The fev-bench
authors report 28.8 MASE skill for the original TabPFN-TS
\citep{shchur2025fev}; our re-run in Table~\ref{tab:fev-bench-full}
yields 27.6 — we report our own re-run for like-for-like comparison
across the cohort.}
```
```{=latex}
\centering
```
![**Qualitative forecast comparison on a fev-bench task (`rohlik_order_1D`).** Each model column shows the forecast horizon (zoomed to time 880-935) against the held-out ground truth, with the shaded band indicating the 10th-90th quantile. The leftmost panel shows the full training history. MASE and CRPS scores are reported per model. Additional examples, including covariate panels, are in `\Cref{app:time_series}`{=latex}.](figures/time_series/new_format/rohlik_orders_1D_s0.png){#fig:fev-bench-qualitative width="\\textwidth"}

Our checkpoint is evaluated with up to 32k historical time steps of context, well beyond the budgets typically used by patch- or window-based time-series foundation models. Compared to the original TabPFN-TS [@hoo2024tabpfn_ts] as evaluated by the fev-bench authors (39.6 SQL skill, 28.8 MASE skill; @shchur2025fev), our fine-tuned variant improves to **43.1 SQL skill** and **30.6 MASE skill**. On the full 100-task cohort it ranks 2nd on mean SQL skill scores (ahead of TiRex and TimesFM-2.5) and 2nd on MASE (ahead of TimesFM-2.5, which has $10\%$ flagged train/test leakage, and TiRex), in both cases behind only Chronos-2. Looking at the win-rate results, TabPFN-TS-3's ranking drops to the 4th place, although we found these rates to be very sensitive to tiny differences on a few datasets.

The strong performance of TabPFN-TS-3 is particularly noteworthy seeing that it is trained purely on synthetic data, while most other time-series models, including Chronos-2 [@ansari2025chronos2univariateuniversalforecasting], TiRex [@auer:25tirex] and TimesFM-2.5 [@pmlr-v235-das24c] are trained on real-world data. This property of our model prevents many issues from real-data pretraining: historical series are leaky and frequently recirculated across forecasting libraries (fev-bench flags 10% leakage in TimesFM-2.5 and 28% in Moirai-2.0; see Table `\ref{tab:fev-bench}`{=latex}), forecasting the future from historical pretraining is fundamentally out-of-distribution, and the supply of public real-world time-series data is finite, so any model relying on it inherits both its biases and its ceiling. Our synthetic prior by design has zero contamination from any specific real time series.

We also show qualitative examples in Figure `\ref{fig:fev-bench-qualitative}`{=latex} to give a better intuition of our model forecasts. Appendix `\ref{app:time_series}`{=latex} complements this section with the full leaderboards (Table `\ref{tab:fev-bench-full}`{=latex}), pairwise comparisons (Figure `\ref{fig:fev-bench-pairwise}`{=latex}), additional qualitative forecasts and per-task SQL results.

Relational Data {#subsec:results_rel}
---------------

```{=latex}
\centering
```
```{=latex}
\centering
```
![**`\ourmodel `{=latex}tops performance on RelBenchV1 among foundation models.** Following @kumorfmv2, we report the mean ROC AUC for entity classification and MAE scores for entity regression normalized by LightGBM's MAE. RelGNN [@relgnn] achieves SOTA performance on both tasks, followed by TabPFN-REL, which sets a new SOTA for foundation models. Methods marked with $^{*}$ in their name (KumoRFMv1, `\rtzero`{=latex}) indicate methods that are likely following a different evaluation protocol than the one outlined in RelBench, which overestimates model performance. ](figures/relbench/relbench_v1_classification_xticks.png){#fig:relbench width="\\textwidth"}

```{=latex}
\centering
```
![**`\ourmodel `{=latex}tops performance on RelBenchV1 among foundation models.** Following @kumorfmv2, we report the mean ROC AUC for entity classification and MAE scores for entity regression normalized by LightGBM's MAE. RelGNN [@relgnn] achieves SOTA performance on both tasks, followed by TabPFN-REL, which sets a new SOTA for foundation models. Methods marked with $^{*}$ in their name (KumoRFMv1, `\rtzero`{=latex}) indicate methods that are likely following a different evaluation protocol than the one outlined in RelBench, which overestimates model performance. ](figures/relbench/relbench_v1_regression.png){#fig:relbench width="\\textwidth"}

Real-world data is often relational: commercial enterprises, healthcare systems, and financial institutions routinely store their core operational data across multiple interconnected tables in relational databases. Unlocking predictive insights from such data is therefore of substantial practical importance, and requires to reason jointly over heterogeneous tables linked by complex foreign-key relationships. This has motivated the development of dedicated relational foundation models (RFMs) that aim to provide accurate, up-to-date predictions via In-Context Learning (ICL) without the need for costly per-task model training and hyperparameter tuning.

This has sparked the emergence of dedicated solutions for relational data, e.g., fully supervised solutions particularly tailored for relational data such as GraphSAGE [@graphsage], RelGT [@relgt] and RelGNN [@relgnn], closed-source relational foundation models like KumoRFMv1 [@kumorfmv1] and KumoRFMv2 [@kumorfmv2], as well as open-source RFMs, Griffin [@griffin] and `\rtzero`{=latex} [@rt_zero]. Recently, RDBLearn [@rdblearn] has shown that TFMs including TabPFN can be converted into RFMs by automatically flattening the underlying database into a table.

In this section, we build on this research and show how *TabPFN-REL* using TabPFN-3 achieves state of the art performance on the popular RelBenchV1 [@relbenchv1] benchmark for entity classification and regression.

For RelBench, we follow the general guidelines by truncating each database at the pre-specified test timestamp before constructing the featurization and context for all test entities. Following @kumorfmv2, we generally report baseline results as provided by the authors of the methods to ensure well-tuned baselines. For methods that likely follow a different evaluation regime, we rerun the evaluation using RelBench's data regime, falling back to author-reported numbers where rerunning is not possible due to model deprecation or missing checkpoints (as is the case for KumoRFMv1 and `\rtzero`{=latex}); we note that these may not be directly comparable due to potentially different data setups. For KumoRFMv2 we adapt the original scripts provided by the authors and use four estimators and a context size of $10000$ (the respective maxima for each), which we found to slightly outperform the script defaults of one estimator and a context size of $5000$ samples. We compare three different versions of RDBLearn: Vanilla RDBLearn that tunes over a range of different TFMs including TabPFN-2.5, as well as versions which forgo the tuning and use either TabPFN-2.5 or `\ourmodel `{=latex}as a fixed TFM.[^3]

#### TabPFN-REL sets a new state-of-the-art among RFMs.

We report the aggregate performance of the different RFMs and fully-supervised baselines in `\autoref{fig:relbench}`{=latex} both for entity classification and entity regression on RelBenchV1, as well as per-dataset results in `\autoref{sec:app/relbench/per-dataset-results}`{=latex}. *TabPFN-REL achieves state-of-the-art performance among RFMs on both tasks*, with KumoRFMv1/v2 coming second on regression/classification. We attribute KumoRFMv1's strong classification results in part to a potentially different evaluation regime used by the authors, which likely overestimates performance, especially on the `rel-f1` task suite. We also observe that RDBLearn with the fixed TabPFN-3 backend consistently outperforms the original RDBLearn, which itself tunes over various TFMs including TabPFN-2.5. RDBLearn using `\ourmodel `{=latex}hence Pareto-dominates vanilla RDBLearn in terms of runtime and performance, and to the best of our knowledge sets a *new state-of-the-art among open-source RFMs*. *At the time of writing, TabPFN-3 therefore powers both the best overall relational foundation model (TabPFN-REL) and the best open-source alternative (RDBLearn + v3).*

#### Comparison to fully-supervised baselines.

The fully-supervised RelGNN outperforms TabPFN-REL, with the gap being larger on classification than regression. On regression, the gap between RelGNN and TabPFN-REL is slim, with TabPFN-REL achieving lower mean rank than RelGNN. RelGT and GraphSAGE fall behind TabPFN-REL both in terms of normalized score and rank. We note that training supervised methods is several orders of magnitude more expensive than the in-context learning performed in TabPFN-REL [@rdblearn; @kumorfmv1; @kumorfmv2]. This is both because training a single supervised model takes significantly longer than the forward pass of TabPFN-REL, and because supervised methods require extensive per-dataset hyperparameter tuning to achieve optimal performance. For example, we identified at least seven axes of variability in RelGNN's per-dataset configs, yielding thousands of possible hyperparameter combinations to search over.

```{=latex}
\FloatBarrier
```
Causal Inference
----------------

We follow up on our previous results [@TabPFN-2.5], which showed strong performance of TabPFN-2.5 as a meta (T/X/S) learner [@kunzel_meta_learners] on the RealCause benchmark, by providing an evaluation on the `scikit-uplift` benchmark [@user-guide-for-uplift-modeling]. In terms of QINI-score, a real-world evaluation strategy for experimental data, we observe that all TabPFN-3 meta-learners improve over TabPFN-2.5, with the top two spots occupied by T and S-Learners (Figure `\ref{fig:causal_inference}`{=latex}). In contrast, we observe slightly worse performance compared to TabPFN-2.5 on RealCause [@neal_realcause]. We provide a more in-depth analysis of the results and description of the QINI evaluation protocol in Appendix `\ref{sec:causal_inference}`{=latex}.

Embeddings {#subsec:results_embed}
----------

Finally, we demonstrate that `\ourmodel `{=latex}generates semantically-meaningful embeddings. We follow the approach developed by @ye2026closer for TabPFN v2: we partition the dataset into cross-validation folds, and take the embeddings from the test-portion of the dataset in each fold. The embeddings we capture are the output of the ICL layers at the end of Stage 3 of our model (see Section /`\ref{sec:arch_overview}`{=latex} for more details). Figure `\ref{fig:embeddings-scatter}`{=latex} shows that this approach continues to work well for TabPFN-3, with the generated embeddings capturing the dataset structure.

```{=latex}
\centering
```
![ **TabPFN-3 extracts semantically-meaningful row embeddings.** The upper plots show 2D PCA applied directly to three classification datasets, where each point is a row, while the lower plots show PCA applied to embeddings of the rows. Color indicates the class. We observe that the embeddings are clustered by class. ](figures/embeddings/embeddings_scatter.png){#fig:embeddings-scatter}

Adoption {#sec:usecases_extensions}
========

TabPFN-3 is shipped into an already sprawling ecosystem. Since the v2 release, TabPFN has been picked up across academic ML research, applied science, and enterprise deployment. A substantial portion of the extension work referenced throughout this report (time-series, causal inference, relational data, interpretability) was driven by that community rather than initiated internally. This section describes the shape of that adoption -- where the model is in production, where it is being evaluated, which platforms make it accessible, and which research areas have published applications -- to give the v3 release its actual operational context.

Community and Open-Source Ecosystem
-----------------------------------

The open-source `tabpfn` package has surpassed 3.2 million PyPI downloads, and the original TabPFN Nature paper [@Hollmann2025tabpfnv2] has been cited in over 1,000 papers in the sixteen months since publication.[^4] A Discord community of over 2,000 users and hundreds of resolved GitHub issues have driven cross-platform stability work, edge-case fixes, and the maturation of the model from research artifact to production-grade library.

A separate `tabpfn-extensions` repository[^5] hosts community-driven extensions that compose with the core model: SHAP and SHAP-IQ interpretability, synthetic data generation and missing-value imputation, TabPFN-based feature selection, regression-via-classification, survival analysis and conditional randomization tests. TabPFN-3's reduced KV cache and inference improvements (Section `\ref{sec:methods}`{=latex}) directly accelerate every extension that depends on repeated forward passes -- most notably interpretability and conditional independence testing.

TabPFN also serves as a foundational layer for methods published as independent research, spanning time-series forecasting [@hoo2024tabpfn_ts], node classification on graphs [@Hayler2025GraphsTablesZeroShot; @eremeev2025turningtabularfoundationmodels], evolving data streams [@Lourenco2025ICLStreams], causal inference [@robertson_dopfn; @balazadeh_causalpfn; @feuerriegel_causalfm], reinforcement learning [@Schiff2025TabPFNRL], high-dimensional Bayesian optimization [@Yu2025GITBO], and multimodal encoding [@luo2025timetabpfnintegratedmultimodalengine]. As shown in Section `\ref{sec:results}`{=latex}, many of these extensions move further forward when run with TabPFN-3 as the backend rather than v2.5 or v2.6.

Enterprise Engagements
----------------------

TabPFN has been deployed and evaluated across a wide range of enterprise settings. Examples include: *Hitachi Rail* deploys TabPFN for predictive maintenance on the Spanish rail network; in initial deployment, TabPFN reduced root-mean-square error by approximately 40% compared to their existing baseline [@hitachi_case_study]. *Creditplus Bank*, part of the Crédit Agricole group, will use distilled TabPFN models (Section `\ref{sec:distillation}`{=latex}) for assisting CPU-based credit decisioning in motor finance under appropriate credit-risk regulatory constraints [@creditplus_case_study]. *Oxford Cancer Analytics* applies TabPFN to proteomic liquid-biopsy data for early lung-disease detection [@oxcan_case_study]. A longer list of enterprise and commercial engagements is available on the Prior Labs website.

Platform Availability
---------------------

TabPFN is available through the open-source PyPI distribution for evaluation and non-commercial use, and through a managed API for commercial workloads. The model is currently listed on the *AWS SageMaker Marketplace*[^6] and the *Azure AI Foundry Model Catalog*[^7], with full support for batch and real-time inference on classification and regression tasks; the TabPFN-3 release on both marketplaces follows this report. A reference integration for *Databricks* is available through the Databricks Industry Solutions repository[^8]. See Section `\ref{sec:license}`{=latex} for license terms, commercial-use scope, and the contact path for production deployment.

Research Adoption Across Domains
--------------------------------

In addition to commercial engagement, we have collected more than 200 published research applications of TabPFN across a broad range of areas; the full list is in Appendix `\ref{app:use_cases}`{=latex}.

Adoption is strongest in *healthcare and life sciences* (98 applications), reflecting TabPFN's relative advantage in data-scarce settings: diagnosis, prognosis, treatment-response prediction, biomarker modeling, survival analysis, drug discovery, pharmacokinetics, radiomics, omics, and multimodal clinical data. *Manufacturing and industrial* applications (41 papers) span concrete and asphalt strength prediction, geotechnical modeling, tunnel construction, steel and semiconductor properties, IIoT intrusion detection, rotating-machinery fault classification, battery and circuit modeling, and materials discovery. *Energy and utilities* (24 papers) cluster around environmental monitoring, renewable-energy and geophysical prediction, water and climate systems, and industrial process optimization. *Financial services* (7 papers) include transaction analytics, churn prediction, return forecasting, actuarial modeling, and credit-risk prediction; the relatively small published count almost certainly underrepresents commercial traction in a domain that publishes little. The remaining 32 applications span uncertainty estimation, hypothesis testing, Shapley value estimation, graph node classification, cybersecurity, geoscience, agriculture, soil and lunar-regolith analysis, fuel-blend prediction, crop-yield forecasting, forensic ancestry prediction, and synthetic tabular data generation.

The distribution of these applications -- weighted toward domains characterized by limited, expensive, or heterogeneous data -- is consistent with the regime TabPFN was designed for, and is the empirical basis for the v3 capability choices described in Section `\ref{sec:methods}`{=latex}.

License and Availability {#sec:license}
========================

We release TabPFN-3 under the `TABPFN-3.0 License v1.0`, designed to be permissive for academic use, research, and evaluation in commercial settings. The license *explicitly allows* testing, evaluation, and internal benchmarking, so an organization can download the model and run preliminary assessments on its own datasets without a commercial agreement.

The key restriction is that the model, its derivatives, and its outputs cannot be used for commercial or production purposes. This includes, but is not limited to, revenue-generating products, competitive benchmarking for procurement decisions, client deliverables, and using model outputs as inputs to internal commercial decision-making.

For production use, we offer a *Commercial Enterprise License*, available for our managed API, Virtual Private Cloud deployments (at the time of publication: AWS SageMaker & Azure AI Foundry), and on-prem or other custom deployment modes across other software platforms such as Databricks and SAP. The Commercial Enterprise License provides access to our proprietary high-speed inference engine, dedicated support, integration tooling, additional internal models, and the `\ourmodelenhanced `{=latex}variant, which is not available as part of the open-source release. The managed API runs on our optimized GPU infrastructure and is the recommended option for users without dedicated local GPUs; it is accessible via a Python SDK[^9] (`pip install tabpfn-client`) or a standard REST API.

The full `TABPFN-3.0 License v1.0` text is available at <https://huggingface.co/Prior-Labs/tabpfn_3/blob/main/LICENSE>. For commercial licensing inquiries, please contact <sales@priorlabs.ai>.

```{=latex}
\newpage
```
```{=latex}
\FloatBarrier
```
```{=latex}
\bibliographystyle{unsrtnat}
```
```{=latex}
\newpage
```
```{=latex}
\appendix
```
```{=latex}
\phantomsection
```
```{=latex}
\addcontentsline{toc}{section}{Appendix}
```
```{=latex}
\addtocontents{toc}{\protect\setcounter{tocdepth}{-10}}
```
Appendix Table of Contents {#appendix-table-of-contents .unnumbered}
==========================

```{=latex}
\startcontents[appendix]
```
```{=latex}
\makeatletter
```
```{=latex}
\protected@write
```
```{=latex}
\@auxout{}{\string\ttl@writefile{ptc}{\protect\setcounter{tocdepth}{2}}}
```
```{=latex}
\makeatother
```
```{=latex}
\printcontents[appendix]{}{1}{}
```
```{=latex}
\newpage
```
Contributors {#app:contributors}
============

#### Model Development & Deployment.

Noah Hollmann, Frank Hutter, Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Mihir Manium, Shi Bin (Liam) Hoo, Magnus Bühler, Anurag Garg, Dominik Safaric, Jake Robertson, Benjamin Jäger, Simone Alessi, Adrian Hayler, Vladyslav Moroshan, Lennart Purucker, Philipp Singer, Alan Arazi, Julien Siems, Jan Hendrik Metzen, Georg Grab, Nick Erickson, Siyuan Guo, Eliott Kalfon, Simon Bing, David Salinas `\vspace{-1em}`{=latex}

#### Distribution & Product.

Sauraj Gambhir, Clara Cornu, Lilly Charlotte Wehrhahn, Diana Kriuchkova `\vspace{-2em}`{=latex}

#### Operations.

Kursat Kaya, Lydia Sidhoum, Marie Salmon, Jerry Chen\
`\vspace{-0.5em}`{=latex} `\noindent`{=latex}*Authors are ordered by their date of joining Prior Labs; all authors above affiliated with Prior Labs at the time of contribution; work done at Prior Labs.*

```{=latex}
\vspace{0.75em}
```
#### Scientific Advisors.

Samuel Müller, Madelon Hulsebos, Yann LeCun, Bernhard Schölkopf `\vspace{0.75em}`{=latex}\
`\noindent`{=latex}*Scientific advisors did not contribute IP.*

```{=latex}
\vspace{1em}
```
Acknowledgements {#app:acknowledge}
================

We acknowledge the EuroHPC Joint Undertaking for awarding this project access to the EuroHPC supercomputer LUMI, hosted by CSC (Finland) and the LUMI consortium through a EuroHPC Regular Access call.

![image](figures/acknowledgements/LUMI_logo_dark.png){width="0.4\\linewidth"}

Architectural Hyperparameters {#app:architecture-hyperparams}
=============================

The tables below list the architectural hyperparameters of the released TabPFN-3 classifier and regressor checkpoints. The two models share all hyperparameters; the only differences are in the output decoder, which is task-specific (noted where applicable).

```{=latex}
\centering
```
```{=latex}
\small
```
  **Hyperparameter**                   **Value** **Description**
  ---------------------------------- ----------- ----------------------------------------------------
  `embed_dim`                                128 Base embedding dimension used throughout the model
  `feature_group_size`                         3 Features per circular-shift group
  `dist_embed_num_blocks`                      3 Induced self-attention blocks
  `dist_embed_num_heads`                       8 Attention heads per block
  `dist_embed_num_inducing_points`           128 Inducing points per column

  : Stage 1 --- Feature embedding.

```{=latex}
\centering
```
```{=latex}
\small
```
  **Hyperparameter**            **Value** **Description**
  --------------------------- ----------- ---------------------------------------------
  `feat_agg_num_blocks`                 3 Transformer blocks
  `feat_agg_num_heads`                  8 Attention heads per block
  `feat_agg_num_cls_tokens`             4 CLS tokens aggregated per row
  `use_rope`                         True Rotary positional embeddings (RoPE) enabled
  `feat_agg_rope_base`            100 000 RoPE base frequency $\theta$

  : Stage 2 --- Feature aggregation.

```{=latex}
\centering
```
```{=latex}
\small
```
  **Hyperparameter**          **Value** **Description**
  ------------------------- ----------- ----------------------------------------------------------------------------------
  `icl_emsize` (derived)            512 $\texttt{embed\_dim} \times \texttt{feat\_agg\_num\_cls\_tokens} = 128 \times 4$
  `nlayers`                          24 Transformer blocks
  `icl_num_heads`                     8 Query heads per block
  `icl_num_kv_heads`                  8 KV heads for train rows (standard MHA)
  `icl_num_kv_heads_test`             1 KV heads for test rows

  : Stage 3 --- ICL transformer.

```{=latex}
\centering
```
```{=latex}
\small
```
  **Hyperparameter**      **Value** **Description**
  --------------------- ----------- --------------------------------------
  `max_num_classes`             160 Maximum supported class count
  `decoder_num_heads`             6 Attention heads in retrieval decoder
  `decoder_head_dim`             64 Head dimension in retrieval decoder

  : Many class output decoder --- classifier.

```{=latex}
\centering
```
```{=latex}
\small
```
  **Hyperparameter**                         **Value** **Description**
  -------------------------- ------------------------- ----------------------------------------------------------------------------------------------------------------------------
  `architecture` (derived)     $512 \to 1024 \to 5000$ $\texttt{icl\_emsize} \to \texttt{icl\_emsize} \times \texttt{ff\_factor} \xrightarrow{\text{GELU}} \texttt{num\_buckets}$
  `num_buckets`                                   5000 Output buckets for quantile regression

  : MLP output decoder --- regressor (2-layer MLP).

```{=latex}
\centering
```
```{=latex}
\small
```
  **Hyperparameter**                   **Value** **Description**
  ---------------------------------- ----------- --------------------------------------------------
  `ff_factor`                                  2 Feed-forward expansion factor (all stages)
  `softmax_scaling_mlp_hidden_dim`            64 Hidden units in query-aware softmax-scaling MLPs

  : Shared settings (both classifier and regressor).

Prior visualizations
====================

We provide a number of illustrative visualizations for the improvements to our prior. Figure `\ref{fig:graph_sampling}`{=latex} shows directed acyclic graphs sampled by our new graph-sampling algorithms; Figure `\ref{fig:mechanisms}`{=latex} visualizes the functional relationships generated by the new combiner mechanisms; Figure `\ref{fig:dataset}`{=latex} gives an example classification dataset generated from the prior; and Figure `\ref{fig:ood}`{=latex} demonstrates TabPFN-3's extrapolation capabilities, comparing to CatBoost.

```{=latex}
\centering
```
![**Visualization of directed acyclic graphs underlying our SCM prior**, produced by our new graph sampling algorithms.](figures/prior/model_report_graphs.png){#fig:graph_sampling width="1\\linewidth"}

```{=latex}
\centering
```
![**Visualization of functional relationships generated by the new combiner mechanisms in our SCM prior.** While mechanisms in the prior have variable dimensionality, for the sake of visualization we plot functions on a two-dimensional grid.](figures/prior/model_report_mechanisms_raster.png){#fig:mechanisms width="1\\linewidth"}

```{=latex}
\centering
```
```{=latex}
\centering
```
![**Example classification dataset generated from the prior.** There are four covariates and the subplot in row i column j corresponds to a scatter plot of covariates i and j+1 with target class indicated by color.](figures/prior/dataset_15.png){#fig:dataset width="\\linewidth"}

```{=latex}
\centering
```
![**Example demonstrating the extrapolation capabilities of TabPFN-3 (using our out-of-distribution compatible preprocessing), comparing to CatBoost.** As can be seen, TabPFN-3 is able to extrapolate successfully, which tree-based algorithms and tabular foundation models often struggle with.](figures/prior/ood_overlay_small.png){#fig:ood width="1\\linewidth"}

Experimental results details
============================

Details on Causal Inference Results {#sec:causal_inference}
-----------------------------------

#### Causal Inference.

Many practical problems are rooted in causal logic, requiring an understanding of how interventions, rather than mere associations, shape outcomes. Estimating Conditional Average Treatment Effects (CATEs) serves as a primary tool for addressing these \"what-if\" scenarios, quantifying the expected change in an individual's response when a treatment is applied compared to when it is withheld. Previous results [@TabPFN-2.5] have shown that TabPFN-2.5, especially when used as a T-Learner [@kunzel_meta_learners], achieves SOTA performance on the RealCause benchmark [@neal_realcause]. While TabPFN-3 does not quite achieve the highest performance on RealCause (still surpassed by TabPFN-2.5), we see substantial improvements on larger datasets with up to 50k samples in the `scikit-uplift` library [@user-guide-for-uplift-modeling]. We describe the details of this evaluation below.

```{=latex}
\centering
```
![**TabPFN-3 as a T/X/S-Learner.** TabPFN-3 when used as a T/S-Learner achieves strong performance in terms of QINI-score ($\uparrow$) in Uplift Modeling on the `scikit-uplift` benchmark. We report worsened performance in terms of PEHE ($\downarrow$) on the RealCause benchmark compared to the previous version.](figures/causal/uplift.png "fig:"){#fig:causal_inference width="0.49\\linewidth"} ![**TabPFN-3 as a T/X/S-Learner.** TabPFN-3 when used as a T/S-Learner achieves strong performance in terms of QINI-score ($\uparrow$) in Uplift Modeling on the `scikit-uplift` benchmark. We report worsened performance in terms of PEHE ($\downarrow$) on the RealCause benchmark compared to the previous version.](figures/causal/RealCause.png "fig:"){#fig:causal_inference width="0.49\\linewidth"}

#### Real-World QINI Evaluation.

One of the major drawbacks in evaluating causal inference methods is referred to by @holland1986statistics as the Fundamental Problem of Causal Inference, which states that individual treatment effects can never actually be observed in the real-world. In simple terms, one cannot experimentally test both potential outcomes without interference. Under the assumption of experimental (RCT) data, Uplift Modeling [@user-guide-for-uplift-modeling] allows to evaluate the benefit of using a causal estimator in terms of ranking individuals by treatment effect. Crucially, this evaluation strategy does not require access to ground truth (synthetic) treatment effects, and serves as arguably the most real-world evaluation of CATE estimators, for example, when A/B testing data is available. Using only observed treatment and outcomes, one can compute the Area-Under the QINI Curve (AUC-QINI) to evaluate CATE estimators by their ability to identify individuals for which the treatment has a strong impact.

#### Strong Performance in Uplift Modeling.

We report the mean normalized AUC-QINI score for the T/X/S meta-learners using TabPFN-2.5 and 3 (Figure `\ref{fig:causal_inference}`{=latex}). TabPFN-3 used as an S and T-Learner achieves stronger performance than other baselines. We observe somewhat worsened performance on the RealCause benchmark [@neal_realcause], which is characterized by smaller sample sizes.

Detailed TabArena Results {#app:tabarena-detailed}
-------------------------

### Evaluation Metrics {#app:tabarena-metrics}

We re-use the official TabArena [@erickson2025tabarena] evaluation metrics and code for generating TabArena plots and tables.

**Elo**: Following TabArena, we evaluate models using the Elo rating system [@elo1967proposed]. Elo is a pairwise comparison-based rating system where each model's rating predicts its expected win probability against others, with a 400-point Elo gap corresponding to a 10 to 1 (91%) expected win rate. We calibrate 1000 Elo to the performance of the default TabArena random forest configuration across all figures, and perform 200 rounds of bootstrapping to obtain 95% confidence intervals, similar to what is done in ChatBot Arena [@chiang2024chatbot]. In our TabArena results, Elo scores are computed using ROC AUC for binary classification, log-loss for multiclass classification, and RMSE for regression.

**Improvability**: The improvability metric introduced in TabArena measures how many percent lower the error of the best method is than the current method on a dataset. This is then averaged over datasets. Formally, for a single dataset, $$\operatorname{Improvability} := \frac{\err_i - \besterr_i}{\err_i} \cdot 100\%~.$$ Improvability is always between $0\%$ and $100\%$.

### Experiment Details

For all TabArena results, we run experiments using the official TabArena code and evaluation pipeline. We will contribute a reproducible official TabArena submission for `\ourmodel `{=latex}shortly after it becomes publicly available. While not strictly necessary to make predictions on test data, we follow TabArena's fit time procedure of fitting an 8-fold bagged ensemble to generate a cross-validation score followed by refitting the model on the full training data at test time, as is done for the other tabular foundation models on TabArena.

All results for non-`\ourmodel `{=latex}models in our TabArena experiments were from the official TabArena reported results. All cached results from tabular foundation models (TabPFN-2.5, TabPFN-2.6, TabICLv2 and TabDPT) were run on a single H200 GPU, while all results for `\ourmodel `{=latex}and `\ourmodelenhanced `{=latex}were run on a single RTX 6000 GPU, a weaker GPU compared to an H200.

For both `\ourmodel `{=latex}and `\ourmodelenhanced`{=latex}, we ran all splits of TabArena, which includes a total of 816 tasks across 51 datasets. In all cases we report results for all splits of each dataset.

### TabArena Pareto Frontier Explanation {#app:tabarena_pareto_frontier_explanation}

Figure `\ref{fig:tabarena_pareto_medium}`{=latex} and Figure `\ref{fig:tabarena_pareto}`{=latex} show TabArena Pareto frontiers of models across Improvability and the median combined train + inference time per 1000 samples. The connected points for a given model type indicate tuning + ensembling performance with points from left to right marking ensembles of increasing numbers of random configurations (1, 2, 5, 10, 25, 50, 100, 150, 201). The trajectories are sampled 20 times from all trials and averaged. The left-most points use the default configuration, and the right-most highlighted points use all configurations.

### TabArena Leaderboard Tables {#sec:tabarena_leaderboard_tables}

We present the leaderboard tables for `\hyperref[tab:tabarena_table]{TabArena}`{=latex}, `\hyperref[tab:tabarena_medium_table]{TabArena-medium}`{=latex}, `\hyperref[tab:tabarena_small_table]{TabArena-small}`{=latex}, `\hyperref[tab:tabarena_classification_table]{TabArena-classification}`{=latex}, and `\hyperref[tab:tabarena_regression_table]{TabArena-regression}`{=latex}, below.

For all 5 views, `\ourmodel `{=latex}ranks highest among all models on TabArena, while `\ourmodelenhanced `{=latex}pushes even futher, strongly outperforming AutoGluon 1.5 extreme and ranking first in Elo, wins, and Improvability in every leaderboard.

```{=latex}
\centering
```
::: {#tab:tabarena_table}
  ----------------------------- ------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------
  **Model**                     **Elo ($\uparrow$)**                                   **\#wins ($\uparrow$)**                  **Improva-**                 **Train time**   **Predict time**
                                                                                                                          **bility ($\downarrow$)**        **per 1K \[s\]**   **per 1K \[s\]**
  TabPFN-3-Thinking             [**1800${}_{-72,+105}$**]{style="color: gold"}     [**13.2**]{style="color: gold"}     [**4.7%**]{style="color: gold"}                37.69               3.26
  AutoGluon 1.5 (extreme, 4h)   [**1695${}_{-68,+83}$**]{style="color: silver"}    [**5.8**]{style="color: bronze"}   [**5.7%**]{style="color: silver"}              289.07               4.03
  TabPFN-3 (D)                  [**1677${}_{-62,+86}$**]{style="color: bronze"}    [**6.3**]{style="color: silver"}   [**6.9%**]{style="color: bronze"}                2.31               0.74
  TabPFN-2.6 (D)                1623${}_{-56,+78}$                                               1.3                                8.7%                               5.48               0.55
  RealTabPFN-2.5 (T+E)          1602${}_{-62,+79}$                                               2.1                                8.3%                            2040.22               8.92
  TabICLv2 (D)                  1599${}_{-64,+77}$                                               5.3                                7.7%                               4.02               0.38
  RealTabPFN-2.5 (T)            1559${}_{-56,+69}$                                               1.4                                9.1%                            2040.22               1.22
  RealTabPFN-2.5 (D)            1526${}_{-48,+66}$                                               0.9                                9.5%                               5.81               0.64
  RealMLP (T+E)                 1514${}_{-45,+58}$                                               0.5                                11.2%                           2950.72              11.99
  TabDPT (T+E)                  1461${}_{-54,+63}$                                               2.0                                11.7%                           4907.64             286.65
  TabM (T+E)                    1449${}_{-44,+56}$                                               1.0                                12.6%                           3285.87               1.47
  LightGBM (T+E)                1438${}_{-31,+36}$                                               0.1                                13.6%                            416.98               2.64
  RealMLP (T)                   1433${}_{-47,+48}$                                               0.4                                12.5%                           2950.72               0.66
  CatBoost (T+E)                1420${}_{-42,+41}$                                               0.1                                13.2%                           1658.41               0.65
  CatBoost (T)                  1410${}_{-45,+41}$                                               0.5                                13.4%                           1658.41               0.08
  TabDPT (T)                    1405${}_{-56,+60}$                                               0.7                                12.9%                           4907.64              39.96
  TabM (T)                      1392${}_{-43,+54}$                                               0.3                                13.5%                           3285.87               0.17
  LightGBM (T)                  1390${}_{-29,+33}$                                               0.0                                14.3%                            416.98               0.33
  XGBoost (T+E)                 1379${}_{-35,+34}$                                               0.1                                14.4%                            693.49               1.69
  CatBoost (D)                  1371${}_{-44,+40}$                                               0.2                                14.2%                              6.83               0.08
  XGBoost (T)                   1354${}_{-35,+33}$                                               0.0                                14.7%                            693.49               0.31
  TabDPT (D)                    1326${}_{-56,+68}$                                               0.3                                15.3%                             47.62              43.74
  TabM (D)                      1299${}_{-44,+49}$                                               0.2                                15.7%                             10.49               0.13
  RealMLP (D)                   1234${}_{-37,+38}$                                               0.1                                17.1%                             10.06               1.69
  XGBoost (D)                   1215${}_{-38,+39}$                                               0.0                                17.5%                              1.94               0.12
  LightGBM (D)                  1189${}_{-29,+34}$                                               0.0                                18.0%                              1.96               0.14
  ----------------------------- ------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------

  : **TabArena leaderboard using all 51 datasets with 816 total tasks.**
:::

```{=latex}
\centering
```
::: {#tab:tabarena_medium_table}
  ----------------------------- --------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------
  **Model**                     **Elo ($\uparrow$)**                                     **\#wins ($\uparrow$)**                  **Improva-**                 **Train time**   **Predict time**
                                                                                                                            **bility ($\downarrow$)**        **per 1K \[s\]**   **per 1K \[s\]**
  TabPFN-3-Thinking             [**2146${}_{-87,+121}$**]{style="color: gold"}        [**6.2**]{style="color: gold"}     [**1.3%**]{style="color: gold"}                15.10               2.15
  AutoGluon 1.5 (extreme, 4h)   [**1907${}_{-50,+92}$**]{style="color: silver"}                    1.4                  [**3.4%**]{style="color: silver"}              191.18               2.21
  TabPFN-3 (D)                  [**1835${}_{-137,+224}$**]{style="color: bronze"}    [**3.3**]{style="color: silver"}   [**4.1%**]{style="color: bronze"}                0.83               0.27
  TabPFN-2.6 (D)                1741${}_{-72,+121}$                                                0.0                                6.4%                               2.76               0.70
  TabICLv2 (D)                  1712${}_{-108,+208}$                                 [**1.9**]{style="color: bronze"}                 5.3%                               0.76               0.14
  RealTabPFN-2.5 (T+E)          1663${}_{-111,+149}$                                               0.0                                7.2%                             735.58              11.74
  RealMLP (T+E)                 1645${}_{-94,+91}$                                                 0.0                                7.4%                            1719.82               1.67
  CatBoost (T+E)                1625${}_{-64,+86}$                                                 0.0                                7.4%                             777.59               0.25
  CatBoost (T)                  1616${}_{-67,+95}$                                                 0.3                                7.6%                             777.59               0.05
  RealTabPFN-2.5 (T)            1612${}_{-103,+130}$                                               0.1                                7.9%                             735.58               1.39
  LightGBM (T+E)                1604${}_{-56,+70}$                                                 0.0                                9.2%                             131.56               2.64
  CatBoost (D)                  1576${}_{-106,+105}$                                               0.1                                7.8%                               3.24               0.03
  XGBoost (T+E)                 1565${}_{-61,+90}$                                                 0.1                                9.3%                             282.13               0.56
  RealMLP (T)                   1554${}_{-85,+106}$                                                0.0                                8.7%                            1719.82               0.08
  TabM (T+E)                    1538${}_{-90,+157}$                                                0.7                                9.1%                            1993.14               0.62
  RealTabPFN-2.5 (D)            1536${}_{-90,+141}$                                                0.0                                8.7%                               1.88               0.64
  TabDPT (T+E)                  1533${}_{-124,+142}$                                               0.8                                8.8%                            4786.55             444.54
  LightGBM (T)                  1515${}_{-59,+80}$                                                 0.0                                10.3%                            131.56               0.13
  XGBoost (T)                   1514${}_{-56,+69}$                                                 0.0                                9.8%                             282.13               0.07
  TabM (T)                      1489${}_{-90,+158}$                                                0.0                                9.9%                            1993.14               0.06
  TabDPT (T)                    1411${}_{-125,+121}$                                               0.0                                11.3%                           4786.55              42.64
  XGBoost (D)                   1375${}_{-115,+101}$                                               0.0                                11.7%                              0.49               0.05
  TabDPT (D)                    1336${}_{-144,+131}$                                               0.0                                14.0%                             46.62              43.74
  TabM (D)                      1330${}_{-101,+123}$                                               0.0                                12.6%                              5.16               0.07
  RealMLP (D)                   1280${}_{-71,+79}$                                                 0.0                                13.7%                              6.75               0.23
  LightGBM (D)                  1263${}_{-63,+55}$                                                 0.0                                13.5%                              0.29               0.04
  ----------------------------- --------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------

  : **TabArena-medium leaderboard on the 15 largest datasets in TabArena**, with 10k--100k training samples, evaluated on the full 135 tasks with 9 splits per dataset.
:::

```{=latex}
\centering
```
::: {#tab:tabarena_small_table}
  ----------------------------- ------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------
  **Model**                     **Elo ($\uparrow$)**                                   **\#wins ($\uparrow$)**                  **Improva-**                 **Train time**   **Predict time**
                                                                                                                          **bility ($\downarrow$)**        **per 1K \[s\]**   **per 1K \[s\]**
  TabPFN-3-Thinking             [**1723${}_{-60,+100}$**]{style="color: gold"}      [**7.0**]{style="color: gold"}     [**6.1%**]{style="color: gold"}                52.78               3.40
  AutoGluon 1.5 (extreme, 4h)   [**1641${}_{-57,+79}$**]{style="color: silver"}    [**4.4**]{style="color: silver"}   [**6.6%**]{style="color: silver"}              346.57               6.56
  TabPFN-3 (D)                  [**1638${}_{-58,+85}$**]{style="color: bronze"}                  2.9                  [**8.1%**]{style="color: bronze"}                4.84               1.54
  RealTabPFN-2.5 (T+E)          1598${}_{-64,+97}$                                               2.1                                8.7%                            2289.05               8.05
  TabPFN-2.6 (D)                1596${}_{-49,+74}$                                               1.3                                9.7%                               7.03               0.55
  TabICLv2 (D)                  1574${}_{-83,+105}$                                [**3.4**]{style="color: bronze"}                 8.7%                               7.06               0.67
  RealTabPFN-2.5 (T)            1556${}_{-58,+75}$                                               1.2                                9.5%                            2289.05               1.14
  RealTabPFN-2.5 (D)            1542${}_{-52,+83}$                                               0.9                                9.9%                               6.76               0.64
  RealMLP (T+E)                 1482${}_{-47,+63}$                                               0.5                                12.7%                           3770.75              21.90
  TabDPT (T+E)                  1448${}_{-59,+76}$                                               1.2                                12.9%                           5119.36             218.71
  TabM (T+E)                    1430${}_{-52,+57}$                                               0.4                                14.0%                           3553.12               1.74
  TabDPT (T)                    1414${}_{-60,+72}$                                               0.7                                13.6%                           5119.36              28.35
  RealMLP (T)                   1402${}_{-42,+55}$                                               0.4                                14.2%                           3770.75               1.78
  LightGBM (T+E)                1392${}_{-34,+37}$                                               0.1                                15.5%                            892.41               2.57
  TabM (T)                      1368${}_{-53,+56}$                                               0.3                                15.0%                           3553.12               0.24
  CatBoost (T+E)                1362${}_{-43,+46}$                                               0.1                                15.6%                           2476.51               0.81
  LightGBM (T)                  1357${}_{-30,+36}$                                               0.0                                15.9%                            892.41               0.35
  CatBoost (T)                  1351${}_{-35,+48}$                                               0.1                                15.8%                           2476.51               0.10
  TabDPT (D)                    1331${}_{-67,+74}$                                               0.3                                15.9%                             50.32              43.71
  XGBoost (T+E)                 1326${}_{-37,+34}$                                               0.0                                16.5%                            884.18               2.37
  CatBoost (D)                  1312${}_{-35,+35}$                                               0.1                                16.9%                              9.64               0.13
  XGBoost (T)                   1309${}_{-39,+32}$                                               0.0                                16.7%                            884.18               0.39
  TabM (D)                      1296${}_{-47,+54}$                                               0.2                                17.0%                             13.18               0.17
  RealMLP (D)                   1224${}_{-42,+37}$                                               0.1                                18.5%                             15.69               4.69
  LightGBM (D)                  1169${}_{-40,+42}$                                               0.0                                19.9%                              3.61               0.17
  XGBoost (D)                   1165${}_{-38,+30}$                                               0.0                                19.9%                              3.29               0.25
  ----------------------------- ------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------

  : **TabArena-small leaderboard on the 36 smallest datasets in TabArena**, with 500--10k training samples, evaluated on the full 681 tasks.
:::

```{=latex}
\centering
```
::: {#tab:tabarena_classification_table}
  ----------------------------- ------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------
  **Model**                     **Elo ($\uparrow$)**                                   **\#wins ($\uparrow$)**                  **Improva-**                 **Train time**   **Predict time**
                                                                                                                          **bility ($\downarrow$)**        **per 1K \[s\]**   **per 1K \[s\]**
  TabPFN-3-Thinking             [**1782${}_{-72,+109}$**]{style="color: gold"}     [**10.0**]{style="color: gold"}     [**6.0%**]{style="color: gold"}                35.70               3.00
  AutoGluon 1.5 (extreme, 4h)   [**1689${}_{-82,+96}$**]{style="color: silver"}    [**4.8**]{style="color: silver"}   [**6.5%**]{style="color: silver"}              267.31               3.98
  TabPFN-3 (D)                  [**1660${}_{-75,+91}$**]{style="color: bronze"}                  3.7                  [**8.7%**]{style="color: bronze"}                2.43               0.75
  TabPFN-2.6 (D)                1604${}_{-69,+69}$                                               0.5                                10.6%                              5.17               0.54
  TabICLv2 (D)                  1593${}_{-75,+94}$                                 [**4.1**]{style="color: bronze"}                 9.3%                               4.15               0.41
  RealTabPFN-2.5 (T+E)          1578${}_{-75,+76}$                                               1.7                                10.2%                           2046.25               8.98
  RealTabPFN-2.5 (T)            1554${}_{-66,+72}$                                               1.2                                11.0%                           2046.25               1.33
  RealTabPFN-2.5 (D)            1539${}_{-63,+69}$                                               0.9                                11.2%                              5.76               0.79
  RealMLP (T+E)                 1492${}_{-45,+63}$                                               0.3                                13.5%                           2879.46              12.49
  TabM (T+E)                    1464${}_{-48,+75}$                                               1.0                                14.8%                           2466.21               1.50
  LightGBM (T+E)                1436${}_{-37,+48}$                                               0.1                                15.7%                            382.05               1.49
  RealMLP (T)                   1413${}_{-47,+55}$                                               0.4                                15.0%                           2879.46               0.60
  CatBoost (T+E)                1412${}_{-47,+55}$                                               0.1                                15.2%                           1372.94               0.56
  TabM (T)                      1411${}_{-58,+71}$                                               0.3                                15.6%                           2466.21               0.18
  TabDPT (T+E)                  1411${}_{-56,+80}$                                               0.5                                14.5%                           4940.61             307.75
  CatBoost (T)                  1404${}_{-45,+54}$                                               0.4                                15.4%                           1372.94               0.07
  LightGBM (T)                  1392${}_{-33,+43}$                                               0.0                                16.4%                            382.05               0.25
  XGBoost (T+E)                 1382${}_{-48,+50}$                                               0.1                                16.5%                            685.87               1.45
  CatBoost (D)                  1381${}_{-46,+46}$                                               0.2                                16.0%                              5.72               0.08
  XGBoost (T)                   1356${}_{-40,+45}$                                               0.0                                16.8%                            685.87               0.21
  TabDPT (T)                    1351${}_{-58,+66}$                                               0.6                                16.0%                           4940.61              41.61
  TabM (D)                      1315${}_{-48,+56}$                                               0.2                                18.0%                             10.21               0.14
  TabDPT (D)                    1270${}_{-57,+62}$                                               0.3                                18.9%                             49.21              43.82
  RealMLP (D)                   1244${}_{-34,+39}$                                               0.1                                19.6%                             10.47               1.71
  XGBoost (D)                   1231${}_{-50,+47}$                                               0.0                                19.6%                              1.77               0.12
  LightGBM (D)                  1192${}_{-40,+49}$                                               0.0                                20.6%                              1.79               0.12
  ----------------------------- ------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------

  : **TabArena-classification leaderboard on the 38 classification datasets in TabArena.**
:::

```{=latex}
\centering
```
::: {#tab:tabarena_regression_table}
  ----------------------------- --------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------
  **Model**                     **Elo ($\uparrow$)**                                     **\#wins ($\uparrow$)**                  **Improva-**                 **Train time**   **Predict time**
                                                                                                                            **bility ($\downarrow$)**        **per 1K \[s\]**   **per 1K \[s\]**
  TabPFN-3-Thinking             [**1959${}_{-150,+211}$**]{style="color: gold"}       [**3.2**]{style="color: gold"}     [**0.9%**]{style="color: gold"}                43.00               3.26
  TabPFN-3 (D)                  [**1827${}_{-142,+255}$**]{style="color: silver"}    [**2.5**]{style="color: silver"}   [**1.6%**]{style="color: silver"}                1.69               0.57
  AutoGluon 1.5 (extreme, 4h)   [**1804${}_{-97,+133}$**]{style="color: bronze"}                   1.1                                3.2%                             335.03               4.33
  TabPFN-2.6 (D)                1776${}_{-71,+131}$                                                0.8                                3.3%                               8.52               0.70
  RealTabPFN-2.5 (T+E)          1774${}_{-107,+174}$                                               0.5                  [**2.6%**]{style="color: bronze"}             1709.05               8.12
  TabDPT (T+E)                  1748${}_{-92,+171}$                                  [**1.5**]{style="color: bronze"}                 3.5%                            4786.55             239.54
  TabICLv2 (D)                  1700${}_{-159,+293}$                                               1.2                                3.2%                               2.10               0.25
  TabDPT (T)                    1696${}_{-79,+134}$                                                0.1                                3.9%                            4786.55              38.50
  RealMLP (T+E)                 1677${}_{-68,+126}$                                                0.2                                4.3%                            3995.01              10.05
  RealTabPFN-2.5 (T)            1654${}_{-113,+165}$                                               0.2                                3.4%                            1709.05               0.81
  TabDPT (D)                    1604${}_{-72,+153}$                                                0.0                                4.9%                              46.62              39.21
  RealMLP (T)                   1574${}_{-84,+114}$                                                0.0                                5.3%                            3995.01               0.84
  RealTabPFN-2.5 (D)            1558${}_{-110,+159}$                                               0.0                                4.9%                               7.04               0.51
  CatBoost (T+E)                1513${}_{-73,+113}$                                                0.0                                7.3%                            3552.96               0.97
  LightGBM (T+E)                1509${}_{-90,+107}$                                                0.0                                7.7%                             700.15               9.32
  CatBoost (T)                  1489${}_{-78,+119}$                                                0.1                                7.4%                            3552.96               0.10
  TabM (T+E)                    1463${}_{-96,+147}$                                                0.0                                6.2%                            4158.29               1.41
  LightGBM (T)                  1440${}_{-77,+119}$                                                0.0                                8.3%                             700.15               0.97
  XGBoost (T+E)                 1424${}_{-52,+72}$                                                 0.0                                8.2%                             834.93               2.61
  XGBoost (T)                   1403${}_{-60,+85}$                                                 0.0                                8.4%                             834.93               0.39
  CatBoost (D)                  1389${}_{-92,+107}$                                                0.0                                8.9%                              10.89               0.09
  TabM (T)                      1381${}_{-101,+147}$                                               0.0                                7.1%                            4158.29               0.17
  TabM (D)                      1284${}_{-118,+126}$                                               0.0                                8.8%                              13.32               0.13
  RealMLP (D)                   1235${}_{-81,+105}$                                                0.0                                9.8%                               8.90               1.64
  LightGBM (D)                  1210${}_{-35,+40}$                                                 0.0                                10.7%                              2.11               0.27
  XGBoost (D)                   1190${}_{-78,+99}$                                                 0.0                                11.3%                              2.24               0.24
  ----------------------------- --------------------------------------------------- ---------------------------------- ----------------------------------- ------------------ ------------------

  : **TabArena-regression leaderboard on the 13 regression datasets in TabArena.**
:::

Details on TALENT benchmark results {#app:TALENT}
-----------------------------------

### Benchmark description

TALENT [@talent_benchmark_jmlr] base contains 300 datasets (120 binary, 80 multiclass, 100 regression). Each dataset is split into 64% training, 16% validation, and 20% test sets.

#### Baselines.

We rely on precomputed baselines provided by the authors of the TALENT benchmark [@talent_benchmark_jmlr] (for the TALENT extensions which we use for the large-rows slice and the many-class slice) or the TabICLv2 paper [@qu2026tabiclv2] (for the main TALENT slice).

#### Metrics.

Following the TALENT paper and [@qu2026tabiclv2], we use accuracy for classification and rmse for regression.

#### Datasets.

Following [@qu2026tabiclv2], we exclude the 26 development datasets used for TabPFN-2 / TabICLv2 development from the main TALENT benchmark.

### Per-task-type breakdown {#app:TALENT-per-task}

```{=latex}
\centering
```
![**Average rank on the TALENT benchmark broken down by task type (regression, binary classification, multiclass classification), using the TabICLv2 evaluation protocol from @qu2026tabiclv2.** Bars show mean rank (lower is better); error bars are 95% bootstrap confidence intervals over datasets. Hatched bars mark methods with KNN-imputed scores. TabPFN-2.5, LimiX, and TabPFNv2 share a 10-class cap, so their scores on the 12 multiclass datasets with $>$10 classes are KNN-imputed.](figures/TALENT/by_task_average_rank_paper.png){#fig:per-task-TALENT-rank width="\\linewidth"}

### Many-class TALENT subset {#app:TALENT-many-class}

We report results on the subset of TALENT [@talent_benchmark_jmlr] datasets with more than $50$ classes, which yields 4 datasets with 100 classes, including 3 from the same family. While limited in number, these complement the results on synthetic data from Section `\ref{sec:many-class-eval}`{=latex}. Results are shown in Figure `\ref{fig:talent_many_class}`{=latex}.

```{=latex}
\centering
```
```{=latex}
\centering
```
```{=latex}
\footnotesize
```
  Dataset                          Classes   Samples   Feat.
  ------------------------------ --------- --------- -------
  `one-hundred-plants-margin`          100     1,600      64
  `one-hundred-plants-shape`           100     1,600      64
  `one-hundred-plants-texture`         100     1,599      64
  `helena`                             100    65,196      27

```{=latex}
\hfill
```
```{=latex}
\centering
```
![**Average rank on the many-classes TALENT slice (4 datasets, all 100 classes).** Three are the `one-hundred-plants` variants (margin / shape / texture, $\approx$1.6 k samples each) and one is `helena` (65 k samples).](figures/TALENT/many_classes_average_rank_paper_half.png){#fig:talent_many_class width="\\linewidth"}

### Large rows subset

We report here the list of datasets in the large-rows subset of TALENT we use in Section `\ref{sec:large-data}`{=latex}. The datasets are filtered for $>$100k samples and $\leq$1M training samples from the TALENT base and large extension. We report the model ranking in Figure `\ref{fig:large_rows_rank}`{=latex}.

```{=latex}
\centering
```
```{=latex}
\vspace{0pt}
```
```{=latex}
\centering
```
```{=latex}
\scriptsize
```
```{=latex}
\resizebox{\linewidth}{!}{%
\begin{tabular}{lrrl}
\toprule
Dataset & Samples & Feat. & Task \\
\midrule
\texttt{microsoft} & 1{,}200{,}192 & 136 & Reg. \\
\texttt{poker-hand} & 1{,}025{,}009 & 10 & Multi. \\
\texttt{BNG(credit-a)} & 1{,}000{,}000 & 15 & Binary \\
\texttt{Higgs} & 1{,}000{,}000 & 28 & Binary \\
\texttt{Smoking\_and\_Drinking\_Dataset\_with\_body\_signal} & 991{,}346 & 23 & Binary \\
\texttt{yahoo} & 709{,}877 & 699 & Reg. \\
\texttt{Data\_Science\_for\_Good\_Kiva\_Crowdfunding} & 671{,}205 & 11 & Multi. \\
\texttt{covertype} & 581{,}012 & 54 & Multi. \\
\texttt{CDC\_Diabetes\_Health\_Indicators} & 253{,}680 & 21 & Binary \\
\texttt{accelerometer} & 153{,}004 & 4 & Multi. \\
\texttt{walking-activity} & 149{,}332 & 4 & Multi. \\
\texttt{Rain\_in\_Australia} & 145{,}460 & 18 & Multi. \\
\texttt{customer\_satisfaction\_in\_airline} & 129{,}880 & 21 & Binary \\
\texttt{diabetes\_130-us\_hospitals} & 101{,}766 & 20 & Binary \\
\bottomrule
\end{tabular}%
}
```
```{=latex}
\vspace{3.5em}
```
```{=latex}
\captionof{table}{Datasets in the large-rows TALENT slice.}
```
`\label{tab:large_rows_datasets}`{=latex}

```{=latex}
\hfill
```
```{=latex}
\vspace{0pt}
```
```{=latex}
\centering
```
![image](figures/TALENT/talent_large_rows_average_rank_paper_half.png){width="0.82\\linewidth"}

```{=latex}
\captionof{figure}{Average rank on the large-rows (100k-1M rows) TALENT slice.}
```
`\label{fig:large_rows_rank}`{=latex}

### Details

#### Per-dataset ranking.

For each (dataset, split) we rank all methods by their score (best $=1$; ties get average ranks). The reported *mean rank* of a method is the average of these for ranks across all (dataset, split) pairs in the slice.

#### Bootstrap confidence intervals.

95% confidence intervals are non-parametric bootstrap over datasets: for each of $B = 2{,}000$ replicates we resample the (dataset, split) pairs with replacement and recompute each method's mean rank, then take the empirical $2.5/97.5$ percentiles across replicates.

Details on TabSTAR Text-Tabular Benchmark results {#app:TABSTAR}
-------------------------------------------------

The TabSTAR benchmark is a union of previous text-tabular benchmarks: the Multimodal AutoML Benchmark [@shi2021benchmarking], @grinsztajn2023vectorizing, and CARTE [@kim2024carte]. After deduplication and exclusion of unavailable datasets, the final benchmark contains 50 datasets: 15 classification and 35 regression tasks.[^10] Each model is run 5 times, with per-task metrics AUROC (binary classification), log-loss (multiclass), and RMSE (regression); results are normalized with MinMax scaling to the $[0,1]$ range. As in the original paper [@arazi_tabstar_2025], we limit each run to up to 100,000 examples. Figure `\ref{fig:text_leaderboard_cls}`{=latex} shows the results for classification, for which the TabSTAR model was reportedly the state of the art; we see that the TabPFN API family significantly outperforms it. Figure `\ref{fig:text_leaderboard_reg}`{=latex} shows the equivalent regression performance.

```{=latex}
\centering
```
![**Performance on the classification tasks of the TabSTAR text-tabular collection.** `\ourmodelenhanced `{=latex}and TabPFN-3-Plus significantly outperform the text-aware TabSTAR, which was otherwise the state-of-the-art reference for this task type.](figures/text_leaderboard/tabstar_classification.png){#fig:text_leaderboard_cls width="0.8\\linewidth"}

```{=latex}
\centering
```
![**Performance on the regression tasks of the TabSTAR text-tabular collection.** `\ourmodelenhanced `{=latex}and TabPFN-3-Plus significantly outperform all baselines.](figures/text_leaderboard/tabstar_regression.png){#fig:text_leaderboard_reg width="0.8\\linewidth"}

Per-dataset results on RelBenchV1 {#sec:app/relbench/per-dataset-results}
---------------------------------

We report per-dataset results for entity regression and entity classification as well as aggregate metrics in `\autoref{tab:relbench/classification}`{=latex} and `\autoref{tab:relbench/regression}`{=latex}, respectively.

```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{2pt}
```
```{=latex}
\resizebox{\textwidth}{!}{%
\begin{tabular}{lrrrrrrrrrrrrrr}
\toprule
Method & \multicolumn{2}{c}{\texttt{f1}} & \multicolumn{2}{c}{\texttt{avito}} & \multicolumn{2}{c}{\texttt{event}} & \multicolumn{1}{c}{\texttt{trial}} & \multicolumn{2}{c}{\texttt{amazon}} & \multicolumn{2}{c}{\texttt{stack}} & \multicolumn{1}{c}{\texttt{hm}} & Avg AUROC $\uparrow$ & Rank $\downarrow$ \\
 & \texttt{\scriptsize dnf} & \texttt{\scriptsize top3} & \texttt{\scriptsize click} & \texttt{\scriptsize visit} & \texttt{\scriptsize repeat} & \texttt{\scriptsize ignore} & \texttt{\scriptsize out} & \texttt{\scriptsize user} & \texttt{\scriptsize item} & \texttt{\scriptsize eng} & \texttt{\scriptsize badge} & \texttt{\scriptsize churn} &  &  \\
\midrule
RelGNN & 75.29 & 85.69 & 68.23 & 66.18 & \textbf{79.61} & \underline{86.18} & 71.24 & \textbf{70.99} & \underline{82.64} & \textbf{90.75} & \textbf{88.98} & \textbf{70.93} & \textbf{78.06} & \textbf{2.83} \\
RelGT & 75.87 & 83.52 & 68.30 & \underline{66.78} & 76.09 & 81.57 & 68.61 & 70.39 & 82.55 & 90.53 & 86.32 & 69.27 & 76.65 & 4.50 \\
GraphSAGE & 72.62 & 75.54 & 65.90 & 66.20 & 76.89 & 81.62 & 68.60 & \underline{70.42} & \textbf{82.81} & 90.59 & \underline{88.86} & 69.88 & 75.83 & 5.17 \\
\addlinespace[3pt]
KumoRFMv1$^{*}$ & \textbf{82.41} & \textbf{91.07} & 64.85 & 64.11 & 76.08 & \textbf{89.20} & 70.79 & 67.29 & 79.93 & 87.09 & 80.00 & 67.71 & 76.71 & 6.75 \\
Griffin & 57.70 & 82.50 & 45.90 & 60.70 & 71.88 & 83.27 & 51.00 & 62.30 & 69.00 & 77.50 & 73.50 & 60.20 & 66.29 & 9.92 \\
RT$_{\text{zero}}^{*}$ & \underline{81.20} & \underline{89.30} & 59.50 & 61.80 & 73.22 & 77.47 & 51.80 & 64.00 & 70.90 & 75.70 & 80.10 & 62.80 & 70.65 & 8.67 \\
RDBLearn & 70.87 & 79.69 & \underline{69.04} & 65.49 & 75.04 & 82.52 & 71.58 & 67.57 & 82.07 & 89.39 & 85.26 & 68.05 & 75.55 & 6.83 \\
\addlinespace[3pt]
RDBLearn + v2.5 & 71.72 & 77.60 & 65.72 & 66.47 & 75.55 & 78.65 & \underline{72.90} & 69.74 & 82.18 & 90.23 & 82.81 & 70.11 & 75.31 & 6.46 \\
RDBLearn + v3 & 71.72 & 82.72 & \textbf{69.06} & 66.76 & 76.81 & 73.70 & 72.89 & 69.35 & 82.46 & 90.59 & 85.98 & 70.06 & 76.01 & 4.83 \\
\addlinespace[3pt]
KumoRFMv2 & 72.03 & 82.09 & 67.42$^{*}$ & \textbf{69.41$^{*}$} & \underline{79.34} & 78.86 & 72.03$^{*}$ & 67.71 & 80.18 & 88.69 & 85.40 & 67.81 & 75.91 & 5.75 \\
\addlinespace[3pt]
\textbf{TabPFN-REL} & 70.74 & 79.98 & 67.09 & 66.68 & 77.11 & 85.38 & \textbf{76.43} & 70.27 & \textbf{82.81} & \underline{90.66} & 85.17 & \underline{70.55} & \underline{76.91} & \underline{4.29} \\
\bottomrule
\end{tabular}}
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{2pt}
```
```{=latex}
\resizebox{\textwidth}{!}{%
\begin{tabular}{lrrrrrrrrrrr}
\toprule
Method & \multicolumn{1}{c}{\texttt{f1}} & \multicolumn{1}{c}{\texttt{avito}} & \multicolumn{1}{c}{\texttt{event}} & \multicolumn{2}{c}{\texttt{trial}} & \multicolumn{2}{c}{\texttt{amazon}} & \multicolumn{1}{c}{\texttt{stack}} & \multicolumn{1}{c}{\texttt{hm}} & Avg $S_{\text{KumoNorm}}$ $\downarrow$ & Rank $\downarrow$ \\
 & \texttt{\scriptsize pos} & \texttt{\scriptsize ctr} & \texttt{\scriptsize attend} & \texttt{\scriptsize adverse} & \texttt{\scriptsize succ} & \texttt{\scriptsize user} & \texttt{\scriptsize item} & \texttt{\scriptsize votes} & \texttt{\scriptsize sales} &  &  \\
\midrule
RelGNN & 3.798 & 0.037 & \underline{0.238} & 44.461 & \textbf{0.301} & \textbf{14.230} & 48.767 & \textbf{0.065} & 0.054 & \textbf{0.861} & \underline{3.72} \\
RelGT & 3.917 & 0.035 & 0.250 & 43.992 & \underline{0.326} & \underline{14.267} & 48.922 & \textbf{0.065} & 0.054 & 0.870 & 4.67 \\
GraphSAGE & 4.022 & 0.041 & 0.258 & 44.473 & 0.400 & 14.313 & 50.053 & \textbf{0.065} & 0.056 & 0.918 & 6.39 \\
\addlinespace[3pt]
KumoRFMv1$^{*}$ & \textbf{2.747} & 0.035 & 0.264 & 58.231 & 0.417 & 16.161 & 55.254 & \textbf{0.065} & \textbf{0.040} & 0.908 & 6.06 \\
Griffin & 4.460 & 0.050 & 0.461 & 78.232 & 0.463 & 35.590 & 53.214 & 0.092 & 0.151 & 1.471 & 10.56 \\
RT$_{\text{zero}}^{*}$ & \underline{2.901} & 0.058 & 0.379 & 73.999 & 0.455 & 18.802 & 57.996 & 0.110 & 0.089 & 1.240 & 9.44 \\
RDBLearn & 3.834 & 0.034 & \textbf{0.237} & 43.913 & 0.424 & 14.540 & 48.559 & \underline{0.068} & 0.064 & 0.906 & 5.11 \\
\addlinespace[3pt]
RDBLearn + v2.5 & 3.930 & 0.034 & 0.243 & 43.409 & 0.429 & 14.463 & 49.053 & \underline{0.068} & 0.066 & 0.913 & 6.28 \\
RDBLearn + v3 & 3.835 & 0.034 & 0.245 & 43.290 & 0.375 & 14.720 & 50.097 & \underline{0.068} & 0.064 & 0.898 & 5.89 \\
\addlinespace[3pt]
KumoRFMv2 & 4.022 & \underline{0.033} & 0.241 & \underline{41.974} & 0.433$^{*}$ & 14.627 & \textbf{45.352} & \textbf{0.065$^{**}$} & \underline{0.043} & 0.866 & 4.33 \\
\addlinespace[3pt]
\textbf{TabPFN-REL} & 3.757 & \textbf{0.031} & 0.241 & \textbf{40.202} & 0.385 & 14.359 & \underline{46.199} & \underline{0.068} & 0.059 & \underline{0.864} & \textbf{3.56} \\
\bottomrule
\end{tabular}}
```
Additional Details on Internal Benchmarks
=========================================

Methodology {#app:methodology}
-----------

#### Metric Normalization. {#sec:metric_norm}

To aggregate heterogeneous metrics across datasets, we apply a per-fold min--max normalization. For each (dataset, fold) pair and metric $m$, we rescale a model's raw score $s_m^{(b)}$ as $$\tilde{s}_m^{(b)} = \frac{s_m^{(b)} - \min_{b' \in \mathcal{B}}\, s_m^{(b')}}{\max_{b' \in \mathcal{B}}\, s_m^{(b')} - \min_{b' \in \mathcal{B}}\, s_m^{(b')}},$$ where $\mathcal{B}$ denotes the set of models we evaluate. This allows the model scores to live on a comparable $[0, 1]$ scale for each (dataset, fold) combination. We treat the tuned and default versions of a model as two different models. For lower-is-better metrics (e.g. RMSE, cross-entropy loss), we apply the additional transformation $\tilde{s}_m \mapsto 1 - \tilde{s}_m$, so that all metrics are higher-is-better on a common scale and can be meaningfully averaged or ranked across datasets and metric types.

#### Statistical significance. {#sec:statistical_significance}

To assess whether performance differences between models are statistically significant, we report critical difference (CD) diagrams using `scikit-posthocs` [@scikit_posthocs_Terpilowski2019]. The critical difference diagram from `scikit-posthocs` summarizes the statistical comparison of methods across multiple datasets. Average ranks are computed per method across all datasets, with lower ranks indicating better performance. Methods connected by a horizontal bar are not significantly different from each other. To assess statistical significance, we use a Friedman test followed by a Conover post hoc analysis at the significance level $\alpha=0.05$.

Large Data Benchmark Details {#app:large_data_datasets}
----------------------------

Classification datasets span domains including healthcare (patient survival, disease diagnosis), customer analytics (satisfaction, credit risk), insurance (claim prediction), microfinance (loan outcomes), and high-energy physics (signal/background classification). Regression datasets cover retail sales forecasting, climate and weather modeling, food delivery logistics, and e-commerce price prediction. All 4 regression datasets use temporal train/test splits reflecting real-world deployment conditions where the test period strictly follows the training period. For classification, all datasets are IID. These datasets are selected to have between 100K and 1M training rows, and fewer than 200 features, which is the regime TabPFN-3 was designed for.

Figures `\ref{fig:large_data_cd_cls}`{=latex} and `\ref{fig:large_data_cd_reg}`{=latex} show critical difference diagrams for ROC-AUC and RMSE respectively, based on average ranks across all datasets in each benchmark.

```{=latex}
\centering
```
![ **Critical difference diagram for ROC-AUC on the large-scale classification benchmark (100k--1M training rows).** TabPFN-3 ranks first (avg. rank 2.11). Its rank differences to the 8-hour-tuned XGB/CatBoost baselines are not statistically significant, while it ranks significantly ahead of tuned LightGBM, all default GBTs, and TabICLv2. Bars connect methods whose rank differences are not statistically significant at $\alpha = 0.05$ under a Conover-Friedman post-hoc test [@scikit_posthocs_Terpilowski2019].. ](figures/internal_benchmarking/large_data/big_data_cls_v2_wo_AG__roc_auc__cd_scikit_posthocs.png){#fig:large_data_cd_cls width="\\textwidth"}

```{=latex}
\centering
```
![ **Critical difference diagram for RMSE on the large-scale regression benchmark (100k--1M rows, 4 datasets, temporal splits).** Methods are ranked per (dataset, split); lower rank is better. TabPFN-3 achieves the best average rank ($2.25$). Its rank differences to the three 8h-tuned GBDTs and untuned CatBoost are not statistically significant, while it ranks significantly ahead of the remaining methods. Bars connect methods whose rank differences are not statistically significant at $\alpha = 0.05$ under a Conover-Friedman post-hoc test [@scikit_posthocs_Terpilowski2019]. ](figures/internal_benchmarking/large_data/big_data_reg_v3_wo_AG__rmse__cd_scikit_posthocs.png){#fig:large_data_cd_reg width="\\textwidth"}

Synthetic Many-Class Benchmark Construction {#app:many_class_construction}
-------------------------------------------

Continuous regression targets are partitioned into $K = 100$ bins using quantile-based bin edges whose spacings are drawn from a $\mathrm{Dirichlet}(\alpha{=}5.0)$ distribution, producing realistic class imbalance. Bins with fewer than 10 samples are merged with their nearest neighbour to guarantee sufficient representation for inner cross-validations. Class labels are then randomly permuted to remove the implicit ordinal structure inherited from the regression target.

tasets from TabArena whose targets have heavy point masses or too few distinct values to fill 100 quantile bins meaningfully --- wine\_quality (7 unique values), Food\_Delivery\_Time (45, discrete times), Fiat-500 (222, discrete prices), and QSAR-TID-11 (concentrated point masses). Dataset statistics are reported in Table `\ref{tab:many_class_datasets}`{=latex}. The resulting benchmark retains a large number of classes for most datasets (median $K=95$), while inducing moderate class imbalance (median IR $=9.9\times$) without collapsing the label distribution onto a few dominant classes (median $H/\log K=0.98$).

```{=latex}
\centering
```
```{=latex}
\resizebox{\textwidth}{!}{%
\begin{tabular}{lrrrrrrrrr}
\toprule
\textbf{Dataset} & \textbf{OpenML task} & \textbf{OpenML did} & $N$ & $K$ & \textbf{Merged} & \textbf{Min} & \textbf{Max} & \textbf{IR} & $H/\log K$ \\
\midrule
airfoil\_self\_noise            & 363612 & 46904 & 1{,}503  & 80      & 20 & 10  & 42    & $4.2\times$  & 0.984 \\
concrete\_compressive\_strength & 363625 & 46917 & 1{,}030  & 60      & 40 & 10  & 28    & $2.8\times$  & 0.991 \\
diamonds                        & 363631 & 46923 & 53{,}940 & 100     & 0  & 127 & 1{,}252 & $9.9\times$ & 0.979 \\
healthcare\_insurance\_expenses & 363675 & 46931 & 1{,}338  & 73      & 27 & 10  & 32    & $3.2\times$  & 0.988 \\
houses                          & 363678 & 46934 & 20{,}640 & 97      & 3  & 62  & 965   & $15.6\times$ & 0.974 \\
miami\_housing                  & 363686 & 46942 & 13{,}776 & 95      & 5  & 19  & 339   & $17.8\times$ & 0.970 \\
physiochemical\_protein         & 363693 & 46949 & 45{,}730 & 100     & 0  & 108 & 1{,}214 & $11.2\times$ & 0.982 \\
QSAR\_fish\_toxicity            & 363698 & 46954 & 907      & 51      & 49 & 10  & 37    & $3.7\times$  & 0.980 \\
superconductivity               & 363705 & 46961 & 21{,}263 & 100     & 0  & 28  & 508   & $18.1\times$ & 0.979 \\
\midrule
Aggregate (mean / median)       & ---    & ---   & ---      & 84 / 95 & --- & --- & ---  & $9.6\times$ / $9.9\times$ & 0.98 / 0.98 \\
\bottomrule
\end{tabular}%
}
```
Quantile Regression: Critical Difference Diagram
------------------------------------------------

```{=latex}
\centering
```
![ **Critical difference diagram for pinball loss on our quantile regression benchmark.** The quantile regression benchmark is constructed from TabArena regression datasets and evaluated across 10 quantile levels $q \in \{0.1, 0.2, \ldots, 0.9\}$. TabPFN-3 ranks first. Its rank difference to Quantile TabICLv2 is not statistically significant, while it ranks significantly ahead of all remaining baselines. Bars connect methods whose rank differences are not statistically significant at $\alpha = 0.05$ under a Conover-Friedman post-hoc test [@scikit_posthocs_Terpilowski2019]. ](figures/internal_benchmarking/quantile/tabarena_regression__pinball_loss__cd_scikit_posthocs.png){#fig:quantile_cd width="\\textwidth"}

Synthetic Many Class: Critical Difference Diagram
-------------------------------------------------

```{=latex}
\centering
```
![ **Critical difference diagram for ROC AUC on the synthetic many-class benchmark (up to 100 classes).** TabPFN-3 is top-ranked on every (dataset, split) pair and ranks significantly ahead of all baselines. Bars connect methods whose rank differences are not statistically significant at $\alpha = 0.05$ under a Conover-Friedman post-hoc test [@scikit_posthocs_Terpilowski2019]. ](figures/internal_benchmarking/many_class/tabarena_many_class_100_v2__roc_auc__cd_scikit_posthocs.png){#fig:many_class_roc_auc_cd width="\\textwidth"}

Supplementary Inference Time Details
====================================

Compilation and FlashAttention-3 {#app:compile-fa3}
--------------------------------

`\ourmodel `{=latex}is shipped with two opt-in performance features that target different bottlenecks: `torch.compile` and FlashAttention-3. At the shapes relevant to large-data inference, the bulk of forward-pass cost is dispatch overhead and attention compute, and the speed-ups of `torch.compile` and FlashAttention-3 compose cleanly with our chunking strategy without changing the model's behaviour.

#### torch.compile.

Three hot-path methods are wrapped with `@torch.compile(dynamic=True)`: feature preprocessing plus embedding grouping, the column-chunk processing block (used in the non-row-chunked path), and the row-chunk processing block (used when chunking is enabled). The `dynamic=True` mode keeps a single compiled graph across batch and feature-count variation, so the same compiled artefact serves the whole inference grid without re-tracing.

Figure `\ref{fig:compile_vs_eager}`{=latex} shows the wall-clock impact on MI-250x. The y-axis is $T_\mathrm{eager} / T_\mathrm{compile}$, so a value above 1 means compile is faster on that shape; each marker is annotated with the absolute time. `torch.compile` fuses Python-level dispatch into single kernel calls, so it helps most where dispatch is the bottleneck: as $n_\mathrm{features}$ grows, more tensor work becomes compile-able per call. In the non-chunked series the speed-up climbs from $1.04$--$1.15{\times}$ at $n_\mathrm{features}=10$ to $1.10$--$1.46{\times}$ at $n_\mathrm{features}=100$ and $1.40$--$1.58{\times}$ at $n_\mathrm{features}=500$. The chunked series shows the same direction with a different shape: chunking already amortises some dispatch overhead by batching the inner loop, so compile's marginal benefit is largest at small $n_\mathrm{train}$ ($1.21$--$1.43{\times}$ at $n_\mathrm{train}=10^3$) and large $n_\mathrm{features}$ and converges toward parity ($0.95$--$1.06{\times}$) at $n_\mathrm{train} \ge 10^5$ for the smaller feature counts, where the residual cost is dominated by attention itself and compile has no further headroom to claim.

```{=latex}
\centering
```
```{=latex}
\centering
```
![ MI-250x -- speed-up of `torch.compile` over eager of `\ourmodel `{=latex}forward pass, for $n_\mathrm{features} \in \{10, 100, 500\}$. Values above 1 indicate compile wins. ](figures/inference/compile_vs_eager.png){#fig:compile_vs_eager width="\\linewidth"}

```{=latex}
\vspace{1em}
```
```{=latex}
\centering
```
![ H100 -- speed-up of the auto backend (Flash Attention 3 where eligible, SDPA fallback elsewhere) over the SDPA-only backend on the `\ourmodel `{=latex}architecture forward pass, for $n_\mathrm{features} \in \{10, 100, 500\}$. Values above 1 indicate FA3 wins. ](figures/inference/h100_auto_vs_sdpa.png){#fig:fa3_h100 width="\\linewidth"}

#### FlashAttention-3.

FlashAttention-3 (FA3) [@flash_attention_3] is a Hopper-specific attention kernel that delivers higher throughput and lower memory use than the generic Scaled Dot-Product Attention (SDPA) path. Attention dominates the forward-pass cost of large-$n_\mathrm{train}$ inference, so even a constant-factor improvement in the attention kernel translates into a meaningful end-to-end speed-up. We therefore expose FA3 as an auto-detecting backend: on Hopper-class GPUs with the FA3 library installed, the in-context-learning self-attention -- which carries the bulk of the attention cost at large $n_\mathrm{train}$ -- is routed through FA3, while attention sites whose head dimensions are not FA3-eligible silently fall back to SDPA. On non-Hopper devices (consumer Ada, AMD MI-250x, Blackwell) the same dispatcher selects SDPA.

Figure `\ref{fig:fa3_h100}`{=latex} shows the H100 SDPA-versus-auto comparison in ratio form. The y-axis is $T_\mathrm{sdpa} / T_\mathrm{auto}$, so a value above 1 means FA3 is faster than SDPA on that shape; each marker is annotated with the absolute auto time so the magnitude being sped up is recoverable. The pattern matches the FA3 design profile. At small training sets ($n_\mathrm{train} \le 1000$) the FA3 dispatch and kernel-launch overhead exceeds the per-call attention work, and SDPA is 10--15% faster ($T_\mathrm{sdpa}/T_\mathrm{auto} \approx 0.84\text{--}0.91$ across feature counts). The cross-over arrives sooner the smaller $n_\mathrm{features}$: by $n_\mathrm{train}=10^4$ FA3 wins at $n_\mathrm{features}=10$ ($1.21{\times}$), is roughly even at $n_\mathrm{features}=100$ ($1.07{\times}$), and at parity at $n_\mathrm{features}=500$ ($1.02{\times}$). At the inference shapes we care about ($n_\mathrm{train} \ge 10^5$) FA3 is the clear win across all feature counts, with the speed-up climbing to $1.49$--$1.73{\times}$ at $n_\mathrm{train}=10^6$. Chunking does not interact with the FA3-versus-SDPA comparison: the chunked and non-chunked curves overlap to within run-to-run noise, since chunking changes the outer dispatch loop but leaves the underlying attention-kernel selection intact.

Interpretability: SHAP-Value Computation {#app:shap-kv-cache}
----------------------------------------

TabPFN-3's improved, smaller KV cache (Section `\ref{sec:kv_cache}`{=latex}) can speed up the computation of SHAP values by multiple order of magnitudes. This is because imputation-based approaches to SHAP-value computation reuse the same `fit` on many different forward passes. `\autoref{fig:shap-kv-speedup}`{=latex} shows the efficiency gains users can expect from enabling the KV cache during SHAP-value-computation.

```{=latex}
\centering
```
```{=latex}
\centering
```
![**Efficiency gains for SHAP-value computation with KV-cache across training table dimensions.** All experiments were conducted on a single RTX Pro 6000 Blackwell with a fixed budget of 1024 coalitions and are averaged over 10 repetitions. Left: expected speed-up from using KV cache. Right: expected runtime for computing SHAP values for one test row with KV cache enabled.](figures/shap/shap_kv_cache_speedup_heatmap_baseline.png){#fig:shap-kv-speedup width="\\textwidth"}

```{=latex}
\hfill
```
```{=latex}
\centering
```
![**Efficiency gains for SHAP-value computation with KV-cache across training table dimensions.** All experiments were conducted on a single RTX Pro 6000 Blackwell with a fixed budget of 1024 coalitions and are averaged over 10 repetitions. Left: expected speed-up from using KV cache. Right: expected runtime for computing SHAP values for one test row with KV cache enabled.](figures/shap/shap_runtime_heatmap_baseline_cacheON.png){#fig:shap-kv-speedup width="\\textwidth"}

Detailed Time-Series Forecasting Results on fev-bench {#app:time_series}
=====================================================

This appendix complements the body Time-Series subsection (`\Cref{tab:fev-bench}`{=latex}, `\Cref{fig:fev-bench-qualitative}`{=latex}) with the full leaderboards (Table `\ref{tab:fev-bench-full}`{=latex}), pairwise comparisons (Figure `\ref{fig:fev-bench-pairwise}`{=latex}), additional qualitative forecasts (Section `\ref{sec:fev-bench-qualitative}`{=latex} and per-task SQL results (Section `\ref{sec:fev-bench-per-task}`{=latex}).

Full leaderboards (SQL and MASE) {#full-leaderboards-sql-and-mase .unnumbered}
--------------------------------

```{=latex}
\centering
```
```{=latex}
\vspace{-3mm}
```
```{=latex}
\centering
```
**(a) SQL (probabilistic)**\
`\resizebox{\linewidth}{!}{\begin{tabular}{lrrrrr}
\toprule
\textbf{Model} & \textbf{Win (\%)} & \textbf{Skill (\%)} & \textbf{Runtime (s)} & \textbf{Leak.\ (\%)} & \textbf{\# fails} \\
\midrule
Chronos-2            & 91.7 &  47.3 &   0.8 &  0 &  0 \\
\textcolor{PriorMauve}{TabPFN-TS-3}          & 73.6 &  43.1 & 234.6 &  0 &  0 \\
TiRex                & 83.4 &  42.6 &   0.2 &  1 &  0 \\
TimesFM-2.5          & 78.6 &  42.2 &   1.9 & 10 &  0 \\
Toto-1.0             & 71.6 &  40.7 &  22.1 &  8 &  0 \\
\textcolor{PriorMauve}{TabPFN-v2-TS}         & 64.1 &  39.6 &  88.9 &  0 &  2 \\
Moirai-2.0           & 66.2 &  39.3 &   0.3 & 28 &  0 \\
Chronos-Bolt         & 66.2 &  38.9 &   0.2 &  0 &  0 \\
Sundial-Base         & 47.1 &  33.4 &   8.0 &  1 &  0 \\
TabICL-v2$^{\dagger}$ & 53.8 &  30.8 &  64.7 &  0 &  0 \\
CatBoost (Recursive) & 35.7 &  23.0 &   0.3 &  0 &  0 \\
LightGBM (Recursive) & 33.4 &  21.7 &   0.3 &  0 &  0 \\
AutoARIMA            & 39.6 &  20.6 &  19.5 &  0 & 10 \\
Stat. Ensemble       & 43.8 &  20.2 & 148.6 &  0 & 11 \\
AutoTheta            & 27.1 &   5.5 &   3.3 &  0 &  0 \\
Seasonal Naive       & 19.1 &   0.0 &   0.5 &  0 &  0 \\
AutoETS              & 32.7 & -26.8 &   3.5 &  0 &  3 \\
Naive                & 12.6 & -45.4 &   0.5 &  0 &  0 \\
Drift                &  9.7 & -45.8 &   0.5 &  0 &  0 \\
\bottomrule
\end{tabular}
}`{=latex}

```{=latex}
\hfill
```
```{=latex}
\centering
```
**(b) MASE (point)**\
`\resizebox{\linewidth}{!}{\begin{tabular}{lrrrrr}
\toprule
\textbf{Model} & \textbf{Win (\%)} & \textbf{Skill (\%)} & \textbf{Runtime (s)} & \textbf{Leak.\ (\%)} & \textbf{\# fails} \\
\midrule
Chronos-2            & 86.9 &  35.5 &   0.8 &  0 &  0 \\
\textcolor{PriorMauve}{TabPFN-TS-3}          & 69.8 &  30.6 & 234.6 &  0 &  0 \\
TimesFM-2.5          & 74.9 &  30.2 &   1.9 & 10 &  0 \\
TiRex                & 76.9 &  30.0 &   0.2 &  1 &  0 \\
Toto-1.0             & 66.3 &  28.2 &  22.1 &  8 &  0 \\
\textcolor{PriorMauve}{TabPFN-v2-TS}         & 58.5 &  27.6 &  88.9 &  0 &  2 \\
Moirai-2.0           & 61.4 &  27.3 &   0.3 & 28 &  0 \\
Chronos-Bolt         & 60.7 &  26.5 &   0.2 &  0 &  0 \\
Sundial-Base         & 53.4 &  24.7 &   8.0 &  1 &  0 \\
CatBoost (Recursive) & 54.0 &  23.7 &   0.3 &  0 &  0 \\
LightGBM (Recursive) & 50.3 &  22.4 &   0.3 &  0 &  0 \\
Stat. Ensemble       & 46.7 &  15.7 & 148.6 &  0 & 11 \\
AutoARIMA            & 36.0 &  11.2 &  19.5 &  0 & 10 \\
AutoTheta            & 34.2 &  11.0 &   3.3 &  0 &  0 \\
TabICL-v2$^{\dagger}$ & 33.2 &   7.0 &  64.7 &  0 &  0 \\
AutoETS              & 33.5 &   2.3 &   3.5 &  0 &  3 \\
Seasonal Naive       & 20.0 &   0.0 &   0.5 &  0 &  0 \\
Naive                & 18.0 & -16.7 &   0.5 &  0 &  0 \\
Drift                & 15.3 & -18.1 &   0.5 &  0 &  0 \\
\bottomrule
\end{tabular}
}`{=latex}

Qualitative forecast examples {#sec:fev-bench-qualitative .unnumbered}
-----------------------------

```{=latex}
\centering
```
![`solar_with_weather_15T` --- 15-minute solar generation with weather covariates.](figures/time_series/model_comparison/solar_with_weather_15T_s0.png){width="\\textwidth"}

```{=latex}
\centering
```
![`rossmann_1W` --- weekly Rossmann store sales (series 1).](figures/time_series/model_comparison/rossmann_1W_s1.png){width="\\textwidth"}

```{=latex}
\centering
```
![`rohlik_orders_1D` --- daily online-grocery orders.](figures/time_series/model_comparison/rohlik_orders_1D_s0.png){width="\\textwidth"}

```{=latex}
\centering
```
![`LOOP_SEATTLE_1H` --- hourly Seattle freeway loop-detector counts.](figures/time_series/model_comparison/LOOP_SEATTLE_1H_s0.png){width="\\textwidth"}

```{=latex}
\centering
```
![`ETT_1H` --- hourly Electricity Transformer Temperature.](figures/time_series/model_comparison/ETT_1H_s0.png){width="\\textwidth"}

```{=latex}
\centering
```
![`entsoe_1H` --- hourly ENTSO-E European electricity load.](figures/time_series/model_comparison/entsoe_1H_s0.png){width="\\textwidth"}

Pairwise skill-score heatmaps {#pairwise-skill-score-heatmaps .unnumbered}
-----------------------------

```{=latex}
\centering
```
```{=latex}
\centering
```
![Pairwise skill-score comparison on fev-bench (100 tasks) under SQL (left) and MASE (right). Cell $(i, j)$ is the skill score of model $i$ relative to model $j$, with 95% confidence intervals from bootstrapped resampling; cells whose interval overlaps zero are shown in italics. Rows and columns are ordered by overall skill score. Best viewed on screen.](figures/time_series/pairwise_skill_score/pairwise_skill_score_sql.png){#fig:fev-bench-pairwise width="\\linewidth"}

```{=latex}
\hfill
```
```{=latex}
\centering
```
![Pairwise skill-score comparison on fev-bench (100 tasks) under SQL (left) and MASE (right). Cell $(i, j)$ is the skill score of model $i$ relative to model $j$, with 95% confidence intervals from bootstrapped resampling; cells whose interval overlaps zero are shown in italics. Rows and columns are ordered by overall skill score. Best viewed on screen.](figures/time_series/pairwise_skill_score/pairwise_skill_score_mase.png){#fig:fev-bench-pairwise width="\\linewidth"}

fev-bench per-task SQL leaderboard {#sec:fev-bench-per-task .unnumbered}
----------------------------------

```{=latex}
\setlength{\tabcolsep}{2pt}
```
```{=latex}
\scriptsize
```
::: {#tab:fev-bench-per-task-sql}
<table><caption><strong>Per-task SQL on fev-bench (100 tasks).</strong> Lower is better; values are after leakage and failure imputation. Per-row top-three are highlighted with gold / silver / bronze backgrounds. Columns are the ten models with the most medal placements; ordered by overall SQL skill score. Values exceeding <span class="math inline">10<sup>3</sup></span> are capped for layout.</caption><thead><tr class="header"><th style="text-align: left;"><strong>Task name</strong></th><th style="text-align: right;"></th><th style="text-align: right;"></th><th style="text-align: right;"></th><th style="text-align: right;"></th><th style="text-align: right;"></th><th style="text-align: right;"></th><th style="text-align: right;"></th><th style="text-align: right;"></th><th style="text-align: right;"></th><th style="text-align: right;"></th></tr></thead><tbody><tr class="odd"><td style="text-align: left;"><strong>Task name</strong></td><td style="text-align: right;"></td><td style="text-align: right;"></td><td style="text-align: right;"></td><td style="text-align: right;"></td><td style="text-align: right;"></td><td style="text-align: right;"></td><td style="text-align: right;"></td><td style="text-align: right;"></td><td style="text-align: right;"></td><td style="text-align: right;"></td></tr><tr class="even"><td style="text-align: left;"><p>ETT_15T</p></td><td style="text-align: right;">0.546</td><td style="text-align: right;">0.626</td><td style="text-align: right;">0.568</td><td style="text-align: right;">0.577</td><td style="text-align: right;">0.593</td><td style="text-align: right;">0.602</td><td style="text-align: right;">0.574</td><td style="text-align: right;">0.574</td><td style="text-align: right;">0.762</td><td style="text-align: right;">1.263</td></tr><tr class="odd"><td style="text-align: left;">ETT_1D</td><td style="text-align: right;">1.132</td><td style="text-align: right;">1.138</td><td style="text-align: right;">1.101</td><td style="text-align: right;">1.144</td><td style="text-align: right;">1.143</td><td style="text-align: right;">1.230</td><td style="text-align: right;">1.132</td><td style="text-align: right;">1.132</td><td style="text-align: right;">1.271</td><td style="text-align: right;">1.356</td></tr><tr class="even"><td style="text-align: left;">ETT_1H</td><td style="text-align: right;">0.883</td><td style="text-align: right;">0.908</td><td style="text-align: right;">0.874</td><td style="text-align: right;">0.882</td><td style="text-align: right;">0.873</td><td style="text-align: right;">0.933</td><td style="text-align: right;">0.944</td><td style="text-align: right;">0.944</td><td style="text-align: right;">1.272</td><td style="text-align: right;">1.765</td></tr><tr class="odd"><td style="text-align: left;">ETT_1W</td><td style="text-align: right;">2.320</td><td style="text-align: right;">2.252</td><td style="text-align: right;">2.265</td><td style="text-align: right;">2.249</td><td style="text-align: right;">2.281</td><td style="text-align: right;">2.411</td><td style="text-align: right;">2.280</td><td style="text-align: right;">2.280</td><td style="text-align: right;">2.407</td><td style="text-align: right;">2.394</td></tr><tr class="even"><td style="text-align: left;">LOOP_SEATTLE_1D</td><td style="text-align: right;">0.779</td><td style="text-align: right;">0.769</td><td style="text-align: right;">0.792</td><td style="text-align: right;">0.774</td><td style="text-align: right;">0.831</td><td style="text-align: right;">0.780</td><td style="text-align: right;">0.805</td><td style="text-align: right;">0.805</td><td style="text-align: right;">0.820</td><td style="text-align: right;">0.825</td></tr><tr class="odd"><td style="text-align: left;">LOOP_SEATTLE_1H</td><td style="text-align: right;">0.639</td><td style="text-align: right;">0.667</td><td style="text-align: right;">0.656</td><td style="text-align: right;">0.621</td><td style="text-align: right;">0.698</td><td style="text-align: right;">0.679</td><td style="text-align: right;">0.765</td><td style="text-align: right;">0.765</td><td style="text-align: right;">1.501</td><td style="text-align: right;">2.639</td></tr><tr class="even"><td style="text-align: left;">LOOP_SEATTLE_5T</td><td style="text-align: right;">0.533</td><td style="text-align: right;">0.710</td><td style="text-align: right;">0.549</td><td style="text-align: right;">0.595</td><td style="text-align: right;">0.561</td><td style="text-align: right;">0.641</td><td style="text-align: right;">0.710</td><td style="text-align: right;">0.710</td><td style="text-align: right;">1.044</td><td style="text-align: right;">1.155</td></tr><tr class="odd"><td style="text-align: left;">M_DENSE_1D</td><td style="text-align: right;">0.646</td><td style="text-align: right;">0.757</td><td style="text-align: right;">0.746</td><td style="text-align: right;">0.708</td><td style="text-align: right;">0.842</td><td style="text-align: right;">0.756</td><td style="text-align: right;">0.759</td><td style="text-align: right;">0.759</td><td style="text-align: right;">0.965</td><td style="text-align: right;">1.073</td></tr><tr class="even"><td style="text-align: left;">M_DENSE_1H</td><td style="text-align: right;">0.585</td><td style="text-align: right;">0.585</td><td style="text-align: right;">0.587</td><td style="text-align: right;">0.556</td><td style="text-align: right;">0.621</td><td style="text-align: right;">0.646</td><td style="text-align: right;">0.595</td><td style="text-align: right;">0.595</td><td style="text-align: right;">1.127</td><td style="text-align: right;">59.020</td></tr><tr class="odd"><td style="text-align: left;">SZ_TAXI_15T</td><td style="text-align: right;">0.393</td><td style="text-align: right;">0.399</td><td style="text-align: right;">0.396</td><td style="text-align: right;">0.397</td><td style="text-align: right;">0.401</td><td style="text-align: right;">0.429</td><td style="text-align: right;">0.413</td><td style="text-align: right;">0.413</td><td style="text-align: right;">0.560</td><td style="text-align: right;">2.355</td></tr><tr class="even"><td style="text-align: left;">SZ_TAXI_1H</td><td style="text-align: right;">0.398</td><td style="text-align: right;">0.407</td><td style="text-align: right;">0.405</td><td style="text-align: right;">0.416</td><td style="text-align: right;">0.418</td><td style="text-align: right;">0.494</td><td style="text-align: right;">0.426</td><td style="text-align: right;">0.426</td><td style="text-align: right;">0.689</td><td style="text-align: right;"><span class="math inline">&gt; 10<sup>3</sup></span></td></tr><tr class="odd"><td style="text-align: left;">aust...tourism</td><td style="text-align: right;">0.677</td><td style="text-align: right;">0.695</td><td style="text-align: right;">0.786</td><td style="text-align: right;">0.732</td><td style="text-align: right;">0.890</td><td style="text-align: right;">0.699</td><td style="text-align: right;">0.918</td><td style="text-align: right;">0.928</td><td style="text-align: right;">0.730</td><td style="text-align: right;">0.762</td></tr><tr class="even"><td style="text-align: left;">bizitobs_l2c_1H</td><td style="text-align: right;">0.301</td><td style="text-align: right;">0.374</td><td style="text-align: right;">0.366</td><td style="text-align: right;">0.326</td><td style="text-align: right;">0.370</td><td style="text-align: right;">0.354</td><td style="text-align: right;">0.342</td><td style="text-align: right;">0.342</td><td style="text-align: right;">0.634</td><td style="text-align: right;">0.718</td></tr><tr class="odd"><td style="text-align: left;">bizitobs_l2c_5T</td><td style="text-align: right;">0.411</td><td style="text-align: right;">0.370</td><td style="text-align: right;">0.679</td><td style="text-align: right;">0.461</td><td style="text-align: right;">0.595</td><td style="text-align: right;">0.485</td><td style="text-align: right;">0.757</td><td style="text-align: right;">0.757</td><td style="text-align: right;">0.720</td><td style="text-align: right;">0.731</td></tr><tr class="even"><td style="text-align: left;">boomlet_1062</td><td style="text-align: right;">0.552</td><td style="text-align: right;">0.554</td><td style="text-align: right;">0.555</td><td style="text-align: right;">0.573</td><td style="text-align: right;">0.548</td><td style="text-align: right;">0.708</td><td style="text-align: right;">0.593</td><td style="text-align: right;">0.639</td><td style="text-align: right;">0.985</td><td style="text-align: right;">1.309</td></tr><tr class="odd"><td style="text-align: left;">boomlet_1209</td><td style="text-align: right;">0.680</td><td style="text-align: right;">0.768</td><td style="text-align: right;">0.729</td><td style="text-align: right;">0.705</td><td style="text-align: right;">0.645</td><td style="text-align: right;">1.016</td><td style="text-align: right;">0.756</td><td style="text-align: right;">0.784</td><td style="text-align: right;">2.469</td><td style="text-align: right;">1.264</td></tr><tr class="even"><td style="text-align: left;">boomlet_1225</td><td style="text-align: right;">0.186</td><td style="text-align: right;">0.199</td><td style="text-align: right;">0.188</td><td style="text-align: right;">0.190</td><td style="text-align: right;">0.183</td><td style="text-align: right;">0.215</td><td style="text-align: right;">0.195</td><td style="text-align: right;">0.203</td><td style="text-align: right;">0.280</td><td style="text-align: right;">0.318</td></tr><tr class="odd"><td style="text-align: left;">boomlet_1230</td><td style="text-align: right;">1.201</td><td style="text-align: right;">1.292</td><td style="text-align: right;">1.186</td><td style="text-align: right;">1.187</td><td style="text-align: right;">1.138</td><td style="text-align: right;">1.613</td><td style="text-align: right;">1.286</td><td style="text-align: right;">1.266</td><td style="text-align: right;">3.390</td><td style="text-align: right;"><span class="math inline">&gt; 10<sup>3</sup></span></td></tr><tr class="even"><td style="text-align: left;">boomlet_1282</td><td style="text-align: right;">0.421</td><td style="text-align: right;">0.413</td><td style="text-align: right;">0.409</td><td style="text-align: right;">0.403</td><td style="text-align: right;">0.407</td><td style="text-align: right;">0.425</td><td style="text-align: right;">0.427</td><td style="text-align: right;">0.462</td><td style="text-align: right;">0.739</td><td style="text-align: right;">0.914</td></tr><tr class="odd"><td style="text-align: left;">boomlet_1487</td><td style="text-align: right;">0.423</td><td style="text-align: right;">0.447</td><td style="text-align: right;">0.427</td><td style="text-align: right;">0.412</td><td style="text-align: right;">0.400</td><td style="text-align: right;">0.745</td><td style="text-align: right;">0.456</td><td style="text-align: right;">0.482</td><td style="text-align: right;">0.681</td><td style="text-align: right;">0.724</td></tr><tr class="even"><td style="text-align: left;">boomlet_1631</td><td style="text-align: right;">0.572</td><td style="text-align: right;">0.622</td><td style="text-align: right;">0.598</td><td style="text-align: right;">0.579</td><td style="text-align: right;">0.581</td><td style="text-align: right;">0.697</td><td style="text-align: right;">0.591</td><td style="text-align: right;">0.619</td><td style="text-align: right;">0.851</td><td style="text-align: right;">0.721</td></tr><tr class="odd"><td style="text-align: left;">boomlet_1676</td><td style="text-align: right;">0.569</td><td style="text-align: right;">0.602</td><td style="text-align: right;">0.571</td><td style="text-align: right;">0.563</td><td style="text-align: right;">0.554</td><td style="text-align: right;">0.831</td><td style="text-align: right;">0.573</td><td style="text-align: right;">0.608</td><td style="text-align: right;">0.850</td><td style="text-align: right;">0.756</td></tr><tr class="even"><td style="text-align: left;">boomlet_1855</td><td style="text-align: right;">0.462</td><td style="text-align: right;">0.504</td><td style="text-align: right;">0.450</td><td style="text-align: right;">0.473</td><td style="text-align: right;">0.452</td><td style="text-align: right;">0.623</td><td style="text-align: right;">0.465</td><td style="text-align: right;">0.470</td><td style="text-align: right;">1.123</td><td style="text-align: right;">1.185</td></tr><tr class="odd"><td style="text-align: left;">boomlet_1975</td><td style="text-align: right;">0.133</td><td style="text-align: right;">0.251</td><td style="text-align: right;">0.192</td><td style="text-align: right;">0.167</td><td style="text-align: right;">0.126</td><td style="text-align: right;">0.207</td><td style="text-align: right;">0.220</td><td style="text-align: right;">0.179</td><td style="text-align: right;">0.548</td><td style="text-align: right;">0.611</td></tr><tr class="even"><td style="text-align: left;">boomlet_2187</td><td style="text-align: right;">0.712</td><td style="text-align: right;">0.835</td><td style="text-align: right;">0.711</td><td style="text-align: right;">0.802</td><td style="text-align: right;">0.764</td><td style="text-align: right;">0.934</td><td style="text-align: right;">0.807</td><td style="text-align: right;">0.775</td><td style="text-align: right;">1.273</td><td style="text-align: right;">1.307</td></tr><tr class="odd"><td style="text-align: left;">boomlet_285</td><td style="text-align: right;">0.290</td><td style="text-align: right;">0.354</td><td style="text-align: right;">0.345</td><td style="text-align: right;">0.397</td><td style="text-align: right;">0.319</td><td style="text-align: right;">0.713</td><td style="text-align: right;">0.427</td><td style="text-align: right;">0.477</td><td style="text-align: right;">1.262</td><td style="text-align: right;">1.203</td></tr><tr class="even"><td style="text-align: left;">boomlet_619</td><td style="text-align: right;">0.323</td><td style="text-align: right;">0.326</td><td style="text-align: right;">0.341</td><td style="text-align: right;">0.340</td><td style="text-align: right;">0.310</td><td style="text-align: right;">0.331</td><td style="text-align: right;">0.329</td><td style="text-align: right;">0.471</td><td style="text-align: right;">0.777</td><td style="text-align: right;">0.894</td></tr><tr class="odd"><td style="text-align: left;">boomlet_772</td><td style="text-align: right;">0.283</td><td style="text-align: right;">0.305</td><td style="text-align: right;">0.296</td><td style="text-align: right;">0.295</td><td style="text-align: right;">0.281</td><td style="text-align: right;">0.330</td><td style="text-align: right;">0.314</td><td style="text-align: right;">0.339</td><td style="text-align: right;">1.179</td><td style="text-align: right;"><span class="math inline">&gt; 10<sup>3</sup></span></td></tr><tr class="even"><td style="text-align: left;">boomlet_963</td><td style="text-align: right;">0.717</td><td style="text-align: right;">0.786</td><td style="text-align: right;">0.718</td><td style="text-align: right;">0.739</td><td style="text-align: right;">0.720</td><td style="text-align: right;">0.796</td><td style="text-align: right;">0.751</td><td style="text-align: right;">0.779</td><td style="text-align: right;">1.335</td><td style="text-align: right;">1.609</td></tr><tr class="odd"><td style="text-align: left;">ecdc_ili</td><td style="text-align: right;">2.271</td><td style="text-align: right;">2.457</td><td style="text-align: right;">2.411</td><td style="text-align: right;">2.215</td><td style="text-align: right;">2.554</td><td style="text-align: right;">2.382</td><td style="text-align: right;">2.454</td><td style="text-align: right;">2.653</td><td style="text-align: right;">3.837</td><td style="text-align: right;">4.079</td></tr><tr class="even"><td style="text-align: left;">entsoe_15T</td><td style="text-align: right;">0.454</td><td style="text-align: right;">0.648</td><td style="text-align: right;">0.469</td><td style="text-align: right;">0.471</td><td style="text-align: right;">0.591</td><td style="text-align: right;">0.484</td><td style="text-align: right;">0.478</td><td style="text-align: right;">0.506</td><td style="text-align: right;">0.781</td><td style="text-align: right;">3.029</td></tr><tr class="odd"><td style="text-align: left;">entsoe_1H</td><td style="text-align: right;">0.429</td><td style="text-align: right;">0.385</td><td style="text-align: right;">0.470</td><td style="text-align: right;">0.468</td><td style="text-align: right;">0.480</td><td style="text-align: right;">0.442</td><td style="text-align: right;">0.487</td><td style="text-align: right;">0.457</td><td style="text-align: right;">0.892</td><td style="text-align: right;">1.905</td></tr><tr class="even"><td style="text-align: left;">entsoe_30T</td><td style="text-align: right;">0.434</td><td style="text-align: right;">0.579</td><td style="text-align: right;">0.523</td><td style="text-align: right;">0.566</td><td style="text-align: right;">0.496</td><td style="text-align: right;">0.512</td><td style="text-align: right;">0.488</td><td style="text-align: right;">0.529</td><td style="text-align: right;">0.847</td><td style="text-align: right;">2.493</td></tr><tr class="odd"><td style="text-align: left;">epf_be</td><td style="text-align: right;">0.503</td><td style="text-align: right;">0.533</td><td style="text-align: right;">0.527</td><td style="text-align: right;">0.494</td><td style="text-align: right;">0.565</td><td style="text-align: right;">0.532</td><td style="text-align: right;">0.528</td><td style="text-align: right;">0.573</td><td style="text-align: right;">1.213</td><td style="text-align: right;">1.534</td></tr><tr class="even"><td style="text-align: left;">epf_de</td><td style="text-align: right;">0.491</td><td style="text-align: right;">0.437</td><td style="text-align: right;">1.032</td><td style="text-align: right;">1.030</td><td style="text-align: right;">1.106</td><td style="text-align: right;">0.440</td><td style="text-align: right;">1.016</td><td style="text-align: right;">1.021</td><td style="text-align: right;">1.167</td><td style="text-align: right;">1.401</td></tr><tr class="odd"><td style="text-align: left;">epf_fr</td><td style="text-align: right;">0.362</td><td style="text-align: right;">0.374</td><td style="text-align: right;">0.401</td><td style="text-align: right;">0.409</td><td style="text-align: right;">0.426</td><td style="text-align: right;">0.331</td><td style="text-align: right;">0.409</td><td style="text-align: right;">0.439</td><td style="text-align: right;">1.146</td><td style="text-align: right;">0.899</td></tr><tr class="even"><td style="text-align: left;">epf_np</td><td style="text-align: right;">0.658</td><td style="text-align: right;">0.633</td><td style="text-align: right;">0.966</td><td style="text-align: right;">1.171</td><td style="text-align: right;">1.037</td><td style="text-align: right;">0.659</td><td style="text-align: right;">0.925</td><td style="text-align: right;">0.971</td><td style="text-align: right;">1.284</td><td style="text-align: right;">1.933</td></tr><tr class="odd"><td style="text-align: left;">epf_pjm</td><td style="text-align: right;">0.382</td><td style="text-align: right;">0.382</td><td style="text-align: right;">0.404</td><td style="text-align: right;">0.426</td><td style="text-align: right;">0.452</td><td style="text-align: right;">0.427</td><td style="text-align: right;">0.441</td><td style="text-align: right;">0.422</td><td style="text-align: right;">0.487</td><td style="text-align: right;">0.914</td></tr><tr class="even"><td style="text-align: left;">ercot_1D</td><td style="text-align: right;">0.869</td><td style="text-align: right;">0.845</td><td style="text-align: right;">0.818</td><td style="text-align: right;">0.830</td><td style="text-align: right;">0.880</td><td style="text-align: right;">0.981</td><td style="text-align: right;">0.947</td><td style="text-align: right;">0.916</td><td style="text-align: right;">1.255</td><td style="text-align: right;">1.382</td></tr><tr class="odd"><td style="text-align: left;">ercot_1H</td><td style="text-align: right;">1.029</td><td style="text-align: right;">1.108</td><td style="text-align: right;">1.065</td><td style="text-align: right;">1.151</td><td style="text-align: right;">1.095</td><td style="text-align: right;">1.208</td><td style="text-align: right;">1.098</td><td style="text-align: right;">1.138</td><td style="text-align: right;">1.260</td><td style="text-align: right;">2.676</td></tr><tr class="even"><td style="text-align: left;">ercot_1M</td><td style="text-align: right;">0.755</td><td style="text-align: right;">0.755</td><td style="text-align: right;">0.806</td><td style="text-align: right;">0.772</td><td style="text-align: right;">1.007</td><td style="text-align: right;">0.903</td><td style="text-align: right;">0.973</td><td style="text-align: right;">0.773</td><td style="text-align: right;">0.762</td><td style="text-align: right;">0.756</td></tr><tr class="odd"><td style="text-align: left;">ercot_1W</td><td style="text-align: right;">0.966</td><td style="text-align: right;">0.996</td><td style="text-align: right;">0.955</td><td style="text-align: right;">0.932</td><td style="text-align: right;">1.060</td><td style="text-align: right;">1.228</td><td style="text-align: right;">1.053</td><td style="text-align: right;">0.961</td><td style="text-align: right;">2.095</td><td style="text-align: right;">2.068</td></tr><tr class="even"><td style="text-align: left;">fav...stores_1D</td><td style="text-align: right;">0.916</td><td style="text-align: right;">0.989</td><td style="text-align: right;">0.968</td><td style="text-align: right;">0.949</td><td style="text-align: right;">1.036</td><td style="text-align: right;">0.970</td><td style="text-align: right;">0.980</td><td style="text-align: right;">1.032</td><td style="text-align: right;">1.197</td><td style="text-align: right;">1.238</td></tr><tr class="odd"><td style="text-align: left;">fav...stores_1M</td><td style="text-align: right;">1.794</td><td style="text-align: right;">1.923</td><td style="text-align: right;">1.856</td><td style="text-align: right;">1.998</td><td style="text-align: right;">2.009</td><td style="text-align: right;">1.934</td><td style="text-align: right;">2.091</td><td style="text-align: right;">2.087</td><td style="text-align: right;">1.943</td><td style="text-align: right;">1.942</td></tr><tr class="even"><td style="text-align: left;">fav...stores_1W</td><td style="text-align: right;">2.024</td><td style="text-align: right;">2.054</td><td style="text-align: right;">2.046</td><td style="text-align: right;">1.968</td><td style="text-align: right;">2.128</td><td style="text-align: right;">2.123</td><td style="text-align: right;">2.197</td><td style="text-align: right;">2.101</td><td style="text-align: right;">2.220</td><td style="text-align: right;">2.357</td></tr><tr class="odd"><td style="text-align: left;">fav...trans_1D</td><td style="text-align: right;">0.685</td><td style="text-align: right;">1.283</td><td style="text-align: right;">1.031</td><td style="text-align: right;">0.975</td><td style="text-align: right;">0.975</td><td style="text-align: right;">1.225</td><td style="text-align: right;">0.975</td><td style="text-align: right;">0.975</td><td style="text-align: right;">1.185</td><td style="text-align: right;">1.181</td></tr><tr class="even"><td style="text-align: left;">fav...trans_1M</td><td style="text-align: right;">0.943</td><td style="text-align: right;">1.214</td><td style="text-align: right;">1.089</td><td style="text-align: right;">1.133</td><td style="text-align: right;">1.397</td><td style="text-align: right;">1.244</td><td style="text-align: right;">1.390</td><td style="text-align: right;">1.358</td><td style="text-align: right;">1.152</td><td style="text-align: right;">1.179</td></tr><tr class="odd"><td style="text-align: left;">fav...trans_1W</td><td style="text-align: right;">1.228</td><td style="text-align: right;">1.579</td><td style="text-align: right;">1.384</td><td style="text-align: right;">1.428</td><td style="text-align: right;">1.557</td><td style="text-align: right;">1.912</td><td style="text-align: right;">1.463</td><td style="text-align: right;">1.428</td><td style="text-align: right;">1.559</td><td style="text-align: right;">1.647</td></tr><tr class="even"><td style="text-align: left;">fred_md_2025/cee</td><td style="text-align: right;">3.468</td><td style="text-align: right;">4.823</td><td style="text-align: right;">3.349</td><td style="text-align: right;">4.490</td><td style="text-align: right;">4.490</td><td style="text-align: right;">3.873</td><td style="text-align: right;">4.490</td><td style="text-align: right;">4.490</td><td style="text-align: right;">3.745</td><td style="text-align: right;">3.643</td></tr><tr class="odd"><td style="text-align: left;">fred_md/macro</td><td style="text-align: right;">5.680</td><td style="text-align: right;">6.623</td><td style="text-align: right;">5.307</td><td style="text-align: right;">5.842</td><td style="text-align: right;">5.842</td><td style="text-align: right;">6.399</td><td style="text-align: right;">5.842</td><td style="text-align: right;">5.842</td><td style="text-align: right;">5.743</td><td style="text-align: right;">5.794</td></tr><tr class="even"><td style="text-align: left;">fred_qd_2025/cee</td><td style="text-align: right;">2.192</td><td style="text-align: right;">2.455</td><td style="text-align: right;">2.046</td><td style="text-align: right;">2.181</td><td style="text-align: right;">1.773</td><td style="text-align: right;">2.292</td><td style="text-align: right;">2.296</td><td style="text-align: right;">2.365</td><td style="text-align: right;">1.903</td><td style="text-align: right;">2.123</td></tr><tr class="odd"><td style="text-align: left;">fred_qd/macro</td><td style="text-align: right;">3.537</td><td style="text-align: right;">4.040</td><td style="text-align: right;">3.530</td><td style="text-align: right;">3.593</td><td style="text-align: right;">3.402</td><td style="text-align: right;">4.240</td><td style="text-align: right;">3.616</td><td style="text-align: right;">3.654</td><td style="text-align: right;">3.615</td><td style="text-align: right;">3.904</td></tr><tr class="even"><td style="text-align: left;">gvar</td><td style="text-align: right;">0.578</td><td style="text-align: right;">0.594</td><td style="text-align: right;">0.577</td><td style="text-align: right;">0.590</td><td style="text-align: right;">0.576</td><td style="text-align: right;">0.674</td><td style="text-align: right;">0.593</td><td style="text-align: right;">0.596</td><td style="text-align: right;">0.590</td><td style="text-align: right;">0.593</td></tr><tr class="odd"><td style="text-align: left;">hermes</td><td style="text-align: right;">0.609</td><td style="text-align: right;">0.619</td><td style="text-align: right;">0.651</td><td style="text-align: right;">0.618</td><td style="text-align: right;">0.985</td><td style="text-align: right;">0.705</td><td style="text-align: right;">0.704</td><td style="text-align: right;">0.675</td><td style="text-align: right;">1.416</td><td style="text-align: right;">1.673</td></tr><tr class="even"><td style="text-align: left;">hier...sales_1D</td><td style="text-align: right;">0.557</td><td style="text-align: right;">0.552</td><td style="text-align: right;">0.547</td><td style="text-align: right;">0.552</td><td style="text-align: right;">0.547</td><td style="text-align: right;">0.572</td><td style="text-align: right;">0.551</td><td style="text-align: right;">0.551</td><td style="text-align: right;">0.720</td><td style="text-align: right;">0.793</td></tr><tr class="odd"><td style="text-align: left;">hier...sales_1W</td><td style="text-align: right;">0.616</td><td style="text-align: right;">0.625</td><td style="text-align: right;">0.621</td><td style="text-align: right;">0.618</td><td style="text-align: right;">0.637</td><td style="text-align: right;">0.637</td><td style="text-align: right;">0.637</td><td style="text-align: right;">0.637</td><td style="text-align: right;">0.746</td><td style="text-align: right;">10.477</td></tr><tr class="even"><td style="text-align: left;">hospital</td><td style="text-align: right;">0.686</td><td style="text-align: right;">0.673</td><td style="text-align: right;">0.688</td><td style="text-align: right;">0.680</td><td style="text-align: right;">0.733</td><td style="text-align: right;">0.696</td><td style="text-align: right;">0.697</td><td style="text-align: right;">0.697</td><td style="text-align: right;">0.697</td><td style="text-align: right;">0.726</td></tr><tr class="odd"><td style="text-align: left;">hosp...sions_1D</td><td style="text-align: right;">0.554</td><td style="text-align: right;">0.554</td><td style="text-align: right;">0.555</td><td style="text-align: right;">0.556</td><td style="text-align: right;">0.555</td><td style="text-align: right;">0.562</td><td style="text-align: right;">0.556</td><td style="text-align: right;">0.556</td><td style="text-align: right;">0.557</td><td style="text-align: right;">0.556</td></tr><tr class="even"><td style="text-align: left;">hosp...sions_1W</td><td style="text-align: right;">0.576</td><td style="text-align: right;">0.581</td><td style="text-align: right;">0.585</td><td style="text-align: right;">0.580</td><td style="text-align: right;">0.598</td><td style="text-align: right;">0.581</td><td style="text-align: right;">0.586</td><td style="text-align: right;">0.587</td><td style="text-align: right;">0.579</td><td style="text-align: right;">0.578</td></tr><tr class="odd"><td style="text-align: left;">jena_weather_10T</td><td style="text-align: right;">0.354</td><td style="text-align: right;">0.398</td><td style="text-align: right;">0.389</td><td style="text-align: right;">0.357</td><td style="text-align: right;">0.368</td><td style="text-align: right;">0.413</td><td style="text-align: right;">0.418</td><td style="text-align: right;">0.418</td><td style="text-align: right;">0.673</td><td style="text-align: right;">0.742</td></tr><tr class="even"><td style="text-align: left;">jena_weather_1D</td><td style="text-align: right;">1.111</td><td style="text-align: right;">1.143</td><td style="text-align: right;">1.072</td><td style="text-align: right;">1.090</td><td style="text-align: right;">1.112</td><td style="text-align: right;">1.155</td><td style="text-align: right;">1.075</td><td style="text-align: right;">1.075</td><td style="text-align: right;">1.339</td><td style="text-align: right;">1.664</td></tr><tr class="odd"><td style="text-align: left;">jena_weather_1H</td><td style="text-align: right;">0.353</td><td style="text-align: right;">0.429</td><td style="text-align: right;">0.356</td><td style="text-align: right;">0.359</td><td style="text-align: right;">0.362</td><td style="text-align: right;">0.413</td><td style="text-align: right;">0.367</td><td style="text-align: right;">0.367</td><td style="text-align: right;">0.452</td><td style="text-align: right;">0.553</td></tr><tr class="even"><td style="text-align: left;">kdd_cup_2022_10T</td><td style="text-align: right;">0.425</td><td style="text-align: right;">0.456</td><td style="text-align: right;">0.533</td><td style="text-align: right;">0.533</td><td style="text-align: right;">0.533</td><td style="text-align: right;">0.555</td><td style="text-align: right;">0.533</td><td style="text-align: right;">0.533</td><td style="text-align: right;">0.777</td><td style="text-align: right;">0.747</td></tr><tr class="odd"><td style="text-align: left;">kdd_cup_2022_1D</td><td style="text-align: right;">0.704</td><td style="text-align: right;">0.709</td><td style="text-align: right;">0.697</td><td style="text-align: right;">0.698</td><td style="text-align: right;">0.704</td><td style="text-align: right;">0.715</td><td style="text-align: right;">0.708</td><td style="text-align: right;">0.709</td><td style="text-align: right;">0.730</td><td style="text-align: right;">0.751</td></tr><tr class="even"><td style="text-align: left;">kdd_cup_2022_30T</td><td style="text-align: right;">0.439</td><td style="text-align: right;">0.459</td><td style="text-align: right;">0.432</td><td style="text-align: right;">0.505</td><td style="text-align: right;">0.429</td><td style="text-align: right;">0.543</td><td style="text-align: right;">0.427</td><td style="text-align: right;">0.561</td><td style="text-align: right;">0.679</td><td style="text-align: right;">0.772</td></tr><tr class="odd"><td style="text-align: left;">m5_1D</td><td style="text-align: right;">0.722</td><td style="text-align: right;">0.720</td><td style="text-align: right;">0.714</td><td style="text-align: right;">0.729</td><td style="text-align: right;">0.729</td><td style="text-align: right;">1.254</td><td style="text-align: right;">0.729</td><td style="text-align: right;">0.729</td><td style="text-align: right;">1.254</td><td style="text-align: right;">0.853</td></tr><tr class="even"><td style="text-align: left;">m5_1M</td><td style="text-align: right;">0.977</td><td style="text-align: right;">0.986</td><td style="text-align: right;">0.974</td><td style="text-align: right;">0.980</td><td style="text-align: right;">1.044</td><td style="text-align: right;">1.002</td><td style="text-align: right;">0.996</td><td style="text-align: right;">1.000</td><td style="text-align: right;">1.022</td><td style="text-align: right;">1.108</td></tr><tr class="odd"><td style="text-align: left;">m5_1W</td><td style="text-align: right;">0.900</td><td style="text-align: right;">0.904</td><td style="text-align: right;">0.903</td><td style="text-align: right;">0.917</td><td style="text-align: right;">0.905</td><td style="text-align: right;">0.928</td><td style="text-align: right;">0.907</td><td style="text-align: right;">0.917</td><td style="text-align: right;">0.936</td><td style="text-align: right;">0.953</td></tr><tr class="even"><td style="text-align: left;">proenfo_gfc12</td><td style="text-align: right;">0.649</td><td style="text-align: right;">0.614</td><td style="text-align: right;">0.908</td><td style="text-align: right;">0.917</td><td style="text-align: right;">0.917</td><td style="text-align: right;">0.834</td><td style="text-align: right;">0.917</td><td style="text-align: right;">0.917</td><td style="text-align: right;">1.305</td><td style="text-align: right;">2.431</td></tr><tr class="odd"><td style="text-align: left;">proenfo_gfc14</td><td style="text-align: right;">0.430</td><td style="text-align: right;">0.426</td><td style="text-align: right;">0.721</td><td style="text-align: right;">0.767</td><td style="text-align: right;">0.767</td><td style="text-align: right;">0.515</td><td style="text-align: right;">0.767</td><td style="text-align: right;">0.767</td><td style="text-align: right;">0.906</td><td style="text-align: right;">1.110</td></tr><tr class="even"><td style="text-align: left;">proenfo_gfc17</td><td style="text-align: right;">0.485</td><td style="text-align: right;">0.528</td><td style="text-align: right;">0.889</td><td style="text-align: right;">0.900</td><td style="text-align: right;">0.900</td><td style="text-align: right;">0.672</td><td style="text-align: right;">0.900</td><td style="text-align: right;">0.900</td><td style="text-align: right;">1.142</td><td style="text-align: right;">2.135</td></tr><tr class="odd"><td style="text-align: left;">redset_15T</td><td style="text-align: right;">0.790</td><td style="text-align: right;">1.208</td><td style="text-align: right;">0.833</td><td style="text-align: right;">0.741</td><td style="text-align: right;">0.818</td><td style="text-align: right;">1.250</td><td style="text-align: right;">1.041</td><td style="text-align: right;">1.243</td><td style="text-align: right;">1.231</td><td style="text-align: right;">1.231</td></tr><tr class="even"><td style="text-align: left;">redset_1H</td><td style="text-align: right;">1.365</td><td style="text-align: right;">1.338</td><td style="text-align: right;">1.337</td><td style="text-align: right;">1.367</td><td style="text-align: right;">1.306</td><td style="text-align: right;">1.321</td><td style="text-align: right;">1.410</td><td style="text-align: right;">2.279</td><td style="text-align: right;">1.859</td><td style="text-align: right;">2.377</td></tr><tr class="odd"><td style="text-align: left;">redset_5T</td><td style="text-align: right;">0.654</td><td style="text-align: right;">0.749</td><td style="text-align: right;">0.787</td><td style="text-align: right;">0.723</td><td style="text-align: right;">0.719</td><td style="text-align: right;">0.711</td><td style="text-align: right;">0.793</td><td style="text-align: right;">1.026</td><td style="text-align: right;">2.690</td><td style="text-align: right;">1.224</td></tr><tr class="even"><td style="text-align: left;">restaurant</td><td style="text-align: right;">0.685</td><td style="text-align: right;">0.686</td><td style="text-align: right;">0.682</td><td style="text-align: right;">0.677</td><td style="text-align: right;">0.704</td><td style="text-align: right;">0.693</td><td style="text-align: right;">0.689</td><td style="text-align: right;">0.689</td><td style="text-align: right;">0.709</td><td style="text-align: right;">1.021</td></tr><tr class="odd"><td style="text-align: left;">rohlik_orders_1D</td><td style="text-align: right;">0.959</td><td style="text-align: right;">1.052</td><td style="text-align: right;">0.986</td><td style="text-align: right;">1.006</td><td style="text-align: right;">1.135</td><td style="text-align: right;">1.341</td><td style="text-align: right;">0.970</td><td style="text-align: right;">1.051</td><td style="text-align: right;">1.211</td><td style="text-align: right;">1.447</td></tr><tr class="even"><td style="text-align: left;">rohlik_orders_1W</td><td style="text-align: right;">1.300</td><td style="text-align: right;">1.415</td><td style="text-align: right;">1.300</td><td style="text-align: right;">1.328</td><td style="text-align: right;">1.493</td><td style="text-align: right;">1.524</td><td style="text-align: right;">1.532</td><td style="text-align: right;">1.428</td><td style="text-align: right;">1.398</td><td style="text-align: right;">1.419</td></tr><tr class="odd"><td style="text-align: left;">rohlik_sales_1D</td><td style="text-align: right;">0.881</td><td style="text-align: right;">0.899</td><td style="text-align: right;">1.148</td><td style="text-align: right;">1.096</td><td style="text-align: right;">1.218</td><td style="text-align: right;">1.375</td><td style="text-align: right;">1.170</td><td style="text-align: right;">1.147</td><td style="text-align: right;">1.248</td><td style="text-align: right;">1.266</td></tr><tr class="even"><td style="text-align: left;">rohlik_sales_1W</td><td style="text-align: right;">1.274</td><td style="text-align: right;">1.159</td><td style="text-align: right;">1.425</td><td style="text-align: right;">1.401</td><td style="text-align: right;">1.505</td><td style="text-align: right;">1.221</td><td style="text-align: right;">1.516</td><td style="text-align: right;">1.522</td><td style="text-align: right;">1.646</td><td style="text-align: right;">14.453</td></tr><tr class="odd"><td style="text-align: left;">rossmann_1D</td><td style="text-align: right;">0.283</td><td style="text-align: right;">0.245</td><td style="text-align: right;">0.539</td><td style="text-align: right;">0.502</td><td style="text-align: right;">0.568</td><td style="text-align: right;">0.232</td><td style="text-align: right;">0.527</td><td style="text-align: right;">0.525</td><td style="text-align: right;">0.578</td><td style="text-align: right;">0.594</td></tr><tr class="even"><td style="text-align: left;">rossmann_1W</td><td style="text-align: right;">0.308</td><td style="text-align: right;">0.256</td><td style="text-align: right;">0.482</td><td style="text-align: right;">0.495</td><td style="text-align: right;">0.494</td><td style="text-align: right;">0.254</td><td style="text-align: right;">0.497</td><td style="text-align: right;">0.487</td><td style="text-align: right;">0.501</td><td style="text-align: right;">0.518</td></tr><tr class="odd"><td style="text-align: left;">solar_1D</td><td style="text-align: right;">0.594</td><td style="text-align: right;">0.601</td><td style="text-align: right;">0.614</td><td style="text-align: right;">0.618</td><td style="text-align: right;">0.622</td><td style="text-align: right;">0.615</td><td style="text-align: right;">0.637</td><td style="text-align: right;">0.635</td><td style="text-align: right;">0.653</td><td style="text-align: right;">0.656</td></tr><tr class="even"><td style="text-align: left;">solar_1W</td><td style="text-align: right;">0.895</td><td style="text-align: right;">0.924</td><td style="text-align: right;">1.121</td><td style="text-align: right;">1.096</td><td style="text-align: right;">1.392</td><td style="text-align: right;">0.870</td><td style="text-align: right;">1.658</td><td style="text-align: right;">0.940</td><td style="text-align: right;">1.296</td><td style="text-align: right;">1.212</td></tr><tr class="odd"><td style="text-align: left;">s...weather_15T</td><td style="text-align: right;">0.677</td><td style="text-align: right;">0.671</td><td style="text-align: right;">0.846</td><td style="text-align: right;">0.906</td><td style="text-align: right;">0.784</td><td style="text-align: right;">0.747</td><td style="text-align: right;">0.839</td><td style="text-align: right;">0.809</td><td style="text-align: right;">1.194</td><td style="text-align: right;">2.529</td></tr><tr class="even"><td style="text-align: left;">s...weather_1H</td><td style="text-align: right;">0.767</td><td style="text-align: right;">0.660</td><td style="text-align: right;">0.900</td><td style="text-align: right;">0.815</td><td style="text-align: right;">0.876</td><td style="text-align: right;">0.701</td><td style="text-align: right;">0.907</td><td style="text-align: right;">0.816</td><td style="text-align: right;">1.458</td><td style="text-align: right;">2.182</td></tr><tr class="odd"><td style="text-align: left;">uci...ality_1D</td><td style="text-align: right;">1.046</td><td style="text-align: right;">1.147</td><td style="text-align: right;">1.128</td><td style="text-align: right;">1.205</td><td style="text-align: right;">1.260</td><td style="text-align: right;">1.186</td><td style="text-align: right;">1.138</td><td style="text-align: right;">1.092</td><td style="text-align: right;">1.123</td><td style="text-align: right;">1.181</td></tr><tr class="even"><td style="text-align: left;">uci...ality_1H</td><td style="text-align: right;">0.798</td><td style="text-align: right;">0.934</td><td style="text-align: right;">0.865</td><td style="text-align: right;">0.877</td><td style="text-align: right;">0.870</td><td style="text-align: right;">0.931</td><td style="text-align: right;">0.945</td><td style="text-align: right;">0.899</td><td style="text-align: right;">1.561</td><td style="text-align: right;"><span class="math inline">&gt; 10<sup>3</sup></span></td></tr><tr class="odd"><td style="text-align: left;">uk_nat_1D/cum</td><td style="text-align: right;">7.826</td><td style="text-align: right;">10.394</td><td style="text-align: right;">7.653</td><td style="text-align: right;">7.051</td><td style="text-align: right;">6.188</td><td style="text-align: right;">13.045</td><td style="text-align: right;">6.763</td><td style="text-align: right;">8.157</td><td style="text-align: right;">7.712</td><td style="text-align: right;">7.184</td></tr><tr class="even"><td style="text-align: left;">uk_nat_1D/new</td><td style="text-align: right;">2.037</td><td style="text-align: right;">2.071</td><td style="text-align: right;">1.992</td><td style="text-align: right;">2.135</td><td style="text-align: right;">2.039</td><td style="text-align: right;">2.076</td><td style="text-align: right;">2.135</td><td style="text-align: right;">2.122</td><td style="text-align: right;">2.799</td><td style="text-align: right;">2.741</td></tr><tr class="odd"><td style="text-align: left;">uk_nat_1W/cum</td><td style="text-align: right;">2.783</td><td style="text-align: right;">3.478</td><td style="text-align: right;">3.192</td><td style="text-align: right;">4.011</td><td style="text-align: right;">2.824</td><td style="text-align: right;">2.872</td><td style="text-align: right;">3.014</td><td style="text-align: right;">3.435</td><td style="text-align: right;">2.238</td><td style="text-align: right;">2.399</td></tr><tr class="even"><td style="text-align: left;">uk_nat_1W/new</td><td style="text-align: right;">4.968</td><td style="text-align: right;">4.784</td><td style="text-align: right;">4.532</td><td style="text-align: right;">3.783</td><td style="text-align: right;">5.098</td><td style="text-align: right;">4.143</td><td style="text-align: right;">3.873</td><td style="text-align: right;">4.148</td><td style="text-align: right;">5.741</td><td style="text-align: right;">5.024</td></tr><tr class="odd"><td style="text-align: left;">uk_utla_1D/new</td><td style="text-align: right;">3.725</td><td style="text-align: right;">3.815</td><td style="text-align: right;">3.729</td><td style="text-align: right;">3.512</td><td style="text-align: right;">4.036</td><td style="text-align: right;">3.801</td><td style="text-align: right;">3.565</td><td style="text-align: right;">3.531</td><td style="text-align: right;">5.582</td><td style="text-align: right;">5.623</td></tr><tr class="even"><td style="text-align: left;">uk_utla_1W/cum</td><td style="text-align: right;">17.442</td><td style="text-align: right;">18.932</td><td style="text-align: right;">19.435</td><td style="text-align: right;">18.486</td><td style="text-align: right;">16.286</td><td style="text-align: right;">16.912</td><td style="text-align: right;">19.325</td><td style="text-align: right;">17.489</td><td style="text-align: right;">14.331</td><td style="text-align: right;">16.313</td></tr><tr class="odd"><td style="text-align: left;">us_cons_1M</td><td style="text-align: right;">1.464</td><td style="text-align: right;">1.698</td><td style="text-align: right;">1.467</td><td style="text-align: right;">1.605</td><td style="text-align: right;">1.564</td><td style="text-align: right;">1.571</td><td style="text-align: right;">1.513</td><td style="text-align: right;">1.516</td><td style="text-align: right;">1.486</td><td style="text-align: right;">1.445</td></tr><tr class="even"><td style="text-align: left;">us_cons_1Q</td><td style="text-align: right;">1.724</td><td style="text-align: right;">2.302</td><td style="text-align: right;">1.803</td><td style="text-align: right;">1.927</td><td style="text-align: right;">1.707</td><td style="text-align: right;">2.673</td><td style="text-align: right;">1.796</td><td style="text-align: right;">1.764</td><td style="text-align: right;">1.908</td><td style="text-align: right;">1.886</td></tr><tr class="odd"><td style="text-align: left;">us_cons_1Y</td><td style="text-align: right;">3.730</td><td style="text-align: right;">4.807</td><td style="text-align: right;">3.634</td><td style="text-align: right;">4.007</td><td style="text-align: right;">3.898</td><td style="text-align: right;">4.180</td><td style="text-align: right;">4.807</td><td style="text-align: right;">4.108</td><td style="text-align: right;">3.786</td><td style="text-align: right;">4.081</td></tr><tr class="even"><td style="text-align: left;">walmart</td><td style="text-align: right;">0.648</td><td style="text-align: right;">0.696</td><td style="text-align: right;">0.707</td><td style="text-align: right;">0.679</td><td style="text-align: right;">0.907</td><td style="text-align: right;">0.662</td><td style="text-align: right;">0.845</td><td style="text-align: right;">0.774</td><td style="text-align: right;">1.217</td><td style="text-align: right;"><span class="math inline">&gt; 10<sup>3</sup></span></td></tr><tr class="odd"><td style="text-align: left;">world_co2_emis</td><td style="text-align: right;">2.670</td><td style="text-align: right;">2.761</td><td style="text-align: right;">2.643</td><td style="text-align: right;">2.876</td><td style="text-align: right;">2.716</td><td style="text-align: right;">2.720</td><td style="text-align: right;">2.875</td><td style="text-align: right;">2.754</td><td style="text-align: right;">2.688</td><td style="text-align: right;">7.724</td></tr><tr class="even"><td style="text-align: left;">world_life_exp</td><td style="text-align: right;">1.187</td><td style="text-align: right;">1.190</td><td style="text-align: right;">1.109</td><td style="text-align: right;">1.210</td><td style="text-align: right;">1.639</td><td style="text-align: right;">1.149</td><td style="text-align: right;">1.785</td><td style="text-align: right;">1.345</td><td style="text-align: right;">1.305</td><td style="text-align: right;">1.302</td></tr><tr class="odd"><td style="text-align: left;">world_tourism</td><td style="text-align: right;">3.052</td><td style="text-align: right;">3.149</td><td style="text-align: right;">3.052</td><td style="text-align: right;">3.562</td><td style="text-align: right;">3.208</td><td style="text-align: right;">2.795</td><td style="text-align: right;">3.264</td><td style="text-align: right;">3.164</td><td style="text-align: right;">2.552</td><td style="text-align: right;">2.882</td></tr></tbody></table>

: **Per-task SQL on fev-bench (100 tasks).** Lower is better; values are after leakage and failure imputation. Per-row top-three are highlighted with gold / silver / bronze backgrounds. Columns are the ten models with the most medal placements; ordered by overall SQL skill score. Values exceeding $10^3$ are capped for layout.
:::

TabPFN Use Case Overview {#app:use_cases}
========================

Previous TabPFN models have been applied to a broad set of use cases. We now list 202 published use cases across different industries.

Highlights {#highlights .unnumbered}
==========

We highlight a selection of representative use cases that demonstrate TabPFN's strengths across domains:

1.  TabPFN enabled non-invasive early detection of pancreatic cancer by integrating NMR metabolomics with clinical and protein biomarkers [@wu2026panmetai]. [Link](https://www.nature.com/articles/s41467-026-69426-9)

2.  TabPFN provided highly accurate predictions of donor mobilization success using baseline and post-mobilization variables, facilitating early triage and improved transplantation outcomes [@adil2026deep]. [Link](https://doi.org/10.1016/j.jtct.2026.02.016)

3.  TabPFN was used for effective differentiation between psychotic and non-psychotic major depression, improving classification accuracy and supporting psychiatric diagnosis [@zheng2026differentiation]. [Link](https://doi.org/10.1016/j.jad.2026.121454)

4.  TabPFN served as a high-fidelity surrogate model for optimizing geopolymer concrete mix design, achieving superior accuracy, generalization, and low-uncertainty predictions compared to other ML approaches [@sichani2025machine]. [Link](https://www.nature.com/articles/s41598-025-29088-x)

5.  TabPFN enabled robust prediction of silica nanoparticle cytotoxicity [@zhang2025boosting]. [Link](https://www.nature.com/articles/s41598-025-33872-0)

6.  TabPFN demonstrated superior performance and translational feasibility for liver fibrosis staging [@CHEN2026102726]. [Link](https://www.cell.com/cell-reports-medicine/fulltext/S2666-3791(26)00143-6)

7.  TabPFN enables accurate prediction of reaction kinetics, facilitating mechanistic understanding in biochar-catalyzed antibiotic degradation processes [@latif2026deep]. [Link](https://link.springer.com/article/10.1007/s42773-026-00606-y)

8.  TabPFN serves as the top-performing regression model to estimate degradation kinetics from multi-source experimental data [@latif2026deep]. [Link](https://doi.org/10.1007/s42773-026-00606-y)

9.  TabPFN was employed as a core modeling component for learning from multimodal tabular data under strict temporal constraints, enabling strong discriminative performance, improved probability calibration, and effective causal forecasting in early rug-pull detection [@shoaei2026lroo]. [Link](http://arxiv.org/abs/2603.11324v1)

10. TabPFN was fine-tuned into a domain-specific model (FinPFN) for regime-aware stock return prediction, improving performance in non-stationary financial markets by adapting to evolving feature--return relationships [@wang2025metalearning]. [Link](https://www.sciencedirect.com/science/article/abs/pii/S1386418125000825)

11. TabPFN enabled early fault classification in rotating machinery, addressing data scarcity in industrial scenarios [@manuf_usecase1_rotating_faults_tabpfn]. [Link](https://ieeexplore.ieee.org/abstract/document/10318062)

Healthcare and Life Sciences {#healthcare-and-life-sciences .unnumbered}
============================

We collected 98 published TabPFN use cases in this area. Applications span diagnosis, prognosis, treatment response prediction, and biomarker-based modeling under frequent data scarcity.

1.  TabPFN enabled non-invasive early detection of pancreatic cancer by integrating NMR metabolomics with clinical and protein biomarkers. [@wu2026panmetai]. [Link](https://www.nature.com/articles/s41467-026-69426-9)

2.  TabPFN enables highly accurate and cost-efficient molecular property prediction by pairing in-context learning with frozen molecular embeddings and descriptor [@hicham2026tabularfoundationmodelsincontext]. [Link](https://arxiv.org/abs/2604.16123)

3.  TabPFN enabled robust prediction of silica nanoparticle cytotoxicity [@zhang2025boosting]. [Link](https://www.nature.com/articles/s41598-025-33872-0)

4.  TabPFN was combined with BulkFormer to improve prediction accuracy of post-transplant kidney function for better assessment of organ viability during machine perfusion or cold storage [@tingle2026combining]. [Link](https://doi.org/10.21203/rs.3.rs-9242336/v1)

5.  TabPFN enhances survival analysis, leading to superior performance compared to specialized methods [@seletkov2026survivalincontextpriorfittedincontext]. [Link](https://arxiv.org/pdf/2603.29475)

6.  TabPFN demonstrated superior performance and translational feasibility for liver fibrosis staging [@CHEN2026102726]. [Link](https://www.cell.com/cell-reports-medicine/fulltext/S2666-3791(26)00143-6)

7.  TabPFN was leveraged in cardiovascular disease diagnosis [@hasan2026advancing]. [Link](https://doi.org/10.1038/s41598-026-35451-3)

8.  TabPFN enabled accurate prediction of ALM from multimodal clinical data and improved sarcopenia screening by maintaining robust performance despite missing modalities [@kita2026transformerbased]. [Link](https://doi.org/10.1186/s12967-026-08079-0)

9.  TabPFN was employed in the winning solution for predicting walking function [@villines2026asia]. [Link](https://doi.org/10.46292/sci25-00137)

10. TabPFN demonstrated high accuracy and specificity in matching cell line transcriptomes to reference kidney cell types using curated kidney marker gene lists, enhancing robust assessment of cell line identity [@Schoberth2026.03.30.715265]. [Link](https://doi.org/10.64898/2026.03.30.715265)

11. TabPFN was used to enhance prediction accuracy of protein coupling based on structural features, improving biological insight into protein interactions [@Pasquale2026.03.07.710286]. [Link](https://www.biorxiv.org/content/10.64898/2026.03.07.710286v1)

12. TabPFN supports risk stratification and adverse event prediction in chemotherapy-based stem cell mobilization, enabling improved ward management and resource allocation [@schwarz2026predicting]. [Link](https://www.nature.com/articles/s41746-026-02394-y)

13. TabPFN used with other ML models to improve radiomics-based breast cancer diagnosis, enhancing feature-combination performance and classification accuracy [@daniels2026application]. [Link](https://www.nature.com/articles/s41598-026-40472-z)

14. TabPFN enhances model interpretability and accuracy in differentiating complex spinal infections, aiding clinical decision-making in ambiguous diagnostic cases [@githubGitHubSmallriver2024STBNet]. [Link](https://github.com/Smallriver2024/STBNet)

15. TabPFN enables improved data quality and predictive model reliability by integrating unstructured clinical text with automated pipelines, enhancing early disease prediction and clinical decision-making [@domingoaldama2026automatingearlydiseaseprediction]. [Link](http://arxiv.org/abs/2603.28167)

16. TabPFN improved severity classification performance in diabetic retinopathy, supporting more accurate staging and treatment planning [@fang2026multitask]. [Link](https://www.sciencedirect.com/science/article/pii/S1572100026001225)

17. TabPFN was integrated into the multimodal MuCB-tabpfn framework, enabling high predictive accuracy in estimating pollutant concentrations in human blood [@liu2026mucbtabpfn]. [Link](https://www.sciencedirect.com/science/article/pii/S0147651326003842)

18. TabPFN enables better generalization and accuracy in modeling complex drug formulation data, improving AI-driven formulation design workflows [@zhong2026physicsbased]. [Link](https://www.sciencedirect.com/science/article/pii/S0168365926002622)

19. TabPFN enables state-of-the-art real-time stress detection by enhancing accuracy and interpretability of multimodal physiological and sensor data [@githubGitHubRishabhmannuMultiModalStressDetectionML]. [Link](https://github.com/Rishabhmannu/MultiModal-Stress-Detection-ML)

20. TabPFN was applied as a robust and data-efficient alternative for tabular learning in drug discovery, improving performance on small and medium datasets and under out-of-distribution conditions [@chen2026tabpfn]. [Link](https://doi.org/10.1021/acs.jcim.5c02823)

21. TabPFN was used to enhance clinical risk prediction from electronic health records by providing robust modeling under real-world constraints, improving prognosis accuracy and reliability [@pham2026retrievalaligned]. [Link](https://doi.org/10.21203/rs.3.rs-9085469/v1)

22. TabPFN achieved the highest performance in predicting BCRL risk with strong minority-class discrimination and accurate calibration [@sadek2026from]. [Link](https://doi.org/10.1186/s12874-026-02805-4)

23. TabPFN achieved strong generalization performance in predicting adsorption capacity in zeolites, with physically meaningful interpretability [@johnsson2026predicting]. [Link](https://doi.org/10.1021/acs.jpcc.5c08611)

24. TabPFN achieved superior discriminative performance in predicting RSA risk by integrating multidimensional clinical data into accurate and interpretable screening models [@chen2026multidimensional]. [Link](https://doi.org/10.3389/fimmu.2026.1774359)

25. TabPFN was used to encode structured EHR data for predicting peak VO$_2$ and identifying high-risk heart failure patients [@huang2026multimodal]. [Link](https://doi.org/10.1038/s41746-026-02493-w)

26. TabPFN provided highly accurate predictions of donor mobilization success using baseline and post-mobilization variables, facilitating early triage and improved transplantation outcomes [@adil2026deep]. [Link](https://doi.org/10.1016/j.jtct.2026.02.016)

27. TabPFN was integrated into the FocalTab framework to improve classification accuracy, handle class imbalance, and support early identification of adolescent alcohol use [@liu2026classification]. [Link](https://doi.org/10.64898/2026.02.24.26347002)

28. TabPFN demonstrated strong robustness in cross-cohort microbiome disease prediction under domain shift, maintaining competitive performance across datasets [@mu2026systematic]. [Link](https://doi.org/10.21203/rs.3.rs-8912605/v1)

29. TabPFN was used as a meta-learner combining predictions of multiple base models to capture complex interactions and enhance early coronary artery disease prediction accuracy [@papakyriakopoulos2026heart]. [Link](https://doi.org/10.21203/rs.3.rs-8239358/v1)

30. TabPFN enables Bayesian inference via in-context learning without per-dataset training, improving accuracy, calibration, and inference speed in scientific disease modeling tasks [@dinnage2026niche]. [Link](https://doi.org/10.32942/x2vq10)

31. TabPFN was extended to multimodal learning through MMPFN, enabling effective integration of non-tabular modalities with structured clinical data [@kim2026multimodalpfn]. [Link](http://arxiv.org/abs/2602.20223v2)

32. TabPFN enables unified Bayesian modeling to improve bioactivity prediction across the ChEMBL database, supporting more efficient drug discovery pipelines [@backenkhler2026chempfn]. [Link](https://doi.org/10.26434/chemrxiv.15000292/v1)

33. TabPFN was used for effective differentiation between psychotic and non-psychotic major depression, improving classification accuracy and supporting psychiatric diagnosis [@zheng2026differentiation]. [Link](https://doi.org/10.1016/j.jad.2026.121454)

34. TabPFN enables more accurate and efficient causal inference to aid early diagnosis and understanding of Long COVID [@githubGitHubSindyPinTACO]. [Link](https://github.com/SindyPin/TACO)

35. TabPFN was utilized to improve clinical risk prediction models on MIMIC-III data, enhancing both accuracy and efficiency [@githubGitHubAhmedAlMaroufFoundationModel_on_Mimic3_ClinRisk]. [Link](https://github.com/AhmedAlMarouf/FoundationModel_on_Mimic3_ClinRisk)

36. TabPFN outperformed current methods in predicting HFNC therapy outcomes and demonstrated potential for improved performance with additional clinical measurements [@yu2025evaluating]. [Link](https://doi.org/10.1186/s13054-025-05765-1)

37. TabPFN was used in a hybrid model combining radiomics and deep learning features to improve risk stratification for post-TIPS hepatic encephalopathy [@miao2025enhancing]. [Link](https://doi.org/10.1007/s12072-025-10934-z)

38. TabPFN was fine-tuned as a proxy model to predict synthetic likelihood of hMOFs, enabling high-fidelity large-scale screening in materials-related biomedical contexts [@wu2025digital]. [Link](https://doi.org/10.1002/adfm.202519565)

39. TabPFN improved intra-European ancestry prediction accuracy when combined with ML-based marker selection, outperforming traditional approaches [@maurer2025enhancing]. [Link](https://doi.org/10.1101/2025.11.08.687358)

40. TabPFN improves renal tumor classification accuracy in CT radiomics by effectively handling small, high-dimensional datasets without extensive tuning [@liu2025tabular]. [Link](https://doi.org/10.21037/qims-2025-1132)

41. TabPFN demonstrates competitive performance as a count-based model for clinical prediction on structured EHR data compared to transformer-based pipelines [@gao2025countbased]. [Link](http://arxiv.org/abs/2511.00782v1)

42. TabPFN improves empathy detection accuracy and cross-subject generalization in human-centered video datasets [@hasan2025privacypreserving]. [Link](http://arxiv.org/abs/2504.10808v2)

43. TabPFN enables accurate prediction of reaction kinetics, facilitating mechanistic understanding in biochar-catalyzed antibiotic degradation processes [@latif2026deep]. [Link](https://link.springer.com/article/10.1007/s42773-026-00606-y)

44. TabPFN yields competitive or superior performance for multiple imputation tasks compared to alternative statistical and ML methods [@sepin2026multiple]. [Link](https://www.mdpi.com/2571-905X/9/2/38)

45. TabPFN improves multimodal skin cancer diagnosis by combining structured lesion features with clinical data for more accurate and interpretable predictions [@fan2026lightweight]. [Link](https://www.sciencedirect.com/science/article/pii/S0020025526003609)

46. TabPFN supports pediatric disease classification in clinical decision support systems, reducing misdiagnosis in emergency settings [@fan2026lightweight]. [Link](https://www.scitepress.org/Papers/2026/143473/143473.pdf)

47. TabPFN improves EEG seizure classification across subjects, achieving high accuracy and strong generalization [@obaido2026evaluating]. [Link](https://doi.org/10.3390/app16073120)

48. TabPFN improves kelp origin prediction using stable isotope data, providing robust and interpretable environmental insights [@kang2026enhancing]. [Link](https://doi.org/10.1016/j.foodchem.2026.148591)

49. TabPFN predicts CO$_2$ frosting temperatures in natural gas mixtures with high accuracy and interpretability [@youcefi2026accurate]. [Link](https://doi.org/10.1016/j.chemolab.2026.105679)

50. TabPFN improves ADMET modeling by increasing prediction accuracy, simplifying deployment, and producing compact models [@chupakhin2025descriptorfirst]. [Link](https://doi.org/10.1021/acs.jcim.5c02094)

51. TabPFN enhances analysis and classification of volatile organic compounds using mass spectrometry data, improving efficiency in chemical and biomedical analysis [@granitto2025tabpfn]. [Link](https://doi.org/10.1038/s41598-025-29128-6)

52. TabPFN was applied to distinguish cancer patients from healthy individuals using immune system profiles from peripheral blood, facilitating predictions of immunotherapy responses [@hc_usecase1_bostongene_tabpfn]. [Link](https://www.linkedin.com/pulse/how-bostongene-utilized-tabpfn-identify-immune-system-profiles-vexle/)

53. A machine learning model employing TabPFN was developed for non-invasive diagnostic prediction of minimal change disease in patients with nephrotic syndrome, utilizing clinical biomarkers [@hc_usecase2_mcd_scirep]. [Link](https://www.nature.com/articles/s41598-024-73898-4)

54. TabPFN was integrated into a system for analyzing T-cell receptor repertoires combined with clinical biomarkers to forecast immunotherapy outcomes in cancer patients, as explored by researchers at BostonGene [@hc_usecase3_immunotypes_cancercell]. [Link](https://www.cell.com/cancer-cell/fulltext/S1535-6108(24)00132-6)

55. TabPFN enabled early detection of stillbirth risks through analysis of cardiotocography data, supporting improved prenatal care [@hc_usecase4_stillbirth_slas]. [Link](https://www.sciencedirect.com/science/article/pii/S2472630324000852)

56. Predictive modeling for postoperative outcomes following anterior cervical corpectomy utilized TabPFN to assess patient demographics and surgical parameters [@hc_usecase5_acc_asj]. [Link](https://pmc.ncbi.nlm.nih.gov/articles/PMC11366553/)

57. A hybrid model incorporating TabPFN was introduced to predict dementia progression in Parkinson's disease patients, handling small datasets and missing values effectively [@hc_usecase6_pd_dementia_lightgbm_tabpfn]. [Link](https://journals.sagepub.com/doi/full/10.1177/20552076241272585)

58. A machine learning model based on TabPFN was developed to predict 90-day unfavorable outcomes in stroke patients with distal vessel occlusions using CT perfusion imaging [@hc_usecase7_dmvo_ajnr]. [Link](https://www.ajnr.org/content/early/2024/10/28/ajnr.A8547.abstract)

59. TabPFN facilitated the prediction of non-invasive ventilation outcomes in patients with acute hypoxemic respiratory failure, supporting early identification of treatment failures [@hc_usecase9_niv_tabpfn]. [Link](https://www.researchgate.net/profile/Antonio-Esquinas/publication/393595503_Early-prediction-of-non-invasive_ventilation_outcome_using_the_TabPFN_machine_learning_model_a_multi-centre_validation_study/links/68718bc56e247f362b18c4b8/Early-prediction-of-non-invasive-ventilation-outcome-using-the-TabPFN-machine-learning-model-a-multi-centre-validation-study.pdf)

60. An interpretable Transformer-based model leveraging TabPFN was created to predict intravenous immunoglobulin resistance in pediatric patients with Kawasaki disease [@hc_usecase10_kawasaki_tabpfnv2]. [Link](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0327564)

61. TabPFN was employed in visual representation techniques for prostate cancer diagnosis, converting clinical biomarkers and symptom data into formats suitable for analysis [@hc_usecase51_prostate_visual_rep]. [Link](https://www.mdpi.com/2306-5354/11/7/635)

62. TabPFN was used to combine clinical, MR morphological, and delta-radiomics features to predict lymphovascular invasion in invasive breast cancer patients [@hc_usecase11_lvi_breast_tabpfn]. [Link](https://journals.sagepub.com/doi/full/10.1177/15330338251362050)

63. TabPFN is proposed to predict mental health trajectories through digital phenotyping, enabling proactive and personalized interventions in precision psychiatry [@hc_usecase12_precision_psychiatry_tabpfn]. [Link](https://onlinelibrary.wiley.com/doi/epdf/10.1002/mdr2.70017)

64. TabPFN contributed to cardiovascular disease risk stratification using clinical features from a large patient cohort, incorporating interpretability techniques [@hc_usecase13_ml_health_tabpfn]. [Link](https://github.com/Bruno-LSo/ML-Health-TABPFN)

65. TabPFN outperformed traditional machine learning models for early prediction of acute kidney injury in hospitalized patients, demonstrating generalizability across datasets [@hc_usecase14_aki_ssrn_tabpfn]. [Link](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5397006)

66. TabPFN was integrated into a framework for predicting postoperative mobility and discharge destinations in older adults using sensor data [@hc_usecase15_postop_mobility_sensors]. [Link](https://www.mdpi.com/1424-8220/25/16/5021)

67. TabPFN supported the prediction of infant temperament from maternal mental health data, aiding early identification of at-risk infants [@hc_usecase16_infant_temperament_tabpfn]. [Link](https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1659987/abstract)

68. TabPFN was employed to characterize clinical risk profiles for complications in type 2 diabetes mellitus patients, focusing on neuropathy and retinopathy [@hc_usecase17_t2dm_complications_tabpfn]. [Link](https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2025.1657366/abstract)

69. TabPFN was extended with a longitudinal-to-cross-sectional transformation to forecast Alzheimer's disease progression on neuroimaging datasets [@hc_usecase18_ad_l2c_tabpfn]. [Link](https://arxiv.org/abs/2508.17649)

70. TabPFN supported uncertainty calibration evaluation in medical data using variational techniques [@hc_usecase19_uncertainty_vbll_tabpfn]. [Link](https://arxiv.org/abs/2509.10048)

71. TabPFN was applied to predict tumor response to chemotherapy in cholangiocarcinoma patients using RNA expression landscapes [@hc_usecase20_cholangio_aacr_tabpfn]. [Link](https://aacrjournals.org/clincancerres/article/31/13_Supplement/A020/763312)

72. TabPFN was incorporated into a generative model framework for tasks like data augmentation and imputation in biomedicine [@hc_usecase21_tabpfgen]. [Link](https://arxiv.org/abs/2406.05216)

73. TabPFN facilitated the prediction of gallstone malignancy risks through analysis of associated disease factors [@hc_usecase22_gallstone_malignancy_tabpfn]. [Link](https://www.mdpi.com/2077-0383/14/17/6091)

74. TabPFN was used in classifying tuberculosis treatment outcomes based on clinical and sociodemographic data from national registries [@hc_usecase23_tb_outcomes_tabpfn]. [Link](https://www.researchsquare.com/article/rs-7502054/v1)

75. TabPFN contributed to early prediction of gestational diabetes using cell-free DNA and genetic scores from early pregnancy blood samples [@hc_usecase24_gdm_cfdna_tabpfn]. [Link](https://www.medrxiv.org/content/10.1101/2025.09.03.25334985v1)

76. TabPFN was used for predicting schizophrenia based on sense of agency features, emphasizing interpretability [@hc_usecase25_schizophrenia_soa_tabpfn]. [Link](https://www.sciencedirect.com/science/article/abs/pii/S187620182500317X)

77. TabPFN was integrated into a physiologically based pharmacokinetic model for predicting dissolution and absorption of amorphous solid dispersions in drug development [@hc_usecase26_pbpk_asd_tabpfn]. [Link](https://doi.org/10.1016/j.jconrel.2025.114123)

78. TabPFN enabled classification of respiratory diseases from sound data, addressing clinical spectrum diversity [@hc_usecase27_respiratory_sounds_tabpfn]. [Link](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5529540)

79. TabPFN was applied to small-data tabular learning in drug discovery, handling data scarcity and distribution shifts [@hc_usecase28_drug_discovery_small_data_tabpfn]. [Link](https://chemrxiv.org/engage/chemrxiv/article-details/68d29b1cf2aff1677025b18f)

80. TabPFN facilitated prediction of coronary heart disease risk in patients with cardiovascular-kidney-metabolic syndrome, optimizing evaluation in small samples [@hc_usecase29_ckm_chd_tabpfn]. [Link](https://pmc.ncbi.nlm.nih.gov/articles/PMC12437168/)

81. TabPFN was used to predict success of allogeneic stem cell mobilization in donors, aiding transplant therapies [@hc_usecase30_stem_cell_mobilization_tabpfn]. [Link](https://www.biorxiv.org/content/10.1101/2025.09.17.676674v1.full)

82. TabPFN contributed to predicting manual strength using anthropometric data, focusing on accuracy and interpretability [@hc_usecase31]. [Link](https://pubmed.ncbi.nlm.nih.gov/41021732/)

83. TabPFN supported uncertainty-guided model selection for biomolecule efficacy prediction, enhancing ensemble optimization in drug discovery, as studied at GSK [@hc_usecase32]. [Link](https://www.arxiv.org/abs/2510.02476)

84. TabPFN was utilized in a multitask deep learning framework for optimizing in vitro fertilization decisions, including embryo transfer and pregnancy prediction [@hc_usecase33]. [Link](https://dspace.mit.edu/handle/1721.1/162969)

85. TabPFN enabled a framework for early Long COVID detection through causal gene identification and interpretability [@hc_usecase34]. [Link](https://www.medrxiv.org/content/10.1101/2025.10.02.25337138v1.full.pdf)

86. TabPFN was used for neoadjuvant therapy recommendations in breast cancer, integrating multi-omics data [@hc_usecase35]. [Link](https://www.medrxiv.org/content/10.1101/2025.10.03.25337255v1)

87. TabPFN facilitated prediction of recurrence and progression in oral potentially malignant disorder patients post-surgery [@hc_usecase36]. [Link](https://journals.lww.com/international-journal-of-surgery/abstract/9900/artificial_intelligence_for_predicting.3354.aspx)

88. TabPFN supported prediction of occult lymph node metastasis in non-small cell lung cancer patients treated with stereotactic ablative radiotherapy [@hc_usecase37]. [Link](https://www.redjournal.org/article/S0360-3016(25)05890-0/fulltext)

89. TabPFN was used in stroke diagnosis, addressing dataset imbalance and model interpretability for clinical decisions [@hc_usecase38]. [Link](https://www.ijsab.com/jsr-volume-9-issue-1/8205)

90. TabPFN was used to predict diabetes-related hypo- and hyperglycemia during hemodialysis using continuous glucose monitoring data, facilitating improved patient management [@hc_usecase40]. [Link](https://www.medrxiv.org/content/10.1101/2025.10.24.25338707v1)

91. TabPFN was applied to enhance diagnosis of hypervascular thyroid nodules using multimodal ultrasound features [@hc_usecase52_thyroid_hypervascular_multimodal]. [Link](https://pmc.ncbi.nlm.nih.gov/articles/PMC12432950/)

92. TabPFN was integrated with radiomics and clinical features to predict endovascular treatment success in femoropopliteal chronic total occlusions, supporting interventional planning [@hc_usecase53_fp_cto_radiomics]. [Link](https://www.researchgate.net/publication/396892115_Radiomics_enhance_the_prediction_of_endovascular_treatment_success_for_femoropopliteal_chronic_total_occlusions_a_proof-of-concept_study)

93. TabPFN was applied to CorvisST biomechanical indices to classify corneal disorders, improving diagnostic accuracy in ophthalmology [@hc_usecase41_corvisst_corneal]. [Link](https://pubmed.ncbi.nlm.nih.gov/41130662/)

94. TabPFN was incorporated into a non-invasive sleep staging framework using respiratory sound features, advancing passive sleep monitoring [@hc_usecase42_sleepstage_resp_sounds]. [Link](https://www.mdpi.com/1424-8220/25/20/6282)

95. TabPFN supported prediction of vancomycin blood concentrations to optimize antimicrobial dosing strategies in clinical practice [@hc_usecase43_vancomycin_mimic4]. [Link](https://journal.china-pharmacy.com/en/article/doi/10.6039/j.issn.1001-0408.2025.19.16/)

96. TabPFN was used to predict negative self-rated oral health in adults, identifying risk factors for targeted public-health interventions [@hc_usecase44_sroh_jdent]. [Link](https://www.sciencedirect.com/science/article/pii/S0300571225006104)

97. TabPFN was extended to very high-dimensional feature spaces to enable robust analysis of biomedical data, improving stability and interpretability in clinical applications [@hc_usecase45_tabpfn_wide]. [Link](https://arxiv.org/abs/2510.06162)

98. TabPFN predicted gastrointestinal bleeding risk in pediatric Henoch--Schönlein purpura patients, supporting early clinical intervention [@hc_usecase50_gibleed_hsp]. [Link](https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2025.1630807/full)

Financial Services, Banking, and Insurance {#financial-services-banking-and-insurance .unnumbered}
==========================================

We collected 7 published TabPFN use cases in this area. These applications include risk modeling, actuarial analysis, credit-related prediction, and customer analytics.

1.  TabPFN improves low-supervision transaction analytics by doubling zero-shot MCC on churn prediction and enhancing few-shot MCC, enabling better knowledge-grounded reasoning in financial transaction analysis [@sakhno2026financial]. [Link](http://arxiv.org/abs/2603.15459)

2.  TabPFN serves as a strong tabular baseline for financial transaction analytics (e.g., churn prediction) [@li2025classimbalancedaware]. [Link](http://arxiv.org/abs/2501.10677v2)

3.  TabPFN was employed as a core modeling component for learning from multimodal tabular data under strict temporal constraints, enabling strong discriminative performance, improved probability calibration, and effective causal forecasting in early rug-pull detection [@shoaei2026lroo]. [Link](http://arxiv.org/abs/2603.11324v1)

4.  TabPFN was used to predict forward financial returns, aiding investment strategy evaluation with the adjusted Sharpe ratio to enhance financial forecasting accuracy [@githubGitHubZx20030501sp500marketpredictiontabpfn]. [Link](https://github.com/zx20030501/sp500-market-prediction-tabpfn)

5.  TabPFN was fine-tuned into a domain-specific model (FinPFN) for regime-aware stock return prediction, improving performance in non-stationary financial markets by adapting to evolving feature--return relationships [@wang2025metalearning]. [Link](https://www.sciencedirect.com/science/article/abs/pii/S1386418125000825)

6.  TabPFN was benchmarked against leading AutoML frameworks on financial classification tasks, demonstrating strong performance in multiclass settings [@leyh2025automlfinance]. [Link](https://aisel.aisnet.org/acis2025/28/)

7.  TabPFN facilitated cross-selling of health insurance products through deep learning analysis of customer data [@fin_usecase2_crosssell_health_insurance]. [Link](https://ieeexplore.ieee.org/abstract/document/10475046)

Energy and Utilities {#energy-and-utilities .unnumbered}
====================

We collected 24 published TabPFN use cases in this area. They include environmental forecasting, renewable-energy prediction, and process or asset optimization across energy and utility systems.

1.  TabPFN was used as a surrogate model for fast one-step predictions under irregular measurements, aiding the delay-aware digital twin framework in handling nonlinear dynamics and operational delays in biogas production control [@wang2026aidriven]. [Link](https://doi.org/10.1016/j.compchemeng.2026.109637)

2.  TabPFN provided superior fitting performance for models analyzing biochar's impact on soil cadmium contamination, improving prediction accuracy in artificial and natural aging scenarios [@meng2026achieve]. [Link](https://www.sciencedirect.com/science/article/pii/S0016706126001205)

3.  TabPFN was used to improve the robustness and accuracy of photovoltaic power forecasting models by providing unified in-context prediction and strong generalization with heterogeneous inputs [@qiao2026cloudedgecollaborativelargemodels]. [Link](https://arxiv.org/pdf/2603.22343)

4.  TabPFN enables effective learning and prediction with very limited data by leveraging pretrained tabular inference, improving model performance in challenging geological prediction tasks [@wang2026predicting]. [Link](https://link.springer.com/article/10.1007/s00603-026-05420-3)

5.  TabPFN was used as a baseline for comparison in spatiotemporal forecasting of small Earth data, demonstrating value despite being surpassed in accuracy and robustness by the proposed method [@yang2025simplerobustforecastingspatiotemporally]. [Link](http://arxiv.org/abs/2510.08920v1)

6.  TabPFN demonstrated superior predictive performance under sparse sampling conditions, enabling accurate high-resolution mapping of groundwater bicarbonate concentrations and evaluation of scaling risks [@doiLeveragingTabPFN]. [Link](https://doi.org/10.6084/m9.figshare.31646935.v1)

7.  TabPFN was used for slope stability assessment, providing superior accuracy and robustness with limited sample sizes and enhancing regional scale evaluation efficiency [@li2026tabpfnbased]. [Link](https://doi.org/10.1016/j.rockmb.2026.100326)

8.  TabPFN surpasses other models in solar energy meteorology [@liu2026evaluating]. [Link](https://doi.org/10.1016/j.solener.2026.114472)

9.  TabPFN Regression was used as a predictive model for evaluating trophic level index from multi-source remote sensing data within the modeling framework [@si2025resolving]. [Link](https://doi.org/10.1038/s41545-025-00525-8)

10. TabPFN-based data augmentation improved model robustness under limited data, enabling accurate predictions of electrochemical performance and efficient screening of hard carbon candidates [@chen2025dataaugmentedmachinelearningpredicting]. [Link](http://arxiv.org/abs/2510.12833v1)

11. TabPFN was employed to predict river algal blooms through multi-classification of chlorophyll-a concentrations, aiding water management [@energy_usecase1_river_algal_tabpfn]. [Link](https://www.earticle.net/Article/A456244)

12. TabPFN facilitated wildfire propagation prediction in Canadian conifer forests, classifying fire types for environmental risk assessment [@energy_usecase2_wildfire_automl]. [Link](https://www.sciencedirect.com/science/article/pii/S157495412400253X)

13. TabPFN was integrated into a machine learning framework for optimizing energy consumption at wastewater treatment plants [@energy_usecase3_wwtp_tabpfnreg]. [Link](https://www.researchgate.net/publication/390516459_Machine_learning_framework_for_energy_consumption_optimization_using_the_TabPFNRegressor_algorithm)

14. TabPFN supported rainfall forecast post-processing using historical error patterns from environmental data [@energy_usecase4_rainfall_tabpfn]. [Link](https://github.com/aarxshi/rainfall_tabpfn)

15. TabPFN enabled solar forecast error adjustment, particularly during rapid weather changes, as developed by Open Climate Fix [@energy_usecase5_solar_adjuster_ocf]. [Link](https://gist.github.com/anshulg954/5f4423ee6b3d3151fa8d0d7fcd98d3eb)

16. TabPFN was applied to predict ash fusibility in high-alkali coal for improved energy production [@energy_usecase6_ash_fusibility_high_alkali]. [Link](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5406504)

17. TabPFN contributed to predicting Henry coefficients for alkanes in zeolites, aiding hydroisomerization in sustainable fuel production [@energy_usecase7_henry_zeolites]. [Link](https://pubs.acs.org/doi/full/10.1021/acs.jpcc.5c03868)

18. TabPFN facilitated shape-selectivity modeling in zeolites for long-chain alkane hydroisomerization, optimizing catalyst design [@energy_usecase8_shape_selectivity_zeolites]. [Link](https://doi.org/10.4233/uuid:f36da034-5cb3-42ca-a53d-d351f68a9ffa)

19. TabPFN was used in an integrated framework for estimated ultimate recovery prediction and fracturing optimization in shale gas reservoirs [@energy_usecase9_shale_eur_fracturing]. [Link](https://www.researchgate.net/publication/395761327_Coupling_EUR_Prediction_with_Fracturing_Optimization_An_Integrated_Machine_Learning_Framework_for_Shale_Gas_Development)

20. TabPFN supported core data augmentation for enhanced reservoir parameter prediction in oil and gas exploration [@energy_usecase10_core_augmentation_reservoir]. [Link](https://www.researchgate.net/publication/395434405_Enhancing_Reservoir_Parameter_Prediction_Workflows_via_Advanced_Core_Data_Augmentation)

21. TabPFN was employed to optimize energy performance in multistage centrifugal pumps through entropy generation analysis [@energy_usecase11_multistage_pump_tabpfn]. [Link](https://www.sciencedirect.com/science/article/abs/pii/S0360544225040411)

22. TabPFN was applied to generate advanced global heat flow maps at 0.2° resolution, integrating high-resolution geophysical data to improve geothermal resource modeling [@energy_usecase13_global_heatflow_02]. [Link](https://www.researchgate.net/publication/396728153_The_First_02_Resolution_Global_Continental_Heat_Flow_Map_Advancing_Fine-Scale_Geothermal_Modeling)

23. TabPFN contributed to FuelCast, standardizing benchmarks for ship fuel consumption prediction and improving efficiency in maritime operations [@energy_usecase14_fuelcast]. [Link](https://arxiv.org/abs/2510.08217)

24. TabPFN was used as the main supervised classifier to automatically identify thunderstorm ground enhancements from particle detector and environmental measurements [@energy_usecase15_tge_tabpfn]. [Link](https://arxiv.org/abs/2510.25125)

Industrial and Manufacturing {#industrial-and-manufacturing .unnumbered}
============================

We collected 41 published TabPFN use cases in this area. These applications cover industrial prediction, process optimization, and engineering-related modeling tasks.

1.  TabPFN served as a high-fidelity surrogate model for optimizing geopolymer concrete mix design, achieving superior accuracy, generalization, and low-uncertainty predictions compared to other ML approaches [@sichani2025machine]. [Link](https://www.nature.com/articles/s41598-025-29088-x)

2.  TabPFN enables rapid prediction of structural crack behavior, supporting reliability assessment and failure analysis in ultra-high-performance concrete [@mahmoodzadeh2025machine]. [Link](https://www.nature.com/articles/s41598-025-23610-x)

3.  TabPFN leveraged prior-data pretraining to predict WCFZ height from only 76 field samples without extensive tuning, providing superior and generalizable performance compared to other ML models [@wang2026highfidelity]. [Link](https://doi.org/10.1088/2631-8695/ae586d)

4.  TabPFN's multitask-aware prior adaptation improves predictive accuracy and computational efficiency in steel property prediction, enabling scalable, rapid, and reliable deployment for industrial quality control and process optimization [@sinodinos2026multitaskinformedpriorincontextlearning]. [Link](http://arxiv.org/abs/2603.22738v1)

5.  TabPFN's pre-trained foundation model enables strong small-data regression and well-calibrated uncertainty estimates in a single forward pass, significantly reducing evaluation cycles for active learning in materials discovery [@hu2026foundationmodelsurrogatesenabledataefficient]. [Link](http://arxiv.org/abs/2603.12567v3)

6.  TabPFN demonstrated strong generalization ability in predicting crash severity, contributing to improved data-driven safety interventions in electric vehicle crash contexts [@Somvanshi_2025]. [Link](http://arxiv.org/abs/2509.11449v1)

7.  TabPFN excelled in zero-shot inference and robustness for rare crash categories, enhancing classification of uncommon SAE automation levels with limited data [@somvanshi2025applyingmambaattentiontabpfntabtransformers]. [Link](http://arxiv.org/abs/2506.03160v1)

8.  TabPFN 2.5's dataset-level embedding identified 'engineering-like' synthetic datasets to enable continued pre-training on synthetic tasks, significantly improving accuracy and data efficiency over baseline models and AutoGluon on engineering regression datasets [@regenwetter2026engineeringregressionrealdatatraining]. [Link](http://arxiv.org/abs/2603.04692v1)

9.  TabPFN achieved the highest prediction accuracy in predicting concrete fracture properties and, combined with SHAP analysis, provided detailed and unbiased insights into nonlinear and interaction effects [@nikzad2026from]. [Link](https://doi.org/10.1016/j.mlwa.2026.100877)

10. TabPFN significantly reduces computational overhead and data requirements while enabling rapid, flexible, and data-efficient engineering design with competitive diversity and low performance error in generated designs [@wang2026tabpfnzeroshotparametricengineering]. [Link](http://arxiv.org/abs/2602.02735v1)

11. TabPFN served as a backbone combined with graph neural network embeddings and MagpieEX descriptors for effective, data-efficient, and physics-aware materials property prediction, outperforming sophisticated models [@li2025contextlearningfoundationmodels]. [Link](http://arxiv.org/abs/2601.00133v1)

12. TabPFN was used for spatial predictions and imputations in geotechnical modeling, achieving superior accuracy, faster inference, and well-calibrated predictive distributions compared to hierarchical Bayesian baselines [@Saito_2026]. [Link](http://arxiv.org/abs/2509.03191v1)

13. TabPFN provided strong prediction ability, outperforming alternatives and enabling more accurate performance prediction of biochar-modified concrete [@k2026advanced]. [Link](https://www.e3s-conferences.org/articles/e3sconf/pdf/2026/20/e3sconf_isdcp2026_01008.pdf)

14. TabPFN was used for accurate and reliable monitoring of driver alertness levels in challenging driving environments, proving more effective than traditional models like logistic regression and XGBoost [@liu2025prediction]. [Link](https://doi.org/10.1080/15389588.2025.2577155)

15. TabPFN enabled highly accurate and unbiased prediction of RAC's elastic modulus, improving trustworthiness and interpretability in a challenging heterogeneous materials domain [@lu2025more]. [Link](https://doi.org/10.3390/ma18225221)

16. TabPFN provided meta-learned prior knowledge that enhanced predictive performance and uncertainty quantification in the PSF-Net model for reliable 5G RF-EMF exposure assessment [@zhang2025psfnet]. [Link](https://doi.org/10.4271/2025-99-0127)

17. TabPFN showed superior predictive performance in predicting the hardgrove grindability index, improving model accuracy [@zhu2026demystifying]. [Link](https://www.sciencedirect.com/science/article/pii/S0016236126010513)

18. TabPFN delivered the best overall performance with the lowest error metrics and highest R^2^ and composite score, demonstrating superior predictive capability for asphalt concrete strength [@xing2026interpretable]. [Link](https://doi.org/10.20944/preprints202603.2259.v1)

19. TabPFN was applied to efficient multi-objective optimization of non-linear mixture designs, improving strength, reducing costs, and lowering carbon emissions for sustainable mining applications [@wang2026cleaner]. [Link](https://www.sciencedirect.com/science/article/pii/S095965262600658X)

20. TabPFN was employed for highly accurate and statistically superior predictions of pavement roughness by capturing complex interactions among traffic loads, structural parameters, and climatic factors [@qin2026interpretable]. [Link](https://doi.org/10.3390/buildings16071358)

21. TabPFN enables accurate prediction of CPB strength with limited data, improving efficiency and supporting theoretical understanding and practical application in mining industry tailings management [@zhang2025strength]. [Link](https://doi.org/10.1016/j.rineng.2025.108269)

22. TabPFN's improved spatiotemporal architecture enhances robustness and accuracy in geological condition detection, enabling better multi-step predictions with uncertainty quantification in tunnel construction [@zhang2026datadriven]. [Link](https://www.sciencedirect.com/science/article/pii/S1474034626003071)

23. TabPFN was utilized as a core component in a multi-objective optimization framework to design cemented foam backfill optimizing high strength, low cost, and low carbon emissions [@wang2026cleaner]. [Link](https://doi.org/10.1016/j.jclepro.2026.148119)

24. TabPFN enhances prediction accuracy and reliability with small sample sizes and missing features in geotechnical engineering [@wang2026predicting]. [Link](https://doi.org/10.1007/s00603-026-05420-3)

25. TabPFN enabled interpretable and uncertainty-aware parameter inference, improving predictions and revealing geotechnical relationships without model retraining for data-scarce applications [@saito2026tabpfnextensionsinterpretablegeotechnical]. [Link](http://arxiv.org/abs/2603.21033v1)

26. TabPFN was used to accurately predict compressive strength in geopolymer concrete from small datasets, supporting optimization of material composition and process parameters in construction material science [@stelmakh2025compressive]. [Link](https://doi.org/10.3390/a18120744)

27. TabPFN was used to improve prediction accuracy in concrete property estimation by integrating knowledge-constrained data augmentation [@deng2026enhancing]. [Link](https://doi.org/10.1016/j.asoc.2026.115037)

28. TabPFN enabled efficient and accurate mapping of key leaf-vein texture parameters to lubrication performance metrics, facilitating multi-objective optimization to identify optimal texture designs that improve journal bearing performance [@yin2026surrogateassisted]. [Link](https://doi.org/10.1016/j.triboint.2026.111936)

29. TabPFN enables robust mapping between operating boundary conditions and latent features to manage data scarcity and enhance regression accuracy, resulting in faster and more accurate temperature field reconstruction [@mao2026datadriven]. [Link](https://doi.org/10.3390/app16042029)

30. TabPFN enables encoding of structured device-physics primitives for reliable and precise analog circuit optimization, outperforming Gaussian-process methods in sample efficiency and final metric quality [@liu2025exploitingfunctionfamilystructureanalog]. [Link](http://arxiv.org/abs/2512.00712v1)

31. TabPFN enabled early fault classification in rotating machinery, addressing data scarcity in industrial scenarios [@manuf_usecase1_rotating_faults_tabpfn]. [Link](https://ieeexplore.ieee.org/abstract/document/10318062)

32. TabPFN facilitated microcontroller performance prediction, aiding semiconductor screening with minimal supervision, as studied at Infineon Technologies [@manuf_usecase2_mcu_performance_tabpfn]. [Link](https://iris.polito.it/handle/11583/3002056)

33. TabPFN was applied to caisson inclination prediction in ultra-deep construction, combining data denoising techniques [@manuf_usecase3_caisson_inclination_ml]. [Link](https://www.sciencedirect.com/science/article/abs/pii/S2214391225001734)

34. TabPFN supported event classification in phase-sensitive optical time-domain reflectometry systems for distributed fiber sensing [@manuf_usecase4_photdr_event_classification]. [Link](https://opg.optica.org/oe/fulltext.cfm?uri=oe-33-17-36646&id=575783)

35. TabPFN was integrated into an adaptive ensemble for intrusion detection in Industrial Internet of Things networks [@ruizvillafranca2024tabpfnbased]. [Link](https://rdcu.be/eASzJ)

36. TabPFN enabled a random forest-based framework for attack recognition in Internet of Things networks, improving interpretability [@manuf_usecase6_rf_tabpfn_iot_attack]. [Link](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11142329)

37. TabPFN was used in cryogenic-assisted abrasive waterjet machining for improving surface integrity in titanium alloys [@manuf_usecase8_cryo_awj_ti64]. [Link](https://www.sciencedirect.com/science/article/abs/pii/S2214993725004531)

38. TabPFN supported in-context learning for thermal behavior prediction in nano-phase change materials for battery systems [@manuf_usecase9_nano_pcm_thermal_icl]. [Link](https://www.sciencedirect.com/science/article/pii/S036054422504335X)

39. TabPFN was applied to explainable strength evaluation in multicomponent concrete mixtures [@manuf_usecase10_multicomponent_concrete]. [Link](https://www.mdpi.com/1996-1944/18/19/4456)

40. TabPFN was integrated into a multimodal fusion framework linking microstructure to friction behavior in martensitic stainless steel, improving wear resistance in materials engineering applications [@manuf_usecase11_martensitic_friction_multimodal]. [Link](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5346149)

41. TabPFN supported multiscale modeling to predict soil salinity in arid farmland, advancing sustainable agricultural management in regions such as Xinjiang [@manuf_usecase12_soil_salinity_multiscale]. [Link](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5591702)

Other Industries {#other-industries .unnumbered}
================

We collected 32 further published TabPFN use cases in this area, spanning a heterogeneous set of domains and prediction tasks.

1.  TabPFN enables the construction of credal sets for models where it was previously infeasible, broadening uncertainty representation and improving uncertainty estimation [@hofman2026efficientcredalpredictiondecalibration]. [Link](http://arxiv.org/abs/2603.08495v1)

2.  TabPFN enables efficient and valid hypothesis testing for feature relevance in tabular data, allowing accurate statistical inference in nonlinear and correlated settings [@salem2026validfeaturelevelinferencetabular]. [Link](http://arxiv.org/abs/2603.06609v1)

3.  TabPFN enables efficient computation of conditional Shapley values, resulting in faster and often more accurate explainable AI analysis [@olsen2026computingconditionalshapleyvalues]. [Link](http://arxiv.org/abs/2602.09489v1)

4.  TabPFN enables effective node classification by leveraging engineered tabular features from graph data as a practical and competitive alternative to graph-specific and language-based foundation models [@choi2025tabpfncompetegnnsnode]. [Link](http://arxiv.org/abs/2512.08798v1)

5.  TabPFN was integrated as the surrogate model enabling accurate and efficient prediction with uncertainty estimation, enhancing the performance, scalability, and zero-shot transfer capability of the DB-SAEA framework [@du2025metablackboxoptimizationbispacelandscape]. [Link](http://arxiv.org/abs/2511.15551v1)

6.  TabPFN was used to model the relationship between nuclear structure properties and $\alpha$-particle preformation factors, improving $\alpha$-decay half-life predictions and enabling insights into nuclear shell effects and magic numbers [@qi2026systematicstudyalphaparticlepreformation]. [Link](http://arxiv.org/abs/2511.14705v1)

7.  TabPFN served as the foundation for TabMGP, enabling state-of-the-art predictive capabilities with effective epistemic uncertainty quantification and improved posterior inference in tabular data contexts [@ng2026tabmgpmartingaleposteriortabpfn]. [Link](http://arxiv.org/abs/2510.25154v2)

8.  TabPFN demonstrated superior utility for real-world operational yield forecasting due to faster tuning and reduced feature engineering requirements [@sabo2025rowsyieldsfoundationmodels]. [Link](http://arxiv.org/abs/2506.19046v1)

9.  TabPFN serves as the base learner in a multi-stage ensemble to model recognition probabilities of rural villages, enabling identification of high-potential but under-observed candidates in geospatial, highly imbalanced datasets [@jiang2026mitigating]. [Link](https://www.mdpi.com/2073-445X/15/4/535)

10. TabPFN was used as a base learner in a stacking ensemble model, improving prediction accuracy and performance for soil salinity retrieval from multispectral imagery data [@hu2026coastal]. [Link](https://doi.org/10.3390/rs18050671)

11. TabPFN serves as the foundational model for ExplainerPFN, enabling zero-shot estimation of Shapley values for feature importance without access to the predictive model or reference explanations [@fonseca2026explainerpfntabularfoundationmodels]. [Link](http://arxiv.org/abs/2601.23068v1)

12. TabPFN enables accurate classification of Near-Earth Objects as Potentially Hazardous, facilitating early identification and monitoring of potential asteroid threats [@githubGitHubAvuiiAsteroidSafe]. [Link](https://github.com/Avuii/AsteroidSafe)

13. TabPFN improves malware detection performance in limited data scenarios by outperforming traditional ensemble models, enhancing cybersecurity workflows [@leroy2026memorybasedmalwaredetectionlimited]. [Link](http://arxiv.org/abs/2601.07305v1)

14. TabPFN achieved the best performance in predicting mycotoxin contamination, outperforming baseline and transfer learning models to enhance prediction accuracy for early interventions [@inglis2025predictingmycotoxincontaminationirish]. [Link](http://arxiv.org/abs/2512.22243v1)

15. TabPFN was used in a classification pipeline whose latent space provided a 2D representation of the blazar population, revealing a continuum between blazar types [@oukacha2025unifiedschemeblazarevolution]. [Link](http://arxiv.org/abs/2507.03088v2)

16. TabPFN enhances accuracy and efficiency in predicting grapevine diseases by processing complex environmental data and providing per-pixel disease probabilities for precise vineyard disease management [@zhao2024grapevinediseasepredictionusing]. [Link](http://arxiv.org/abs/2406.07094v1)

17. TabPFN enhances synthetic tabular data generation by providing probabilistic modeling capabilities that improve data quality, realism, and utility [@githubGitHubSebhaanTabPFGen]. [Link](https://github.com/sebhaan/TabPFGen)

18. TabPFN was modified for microbiome data classification in metagenomics, matching species abundance patterns with synthetic priors [@other_usecase1_microbiome_zero_inflated]. [Link](https://openreview.net/forum?id=3I0bVvUj25)

19. TabPFN enabled lunar regolith analysis for classifying meteorite compositions from spectral data [@other_usecase2_lunar_meteorites]. [Link](https://www.sciencedirect.com/science/article/pii/S2095268624001010)

20. TabPFN facilitated winter wheat yield forecasting in agricultural regions by integrating climate and remote sensing data [@other_usecase3_winter_wheat_yield_ssrn]. [Link](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5380177)

21. TabPFN was applied to flood impact assessment on housing prices by geographic areas [@other_usecase4_flood_housing_prices_ml_climate]. [Link](https://github.com/melina-thegarza/ml-climate/blob/main/doc/ML_Climate___Final.pdf)

22. TabPFN showed the strongest performance on 31 predictive soil modeling datasets containing 30 to 460 samples [@other_usecase5_soil_mapping_new_default]. [Link](https://arxiv.org/abs/2508.09888)

23. TabPFN was applied to shallow natural gas hazard prediction in tunnel construction [@other_usecase6_shallow_gas_tunnel_tabpfn]. [Link](https://www.sciencedirect.com/science/article/pii/S2590123025029366)

24. TabPFN supported automated feature engineering for energy consumption forecasting in domain-specific applications [@other_usecase7_autoenergy_feature_eng]. [Link](https://www.sciencedirect.com/science/article/pii/S0950705125013413)

25. TabPFN enabled Australian rice phenology prediction using remote sensing and weather data for crop management [@other_usecase8_rice_phenology_tabpfn]. [Link](https://www.mdpi.com/2072-4292/17/17/3050)

26. TabPFN was applied to a multi-stage framework for predicting fuel blend properties through automated feature engineering [@other_usecase9_fuel_blend_framework]. [Link](https://chemrxiv.org/engage/chemrxiv/article-details/68dc888d3e708a7649ff0ec9)

27. TabPFN enabled kriging prior regression for incorporating spatial context in soil mapping predictions [@other_usecase10_kriging_prior_regression]. [Link](https://arxiv.org/abs/2509.09408)

28. TabPFN enhanced clone-type recognition across programming languages through metrics-driven analysis, improving stability and interpretability in software engineering [@other_usecase12_clone_type]. [Link](https://wiley.authorea.com/users/980519/articles/1346750-metrics-first-language-aware-clone-type-recognition-auditable-signals-across-c-c-java-and-python)

29. TabPFN informed the development of TabImpute, enabling efficient zero-shot imputation for missing tabular data and improving preprocessing pipelines [@other_usecase14_tabimpute]. [Link](https://www.arxiv.org/abs/2510.02625)

30. TabPFN, alongside TabICL and related foundation models, was evaluated for intrusion detection, improving cybersecurity performance in IoT networks [@other_usecase16_cyber_fm_tabpfn_tabicl]. [Link](https://www.mdpi.com/2079-9292/14/19/3792)

31. TabPFN was used in forensic science to advance biogeographical ancestry predictions [@Heinzel2025]. [Link](https://www.sciencedirect.com/science/article/pii/S1872497325000705)

32. TabPFN was used as a benchmark model for predicting avocado alternate bearing from Sentinel-2 and climate features [@other_usecase21_avocado_alt_bearing]. [Link](https://www.preprints.org/manuscript/202510.2413)

[^1]: We use 128 inducing points, much smaller than the dataset sizes of interest, which often exceed 100,000 rows.

[^2]: <https://github.com/PriorLabs/tabpfn-extensions/tree/main/src/tabpfn_extensions/many_class>

[^3]: The reported results were produced with early checkpoints that did not undergo the full training pipeline and separate binary from multiclass classification. They can be identified on HuggingFace by the `20260417_<TASK_TYPE>` suffix.

[^4]: Google Scholar entry and pepy.tech `tabpfn` download statistics, both accessed May 8, 2026.

[^5]: <https://github.com/PriorLabs/tabpfn-extensions>

[^6]: <https://aws.amazon.com/marketplace/pp/prodview-chfhncrdzlb3s>

[^7]: <https://ai.azure.com/catalog/models/TabPFN-2.5>

[^8]: <https://github.com/databricks-industry-solutions/tabpfn-databricks>

[^9]: The Python client SDK is available on PyPI: <https://github.com/PriorLabs/tabpfn-client>.

[^10]: The TabSTAR paper reports 14 classification tasks, having mistakenly treated *Spotify Genres* as a regression dataset.