---
abstract: |
  Deep learning architectures for supervised learning on tabular data range from simple multilayer perceptrons (MLP) to sophisticated Transformers and retrieval-augmented methods. This study highlights a major, yet so far overlooked opportunity for designing substantially better MLP-based tabular architectures. Namely, our new model TabM relies on *efficient ensembling*, where one TabM efficiently imitates an ensemble of MLPs and produces multiple predictions per object. Compared to a traditional deep ensemble, in TabM, the underlying implicit MLPs are trained simultaneously, and (by default) share most of their parameters, which results in significantly better performance and efficiency. Using TabM as a new baseline, we perform a large-scale evaluation of tabular DL architectures on public benchmarks in terms of both task performance and efficiency, which renders the landscape of tabular DL in a new light. Generally, we show that MLPs, including `\model`{=latex}, form a line of stronger and more practical models compared to attention- and retrieval-based architectures. In particular, we find that `\model`{=latex} demonstrates the best performance among tabular DL models. Then, we conduct an empirical analysis on the ensemble-like nature of `\model`{=latex}. We observe that the multiple predictions of `\model`{=latex} are weak individually, but powerful collectively. Overall, our work brings an impactful technique to tabular DL and advances the performance-efficiency `\mbox{trade-off}`{=latex} with `\model`{=latex} --- a simple and powerful baseline for researchers and practitioners. The code is available at: [\\repository](\repository).
author:
- |
  Yury Gorishniy [^1]\
  Yandex `\And`{=latex} Akim Kotelnikov\
  HSE University, Yandex `\And`{=latex} Artem Babenko\
  Yandex
bibliography:
- references.bib
title: |
  `\model`{=latex}: Advancing Tabular Deep Learning\
  with Parameter-Efficient Ensembling
---

```{=latex}
\newcommand{\fix}{\marginpar{FIX}}
```
```{=latex}
\newcommand{\new}{\marginpar{NEW}}
```
```{=latex}
\maketitle
```
Introduction
============

Supervised learning on tabular data is a ubiquitous machine learning (ML) scenario in a wide range of industrial applications. Among classic non-deep-learning methods, the state-of-the-art solution for such tasks is gradient-boosted decision trees (GBDT) [@prokhorenkova2018catboost; @chen2016xgboost; @ke2017lightgbm]. Deep learning (DL) models for tabular data, in turn, are reportedly improving, and the most recent works claim to perform on par or even outperform GBDT on academic benchmarks [@hollmann2022tabpfn; @chen2023trompt; @chen2023excelformer; @gorishniy2023tabr].

However, from the practical perspective, it is unclear if tabular DL offers any obvious go-to baselines beyond simple architectures in the spirit of a multilayer perceptron (MLP). *First*, the scale and consistency of performance improvements of new methods w.r.t. simple MLP-like baselines are not always explicitly analyzed in the literature. Thus, one has to infer those statistics from numerous per-dataset performance scores, which makes it hard to reason about the progress. At the same time, due to the extreme diversity of tabular datasets, consistency is an especially valuable and hard-to-achieve property for a hypothetical go-to baseline. *Second*, efficiency-related properties, such as training time, and especially inference throughput, sometimes receive less attention. While methods are usually equally affordable on small-to-medium datasets (e.g. $<$100K objects), their applicability to larger datasets remains uncertain. *Third*, some recent work generally suggests that the progress on academic benchmarks may not transfer that well to real-world tasks [@rubachev2024tabred]. With all the above in mind, in this work, we thoroughly evaluate existing tabular DL methods and find that non-MLP models do not yet offer a convincing replacement for MLPs.

At the same time, we identify a previously overlooked path towards more powerful, reliable, and reasonably efficient tabular DL models. In a nutshell, we find that the parameter-efficient approach to deep ensembling, where most weights are shared between ensemble members, allow one to make simple and strong tabular models out of plain MLPs. For example, MLP coupled with BatchEnsemble [@wen2020batchensemble] --- a long-existing method --- right away outperforms popular attention-based models, such as FT-Transformer [@gorishniy2021revisiting], while being simpler and more efficient. This result alone suggests that efficient ensembling is a low-hanging fruit for tabular DL.

Our work builds on the above observations and offers `\model`{=latex} --- a new powerful and practical model for researchers and practitioners. Drawing an informal parallel with GBDT (an ensemble of decision trees), `\model`{=latex} can also be viewed as a simple base model (MLP) combined with an ensembling-like technique, providing high performance and simple implementation at the same time.

**Main contributions.** We summarize our main contributions as follows:

1.  We present `\model`{=latex} --- a simple DL architecture for supervised learning on tabular data. `\model`{=latex} is based on MLP and parameter-efficient ensembling techniques closely related to BatchEnsemble [@wen2020batchensemble]. In particular, `\model`{=latex} produces **M**ultiple predictions per object. `\model`{=latex} easily competes with GBDT and outperforms prior tabular DL models, while being more efficient than attention- and retrieval-based DL architectures.

2.  We provide a fresh perspective on tabular DL models in a large-scale evaluation along four dimensions: performance ranks, performance score distributions, training time, and inference throughput. One of our findings is that MLPs, including `\model`{=latex}, hit an appealing performance-efficiency tradeoff, which is not the case for attention- and retrieval-based models.

3.  We show that the two key reasons for TabM's high performance are the collective training of the underlying implicit MLPs and the weight sharing. We also show that the multiple predictions of `\model`{=latex} are weak and overfitted individually, while their average is strong and generalizable.

Related work {#sec:related-work}
============

**Decision-tree-based models.** Gradient-boosted decision trees (GBDT) [@chen2016xgboost; @ke2017lightgbm; @prokhorenkova2018catboost] is a strong and efficient baseline for tabular tasks. GBDT is a classic machine learning model, specifically, an ensemble of decision trees. Our model `\model`{=latex} is a deep learning model, specifically, a parameter-efficient ensemble of MLPs.

**Tabular deep learning architectures.** A large number of deep learning architectures for tabular data have been proposed over the recent years. That includes attention-based architectures [@song2019autoint; @gorishniy2021revisiting; @somepalli2021saint; @kossen2021self; @yan2023t2g], retrieval-augmented architectures [@somepalli2021saint; @kossen2021self; @gorishniy2023tabr; @ye2024modern], MLP-like models [@gorishniy2021revisiting; @klambauer2017self; @wang2020dcn2] and others [@arik2020tabnet; @popov2020neural; @chen2023trompt; @marton2024grande; @hollmann2022tabpfn]. Compared to prior work, the key difference of our model `\model`{=latex} is its computation flow, where one `\model`{=latex} imitates an ensemble of MLPs by producing multiple independently trained predictions. Prior attempts to bring ensemble-like elements to tabular DL [@badirli2020gradient; @popov2020neural] were not found promising [@gorishniy2021revisiting]. Also, being a simple feed-forward MLP-based model, `\model`{=latex} is significantly more efficient than some of the prior work. Compared to attention-based models, `\model`{=latex} does not suffer from quadratic computational complexity w.r.t. the dataset dimensions. Compared to retrieval-based models, `\model`{=latex} is easily applicable to large datasets.

**Improving tabular MLP-like models.** Multiple recent studies achieved competitive performance with MLP-like architectures on tabular tasks by applying architectural modifications [@gorishniy2022embeddings], regularizations [@kadra2021well; @jeffares2023tangos; @holzmüller2024better], custom training techniques [@bahri2021scarf; @rubachev2022revisiting]. Thus, it seems that tabular MLPs have good potential, but one has to deal with overfitting and optimization issues to reveal that potential. Our model `\model`{=latex} achieves high performance with MLP in a different way, namely, by using it as the base backbone in a parameter-efficient ensemble in the spirit of BatchEsnsemble [@wen2020batchensemble]. Our approach is orthogonal to the aforementioned training techniques and architectural advances.

**Deep ensembles.** In this paper, by a deep ensemble, we imply multiple DL models of the same architecture trained independently [@jeffares2023joint] for the same task under different random seeds (i.e. with different initializations, training batch sequences, etc.). The prediction of a deep ensemble is the mean prediction of its members. Deep ensembles often significantly outperform single DL models of the same architecture [@fort2020deep] and can excel in other tasks like uncertainty estimation or out-of-distribution detection [@lakshminarayanan2017simple]. It was observed that individual members of deep ensembles can learn to extract diverse information from the input, and the power of deep ensembles depends on this diversity [@allenzhu2023towards]. The main drawback of deep ensembles is the cost and inconvenience of training and using multiple models.

**Parameter-efficient deep \`\`ensembles".** To achieve the performance of deep ensembles at a lower cost, multiple studies proposed architectures that imitate ensembles by producing multiple predictions with one model [@lee2015why; @zhang2020diversified; @wen2020batchensemble; @havasi2020training; @antoran2020depth; @turkoglu2022film]. Such models can be viewed as \`\`ensembles" where the implicit ensemble members share a large amount of their weights. There are also non-architectural approaches to efficient ensembling, e.g. FGE [@garipov2018loss], but we do not explore them, because we are interested specifically in architectural techniques. In this paper, we highlight parameter-efficient ensembling as an impactful paradigm for tabular DL. In particular, we describe two simple variations of BatchEnsemble [@wen2020batchensemble] that are highly effective for tabular MLPs. One variation uses a more efficient parametrization, and another one uses an improved initialization.

`\model`{=latex} {#sec:model}
================

In this section, we present `\model`{=latex} --- a **Tab**ular DL model that makes **M**ultiple predictions.

Preliminaries {#sec:model-preliminaries}
-------------

**Notation.** We consider classification and regression tasks on tabular data. $x$ and $y$ denote the features and a label, respectively, of one object from a given dataset. A machine learning model takes $x$ as input and produces $\hat{y}$ as a prediction of $y$. $N \in \mathbb{N}$ and $d \in \mathbb{N}$ respectively denote the \`\`depth" (e.g. the number of blocks) and \`\`width" (e.g. the size of the latent representation) of a given neural network. $d_y \in \mathbb{N}$ is the output representation size (e.g. $d_y = 1$ for regression tasks, and $d_y$ equals the number of classes for classification tasks).

**Datasets.** Our benchmark consists of `\ndatasets`{=latex} publicly available datasets used in prior work, including @grinsztajn2022why [@gorishniy2023tabr; @rubachev2024tabred]. The main properties of our benchmark are summarized in `\autoref{tab:datasets}`{=latex}, and more details are provided in `\autoref{A:sec:datasets}`{=latex}.

```{=latex}
\centering
```
```{=latex}
\setlength
```
```{=latex}
\tabcolsep{5pt}
```
```{=latex}
\scalebox{0.8}{    \begin{tabular}{ccccccccccccc}
    \toprule
    \multicolumn{1}{c}{\#Datasets} &
    \multicolumn{4}{c}{Train size} &
    \multicolumn{4}{c}{\#Features} &
    \multicolumn{2}{c}{Task type} &
    \multicolumn{2}{c}{Split type}
    \\
    \cmidrule(lr){1-1} \cmidrule(lr){2-5} \cmidrule(lr){6-9} \cmidrule(lr){10-11} \cmidrule(lr){12-13}
    & {\footnotesize Min.} & {\footnotesize Q50}& {\footnotesize Mean} & {\footnotesize Max.} & {\footnotesize Min.} & {\footnotesize Q50}& {\footnotesize Mean} & {\footnotesize Max.} & {\footnotesize \#Regr.} & {\footnotesize \#Classif.} & {\footnotesize Random} & {\footnotesize Domain-aware}
    \\
    \midrule
    \ndatasets & 1.8K & 12K & 76K & 723K & 3 & 20 & 108 & 986 & 28 & 18 & 37 & 9 \\
    \bottomrule
    \end{tabular}}
```
`\label{tab:datasets}`{=latex}

**Domain-aware splits.** We pay extra attention to datasets with what we call \`\`domain-aware" splits, including the eight datasets from the TabReD benchmark [@rubachev2024tabred] and the Microsoft dataset [@microsoft]. For these datasets, their original real-world splits are available, e.g. time-aware splits as in TabReD. Such datasets were shown to be challenging for some methods because they naturally exhibit a certain degree of distribution shift between training and test parts [@rubachev2024tabred]. The random splits of the remaining `\nrandomsplits`{=latex} datasets are inherited from prior work.

**Experiment setup.** We use the setup from [@gorishniy2023tabr], and describe it in detail in `\autoref{A:sec:impl-experiment-setup}`{=latex}. Most importantly, on each dataset, a given model undergoes hyperparameter tuning on the *validation* set, then the tuned model is trained from scratch under multiple random seeds, and the *test* metric averaged over the random seeds becomes the final score of the model on the dataset.

**Metrics.** We use RMSE (the root mean square error) for regression tasks, and accuracy or ROC-AUC for classification tasks depending on the dataset source. See `\autoref{A:sec:impl-metrics}`{=latex} for details.

Also, throughout the paper, we often use the relative performance of models w.r.t. MLP as the key metric. This metric gives a unified perspective on all tasks and allows reasoning about the scale of improvements w.r.t. to a simple baseline (MLP). Formally, on a given dataset, the metric is defined as $\left( \frac{\text{score}}{\text{baseline}} - 1\right) \cdot 100\%$, where \`\`score" is the metric of a given model, and \`\`baseline" is the metric of MLP. In this computation, for regression tasks, we convert the raw metrics from RMSE to $R^2$ to better align the scales of classification and regression metrics.

A quick introduction to BatchEnsemble. {#sec:model-batchensemble}
--------------------------------------

For a given architecture, let's consider any linear layer $l$ in it: $l(x) = Wx + b$, where $x \in \R^{d_1}$, $W \in \R^{d_2 \times d_1}$, $b \in \R^{d_2}$. To simplify the notation, let $d_1 = d_2 = d$. In a traditional deep ensemble, the $i$-th member has its own set of weights $W_i, b_i$ for this linear layer: `\mbox{$l_i(x_i) = W_ix_i + b_i$}`{=latex}, where $x_i$ is the object representation within the $i$-th member. By contrast, in BatchEnsemble, this linear layer is either (1) fully shared between all members, or (2) mostly shared: `\mbox{$l_i(x_i) = s_i \odot (W(r_i \odot x_i)) + b_i$}`{=latex}, where $\odot$ is the elementwise multiplication, $W \in \R^{d \times d}$ is shared between all members, and $r_i, s_i, b_i \in \R^d$ are *not* shared between the members. This is equivalent to defining the $i$-th weight matrix as `\mbox{$W_i = W \odot (s_ir_i^T)$}`{=latex}. To ensure diversity of the ensemble members, $r_i$ and $s_i$ of all members are initialized randomly with $\pm 1$. All other layers are fully shared between the members of BatchEnsemble.

The described parametrization allows packing all ensemble members in one model that simultaneously takes $k$ objects as input, and applies all $k$ implicit members in parallel, without explicitly materializing each member. This is achieved by replacing one or more linear layers of the original neural network with their BatchEnsemble versions: `\mbox{$l_\text{BE}(X) = ((X \odot R) W) \odot S + B$}`{=latex}, where `\mbox{$X \in \R^{k \times d}$}`{=latex} stores $k$ object representations (one per member), and $R, S, B \in \R^d$ store the non-shared weights ($r_i$, $s_i$, $b_i$) of the members, as shown at the lower left part of `\autoref{fig:model}`{=latex}.

**Terminology.** In this paper, we call $r_i$, $s_i$, $b_i$, $R$, $S$ and $B$ *adapters*, and the implicit members of parameter-efficient emsembles (e.g. BatchEnsemble) --- *implicit submodels* or simply *submodels*.\
**Overhead to the model size.** With BatchEnsemble, adding a new ensemble member means adding only one row to each of the matrices $R$, $S$, and $B$, which results in $3d$ new parameters per layer. For typical values of $d$, this is a negligible overhead to the original layer size $d^2 + d$.\
**Overhead to the runtime.** Thanks to the modern hardware, the large number of shared weights and the parallel execution of the $k$ forward passes, the runtime overhead of BatchEnsemble can be (significantly) lower than $\times k$ [@wen2020batchensemble]. Intuitively, if the original workload underutilizes the hardware, there are more chances to pay less than $\times k$ overhead.

Architecture {#sec:model-design}
------------

TabM is one model representing an ensemble of $k$ MLPs. Contrary to conventional deep ensembles, in TabM, the $k$ MLPs are trained in parallel and share most of their weights by default, which leads to better performance and efficiency. We present multiple variants of TabM that differ in their weight-sharing strategies, where `\model`{=latex} and `\modelmini`{=latex} are the most effective variants, and `\modelpacked`{=latex} is a conceptually important variant potentially useful in some cases. We obtain our models in several steps, starting from essential baselines. We always use the ensemble size $k=32$ and analyze this hyperparameter in `\autoref{sec:analysis-k}`{=latex}. In `\autoref{A:sec:model-motivation}`{=latex}, we explain that using MLP as the base model is crucial because of its excellent efficiency.

**MLP.** We define MLP as a sequence of $N$ simple blocks followed by a linear prediction head:\
$\text{MLP}(x) = \text{Linear}(\text{Block}_N(\ldots(\text{Block}_1(x)))$, where $\text{Block}_i(x) = \text{Dropout}(\text{ReLU}(\text{Linear}((x)))$.

**`\mlpxk`{=latex} = MLP + Deep Ensemble.** We denote the traditional deep ensemble of $k$ independently trained MLPs as `\mlpxk`{=latex}. To clarify, this means tuning hyperparameters of one MLP, then independently training $k$ tuned MLPs under different random seeds, and then averaging their predictions. The performance of `\mlpxk`{=latex} is reported in `\autoref{fig:model-design}`{=latex}. Notably, the results are already better and more stable than those of FT-Transformer [@gorishniy2021revisiting] --- the popular attention-based baseline.

Although the described approach is a somewhat default way to implement an ensemble, it is not optimized for the task performance of the ensemble. First, for each of the $k$ MLPs, the training is stopped based on the individual validation score, which is optimal for each individual MLP, but can be suboptimal for their ensemble. Second, the hyperparameters are also tuned for one MLP without knowing about the subsequent ensembling. All TabM variants are free from these issues.

**`\modelpacked`{=latex} = MLP + Packed-Ensemble.** As the first step towards better and more efficient ensembles of MLPs, we implement $k$ MLPs as one large model using Packed-Ensemble [@laurent2022packed]. This results in `\modelpacked`{=latex} illustrated in `\autoref{fig:model}`{=latex}. As an architecture, `\modelpacked`{=latex} is equivalent to `\mlpxk`{=latex} and stores $k$ independent MLPs without any weight sharing. However, the critical difference is that TabM processes $k$ inputs in parallel, which means that one training step of TabM consists of $k$ parallel training steps of the individual MLPs. This allows monitoring the performance of the ensemble during the training and stopping the training when it is optimal for the whole ensemble, not for individual MLPs. As a consequence, this also allows tuning hyperparameters for `\modelpacked`{=latex} as for one model. As shown in `\autoref{fig:model-design}`{=latex}, `\modelpacked`{=latex} delivers significantly better performance compared to `\mlpxk`{=latex}. Efficiency-wise, for typical depth and width of MLPs, the runtime overhead of `\modelpacked`{=latex} is noticeably less than $\times k$ due to the parallel execution of the $k$ forward passes on the modern hardware. Nevertheless, the $\times k$ overhead of `\modelpacked`{=latex} to the model size motivates further exploration.

**`\modelnaive`{=latex} = MLP + BatchEnsemble.** To reduce the size of `\modelpacked`{=latex}, we now turn to weight sharing between the MLPs, and naively apply BatchEnsemble [@wen2020batchensemble] instead of Packed-Ensemble, as described in `\autoref{sec:model-batchensemble}`{=latex}. This gives us `\modelnaive  `{=latex}--- a preliminary version of `\model`{=latex}. In fact, the architecture (but not the initialization) of `\modelnaive`{=latex} is already equivalent to that of `\model`{=latex}, so `\autoref{fig:model}`{=latex} is applicable. Interestingly, `\autoref{fig:model-design}`{=latex} reports higher performance of `\modelnaive`{=latex} compared to `\modelpacked`{=latex}. Thus, constraining the ensemble with weight sharing turns out to be a highly effective regularization on tabular tasks. The alternatives to BatchEnsemble are discussed in `\autoref{A:sec:model-motivation}`{=latex}.

```{=latex}
\centering
```
![ *(Upper left)* A high-level illustration of TabM. One TabM represents an ensemble of $k$ MLPs processing $k$ inputs in parallel. The remaining parts of the figure are three different parametrizations of the $k$ MLP backbones. *(Upper right)* `\modelpacked`{=latex} consists of $k$ fully independent MLPs. `\mbox{\textit{(Lower left)}}`{=latex} `\model`{=latex} is obtained by injecting three non-shared adapters $R$, $S$, $B$ in each of the $N$ linear layers of *one* MLP (`\mbox{$^*$ the}`{=latex} initialization differs from @wen2020batchensemble). *(Lower right)* `\modelmini`{=latex} is obtained by keeping only the very first adapter $R$ of `\model`{=latex}  and removing the remaining $3N - 1$ adapters. *(Details)* Input transformations such as one-hot-encoding or feature embeddings [@gorishniy2022embeddings] are omitted for simplicity. `Drop` denotes dropout [@srivastava2014dropout]. ](figures/model.png){#fig:model width="0.99\\linewidth"}

```{=latex}
\centering
```
![ The performance of models described in `\autoref{sec:model-design}`{=latex} on `\ndatasets`{=latex} datasets from `\autoref{tab:datasets}`{=latex}; plus several baselines on the left. For a given model, one dot on a jitter plot describes the performance score on one of the `\ndatasets`{=latex} datasets. The box plots describe the percentiles of the jitter plots: the boxes describe the 25th, 50th, and 75th percentiles, and the whiskers describe the 10th and 90th percentiles. Outliers are clipped. The numbers at the bottom are the mean and standard deviations over the jitter plots. For each model, hyperparameters are tuned. \`\`$\text{Model}^{\times k}$" denotes an ensemble of $k$ models. ](figures/figure2-split.png){#fig:model-design width="0.99\\linewidth"}

**`\modelmini`{=latex} = MLP + MiniEnsemble.** By construction, the just discussed `\modelnaive`{=latex} (illustrated as \`\``\model`{=latex}" in `\autoref{fig:model}`{=latex}) has $3N$ adapters: $R$, $S$ and $B$ in each of the $N$ blocks. Let's consider the very first adapter, i.e. the first adapter $R$ in the first linear layer. Informally, its role can be described as mapping the $k$ inputs living in the same representation space to $k$ different representation spaces *before* the tabular features are mixed with $@ W$ for the first time. A simple experiment reveals that this adapter is critical. First, we remove it from `\modelnaive`{=latex} and keep the remaining $3N-1$ adapters untouched, which gives us `\modelbad`{=latex} with worse performance, as shown in `\autoref{fig:model-design}`{=latex}. Then, we do the opposite: we keep only the very first adapter of `\modelnaive`{=latex} and remove the remaining $3N - 1$ adapters, which gives us `\modelmini`{=latex} --- the minimal version of `\model`{=latex}. `\modelmini`{=latex} is illustrated in `\autoref{fig:model}`{=latex}, where we call the described approach \`\`MiniEnsemble". `\autoref{fig:model-design}`{=latex} shows that `\modelmini`{=latex} performs even slightly better than `\modelnaive`{=latex}, despite having only one adapter instead of $3N$ adapters.

**`\model`{=latex} = MLP + BatchEnsemble + Better initialization.** The just obtained results motivate the next step. We go back to the architecture of `\modelnaive`{=latex} with all $3N$ adapters, but initialize all multiplicative adapters $R$ and $S$, except for the very first one, deterministically with $1$. As such, at initialization, the deterministically initialized adapters have no effect, and the model behaves like `\modelmini`{=latex}, but these adapters are free to add more expressivity during training. This gives us `\model`{=latex}, illustrated in `\autoref{fig:model}`{=latex}. `\autoref{fig:model-design}`{=latex} shows that `\model`{=latex} is the best variation so far.

**Hyperparameters.** Compared to MLP, the only new hyperparameter of `\model`{=latex} is $k$ --- the number of implicit submodels. We heuristically set $k=32$ and do not tune this value. We analyze the influence of $k$ in `\autoref{sec:analysis-k}`{=latex}. We also share additional observations on the learning rate in `\autoref{A:sec:model-hyperparameters}`{=latex}.

**Limitations and practical considerations** are commented in `\autoref{A:sec:model-limitations}`{=latex}.

Important practical modifications of `\model`{=latex} {#sec:model-important}
-----------------------------------------------------

$\mathbf{\spadesuit \sim}$ **Shared training batches**. Recall that the order of training objects usually varies between ensemble members, because of the random shuffling with different seeds. For `\model`{=latex}, in terms of `\autoref{fig:model}`{=latex}, that corresponds to $X$ storing $k$ different training objects $\{x_i\}_{i = 1}^k$. We observed that reusing the training batches between the `\model`{=latex}'s submodels results in only minor performance loss on average (depending on a dataset), as illustrated with `\modelspade`{=latex} in `\autoref{fig:model-design}`{=latex}. In practice, due to the simpler implementation and better efficiency, sharing training batches can be a reasonable starting point.

$\mathbf{\dagger \sim}$ **Non-linear feature embeddings**. In `\autoref{fig:model-design}`{=latex}, `\modelminiemb`{=latex} denotes `\modelmini`{=latex} with non-linear feature embeddings from [@gorishniy2022embeddings], which demonstrates the high utility of feature embeddings for `\model`{=latex}. Specifically, we use a slightly modified version of the piecewise-linear embeddings (see `\autoref{A:sec:impl-feature-embeddings}`{=latex} for details).

$\mathbf{\times N \sim}$ **Deep ensemble**. In `\autoref{fig:model-design}`{=latex}, `\modelminiembensfive`{=latex} denotes an ensemble of five independent `\modelminiemb`{=latex} models, showing that `\model`{=latex} itself can benefit from the conventional deep ensembling.

Summary {#sec:model-summary}
-------

The story behind `\model`{=latex} shows that technical details of *how* to construct and train an ensemble have a major impact on task performance. Most importantly, we highlight simultaneous training of the (implicit) ensemble members and weight sharing between them. The former is responsible for the ensemble-aware stopping of the training, and the latter apparently serves as a form of regularization.

Evaluating tabular deep learning architectures {#sec:evaluation}
==============================================

Now, we perform an empirical comparison of many tabular models, including `\model`{=latex}.

Baselines {#sec:evaluation-baselines}
---------

In the main text, we use the following baselines: MLP (defined in `\autoref{sec:model-design}`{=latex}), FT-Transformer denoted as \`\`FT-T" (the attention-based model from @gorishniy2021revisiting), SAINT (the attention- and retrieval-based model from @somepalli2021saint), T2G-Former denoted as \`\`T2G" (the attention-based model from @yan2023t2g), ExcelFormer denoted as \`\`Excel" (the attention-based model from @chen2023excelformer), TabR (the retrieval-based model from @gorishniy2023tabr), ModernNCA denoted as \`\`MNCA" (the retrieval-based model from @ye2024modern) and GBDT, including XGBoost [@chen2016xgboost], LightGBM [@ke2017lightgbm] and CatBoost [@prokhorenkova2018catboost].

The models with non-linear feature embeddings from @gorishniy2022embeddings are marked with $\dagger$ or $\ddagger$ depending on the embedding type (see `\autoref{A:sec:impl-feature-embeddings}`{=latex} for details on feature embeddings):

-   $\text{MLP}^\dagger$ and `\modelminiemb`{=latex} use a modified version of the piecewise-linear embeddings.

-   $\text{TabR}^\ddagger$, $\text{MNCA}^\ddagger$, and $\text{MLP}^\ddagger$ (also known as MLP-PLR) use various periodic embeddings.

More baselines are evaluated in `\autoref{A:sec:extended-results}`{=latex}. Implementation details are provided in `\autoref{A:sec:implementation-details}`{=latex}.

Task performance {#sec:evaluation-performance}
----------------

We evaluate all models following the protocol announced in `\autoref{sec:model-preliminaries}`{=latex} and report the results in `\autoref{fig:performance}`{=latex} (see also the critical difference diagram in `\autoref{A:fig:cdd}`{=latex}). We make the following observations:

1.  The performance ranks render `\model`{=latex} as the top-tier DL model.

2.  The middle and right parts of `\autoref{fig:performance}`{=latex} provide a fresh perspective on the per-dataset metrics. `\model`{=latex} holds its leadership among the DL models. Meanwhile, many DL methods turn out to be no better or even worse than MLP on a non-negligible number of datasets, which shows them as less reliable solutions, and changes the ranking, especially on the domain-aware splits (right).

3.  One important characteristic of a model is the *weakest* part of its performance profile (e.g. the 10th or 25th percentiles in the middle plot) since it shows how reliable the model is on \`\`inconvenient" datasets. From that perspective, MLP^`\textdagger`{=latex}^ seems to be a decent practical option between the plain MLP and `\model`{=latex}, especially given its simplicity and efficiency compared to retrieval-based alternatives, such as TabR and ModernNCA.

**Summary.** `\model`{=latex} confidently demonstrates the best performance among tabular DL models, and can serve as a reliable go-to DL baseline. This is not the case for attention- and retrieval-based models. Overall, MLP-like models, including `\model`{=latex}, form a representative set of tabular DL baselines.

```{=latex}
\centering
```
![ The task performance of tabular models on the `\ndatasets`{=latex} datasets from `\autoref{tab:datasets}`{=latex}. *(Left)* The mean and standard deviations of the performance ranks over all datasets summarize the head-to-head comparison between the models on all datasets. *(Middle & Right)* The relative performance w.r.t. the plain multilayer perceptron (MLP) allows reasoning about the scale and consistency of improvements over this simple baseline. One dot of a jitter plot corresponds to the performance of a model on one of the `\ndatasets`{=latex} datasets. The box plots visualize the 10th, 25th, 50th, 75th, and 90th percentiles of the jitter plots. Outliers are clipped. The separation in random and domain-aware dataset splits is explained in `\autoref{sec:model-preliminaries}`{=latex}. (`\mbox{$^*$Evaluated}`{=latex} under the common protocol without data augmentations) ](figures/figure3-split-tiny.png){#fig:performance width="0.99\\linewidth"}

Efficiency {#sec:evaluation-efficiency}
----------

Now, we evaluate tabular models in terms of training and inference efficiency, which becomes a serious reality check for some of the methods. We benchmark exactly those hyperparameter configurations of models that are presented in `\autoref{fig:performance}`{=latex} (see `\autoref{a:sec:extended-efficiency}`{=latex} for the motivation).

**`\modelminiembopt`{=latex} & `\modelminiembspadeopt`{=latex}.** Additionally, in this section, we mark with the asterisk ($^*$) the versions of `\model`{=latex} enhanced with two efficiency-related plugins available out-of-the-box in PyTorch [@paszke2019pytorch]: the automatic mixed precision (AMP) and `torch.compile` [@ansel2024pytorch2]. The purpose of those `\model`{=latex} variants is to showcase the potential of the modern hardware and software for a powerful tabular DL model, and they should not be directly compared to other DL models. However, the implementation simplicity of `\model`{=latex} plays an important role, because it facilitates the seamless integration of the aforementioned PyTorch plugins.

**Training time.** We focus on training times on larger datasets, because on small datasets, all methods become almost equally affordable, regardless of the formal relative difference. Nevertheless, in `\autoref{A:fig:efficiency}`{=latex}, we provide measurements on small datasets as well. The left side of `\autoref{fig:efficiency}`{=latex} reveals that `\model`{=latex} offers practical training times. By contrast, the long training times of attention- and retrieval-based models become one more limitation of these methods.

**Inference throughput.** The right side of `\autoref{fig:efficiency}`{=latex} tells essentially the same story as the left side. In `\autoref{a:sec:extended-efficiency}`{=latex}, we also report the inference throughput on GPU with large batch sizes.

**Applicability to large datasets.** In `\autoref{tab:large}`{=latex}, we report metrics on two large datasets. As expected, attention- and retrieval-based models struggle, yielding extremely long training times, or being simply inapplicable without additional effort. See `\autoref{A:sec:impl-evaluation-efficiency}`{=latex} for implementation details.

**Parameter count.** Most tabular networks are overall compact. This, in particular, applies to `\model`{=latex}, because its size is by design comparable to MLP. We report model sizes in `\autoref{a:sec:extended-efficiency}`{=latex}.

**Summary.** Simple MLPs are the fastest DL models, with `\model`{=latex} being the runner-up. The attention- and retrieval-based models are significantly slower. Overall, MLP-like models, including `\model`{=latex}, form a representative set of practical and accessible tabular DL baselines.

```{=latex}
\centering
```
```{=latex}
\centering
```
![ Training times (left) and inference throughput (right) of the models from `\autoref{fig:performance}`{=latex}. One dot represents a measurement on one dataset. `\modelminiembopt`{=latex} is the optimized `\modelminiemb`{=latex} (see `\autoref{sec:evaluation-efficiency}`{=latex}). ](figures/fig4-train-time.png){#fig:efficiency width="0.95\\linewidth"}

```{=latex}
\hfill
```
```{=latex}
\centering
```
![ Training times (left) and inference throughput (right) of the models from `\autoref{fig:performance}`{=latex}. One dot represents a measurement on one dataset. `\modelminiembopt`{=latex} is the optimized `\modelminiemb`{=latex} (see `\autoref{sec:evaluation-efficiency}`{=latex}). ](figures/inference-cpu-v4-mini.png){#fig:efficiency width="0.95\\linewidth"}

```{=latex}
\centering
```
```{=latex}
\scalebox{0.875}{
\begin{tabular}{lcc|cccccc}
\toprule

% a6c8e0
% a3da9c
% fdca90

% accce2
% a9dca3
% fece98

% b9d4e7
% b9e3b3
% fed4a6

& \#Objects
& \#Features
& {\transparent{1.0}\cellcolor[HTML]{C2DAEA}} {\color{black} $\mathrm{XGBoost}$}
& {\transparent{1.0}\cellcolor[HTML]{C2DAEA}} {\color{black} $\mathrm{MLP}$}
& {\transparent{1.0}\cellcolor[HTML]{C1E6BC}} {\color{black} \modelminiembspadeopt}
& {\transparent{1.0}\cellcolor[HTML]{C1E6BC}} {\color{black} \modelminiemb}
& {\transparent{1.0}\cellcolor[HTML]{FEDAB3}} {\color{black} $\mathrm{FT}\text{-}\mathrm{T}$}
& {\transparent{1.0}\cellcolor[HTML]{FEDAB3}} {\color{black} $\mathrm{TabR}$}
\\
\midrule
\multirow{ 2}{*}{Maps Routing}
& \multirow{ 2}{*}{$6.5$M}
& \multirow{ 2}{*}{$986$}
& $0.1601$
& $0.1592$
& $0.1583$
& $\mathbf{0.1582}$
& $0.1594$
& \multirow{ 2}{*}{OOM}
\\
&
&
& $28$m
& $\mathbf{15}$\textbf{m}
& $2$h
& $13.5$h
& $45.5$h
&
\\
\hline
\multirow{2}{*}{Weather}
& \multirow{2}{*}{$13$M}
& \multirow{2}{*}{$103$}
& $1.4234$
& $1.4842$
& $\mathbf{1.4090}$
& $\mathbf{1.4112}$
& $1.4409$
& \multirow{ 2}{*}{OOM}
\\
&
&
& $\mathbf{10}$\textbf{m}
& $15$m
& $1.3$h
& $3.3$h
& $13.5$h
&
\\
\bottomrule
\end{tabular}
}
```
Analysis {#sec:analysis}
========

Performance and training dynamics of the individual submodels {#sec:analysis-optimization}
-------------------------------------------------------------

Recall that the prediction of `\model`{=latex} is defined as the mean prediction of its $k$ implicit submodels that share most of their weights. In this section, we take a closer look at these submodels.

For the next experiment, we intentionally simplify the setup as described in detail in `\autoref{A:sec:impl-analysis-optimization}`{=latex}. Most importantly, all models have the same depth $3$ and width $512$, and are trained without early stopping, i.e. the training goes beyond the optimal epochs. We use `\modelmini`{=latex} from `\autoref{fig:model}`{=latex} with $k=32$ denoted as `\modelminik{32}`{=latex}. We use `\modelminik{1}`{=latex} (i.e. essentially one plain MLP) as a natural baseline for the submodels of `\modelminik{32}`{=latex}, because each of the $32$ submodels has the architecture of `\modelminik{1}`{=latex}.

We visualize the training profiles on four diverse datasets (two classification and two regression problems of different sizes) in `\autoref{fig:training-curves}`{=latex}. As a reminder, the mean of the $k$ **`\color{fontindividual}`{=latex} individual** losses is what is explicitly optimized during the training of `\modelmini`{=latex}, the loss of the **`\color{fontcollective}`{=latex} collective** mean prediction corresponds to how `\modelmini`{=latex} makes predictions on inference, and `\modelminik{1}`{=latex} is just a **`\color{fontmlp}`{=latex} baseline**.

```{=latex}
\begin{figure*}[!h]
    \centering
    \includegraphics[width=0.99\linewidth]{figures/losses-simple-split.pdf}
    \includegraphics[width=0.99\linewidth]{figures/losses-split.pdf}
    \caption{
        The training profiles of \modelminik{32} and \modelminik{1} as described in \autoref{sec:analysis-optimization}.
        \textit{(Upper)} The training curves. $k=32[i]$ represents the mean \textbf{i}ndividual loss over the $32$ submodels.
        \textit{(Lower)} Same as the first row, but in the train-test coordinates: each dot represents some epoch from the first row, and the training generally goes from left to right.
        This allows reasoning about overfitting by comparing test loss values for a given train loss value.
    }
    \label{fig:training-curves}
\end{figure*}
```
In the upper row of `\autoref{fig:training-curves}`{=latex}, the collective mean prediction of the submodels is superior to their individual predictions in terms of both training and test losses. After the initial epochs, the training loss of the baseline MLP is lower than that of the collective and individual predictions.

In the lower row of `\autoref{fig:training-curves}`{=latex}, we see a stark contrast between the individual and collective performance of the submodels. Compared to the baseline MLP, the submodels look overfitted individually, while their collective prediction exhibits substantially better generalization. This result is strict evidence of a non-trivial diversity of the submodels: without that, their collective test performance would be similar to their individual test performance. Additionally, we report the performance of the **B**est submodel of `\model`{=latex} across many datasets under the name `\modelbesthead`{=latex} in `\autoref{fig:submodels}`{=latex}. As such, individually, even the best submodel of `\model`{=latex} is no better than a simple MLP.

**Summary.** `\model`{=latex} draws its power from the collective prediction of weak, but diverse submodels.

Selecting submodels after training {#sec:analysis-selecting-submodels}
----------------------------------

The design of `\model`{=latex} allows selecting only a subset of submodels after training based on any criteria, simply by pruning extra prediction heads and the corresponding rows of the adapter matrices. To showcase this mechanics, after the training, we **G**reedily construct a subset of `\model`{=latex}'s submodels with the best collective performance on the validation set, and denote this \`\`pruned" `\model`{=latex} as `\modelgreedyheads`{=latex}. The performance reported in `\autoref{fig:submodels}`{=latex} shows that `\modelgreedyheads`{=latex} is slightly behind the vanilla `\model`{=latex}. On average over `\ndatasets`{=latex} datasets, the greedy submodel selection results in $8.8 \pm 6.6$ submodels out of the initial $k=32$, which can result in faster inference. See `\autoref{A:sec:impl-analysis-selecting-submodels}`{=latex} for implementation details.

```{=latex}
\centering
```
![image](figures/submodels-split.png){width="0.95\\linewidth"} `\captionof{figure}{
        The performance on the \ndatasets\ datasets from \autoref{tab:datasets}.
        \modelbesthead\ and \modelgreedyheads\ are described in \autoref{sec:analysis-optimization} and \autoref{sec:analysis-selecting-submodels}.
    }`{=latex} `\label{fig:submodels}`{=latex}

```{=latex}
\hfill
```
```{=latex}
\centering
```
![image](figures/d-n-k-ablation-split.png){width="0.97\\linewidth"} `\captionof{figure}{
        The average performance of \model\ with $n$ layers of the width $d$ across $17$ datasets as a function of $k$.
    }`{=latex} `\label{fig:d-k-ablation}`{=latex}

How does the performance of `\model`{=latex} depend on $k$? {#sec:analysis-k}
-----------------------------------------------------------

To answer the question in the title, we consider `\model`{=latex} with $n$ layers of the size $d$ and different values of $k$, and report the average performance over multiple datasets in `\autoref{fig:d-k-ablation}`{=latex} (the implementation details are provided in `\autoref{A:sec:impl-analysis-k}`{=latex}). The solid curves correspond to $n = 3$, and the dark green curves correspond to $d = 512$. Our main observations are as follows. *First,* it seems that the \`\`larger" `\model`{=latex} is (i.e. when $n$ and $d$ increase), the more submodels it can accommodate effectively. For example, note how the solid curves corresponding to different $d$ diverge at $k = 2$ and $k = 4$. *Second,* too high values of $k$ can be detrimental. Perhaps, weight sharing limits the number of submodels that can productively \`\`coexist" in one network, despite the presence of non-shared adapters. *Third*, too narrow ($d = 64$) or too shallow ($n = 1$) configurations of TabM can lead to suboptimal performance, at least in the scope of middle-to-large datasets considered in this work.

Parameter-efficient ensembling reduces the number of dead neurons {#sec:analysis-dead-neurons}
-----------------------------------------------------------------

Here, we show empirically that the design of `\model`{=latex} naturally leads to higher utilization of the backbone's weights. Even without technical definitions, this sounds intuitive, since `\model`{=latex} has to implement $k$ (diverse) computations using the amount of weights close to that of one MLP.

Let's consider `\modelmini`{=latex} as illustrated in `\autoref{fig:model}`{=latex}. By design, each of the shared neurons of `\modelmini`{=latex} is used $k$ times per forward pass, where \`\`neuron" refers to the combination of the linear transformation and the subsequent nonlinearity (e.g. ReLU). By contrast, in plain MLP (or in `\modelmini`{=latex} with $k=1$), each neuron is used only once per forward pass. Thus, technically, a neuron in `\modelmini`{=latex} has more chances to be activated, which overall may lead to lower portion of dead neurons in `\modelmini`{=latex} compared to MLP (a dead neuron is a neuron that never activates, and thus has no impact on the prediction). Using the experiment setup from `\autoref{sec:analysis-optimization}`{=latex}, we compute the portion of dead neurons in `\modelmini`{=latex} using its best validation checkpoint. On average across `\ndatasets`{=latex} datasets, for $k = 1$ and $k = 32$, we get $0.29 \pm 0.17$ and $0.14 \pm 0.09$ portion of dead neurons, respectively, which is in line with the described intuition. Technically, on a given dataset, this metric is computed as the percentage of neurons that never activate on a fixed set of $2048$ training objects.

Conclusion & Future work {#sec:conclusion}
========================

In this work, we have demonstrated that tabular multilayer perceptrons (MLPs) greatly benefit from parameter-efficient ensembling. Using this insight, we have developed `\model`{=latex} --- a simple MLP-based model with state-of-the-art performance. In a large-scale comparison with many tabular DL models, we have demonstrated that `\model`{=latex} is ready to serve as a new powerful and efficient tabular DL baseline. Along the way, we highlighted the important technical details behind `\model`{=latex} and discussed the individual performance of the implicit submodels underlying `\model`{=latex}.

One idea for future work is to bring the power of (parameter-)efficient ensembles to other, non-tabular, domains with optimization-related challenges and, ideally, lightweight base models. Another idea is to evaluate `\model`{=latex} for uncertainty estimation and out-of-distribution (OOD) detection on tabular data, which is inspired by works like @lakshminarayanan2017simple.

```{=latex}
\newpage
```
**Reproducibility statement.** The code is provided in the following repository: [link](\repository). It contains the implementation of `\model`{=latex}, hyperparameter tuning scripts, evaluation scripts, configuration files with hyperparameters (the TOML files in the `exp/` directory), and the report files with the main metrics (the JSON files in the `exp/` directory). In the paper, the model is described in `\autoref{sec:model}`{=latex}, and the implementation details are provided in `\autoref{A:sec:implementation-details}`{=latex}.

```{=latex}
\bibliographystyle{iclr2025_conference}
```
```{=latex}
\newpage
```
```{=latex}
\appendix
```
Additional discussion on `\model`{=latex}
=========================================

Motivation {#A:sec:model-motivation}
----------

**Why BatchEnsemble?** Among relatively ease-to-use \`\`efficient ensembling" methods, beyond BatchEnsemble, there are examples such as dropout ensembles [@lakshminarayanan2017simple], naive multi-head architectures, TreeNet [@lee2015why]. However, in the literature, they were consistently outperformed by more advanced methods, including BatchEnsemble [@wen2020batchensemble], MIMO [@havasi2020training], FiLM-Ensemble [@turkoglu2022film].

Among advanced methods, BatchEnsemble seems to be one of the simplest and most flexible options. For example, FiLM-Ensemble [@turkoglu2022film] requires normalization layers to be presented in the original architecture, which is not always the case for tabular MLPs. MIMO [@havasi2020training], in turn, imposes additional limitations compared to BatchEnsemble. *First*, it requires *concatenating* (not *stacking*, as with BatchEnsemble) all $k$ input representations, which increases the input size of the first linear layer. With the relatively high number of submodels $k = 32$ used in our paper, this can be an issue on datasets with a large number of features, especially when feature embeddings [@gorishniy2022embeddings] are used. For example, for $k = 32$, the number of features $m = 1000$, and the feature embedding size $l = 32$, the input size approaches one million resulting in an extremely large first linear layer of MLP. *Second*, with BatchEnsemble, it is easy to explicitly materialize, analyze, and prune individual submodels. By contrast, in MIMO, all submodels are implicitly entangled within one MLP, and there is no easy way to access individual submodels.

**Why MLPs?** Despite the applicability of BatchEnsemble [@wen2020batchensemble] to almost any architecture, we focus specifically on MLPs. The key reason is *efficiency*. *First,* to achieve high performance, throughout the paper, we use the relatively large number of submodels $k = 32$. However, the desired less-than-$\times k$ runtime overhead of BatchEnsemble typically happens only when the original model underutilizes the power of parallel computations of a given hardware. This will not be the case for attention-based models on datasets with a large number of features, as well as for retrieval-based models on datasets with a large number of objects. *Second,* as we show in `\autoref{sec:evaluation-efficiency}`{=latex}, attention- and retrieval-based models are already slow as-is. By contrast, MLPs are exceptionally efficient, to the extent that slowing them down even by an order of magnitude will still result in practical models.

Also, generally speaking, the definition of MLP suggested in `\autoref{sec:model-design}`{=latex} and used in `\model`{=latex} is not special, and more advanced MLP-like backbones can be used. However, in preliminary experiments, we did not observe the benefits of more advanced backbones. Perhaps, small technical differences between backbones become less impactful in the context of parameter-efficient ensembling, at least in the scope of middle-to-large-sized datasets.

`\model`{=latex} with feature embeddings {#A:sec:model-emb}
----------------------------------------

**Notation.** In this paper, we use $\dagger$ to mark `\model`{=latex} variants with the piecewise-linear embeddings (e.g. `\modelminiemb`{=latex}, `\modelemb`{=latex}, etc.).

**Implementation details.** In fact, there are no changes in the usage of feature embeddings compared to plain MLPs: feature embeddings are applied, and the result is flattened, before being passed to the backbones in terms of `\autoref{fig:model}`{=latex}. For example, if a dataset has $m$ continuous features and all of them are embedded, the very first adapter $R$ will have the shape $k \times md_e$, where $d_e$ is the feature embedding size. For `\modelminiemb`{=latex} and `\modelemb`{=latex}, we initialize the first multiplicative adapter $R$ of the first linear layer from the standard normal distribution $\mathcal{N}(0, 1)$. The remaining details are best understood from the source code.

**Efficiency.** When feature embeddings are used, the simplified batching strategy from `\autoref{sec:model-important}`{=latex} allows for more efficient implementation, when the feature embeddings are applied to the original `batch_size` objects, and the result is simply cloned $k$ times (compared to embedding $k \times \texttt{batch\_size}$ objects with the original batching strategy).

Hyperparameters {#A:sec:model-hyperparameters}
---------------

We noticed that the typical optimal learning rate for `\model`{=latex} is higher than for MLP (note that, on each dataset, the batch size is the same for all DL models). We hypothesize that the reason is the effectively larger batch size for `\model`{=latex} because of how the training batches are constructed (even if the simplified batching strategy from `\autoref{sec:model-important}`{=latex} is used).

Limitations and practical considerations {#A:sec:model-limitations}
----------------------------------------

`\model`{=latex} does not introduce any new limitations compared to BatchEnsemble [@wen2020batchensemble]. Nevertheless, we note the following:

-   The MLP backbone used in `\model`{=latex} is one of the simplest possible, and generally, more advanced backbones can be used. That said, some backbones may require additional care when used in `\model`{=latex}. For example, we did not explore backbones with normalization layers. For such layers, it is possible to allocate non-shared trainable affine transformations for each implicit submodel by adding one multiplicative and one additive adapter after the normalization layer (i.e. like in FiLM-Ensemble [@turkoglu2022film]). Additional experiments are required to find the best strategy.

-   For ensemble-like models, such as `\model`{=latex}, the notion of \`\`the final object embedding\`\` changes: now, it is not a single vector, but a set of $k$ vectors. If exactly one object embedding is required, then additional experiments may be needed to find the best way to combine $k$ embeddings into one. The presence of multiple object embeddings can also be important for scenarios when `\model`{=latex} is used for solving more than one task, in particular when it is pretrained as a generic feature extractor and then reused for other tasks. The main practical guideline is that the $k$ prediction branches should not interact with each other (e.g. through attention, pooling, etc.) and should always be trained separately.

Extended results {#A:sec:extended-results}
================

This section complements `\autoref{sec:evaluation}`{=latex}.

Additional baselines
--------------------

In addition to the models from `\autoref{sec:evaluation-baselines}`{=latex}, we consider the following baselines:

-   MLP-PLR [@gorishniy2022embeddings], that is, an MLP with periodic embeddings.

-   ResNet [@gorishniy2021revisiting]

-   SNN [@klambauer2017self]

-   DCNv2 [@wang2020dcn2]

-   AutoInt [@song2019autoint]

-   MLP-Mixer is our adaptation of [@tolstikhin2021mlp] for tabular data.

-   Trompt [@chen2023trompt] (our reimplementation, since there is no official implementation)

We also evaluated TabPFN [@hollmann2022tabpfn], where possible. The results for this model are available only in `\autoref{A:sec:per-dataset-results}`{=latex} because this model is by design not applicable to regression tasks, which is a considerable number of our datasets. Overall, TabPFN specializes in small datasets. In line with that, the performance of TabPFN on our benchmark was not competitive.

Task performance {#task-performance}
----------------

`\autoref{A:fig:main-comparison}`{=latex} is a different version of `\autoref{fig:performance}`{=latex} with additional baselines. Overall, none of the additional baselines affect our main story.

`\autoref{A:fig:cdd}`{=latex} is the critical difference diagram (CDD) computed over exactly the same results that were used for building `\autoref{fig:performance}`{=latex}.

```{=latex}
\centering
```
![ An extended comparison of tabular models as in `\autoref{fig:performance}`{=latex}. Note that the ranks (left) are computed only over the `\nrandomsplits`{=latex} datasets with random splits because ResNet, AutoInt, and MLP-Mixer were evaluated only on one $1$ out of $9$ datasets with domain-aware splits. ](figures/figure3-appendix-split.png){#A:fig:main-comparison width="0.97\\linewidth"}

```{=latex}
\centering
```
![ Critical difference diagram. The computation method is taken from the [@kim2024carte]. ](figures/cdd-split.png){#A:fig:cdd width="0.6\\linewidth"}

```{=latex}
\newpage
```
Efficiency {#a:sec:extended-efficiency}
----------

This section complements `\autoref{sec:evaluation-efficiency}`{=latex}.

**Additional results.**

`\autoref{A:fig:efficiency}`{=latex} complements `\autoref{fig:efficiency}`{=latex} by providing the training times on smaller datasets and the inference throughput on GPU with large batch sizes.

`\autoref{A:tab:n-params}`{=latex} provide the number of trainable parameters for some of the models from `\autoref{fig:performance}`{=latex}.

**Motivation for the benchmark setup.** Comparing models under all possible kinds of budgets (task performance, the number of parameters, training time, etc.) on all possible hardware (GPU, CPU, etc.) with all possible batch sizes is rather infeasible. As such, we set a narrow goal of *providing a high-level intuition on the efficiency in a transparent setting*. Thus, benchmarking the transparently obtained tuned hyperparameter configurations works well for our goal. Yet, this choice also has a limitation: the hyperparameter tuning process is not aware of the efficiency budget, so it can prefer much heavier configurations even if they lead to tiny performance improvements, which will negatively affect efficiency without a good reason. Overall, we hope that the large number of datasets compensates for potentially imperfect per-dataset measurements.

**Motivation for the two setups for measuring inference throughput.**

-   The setup on the right side of `\autoref{fig:efficiency}`{=latex} simulates the online per-object predictions.

-   The setup on the right side of `\autoref{A:fig:efficiency}`{=latex} simulates the offline batched computations.

```{=latex}
\centering
```
```{=latex}
\centering
```
![ (*Left*) Training time on datasets with less than 100K objects. (*Right*) Inference throughput on GPU with maximum possible batch size (i.e. the batch size depends on a model). ](figures/training-time-split-small.png){#A:fig:efficiency width="0.95\\linewidth"}

```{=latex}
\hfill
```
```{=latex}
\centering
```
![ (*Left*) Training time on datasets with less than 100K objects. (*Right*) Inference throughput on GPU with maximum possible batch size (i.e. the batch size depends on a model). ](figures/inference-gpu-v4-mini.png){#A:fig:efficiency width="0.95\\linewidth"}

```{=latex}
\centering
```
```{=latex}
\scalebox{0.75}{\begin{tabular}{ccccccc}
\toprule
\model & MLP & FT-T & T2G & TabR & ModernNCA & SAINT \\
\midrule
$1.4M\pm1.3M$ &
$1.0M\pm1.0M$ &
$1.2M\pm1.2M$ &
$2.1M\pm1.6M$ &
$858K\pm1.4M$ &
$1.0M\pm1.1M$ &
$175.4M\pm565.4M$ \\
\bottomrule

\end{tabular}}
```
`\label{A:tab:n-params}`{=latex}

Datasets {#A:sec:datasets}
========

In total, we use `\ndatasets`{=latex} datasets:

1.  $38$ datasets are taken from [@gorishniy2023tabr], which includes:

    1.  $28$ datasets from [@grinsztajn2022why]. See the original paper for the precise dataset information.

    2.  $10$ datasets from other sources. Their properties are provided in `\autoref{A:tab:default-datasets}`{=latex}.

2.  $8$ datasets from the TabReD benchmark [@rubachev2024tabred]. Their properties are provided in `\autoref{A:tab:tabred-datasets}`{=latex}.

In fact, the aforementioned $38$ datasets from @gorishniy2023tabr is only a subset of the datasets used in @gorishniy2023tabr. Namely, we did not include the following of the remaining datasets:

-   The datasets that, according to @rubachev2024tabred, have incorrect splits and/or label leakage, including: $\mathrm{Bike\_Sharing\_Demand}$, $\mathrm{compass}$, $\mathrm{electricity}$, $\mathrm{SGEMM\_GPU\_kernel\_performance}$, $\mathrm{sulfur}$, $\mathrm{visualizing\_soil}$, and the weather forecasting dataset (it is replaced by the correct weather forecasting dataset from TabReD [@rubachev2024tabred]).

-   $\mathrm{rl}$ from [@grinsztajn2022why]. We observed abnormal results on these datasets. This is an anonymous dataset, which made the investigation impossible, so we removed this dataset to avoid confusion.

-   $\mathrm{yprop\_4\_1}$ from [@grinsztajn2022why]. Strictly speaking, this dataset was omitted due to a mistake on our side. For future work, we note that the typical performance gaps on this dataset have low absolute values in terms of RMSE. Perhaps, $R^2$ may be a more appropriate metric for this dataset.

```{=latex}
\begin{table*}[h!]\setlength\tabcolsep{2.2pt}
    \centering
    \caption{
        Properties of those datasets from \citet{gorishniy2023tabr}
        that are not part of \citet{grinsztajn2022why} or TabReD \citet{rubachev2024tabred}.
        ``\# Num'', ``\# Bin'', and ``\# Cat'' denote the number of numerical, binary, and categorical features, respectively.
        The table is taken from \citep{gorishniy2023tabr}.
    }
    \label{A:tab:default-datasets}
    \scalebox{0.9}{\begin{tabular}{llcccccclc}
\toprule
Name & \# Train & \# Validation & \# Test & \# Num & \# Bin & \# Cat & Task type & Batch size \\
\midrule
Churn Modelling & $6\,400$ & $1\,600$ & $2\,000$ & $7$ & $3$ & $1$ & Binclass & 128 \\
California Housing & $13\,209$ & $3\,303$ & $4\,128$ & $8$ & $0$ & $0$ & Regression & 256 \\
House 16H & $14\,581$ & $3\,646$ & $4\,557$ & $16$ & $0$ & $0$ & Regression & 256 \\
Adult & $26\,048$ & $6\,513$ & $16\,281$ & $6$ & $1$ & $8$ & Binclass & 256 \\
Diamond & $34\,521$ & $8\,631$ & $10\,788$ & $6$ & $0$ & $3$ & Regression & 512 \\
Otto Group Products & $39\,601$ & $9\,901$ & $12\,376$ & $93$ & $0$ & $0$ & Multiclass & 512 \\
Higgs Small & $62\,751$ & $15\,688$ & $19\,610$ & $28$ & $0$ & $0$ & Binclass & 512 \\
Black Friday & $106\,764$ & $26\,692$ & $33\,365$ & $4$ & $1$ & $4$ & Regression & 512 \\
Covertype & $371\,847$ & $92\,962$ & $116\,203$ & $10$ & $4$ & $1$ & Multiclass & 1024 \\
Microsoft & $723\,412$ & $235\,259$ & $241\,521$ & $131$ & $5$ & $0$ & Regression & 1024 \\
\bottomrule
\end{tabular}

}
\end{table*}
```
```{=latex}
\begin{table*}[h!]\setlength\tabcolsep{2.2pt}
    \centering
    \caption{
        Properties of the datasets from the TabReD benchmark \citep{rubachev2024tabred}.
        ``\# Num'', ``\# Bin'', and ``\# Cat'' denote the number of numerical, binary, and categorical features, respectively.
    }
    \label{A:tab:tabred-datasets}
    \scalebox{0.9}{\begin{tabular}{llcccccclc}
\toprule
Name & \# Train & \# Validation & \# Test & \# Num & \# Bin & \# Cat & Task type & Batch size \\
\midrule
Sberbank Housing & $18\,847$ & $4\,827$ & $4\,647$ & $365$ & $17$ & $10$ & Regression & 256 \\
Ecom Offers & $109\,341$ & $24\,261$ & $26\,455$ & $113$ & $6$ & $0$ & Binclass & 1024 \\
Maps Routing & $160\,019$ & $59\,975$ & $59\,951$ & $984$ & $0$ & $2$ & Regression & 1024 \\
Homesite Insurance & $224\,320$ & $20\,138$ & $16\,295$ & $253$ & $23$ & $23$ & Binclass & 1024 \\
Cooking Time & $227\,087$ & $51\,251$ & $41\,648$ & $186$ & $3$ & $3$ & Regression & 1024 \\
Homecredit Default & $267\,645$ & $58\,018$ & $56\,001$ & $612$ & $2$ & $82$ & Binclass & 1024 \\
Delivery ETA & $279\,415$ & $34\,174$ & $36\,927$ & $221$ & $1$ & $1$ & Regression & 1024 \\
Weather & $106\,764$ & $42\,359$ & $40\,840$ & $100$ & $3$ & $0$ & Regression & 1024 \\
\bottomrule
\end{tabular}

}
\end{table*}
```
Implementation details {#A:sec:implementation-details}
======================

Hardware {#A:sec:impl-hardware}
--------

Most of the experiments were conducted on a single NVIDIA A100 GPU. In rare exceptions, we used a machine with a single NVIDIA 2080 Ti GPU and Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz.

Experiment setup {#A:sec:impl-experiment-setup}
----------------

We mostly follow the experiment setup from [@gorishniy2023tabr]. As such, some of the text below is copied from [@gorishniy2023tabr].

**Data preprocessing.** For each dataset, for all DL-based solutions, the same preprocessing was used for fair comparison. For numerical features, by default, we used a slightly modified version of the quantile normalization from the Scikit-learn package [@pedregosa2011scikit] (see the source code), with rare exceptions when it turned out to be detrimental (for such datasets, we used the standard normalization or no normalization). For categorical features, we used one-hot encoding. Binary features (i.e. the ones that take only two distinct values) are mapped to $\{0,1\}$ without any further preprocessing. We completely follow [@rubachev2024tabred] on `\autoref{A:tab:tabred-datasets}`{=latex} datasets.

**Training neural networks.** For DL-based algorithms, we minimize cross-entropy for classification problems and mean squared error for regression problems. We use the AdamW optimizer [@loshchilov2019decoupled]. We do not apply learning rate schedules. We do not use data augmentations. We apply global gradient clipping to $1.0$. For each dataset, we used a predefined dataset-specific batch size. We continue training until there are $\texttt{patience}$ consecutive epochs without improvements on the validation set; we set $\texttt{patience} = 16$ for the DL models.

**Hyperparameter tuning.** In most cases, hyperparameter tuning is performed with the TPE sampler (typically, 50-100 iterations) from the Optuna package [@akiba2019optuna]. Hyperparameter tuning spaces for most models are provided in individual sections below (example for `\model`{=latex}: `\autoref{A:sec:impl-model}`{=latex}). We follow [@rubachev2024tabred] and use $25$ iterations on some datasets from `\autoref{A:tab:tabred-datasets}`{=latex}.

**Evaluation.** On a given dataset, for a given model, the tuned hyperparameters are evaluated under multiple (in most cases, $15$) random seeds. The mean test metric and its standard deviation over these random seeds are then used to compare algorithms as described in `\autoref{A:sec:impl-metrics}`{=latex}.

Metrics {#A:sec:impl-metrics}
-------

We use Root Mean Squared Error for regression tasks, ROC-AUC for classification datasets from `\autoref{A:tab:tabred-datasets}`{=latex} (following @rubachev2024tabred), and accuracy for the rest of datasets (following @gorishniy2023tabr). We also tried computing ROC-AUC for all classification datasets, but did not observe any significant changes (see `\autoref{A:fig:evaluation-roc-auc}`{=latex}), so we stuck to prior work. By default, the mean test score and its standard deviation are obtained by training a given model with tuned hyperparameters from scratch on a given dataset under 15 different random seeds.

**How we compute ranks.** Our method of computing ranks used in `\autoref{fig:performance}`{=latex} does not count small improvements as wins, hence the reduced range of ranks compared to other studies. Intuitively, our ranks can be considered as "tiers".

Recall that, on a given dataset, the performance of a given model A is expressed with the mean $\text{A}_\text{mean}$ and the standard deviation $\text{A}_\text{std}$ of the performance score computed after the evaluation under multiple random seeds. Assuming the higher score the better, we define that the model A is better than the model B if: $\text{A}_\text{mean} - \text{A}_\text{std} > \text{B}_\text{mean}$. In other words, a model is considered better if it has a better mean score and the margin is larger than the standard deviation.

On a given dataset, when there are many models, we sort them in descending score order. Starting from the best model (with a rank equal to $1$) we iterate over models and assign the rank $1$ to all models that are no worse than the best model according to the above rule. The first model in descending order that is worse than the best model is assigned rank $2$ and becomes the new reference model. We continue the process until all models are ranked. Ranks are computed independently for each dataset.

Implementation details of `\autoref{sec:evaluation-efficiency}`{=latex} {#A:sec:impl-evaluation-efficiency}
-----------------------------------------------------------------------

**Applicability to large datasets.** The two datasets used in `\autoref{tab:large}`{=latex} are the *full* versions of the \`\`Weather" and \`\`Maps Routing" datasets from the TabReD benchmark [@rubachev2024tabred]. Their smaller versions with subsampled training set were already included in `\autoref{tab:datasets}`{=latex} and were used when building `\autoref{fig:performance}`{=latex}. The validation and test sets are the same for the small and large versions of these datasets, so the task metrics are comparable between the two versions. When running models on the large versions of the datasets, we reused the hyperparameters tuned for their small versions. Thus, this experiment can be seen as a quick assessment of the applicability of several tabular DL to large datasets without a strong focus on the task performance. All models, except for FT-Transformer, were evaluated under $3$ random seeds. FT-Transformer was evaluated under $1$ random seed.

```{=latex}
\centering
```
![ Same as `\autoref{fig:performance}`{=latex}, but ROC-AUC is used as the metric for all classification datasets. The two multiclass datasets presented in our benchmark are not taken into account. ](figures/figure3-split-roc-appedndix.png){#A:fig:evaluation-roc-auc width="0.95\\linewidth"}

Implementation details of `\autoref{sec:analysis-optimization}`{=latex} {#A:sec:impl-analysis-optimization}
-----------------------------------------------------------------------

**Experiment setup.** This paragraph complements the description of the experiment setup in `\autoref{sec:analysis-optimization}`{=latex}. Namely, in addition to what is mentioned in the main text:

-   Dropout and weight decay are turned off.

-   To get representative training profiles for all models, the learning rates are tuned separately for `\modelminik{1}`{=latex} and `\modelminik{32}`{=latex} on validation sets using the usual metrics (i.e. RMSE or accuracy) as the guidance. The grid for learning rate tuning was: `\mbox{\texttt{numpy.logspace(numpy.log10(1e-5), numpy.log10(5e-3), num=25)}}`{=latex}.

Implementation details of `\autoref{sec:analysis-selecting-submodels}`{=latex} {#A:sec:impl-analysis-selecting-submodels}
------------------------------------------------------------------------------

**`\modelgreedyheads`{=latex}.** Here, we clarify the implementation details for `\modelgreedyheads`{=latex} described in `\autoref{sec:analysis-selecting-submodels}`{=latex}. `\modelgreedyheads`{=latex} is obtained from a trained `\model`{=latex} by greedily selecting submodels from `\model`{=latex} starting from the best one and stopping when two conditions are simultaneously true for the first time: (1) adding any new submodel does not improve the validation metric of the collective prediction; (2) the current validation metric is already better than that of the initial model with all $k$ submodels. To clarify, during the greedy selection, the $i$-th submodel is considered to be better than the $j$-th submodel if adding the $i$-th submodel to the aggregated prediction leads to better validation metrics (i.e. it is *not* the same as adding the submodel in the order of their individual validation metrics).

Implementation details of `\autoref{sec:analysis-k}`{=latex} {#A:sec:impl-analysis-k}
------------------------------------------------------------

`\autoref{fig:d-k-ablation}`{=latex} shows the mean percentage improvements (see `\autoref{A:sec:impl-metrics}`{=latex}) over MLP across $17$ datasets: all datasets except for Covertype from `\autoref{A:tab:default-datasets}`{=latex}, and all datasets from TabReD [@rubachev2024tabred]. We have used the dropout rate $0.1$ and tuned the learning rate separately for each value of $k$. The score on each dataset is averaged over $5$ seeds.

Non-linear embeddings for continuous features {#A:sec:impl-feature-embeddings}
---------------------------------------------

**Notation.** We use the notation based on $\dagger$ and $\ddagger$ only for brevity. Any other unambiguous notation can be used in future work.

**Updated piecewise-linear embeddings.** We use a slightly different implementation of the piecewise-linear embeddings compared to @gorishniy2022embeddings. Architecture-wise, our implementation corresponds to the \`\`Q-L" and \`\`T-L" variations from Table 2 in @gorishniy2022embeddings (we use the quantile-based bins for simplicity). In practice, our implementation is significantly faster and uses a different parametrization and initialization. See the source code for details.

**Other models.** Since it is not feasible to test all combinations of backbones and embeddings, for baselines, we stick to the embeddings used in the original papers (applies to TabR [@gorishniy2023tabr], ExcelFormer [@chen2023excelformer] and ModernNCA [@ye2024modern]). For all models with feature embeddings (including TabM, MLP, TabR, ModernNCA, ExcelFormer), the embeddings-related details are commented in the corresponding sections below.

`\model`{=latex} {#A:sec:impl-model}
----------------

**Feature embeddings.** `\modelminiemb`{=latex} and `\modelemb`{=latex} are the versions of `\model`{=latex} with non-linear feature embeddings. `\modelminiemb`{=latex} and `\modelemb`{=latex} use the updated piecewise-linear feature embeddings mentioned in `\autoref{A:sec:impl-feature-embeddings}`{=latex}.

`\autoref{A:tab:tabm-space}`{=latex} provides the hyperparameter tuning spaces for `\model`{=latex} and `\modelmini`{=latex}. `\autoref{A:tab:tabm-emb-space}`{=latex} provides the hyperparameter tuning spaces for `\modelemb`{=latex} and `\modelminiemb`{=latex}.

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter              Distribution or Value
  ---------------------- -------------------------------------------------------- --
  $k$                    $32$
  \# layers              $\mathrm{UniformInt}[1,5]$
  Width (hidden size)    $\mathrm{UniformInt}[64,1024]$
  Dropout rate           $\{0.0, \mathrm{Uniform}[0.0,0.5]\}$
  Learning rate          $\mathrm{LogUniform}[1e\text{-}4, 5e\text{-}3]$
  Weight decay           $\{0, \mathrm{LogUniform}[1e\text{-}4, 1e\text{-}1]\}$
  \# Tuning iterations   \(A\) 100 (B) 50

  : The hyperparameter tuning space for `\model`{=latex} and `\modelmini`{=latex}. Here, (B) = {Covertype, Microsoft, `\autoref{A:tab:tabred-datasets}`{=latex}} and (A) contains all other datasets.

`\label{A:tab:tabm-space}`{=latex}

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter              Distribution or Value
  ---------------------- -------------------------------------------------------- --
  $k$                    $32$
  \# layers              $\mathrm{UniformInt}[1,4]$
  Width (hidden size)    $\mathrm{UniformInt}[64,1024]$
  Dropout rate           $\{0.0, \mathrm{Uniform}[0.0,0.5]\}$
  \# PLE bins            $\mathrm{UniformInt}[8, 32]$
  Learning rate          $\mathrm{LogUniform}[5e\text{-}5, 3e\text{-}3]$
  Weight decay           $\{0, \mathrm{LogUniform}[1e\text{-}4, 1e\text{-}1]\}$
  \# Tuning iterations   \(A\) 100 (B) 50

  : The hyperparameter tuning space for `\modelminiemb`{=latex} and `\modelemb`{=latex}. Here, (B) = {Covertype, Microsoft, `\autoref{A:tab:tabred-datasets}`{=latex}} and (A) contains all other datasets.

`\label{A:tab:tabm-emb-space}`{=latex}

MLP
---

**Feature embeddings.** MLP^$\dagger$^ and MLP^$\ddagger$^ are the versions of MLP with non-linear feature embeddings. MLP^$\dagger$^ uses the updated piecewise-linear embeddings mentioned in `\autoref{A:sec:impl-feature-embeddings}`{=latex}. MLP^$\ddagger$^ (also known as MLP-PLR) uses the periodic embeddings [@gorishniy2022embeddings]. Technically, it is the `PeriodicEmbeddings` class from the `rtdl_num_embeddings` Python package. We tested two variations: with `lite=False` and `lite=True`. In the paper, only the former one is reported, but in the source code, the results for both are available.

`\autoref{A:tab:mlp-space}`{=latex}, `\autoref{A:tab:mlp-emb-space}`{=latex}, `\autoref{A:tab:mlp-plr-space}`{=latex} provide the hyperparameter tuning spaces for MLP, MLP^$\dagger$^ and MLP^$\ddagger$^, respectively.

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter              Distribution
  ---------------------- -------------------------------------------------------- --
  \# layers              $\mathrm{UniformInt}[1,6]$
  Width (hidden size)    $\mathrm{UniformInt}[64,1024]$
  Dropout rate           $\{0.0, \mathrm{Uniform}[0.0,0.5]\}$
  Learning rate          $\mathrm{LogUniform}[3e\text{-}5, 1e\text{-}3]$
  Weight decay           $\{0, \mathrm{LogUniform}[1e\text{-}4, 1e\text{-}1]\}$
  \# Tuning iterations   100

  : The hyperparameter tuning space for MLP.

`\label{A:tab:mlp-space}`{=latex}

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter              Distribution
  ---------------------- -------------------------------------------------------- --
  \# layers              $\mathrm{UniformInt}[1,5]$
  Width (hidden size)    $\mathrm{UniformInt}[64,1024]$
  Dropout rate           $\{0.0, \mathrm{Uniform}[0.0,0.5]\}$
  Learning rate          $\mathrm{LogUniform}[3e\text{-}5, 1e\text{-}3]$
  Weight decay           $\{0, \mathrm{LogUniform}[1e\text{-}4, 1e\text{-}1]\}$
  d\_embedding           $\mathrm{UniformInt}[8,32]$
  n\_bins                $\mathrm{UniformInt}[2,128]$
  \# Tuning iterations   100

  : The hyperparameter tuning space for $\mathrm{MLP}^{\dagger}$.

`\label{A:tab:mlp-emb-space}`{=latex}

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter                Distribution
  ------------------------ -------------------------------------------------------- --
  \# layers                $\mathrm{UniformInt}[1,5]$
  Width (hidden size)      $\mathrm{UniformInt}[64,1024]$
  Dropout rate             $\{0.0, \mathrm{Uniform}[0.0,0.5]\}$
  Learning rate            $\mathrm{LogUniform}[3e\text{-}5, 1e\text{-}3]$
  Weight decay             $\{0, \mathrm{LogUniform}[1e\text{-}4, 1e\text{-}1]\}$
  n\_frequencies           $\mathrm{UniformInt}[16,96]$
  d\_embedding             $\mathrm{UniformInt}[16,32]$
  frequency\_init\_scale   $\mathrm{LogUniform}[1e\text{-}2, 1e\text{1}]$
  \# Tuning iterations     100

  : The hyperparameter tuning space for $\mathrm{MLP}^{\ddagger}$.

`\label{A:tab:mlp-plr-space}`{=latex}

TabR
----

**Feature embeddings.** TabR^$\ddagger$^ is the version of TabR with non-linear feature embeddings. TabR^$\ddagger$^ uses the periodic embeddings [@gorishniy2022embeddings], specifically, `PeriodicEmbeddings(lite=True)` from the `rtdl_num_embeddings` Python package on most datasets. On the datasets from `\autoref{A:tab:tabred-datasets}`{=latex}, TabR^$\ddagger$^ uses the `PeriodicEmbeddings(lite=True)` embeddings on the Sberbank Housing and Ecom Offers datasets, and `LinearReLUEmbeddings` on the rest (to fit the computations into the GPU memory, following the original TabR paper).

Since we follow the training and evaluation protocols from [@gorishniy2023tabr], and TabR was proposed in [@gorishniy2023tabr], we simply reuse the results for TabR. More details can be found in Appendix.D from [@gorishniy2023tabr]. When tuning TabR^$\ddagger$^ on the datasets from `\autoref{A:tab:tabred-datasets}`{=latex}, we have used $25$ tuning iterations and the same tuning space as for TabR from [@rubachev2024tabred].

FT-Transformer
--------------

We used the implementation from the \"`rtdl_revisiting_models`\" Python package. The results on datasets from `\autoref{A:tab:tabred-datasets}`{=latex} were copied from [@rubachev2024tabred], because the experiment setups are compatible.

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter                             Distribution or Value
  ------------------------------------- -------------------------------------------------------- --
  \# blocks                             $\mathrm{UniformInt}[1,4]$
  $d_{token}$                           $\mathrm{UniformInt}[16,384]$
  Attention dropout rate                $\mathrm{Uniform}[0.0,0.5]$
  FFN hidden dimension expansion rate   $\mathrm{Uniform}[\nicefrac{2}{3},\nicefrac{8}{3}]$
  FFN dropout rate                      $\mathrm{Uniform}[0.0,0.5]$
  Residual dropout rate                 $\{0.0, \mathrm{Uniform}[0.0,0.2] \}$
  Learning rate                         $\mathrm{LogUniform}[3e\text{-}5, 1e\text{-}3]$
  Weight decay                          $\{0, \mathrm{LogUniform}[1e\text{-}4, 1e\text{-}1]\}$
  \# Tuning iterations                  \(A\) 100 (B) 50

  :  The hyperparameter tuning space for FT-Transformer [@gorishniy2021revisiting]. Here, (B) = {Covertype, Microsoft} and (A) contains all other datasets (except `\autoref{A:tab:tabred-datasets}`{=latex}).

ModernNCA
---------

**Feature embeddings.** We adapted the official implementation of @ye2024modern. We used periodic embeddings [@gorishniy2022embeddings] (specifically, `PeriodicEmbeddings(lite=True)` from the `rtdl_num_embeddings` Python package) for ModernNCA^$\ddagger$^ and no embeddings for ModernNCA. `\autoref{A:tab:mnca-space}`{=latex} and `\autoref{A:tab:mnca-emb-space}`{=latex} provides hyperparameter tuning spaces for each ModernNCA and ModernNCA^$\ddagger$^.

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter              Distribution
  ---------------------- -------------------------------------------------------- --
  \# blocks              $\mathrm{UniformInt}[0, 2]$
  $d_{block}$            $\mathrm{UniformInt}[64,1024]$
  dim                    $\mathrm{UniformInt}[64,1024]$
  Dropout rate           $\mathrm{Uniform}[0.0,0.5]$
  Sample rate            $\mathrm{Uniform}[0.05, 0.6]$
  Learning rate          $\mathrm{LogUniform}[1e\text{-}5, 1e\text{-}1]$
  Weight decay           $\{0, \mathrm{LogUniform}[1e\text{-}6, 1e\text{-}3]\}$
  \# Tuning iterations   \(A\) 100 (B, C) 50

  :  The hyperparameter tuning space for ModernNCA. Here, (C) = {`\autoref{A:tab:tabred-datasets}`{=latex}}, (B) = {Covertype, Microsoft} and (A) contains all other datasets.

`\label{A:tab:mnca-space}`{=latex}

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter                Distribution
  ------------------------ -------------------------------------------------------- --
  \# blocks                $\mathrm{UniformInt}[0, 2]$
  $d_{block}$              $\mathrm{UniformInt}[64,1024]$
  dim                      $\mathrm{UniformInt}[64,1024]$
  Dropout rate             $\mathrm{Uniform}[0.0,0.5]$
  Sample rate              $\mathrm{Uniform}[0.05, 0.6]$
  Learning rate            $\mathrm{LogUniform}[1e\text{-}5, 1e\text{-}1]$
  Weight decay             $\{0, \mathrm{LogUniform}[1e\text{-}6, 1e\text{-}3]\}$
  n\_frequencies           $\mathrm{UniformInt}[16, 96]$
  d\_embedding             $\mathrm{UniformInt}[16, 32]$
  frequency\_init\_scale   $\mathrm{LogUniform}[0.01, 10]$
  \# Tuning iterations     \(A\) 100 (B, C) 50

  :  The hyperparameter tuning space for ModernNCA^$\ddagger$^. Here, (C) = {`\autoref{A:tab:tabred-datasets}`{=latex}}, (B) = {Covertype, Microsoft} and (A) contains all other datasets.

`\label{A:tab:mnca-emb-space}`{=latex}

T2G-Former
----------

We adapted the implementation and hyperparameters of [@yan2023t2g] from the official repository[^2]. `\autoref{A:tab:t2g-space}`{=latex} provides hyperparameter tuning space.

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter                             Distribution or Value
  ------------------------------------- ---------------------------------------------------------------------- --
  \# blocks                             \(A\) $\mathrm{UniformInt}[3,4]$ (B, C) $\mathrm{UniformInt}[1,3]$
  $d_{token}$                           $\mathrm{UniformInt}[64,512]$
  Attention dropout rate                $\mathrm{Uniform}[0.0,0.5]$
  FFN hidden dimension expansion rate   (A, B) $\mathrm{Uniform}[\nicefrac{2}{3},\nicefrac{8}{3}]$ (C) $4/3$
  FFN dropout rate                      $\mathrm{Uniform}[0.0,0.5]$
  Residual dropout rate                 $\{0.0, \mathrm{Uniform}[0.0,0.2] \}$
  Learning rate                         $\mathrm{LogUniform}[3e\text{-}5, 1e\text{-}3]$
  Col. Learning rate                    $\mathrm{LogUniform}[5e\text{-}3, 5e\text{-}2]$
  Weight decay                          $\{0, \mathrm{LogUniform}[1e\text{-}6, 1e\text{-}1]\}$
  \# Tuning iterations                  \(A\) 100 (B) 50 (C) 25

  : The hyperparameter tuning space for T2G-Former [@yan2023t2g]. Here, (C) = {`\autoref{A:tab:tabred-datasets}`{=latex}}, (B) = {Covertype, Microsoft} and (A) contains all other datasets. Also, we used $50$ tuning iterations on some datasets from [@grinsztajn2022why].

`\label{A:tab:t2g-space}`{=latex}

SAINT
-----

We completely adapted hyperparameters and protocol from [@gorishniy2023tabr] to evaluate SAINT on [@grinsztajn2022why] benchmark. Results on datasets from `\autoref{A:tab:default-datasets}`{=latex} were directly taken from [@gorishniy2023tabr]. Additional details can be found in Appendix.D from [@gorishniy2023tabr]. We have used a default configuration on big datasets due to the very high cost of tuning (see `\autoref{A:tab:saint-hp}`{=latex}).

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
::: {#A:tab:saint-hp}
  Parameter                             Value
  ------------------------------------- --------------- --
  depth                                 $2$
  $d_{token}$                           $32$
  $n_{heads}$                           $4$
  $d_{head}$                            $8$
  Attention dropout rate                $0.1$
  FFN hidden dimension expansion rate   $1$
  FFN dropout rate                      $0.8$
  Learning rate                         $1e\text{-}4$
  Weight decay                          $1e\text{-}2$

  : The default hyperparameters for SAINT [@somepalli2021saint] on datasets from [@rubachev2024tabred].
:::

Excelformer
-----------

**Feature embeddings.** ExcelFormer [@chen2023excelformer] uses custom non-linear feature embeddings based on a GLU-style activation, see the original paper for details.

We adapted the implementation and hyperparameters of [@chen2023excelformer] from the official repository[^3]. For a fair comparison with other models, we did not use the augmentation techniques from the paper in our experiments. See `\autoref{A:tab:excel-space}`{=latex}.

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter                Distribution or Value
  ------------------------ ------------------------------------------------------------------------------------------------- --
  \# blocks                (A, B) $\mathrm{UniformInt}[2,5]$ (C) $\mathrm{UniformInt}[2,4]$ (D) $\mathrm{UniformInt}[1,3]$
  $d_{token}$              (A, B) $\{32, 64, 128, 256\}$ (C) $\{16, 32, 64\}$ (D) $\{4, 8, 16, 32\}$
  $n_{heads}$              (A,B) $\{4, 8, 16, 32\}$ (C) $\{4, 8, 16\}$ (D) $4$
  Attention dropout rate   $0.3$
  FFN dropout rate         $0.0$
  Residual dropout rate    $\mathrm{Uniform}[0.0,0.5]$
  Learning rate            $\mathrm{LogUniform}[3e\text{-}5, 1e\text{-}3]$
  Weight decay             $\{0, \mathrm{LogUniform}[1e\text{-}4, 1e\text{-}1]\}$
  \# Tuning iterations     \(A\) 100 (B) 50 (C, D) 25

  : The hyperparameter tuning space for Excelformer [@chen2023excelformer]. Here, (D) = {Homecredit, Maps Routing}, (C) = {`\autoref{A:tab:tabred-datasets}`{=latex} w/o (D)}, (B) = {Covertype, Microsoft} and (A) contains all other datasets.

`\label{A:tab:excel-space}`{=latex}

CatBoost, XGBoost and LightGBM
------------------------------

Since our setup is directly taken from [@gorishniy2023tabr], we simply reused their results for GBDTs from the official repository[^4]. Importantly, in a series of preliminary experiments, we confirmed that those results are reproducible in our instance of their setup. The details can be found in Appendix.D from [@gorishniy2023tabr]. Results on datasets from `\autoref{A:tab:tabred-datasets}`{=latex} were copied from the paper [@rubachev2024tabred].

AutoInt
-------

We used an implementation from [@gorishniy2021revisiting] which is an adapted official implementation[^5].

```{=latex}
\centering
```
```{=latex}
\renewcommand{\arraystretch}{1.2}
```
  Parameter                Distribution
  ------------------------ -------------------------------------------------------- --
  \# blocks                $\mathrm{UniformInt}[1,6]$
  $d_{token}$              $\mathrm{UniformInt}[8,64]$
  $n_{heads}$              2
  Attention dropout rate   $\{0, \mathrm{Uniform}[0.0,0.5]\}$
  Embedding dropout rate   $\{0, \mathrm{Uniform}[0.0,0.5]\}$
  Learning rate            $\mathrm{LogUniform}[3e\text{-}5, 1e\text{-}3]$
  Weight decay             $\{0, \mathrm{LogUniform}[1e\text{-}4, 1e\text{-}1]\}$
  \# Tuning iterations     \(A\) 100 (B) 50

  : The hyperparameter tuning space for AutoInt [@song2019autoint]. Here, (B) = {Covertype, Microsoft} and (A) contains all other datasets.

```{=latex}
\vspace{1em}
```
### TabPFN

Since TabPFN accepts only less than 10K training samples we use different subsamples of size 10K for different random seeds. Also, TabPFN is not applicable to regressions and datasets with more than $100$ features.

```{=latex}
\newpage
```
Per-dataset results with standard deviations {#A:sec:per-dataset-results}
============================================

```{=latex}
\centering
```
```{=latex}
\newcommand{\topalign}[1]{%
\vtop{\vskip 0pt #1}}
```
```{=latex}
\begin{longtable}{p{0.5\textwidth}p{0.5\textwidth}}
\caption{
Extended results for the main benchmark.
Results are grouped by datasets.
One ensemble consists of five models trained independently under different random seeds.
}\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{churn \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.8553 \pm 0.0029$} & {\footnotesize$0.8582 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.8624 \pm 0.0008$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.8545 \pm 0.0044$} & {\footnotesize$0.8565 \pm 0.0035$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.8567 \pm 0.0020$} & {\footnotesize$0.8570 \pm 0.0017$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.8506 \pm 0.0051$} & {\footnotesize$0.8533 \pm 0.0033$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.8600 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.8607 \pm 0.0047$} & {\footnotesize$0.8622 \pm 0.0003$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.8592 \pm 0.0036$} & {\footnotesize$0.8630 \pm 0.0005$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.8618 \pm 0.0023$} & {\footnotesize$0.8625 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.8603 \pm 0.0029$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.8593 \pm 0.0028$} & {\footnotesize$0.8598 \pm 0.0025$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.8613 \pm 0.0015$} & -- \\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.8624 \pm 0.0010$} & {\footnotesize$0.8638 \pm 0.0012$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.8624 \pm 0.0026$} & {\footnotesize$0.8640 \pm 0.0010$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.8580 \pm 0.0028$} & {\footnotesize$0.8605 \pm 0.0018$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8605 \pm 0.0022$} & {\footnotesize$0.8608 \pm 0.0013$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.8600 \pm 0.0008$} & {\footnotesize$0.8600 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8582 \pm 0.0017$} & {\footnotesize$0.8588 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.8599 \pm 0.0025$} & {\footnotesize$0.8620 \pm 0.0023$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8625 \pm 0.0021$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.8595 \pm 0.0028$} & {\footnotesize$0.8615 \pm 0.0013$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8606 \pm 0.0032$} & {\footnotesize$0.8607 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8613 \pm 0.0025$} & {\footnotesize$0.8615 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8605 \pm 0.0016$} & {\footnotesize$0.8612 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.8609 \pm 0.0024$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.8633 \pm 0.0018$} & {\footnotesize$0.8638 \pm 0.0012$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8606 \pm 0.0023$} & {\footnotesize$0.8630 \pm 0.0030$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{california \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.4948 \pm 0.0058$} & {\footnotesize$0.4880 \pm 0.0022$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.4915 \pm 0.0031$} & {\footnotesize$0.4862 \pm 0.0017$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.4971 \pm 0.0122$} & {\footnotesize$0.4779 \pm 0.0022$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.5033 \pm 0.0075$} & {\footnotesize$0.4933 \pm 0.0035$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.4579 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.4682 \pm 0.0063$} & {\footnotesize$0.4490 \pm 0.0028$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.4746 \pm 0.0056$} & {\footnotesize$0.4509 \pm 0.0029$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.4544 \pm 0.0048$} & {\footnotesize$0.4350 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.4680 \pm 0.0048$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.4635 \pm 0.0048$} & {\footnotesize$0.4515 \pm 0.0016$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.4640 \pm 0.0100$} & {\footnotesize$0.4462 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.4652 \pm 0.0045$} & {\footnotesize$0.4549 \pm 0.0006$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.4597 \pm 0.0058$} & {\footnotesize$0.4482 \pm 0.0026$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.4530 \pm 0.0029$} & {\footnotesize$0.4491 \pm 0.0010$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.4327 \pm 0.0016$} & {\footnotesize$0.4316 \pm 0.0007$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.4352 \pm 0.0019$} & {\footnotesize$0.4339 \pm 0.0008$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.4294 \pm 0.0012$} & {\footnotesize$0.4265 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.4030 \pm 0.0023$} & {\footnotesize$0.3964 \pm 0.0013$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.3998 \pm 0.0033$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.4239 \pm 0.0012$} & {\footnotesize$0.4231 \pm 0.0005$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.4142 \pm 0.0031$} & {\footnotesize$0.4071 \pm 0.0029$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.4509 \pm 0.0032$} & {\footnotesize$0.4490 \pm 0.0018$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.4414 \pm 0.0012$} & {\footnotesize$0.4402 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.4413 \pm 0.0020$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.4479 \pm 0.0022$} & {\footnotesize$0.4461 \pm 0.0011$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.4275 \pm 0.0024$} & {\footnotesize$0.4244 \pm 0.0006$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{house \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$3.1117 \pm 0.0294$} & {\footnotesize$3.0706 \pm 0.0140$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$3.1143 \pm 0.0258$} & {\footnotesize$3.0706 \pm 0.0098$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$3.3327 \pm 0.0878$} & {\footnotesize$3.1303 \pm 0.0410$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$3.2176 \pm 0.0376$} & {\footnotesize$3.1320 \pm 0.0155$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$3.0638 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$3.2157 \pm 0.0436$} & {\footnotesize$3.1261 \pm 0.0095$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$3.1871 \pm 0.0519$} & {\footnotesize$3.0184 \pm 0.0086$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$3.2460 \pm 0.0685$} & {\footnotesize$3.1097 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$3.2424 \pm 0.0595$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$3.1823 \pm 0.0460$} & {\footnotesize$3.0974 \pm 0.0334$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$3.1613 \pm 0.0320$} & {\footnotesize$3.0982 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$3.0633 \pm 0.0248$} & {\footnotesize$3.0170 \pm 0.0070$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$3.0775 \pm 0.0336$} & {\footnotesize$3.0268 \pm 0.0170$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$3.0999 \pm 0.0351$} & {\footnotesize$3.0401 \pm 0.0071$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$3.1773 \pm 0.0102$} & {\footnotesize$3.1644 \pm 0.0068$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$3.1774 \pm 0.0087$} & {\footnotesize$3.1672 \pm 0.0050$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$3.1172 \pm 0.0125$} & {\footnotesize$3.1058 \pm 0.0022$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$3.0667 \pm 0.0403$} & {\footnotesize$2.9958 \pm 0.0270$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$3.1048 \pm 0.0410$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$3.0884 \pm 0.0286$} & {\footnotesize$3.0538 \pm 0.0072$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$3.0704 \pm 0.0388$} & {\footnotesize$3.0149 \pm 0.0308$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$3.0002 \pm 0.0182$} & {\footnotesize$2.9796 \pm 0.0024$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$3.0038 \pm 0.0097$} & {\footnotesize$2.9906 \pm 0.0026$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$3.0082 \pm 0.0184$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$3.0394 \pm 0.0139$} & {\footnotesize$3.0206 \pm 0.0128$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$2.9976 \pm 0.0196$} & {\footnotesize$2.9854 \pm 0.0076$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{adult \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.8540 \pm 0.0018$} & {\footnotesize$0.8559 \pm 0.0011$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.8554 \pm 0.0011$} & {\footnotesize$0.8562 \pm 0.0006$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.8582 \pm 0.0011$} & {\footnotesize$0.8593 \pm 0.0002$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.8582 \pm 0.0009$} & {\footnotesize$0.8603 \pm 0.0012$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.8590 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.8592 \pm 0.0016$} & {\footnotesize$0.8612 \pm 0.0004$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.8598 \pm 0.0013$} & {\footnotesize$0.8617 \pm 0.0002$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.8613 \pm 0.0024$} & {\footnotesize$0.8641 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.8601 \pm 0.0019$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.8588 \pm 0.0015$} & {\footnotesize$0.8608 \pm 0.0011$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.8601 \pm 0.0011$} & {\footnotesize$0.8622 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.8693 \pm 0.0007$} & {\footnotesize$0.8702 \pm 0.0006$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.8694 \pm 0.0011$} & {\footnotesize$0.8704 \pm 0.0008$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.8603 \pm 0.0009$} & {\footnotesize$0.8616 \pm 0.0006$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8720 \pm 0.0006$} & {\footnotesize$0.8723 \pm 0.0002$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.8713 \pm 0.0007$} & {\footnotesize$0.8721 \pm 0.0004$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8714 \pm 0.0012$} & {\footnotesize$0.8723 \pm 0.0007$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.8646 \pm 0.0022$} & {\footnotesize$0.8680 \pm 0.0019$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8699 \pm 0.0011$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.8677 \pm 0.0018$} & {\footnotesize$0.8696 \pm 0.0003$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8717 \pm 0.0008$} & {\footnotesize$0.8742 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8582 \pm 0.0011$} & {\footnotesize$0.8588 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8575 \pm 0.0008$} & {\footnotesize$0.8583 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.8572 \pm 0.0010$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.8598 \pm 0.0011$} & {\footnotesize$0.8604 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8700 \pm 0.0007$} & {\footnotesize$0.8701 \pm 0.0003$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{diamond \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.1404 \pm 0.0012$} & {\footnotesize$0.1362 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.1396 \pm 0.0029$} & {\footnotesize$0.1361 \pm 0.0011$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.1420 \pm 0.0032$} & {\footnotesize$0.1374 \pm 0.0020$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.1473 \pm 0.0057$} & {\footnotesize$0.1424 \pm 0.0008$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.1391 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.1392 \pm 0.0014$} & {\footnotesize$0.1361 \pm 0.0004$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.1400 \pm 0.0025$} & {\footnotesize$0.1378 \pm 0.0008$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.1766 \pm 0.0023$} & {\footnotesize$0.1712 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.1369 \pm 0.0019$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.1376 \pm 0.0013$} & {\footnotesize$0.1360 \pm 0.0002$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.1372 \pm 0.0011$} & {\footnotesize$0.1346 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.1342 \pm 0.0008$} & {\footnotesize$0.1325 \pm 0.0004$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.1337 \pm 0.0010$} & {\footnotesize$0.1317 \pm 0.0003$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.1323 \pm 0.0010$} & {\footnotesize$0.1301 \pm 0.0005$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.1368 \pm 0.0004$} & {\footnotesize$0.1363 \pm 0.0001$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.1359 \pm 0.0002$} & {\footnotesize$0.1358 \pm 0.0001$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.1335 \pm 0.0006$} & {\footnotesize$0.1327 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.1327 \pm 0.0010$} & {\footnotesize$0.1311 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.1333 \pm 0.0013$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.1370 \pm 0.0018$} & {\footnotesize$0.1348 \pm 0.0005$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.1327 \pm 0.0012$} & {\footnotesize$0.1315 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.1342 \pm 0.0017$} & {\footnotesize$0.1327 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.1310 \pm 0.0007$} & {\footnotesize$0.1307 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.1309 \pm 0.0008$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.1323 \pm 0.0007$} & {\footnotesize$0.1317 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.1315 \pm 0.0006$} & {\footnotesize$0.1312 \pm 0.0001$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{otto \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.8175 \pm 0.0022$} & {\footnotesize$0.8222 \pm 0.0007$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.7408 \pm 0.0028$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.8174 \pm 0.0021$} & {\footnotesize$0.8198 \pm 0.0006$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.8064 \pm 0.0021$} & {\footnotesize$0.8208 \pm 0.0023$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.8087 \pm 0.0020$} & {\footnotesize$0.8156 \pm 0.0013$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.8093 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.8050 \pm 0.0034$} & {\footnotesize$0.8111 \pm 0.0020$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.8092 \pm 0.0040$} & {\footnotesize$0.8136 \pm 0.0010$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.8102 \pm 0.0022$} & {\footnotesize$0.8220 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.8119 \pm 0.0018$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.8133 \pm 0.0033$} & {\footnotesize$0.8221 \pm 0.0013$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.8161 \pm 0.0019$} & {\footnotesize$0.8272 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.8190 \pm 0.0021$} & {\footnotesize$0.8271 \pm 0.0015$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.8189 \pm 0.0015$} & {\footnotesize$0.8253 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.8205 \pm 0.0021$} & {\footnotesize$0.8290 \pm 0.0006$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8297 \pm 0.0011$} & {\footnotesize$0.8316 \pm 0.0008$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.8302 \pm 0.0009$} & {\footnotesize$0.8316 \pm 0.0013$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8250 \pm 0.0013$} & {\footnotesize$0.8268 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.8179 \pm 0.0022$} & {\footnotesize$0.8236 \pm 0.0009$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8246 \pm 0.0018$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.8275 \pm 0.0012$} & {\footnotesize$0.8313 \pm 0.0006$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8265 \pm 0.0015$} & {\footnotesize$0.8304 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8268 \pm 0.0014$} & {\footnotesize$0.8300 \pm 0.0007$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8275 \pm 0.0014$} & {\footnotesize$0.8284 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.8254 \pm 0.0022$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.8282 \pm 0.0014$} & {\footnotesize$0.8299 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8342 \pm 0.0012$} & {\footnotesize$0.8356 \pm 0.0004$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{higgs-small \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.7180 \pm 0.0027$} & {\footnotesize$0.7192 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.6727 \pm 0.0034$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.7256 \pm 0.0020$} & {\footnotesize$0.7307 \pm 0.0001$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.7164 \pm 0.0030$} & {\footnotesize$0.7237 \pm 0.0011$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.7142 \pm 0.0024$} & {\footnotesize$0.7171 \pm 0.0020$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.7262 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.7240 \pm 0.0028$} & {\footnotesize$0.7287 \pm 0.0008$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.7248 \pm 0.0023$} & {\footnotesize$0.7334 \pm 0.0007$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.7262 \pm 0.0017$} & {\footnotesize$0.7329 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.7236 \pm 0.0019$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.7281 \pm 0.0016$} & {\footnotesize$0.7334 \pm 0.0013$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.7352 \pm 0.0037$} & {\footnotesize$0.7400 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.7260 \pm 0.0017$} & {\footnotesize$0.7304 \pm 0.0008$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.7261 \pm 0.0010$} & {\footnotesize$0.7270 \pm 0.0003$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.7210 \pm 0.0016$} & {\footnotesize$0.7252 \pm 0.0005$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.7246 \pm 0.0015$} & {\footnotesize$0.7264 \pm 0.0013$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.7256 \pm 0.0009$} & {\footnotesize$0.7263 \pm 0.0007$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.7260 \pm 0.0011$} & {\footnotesize$0.7273 \pm 0.0010$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.7223 \pm 0.0010$} & {\footnotesize$0.7257 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.7294 \pm 0.0014$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.7263 \pm 0.0023$} & {\footnotesize$0.7292 \pm 0.0006$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.7300 \pm 0.0020$} & {\footnotesize$0.7348 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.7383 \pm 0.0028$} & {\footnotesize$0.7409 \pm 0.0010$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.7394 \pm 0.0018$} & {\footnotesize$0.7409 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.7392 \pm 0.0016$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.7338 \pm 0.0011$} & {\footnotesize$0.7345 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.7361 \pm 0.0011$} & {\footnotesize$0.7383 \pm 0.0008$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{black-friday \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.6955 \pm 0.0004$} & {\footnotesize$0.6942 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.6929 \pm 0.0008$} & {\footnotesize$0.6907 \pm 0.0002$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.6968 \pm 0.0013$} & {\footnotesize$0.6936 \pm 0.0007$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.6996 \pm 0.0013$} & {\footnotesize$0.6978 \pm 0.0004$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.6983 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.6994 \pm 0.0082$} & {\footnotesize$0.6927 \pm 0.0021$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.6905 \pm 0.0021$} & {\footnotesize$0.6851 \pm 0.0011$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.6947 \pm 0.0016$} & {\footnotesize$0.6908 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.6934 \pm 0.0009$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.6987 \pm 0.0192$} & {\footnotesize$0.6879 \pm 0.0023$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.6887 \pm 0.0046$} & {\footnotesize$0.6832 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.6849 \pm 0.0006$} & {\footnotesize$0.6824 \pm 0.0002$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.6857 \pm 0.0004$} & {\footnotesize$0.6838 \pm 0.0002$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.6836 \pm 0.0006$} & {\footnotesize$0.6812 \pm 0.0002$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.6806 \pm 0.0001$} & {\footnotesize$0.6805 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.6799 \pm 0.0003$} & {\footnotesize$0.6795 \pm 0.0001$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.6822 \pm 0.0003$} & {\footnotesize$0.6813 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.6899 \pm 0.0004$} & {\footnotesize$0.6883 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.6761 \pm 0.0009$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.6893 \pm 0.0004$} & {\footnotesize$0.6883 \pm 0.0000$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.6885 \pm 0.0007$} & {\footnotesize$0.6863 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.6875 \pm 0.0015$} & {\footnotesize$0.6866 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.6869 \pm 0.0004$} & {\footnotesize$0.6865 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.6865 \pm 0.0005$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.6863 \pm 0.0006$} & {\footnotesize$0.6856 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.6781 \pm 0.0004$} & {\footnotesize$0.6773 \pm 0.0001$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{covtype2 \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.9630 \pm 0.0012$} & {\footnotesize$0.9664 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.7606 \pm 0.0022$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.9638 \pm 0.0005$} & {\footnotesize$0.9685 \pm 0.0003$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.9622 \pm 0.0019$} & {\footnotesize$0.9673 \pm 0.0011$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.9636 \pm 0.0010$} & {\footnotesize$0.9677 \pm 0.0002$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.9286 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.9614 \pm 0.0016$} & {\footnotesize$0.9696 \pm 0.0005$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.9663 \pm 0.0019$} & {\footnotesize$0.9699 \pm 0.0014$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.9606 \pm 0.0018$} & {\footnotesize$0.9670 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.9669 \pm 0.0010$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.9698 \pm 0.0008$} & {\footnotesize$0.9731 \pm 0.0006$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.9668 \pm 0.0008$} & {\footnotesize$0.9708 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.9690 \pm 0.0008$} & {\footnotesize$0.9721 \pm 0.0006$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.9713 \pm 0.0006$} & {\footnotesize$0.9758 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.9697 \pm 0.0008$} & {\footnotesize$0.9721 \pm 0.0005$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.9710 \pm 0.0002$} & {\footnotesize$0.9713 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.9709 \pm 0.0003$} & -- \\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.9670 \pm 0.0003$} & {\footnotesize$0.9680 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.9737 \pm 0.0005$} & {\footnotesize$0.9745 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.9752 \pm 0.0003$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.9724 \pm 0.0003$} & {\footnotesize$0.9729 \pm 0.0001$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.9747 \pm 0.0002$} & {\footnotesize$0.9747 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.9712 \pm 0.0008$} & {\footnotesize$0.9729 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.9735 \pm 0.0004$} & {\footnotesize$0.9743 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.9730 \pm 0.0005$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.9710 \pm 0.0007$} & {\footnotesize$0.9727 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.9755 \pm 0.0003$} & {\footnotesize$0.9762 \pm 0.0001$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{microsoft \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.7475 \pm 0.0003$} & {\footnotesize$0.7460 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.7472 \pm 0.0004$} & {\footnotesize$0.7452 \pm 0.0004$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.7499 \pm 0.0003$} & {\footnotesize$0.7477 \pm 0.0001$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.7488 \pm 0.0004$} & {\footnotesize$0.7470 \pm 0.0001$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.7476 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.7482 \pm 0.0005$} & {\footnotesize$0.7455 \pm 0.0002$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.7482 \pm 0.0008$} & {\footnotesize$0.7436 \pm 0.0001$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.7479 \pm 0.0007$} & {\footnotesize$0.7442 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.7625 \pm 0.0066$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.7460 \pm 0.0007$} & {\footnotesize$0.7422 \pm 0.0004$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.7460 \pm 0.0006$} & {\footnotesize$0.7427 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.7446 \pm 0.0002$} & {\footnotesize$0.7434 \pm 0.0002$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.7444 \pm 0.0003$} & {\footnotesize$0.7429 \pm 0.0001$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.7465 \pm 0.0005$} & {\footnotesize$0.7448 \pm 0.0001$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.7413 \pm 0.0001$} & {\footnotesize$0.7410 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.7417 \pm 0.0001$} & {\footnotesize$0.7413 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.7412 \pm 0.0001$} & {\footnotesize$0.7406 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.7503 \pm 0.0006$} & {\footnotesize$0.7485 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.7501 \pm 0.0005$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.7458 \pm 0.0003$} & {\footnotesize$0.7448 \pm 0.0002$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.7460 \pm 0.0008$} & {\footnotesize$0.7435 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.7434 \pm 0.0003$} & {\footnotesize$0.7424 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.7432 \pm 0.0004$} & {\footnotesize$0.7426 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.7432 \pm 0.0004$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.7436 \pm 0.0002$} & {\footnotesize$0.7430 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.7423 \pm 0.0002$} & {\footnotesize$0.7416 \pm 0.0001$}\\
\bottomrule
\end{tabular}}

\\
\end{longtable}
```
```{=latex}
\begin{longtable}{p{0.5\textwidth}p{0.5\textwidth}}
\caption{Extended results for \cite{grinsztajn2022why} benchmark. Results are grouped by datasets. One ensemble consists of five models trained independently with different random seeds.}\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{wine \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.7778 \pm 0.0153$} & {\footnotesize$0.7907 \pm 0.0117$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.7908 \pm 0.0063$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.7710 \pm 0.0137$} & {\footnotesize$0.7839 \pm 0.0083$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.7492 \pm 0.0147$} & {\footnotesize$0.7764 \pm 0.0095$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.7818 \pm 0.0143$} & {\footnotesize$0.7994 \pm 0.0097$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.7818 \pm 0.0081$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.7745 \pm 0.0144$} & {\footnotesize$0.7909 \pm 0.0160$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.7769 \pm 0.0149$} & {\footnotesize$0.7950 \pm 0.0087$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.7631 \pm 0.0171$} & {\footnotesize$0.7765 \pm 0.0121$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.7684 \pm 0.0144$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.7755 \pm 0.0133$} & {\footnotesize$0.7894 \pm 0.0083$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.7733 \pm 0.0118$} & {\footnotesize$0.7933 \pm 0.0137$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.7803 \pm 0.0157$} & {\footnotesize$0.7964 \pm 0.0146$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.7733 \pm 0.0185$} & {\footnotesize$0.7856 \pm 0.0160$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.7814 \pm 0.0132$} & {\footnotesize$0.7919 \pm 0.0098$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.7949 \pm 0.0178$} & {\footnotesize$0.8010 \pm 0.0186$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.7890 \pm 0.0160$} & {\footnotesize$0.7929 \pm 0.0106$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.7994 \pm 0.0131$} & {\footnotesize$0.8057 \pm 0.0098$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.7936 \pm 0.0114$} & {\footnotesize$0.8055 \pm 0.0057$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.7804 \pm 0.0148$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.7911 \pm 0.0135$} & {\footnotesize$0.8005 \pm 0.0121$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.7867 \pm 0.0113$} & {\footnotesize$0.7953 \pm 0.0114$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.7961 \pm 0.0136$} & {\footnotesize$0.8011 \pm 0.0084$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.7943 \pm 0.0124$} & {\footnotesize$0.7985 \pm 0.0139$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.7879 \pm 0.0161$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.7890 \pm 0.0130$} & {\footnotesize$0.7937 \pm 0.0103$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.7839 \pm 0.0169$} & {\footnotesize$0.7917 \pm 0.0143$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{phoneme \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.8525 \pm 0.0126$} & {\footnotesize$0.8635 \pm 0.0099$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.8684 \pm 0.0050$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.8456 \pm 0.0121$} & {\footnotesize$0.8504 \pm 0.0066$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.8342 \pm 0.0151$} & {\footnotesize$0.8543 \pm 0.0118$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.8596 \pm 0.0124$} & {\footnotesize$0.8687 \pm 0.0080$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.8465 \pm 0.0205$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.8623 \pm 0.0138$} & {\footnotesize$0.8754 \pm 0.0095$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.8629 \pm 0.0123$} & {\footnotesize$0.8757 \pm 0.0095$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.8551 \pm 0.0092$} & {\footnotesize$0.8711 \pm 0.0081$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.8657 \pm 0.0130$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.8667 \pm 0.0127$} & {\footnotesize$0.8795 \pm 0.0093$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.8672 \pm 0.0166$} & {\footnotesize$0.8765 \pm 0.0141$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.8742 \pm 0.0120$} & {\footnotesize$0.8861 \pm 0.0071$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.8757 \pm 0.0118$} & {\footnotesize$0.8856 \pm 0.0065$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.8647 \pm 0.0098$} & {\footnotesize$0.8761 \pm 0.0076$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8682 \pm 0.0174$} & {\footnotesize$0.8771 \pm 0.0156$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.8702 \pm 0.0129$} & {\footnotesize$0.8733 \pm 0.0126$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8827 \pm 0.0117$} & {\footnotesize$0.8897 \pm 0.0055$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.8781 \pm 0.0096$} & {\footnotesize$0.8840 \pm 0.0054$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8772 \pm 0.0087$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.8835 \pm 0.0079$} & {\footnotesize$0.8861 \pm 0.0057$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8828 \pm 0.0082$} & {\footnotesize$0.8925 \pm 0.0056$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8701 \pm 0.0167$} & {\footnotesize$0.8766 \pm 0.0128$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8831 \pm 0.0121$} & {\footnotesize$0.8880 \pm 0.0108$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.8762 \pm 0.0144$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.8803 \pm 0.0098$} & {\footnotesize$0.8842 \pm 0.0067$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8780 \pm 0.0119$} & {\footnotesize$0.8817 \pm 0.0101$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{analcatdata\_supreme \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.0782 \pm 0.0081$} & {\footnotesize$0.0766 \pm 0.0090$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.0852 \pm 0.0076$} & {\footnotesize$0.0823 \pm 0.0078$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.0811 \pm 0.0137$} & {\footnotesize$0.0759 \pm 0.0086$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.0826 \pm 0.0096$} & {\footnotesize$0.0779 \pm 0.0098$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.0782 \pm 0.0095$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.0783 \pm 0.0078$} & {\footnotesize$0.0768 \pm 0.0083$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.0770 \pm 0.0082$} & {\footnotesize$0.0759 \pm 0.0081$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.0796 \pm 0.0101$} & {\footnotesize$0.0776 \pm 0.0101$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.0773 \pm 0.0078$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.0787 \pm 0.0086$} & {\footnotesize$0.0775 \pm 0.0091$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.0775 \pm 0.0081$} & {\footnotesize$0.0763 \pm 0.0084$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.0798 \pm 0.0088$} & {\footnotesize$0.0769 \pm 0.0092$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.0786 \pm 0.0073$} & {\footnotesize$0.0720 \pm 0.0053$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.0774 \pm 0.0064$} & {\footnotesize$0.0759 \pm 0.0063$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.0801 \pm 0.0126$} & {\footnotesize$0.0774 \pm 0.0107$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.0778 \pm 0.0115$} & {\footnotesize$0.0767 \pm 0.0110$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.0780 \pm 0.0067$} & {\footnotesize$0.0734 \pm 0.0022$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.0803 \pm 0.0066$} & {\footnotesize$0.0759 \pm 0.0046$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.0807 \pm 0.0088$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.0809 \pm 0.0072$} & {\footnotesize$0.0784 \pm 0.0062$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.0825 \pm 0.0090$} & {\footnotesize$0.0793 \pm 0.0072$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.0777 \pm 0.0099$} & {\footnotesize$0.0769 \pm 0.0105$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.0786 \pm 0.0055$} & {\footnotesize$0.0781 \pm 0.0054$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.0808 \pm 0.0063$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.0773 \pm 0.0077$} & {\footnotesize$0.0763 \pm 0.0077$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.0764 \pm 0.0071$} & {\footnotesize$0.0749 \pm 0.0076$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{Mercedes\_Benz\_Greener\_Manufacturing \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$8.3045 \pm 0.8708$} & {\footnotesize$8.2682 \pm 0.8992$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$8.4434 \pm 0.7982$} & {\footnotesize$8.3178 \pm 0.8482$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$8.3540 \pm 0.8314$} & {\footnotesize$8.3021 \pm 0.8579$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$8.2718 \pm 0.8152$} & {\footnotesize$8.2236 \pm 0.8479$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$8.3409 \pm 0.9840$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$8.4001 \pm 0.9256$} & {\footnotesize$8.3237 \pm 0.9658$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$8.2860 \pm 0.8656$} & {\footnotesize$8.2398 \pm 0.9023$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$8.2244 \pm 0.8514$} & {\footnotesize$8.1918 \pm 0.9387$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$8.3556 \pm 0.9566$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$8.2252 \pm 0.8617$} & {\footnotesize$8.1616 \pm 0.8834$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$8.2120 \pm 0.8485$} & {\footnotesize$8.1654 \pm 0.9339$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$8.3045 \pm 0.8708$} & {\footnotesize$8.2682 \pm 0.8992$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$8.3045 \pm 0.8708$} & {\footnotesize$8.2682 \pm 0.8992$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$8.3045 \pm 0.8708$} & {\footnotesize$8.2682 \pm 0.8992$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$8.2177 \pm 0.8175$} & {\footnotesize$8.2092 \pm 0.8458$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$8.2078 \pm 0.8231$} & {\footnotesize$8.1618 \pm 0.8566$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$8.1629 \pm 0.8193$} & {\footnotesize$8.1554 \pm 0.8439$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$8.3506 \pm 0.8149$} & {\footnotesize$8.2694 \pm 0.8399$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$8.3187 \pm 0.8186$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$8.2557 \pm 0.8602$} & {\footnotesize$8.1771 \pm 0.8710$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$8.2557 \pm 0.8602$} & {\footnotesize$8.1771 \pm 0.8710$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$8.2215 \pm 0.8940$} & {\footnotesize$8.1995 \pm 0.9130$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$8.2052 \pm 0.9043$} & {\footnotesize$8.1965 \pm 0.9306$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$8.2235 \pm 0.8867$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$8.2075 \pm 0.9185$} & {\footnotesize$8.1986 \pm 0.9442$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$8.2075 \pm 0.9185$} & {\footnotesize$8.1986 \pm 0.9442$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{KDDCup09\_upselling \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.7759 \pm 0.0137$} & {\footnotesize$0.7806 \pm 0.0125$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.7811 \pm 0.0124$} & {\footnotesize$0.7861 \pm 0.0109$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.7850 \pm 0.0161$} & {\footnotesize$0.7884 \pm 0.0135$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.7884 \pm 0.0122$} & {\footnotesize$0.7940 \pm 0.0116$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.7994 \pm 0.0055$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.8004 \pm 0.0075$} & {\footnotesize$0.8037 \pm 0.0063$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.7979 \pm 0.0105$} & {\footnotesize$0.8010 \pm 0.0094$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.7903 \pm 0.0074$} & {\footnotesize$0.7939 \pm 0.0099$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.7942 \pm 0.0112$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.7957 \pm 0.0127$} & {\footnotesize$0.7960 \pm 0.0139$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.8037 \pm 0.0100$} & {\footnotesize$0.7988 \pm 0.0084$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.7962 \pm 0.0093$} & {\footnotesize$0.7995 \pm 0.0105$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.8005 \pm 0.0097$} & {\footnotesize$0.8032 \pm 0.0117$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.7925 \pm 0.0123$} & {\footnotesize$0.7963 \pm 0.0089$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.7930 \pm 0.0108$} & {\footnotesize$0.7950 \pm 0.0102$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.7932 \pm 0.0119$} & {\footnotesize$0.7969 \pm 0.0115$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.7992 \pm 0.0117$} & {\footnotesize$0.8010 \pm 0.0121$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.7838 \pm 0.0136$} & {\footnotesize$0.7859 \pm 0.0167$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.7908 \pm 0.0123$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.7939 \pm 0.0097$} & {\footnotesize$0.7989 \pm 0.0115$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.7960 \pm 0.0131$} & {\footnotesize$0.8008 \pm 0.0110$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8002 \pm 0.0103$} & {\footnotesize$0.8021 \pm 0.0074$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8024 \pm 0.0111$} & {\footnotesize$0.8054 \pm 0.0123$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.7988 \pm 0.0118$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.7971 \pm 0.0117$} & {\footnotesize$0.7982 \pm 0.0107$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8024 \pm 0.0075$} & {\footnotesize$0.8035 \pm 0.0088$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{kdd\_ipums\_la\_97-small \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.8828 \pm 0.0061$} & {\footnotesize$0.8845 \pm 0.0055$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.8578 \pm 0.0046$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.8823 \pm 0.0070$} & {\footnotesize$0.8824 \pm 0.0060$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.8770 \pm 0.0072$} & {\footnotesize$0.8824 \pm 0.0068$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.8722 \pm 0.0093$} & {\footnotesize$0.8733 \pm 0.0083$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.8847 \pm 0.0070$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.8808 \pm 0.0083$} & {\footnotesize$0.8830 \pm 0.0081$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.8762 \pm 0.0100$} & {\footnotesize$0.8770 \pm 0.0088$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.8803 \pm 0.0054$} & {\footnotesize$0.8823 \pm 0.0071$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.8837 \pm 0.0055$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.8795 \pm 0.0077$} & {\footnotesize$0.8792 \pm 0.0062$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.8833 \pm 0.0054$} & {\footnotesize$0.8841 \pm 0.0062$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.8765 \pm 0.0108$} & {\footnotesize$0.8765 \pm 0.0108$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.8816 \pm 0.0057$} & {\footnotesize$0.8818 \pm 0.0048$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.8757 \pm 0.0101$} & {\footnotesize$0.8756 \pm 0.0104$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8825 \pm 0.0089$} & {\footnotesize$0.8835 \pm 0.0085$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.8792 \pm 0.0075$} & {\footnotesize$0.8802 \pm 0.0067$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8793 \pm 0.0088$} & {\footnotesize$0.8803 \pm 0.0100$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.8798 \pm 0.0081$} & {\footnotesize$0.8819 \pm 0.0078$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8831 \pm 0.0050$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.8819 \pm 0.0054$} & {\footnotesize$0.8832 \pm 0.0048$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8837 \pm 0.0062$} & {\footnotesize$0.8860 \pm 0.0059$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8845 \pm 0.0063$} & {\footnotesize$0.8848 \pm 0.0070$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8823 \pm 0.0079$} & {\footnotesize$0.8825 \pm 0.0071$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.8818 \pm 0.0082$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.8784 \pm 0.0123$} & {\footnotesize$0.8786 \pm 0.0133$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8779 \pm 0.0094$} & {\footnotesize$0.8784 \pm 0.0108$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{wine\_quality \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.6707 \pm 0.0178$} & {\footnotesize$0.6530 \pm 0.0152$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.6687 \pm 0.0166$} & {\footnotesize$0.6543 \pm 0.0170$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.7010 \pm 0.0171$} & {\footnotesize$0.6699 \pm 0.0139$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.6604 \pm 0.0174$} & {\footnotesize$0.6245 \pm 0.0140$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.6605 \pm 0.0153$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.6840 \pm 0.0126$} & {\footnotesize$0.6478 \pm 0.0146$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.6672 \pm 0.0263$} & {\footnotesize$0.6294 \pm 0.0200$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.6881 \pm 0.0182$} & {\footnotesize$0.6664 \pm 0.0179$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.6797 \pm 0.0161$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.6787 \pm 0.0149$} & {\footnotesize$0.6564 \pm 0.0250$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.6783 \pm 0.0170$} & {\footnotesize$0.6570 \pm 0.0273$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.6569 \pm 0.0167$} & {\footnotesize$0.6328 \pm 0.0155$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.6532 \pm 0.0133$} & {\footnotesize$0.6336 \pm 0.0140$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.6721 \pm 0.0180$} & {\footnotesize$0.6463 \pm 0.0262$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.6039 \pm 0.0134$} & {\footnotesize$0.6025 \pm 0.0139$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.6135 \pm 0.0138$} & {\footnotesize$0.6122 \pm 0.0144$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.6088 \pm 0.0132$} & {\footnotesize$0.6060 \pm 0.0137$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.6315 \pm 0.0097$} & {\footnotesize$0.6197 \pm 0.0096$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.6412 \pm 0.0105$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.6154 \pm 0.0083$} & {\footnotesize$0.6058 \pm 0.0149$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.6099 \pm 0.0144$} & {\footnotesize$0.6028 \pm 0.0157$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.6169 \pm 0.0123$} & {\footnotesize$0.6131 \pm 0.0126$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.6328 \pm 0.0172$} & {\footnotesize$0.6297 \pm 0.0180$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.6369 \pm 0.0179$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.6314 \pm 0.0142$} & {\footnotesize$0.6272 \pm 0.0146$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.6294 \pm 0.0120$} & {\footnotesize$0.6241 \pm 0.0118$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{isolet \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$2.2744 \pm 0.2203$} & {\footnotesize$2.0018 \pm 0.1111$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$2.2077 \pm 0.2248$} & {\footnotesize$1.9206 \pm 0.1478$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$2.2449 \pm 0.1579$} & {\footnotesize$2.0176 \pm 0.0770$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$2.4269 \pm 0.2382$} & {\footnotesize$2.1142 \pm 0.1262$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$2.6219 \pm 0.0315$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$2.6130 \pm 0.1658$} & {\footnotesize$2.3308 \pm 0.1088$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$2.3344 \pm 0.2073$} & {\footnotesize$2.0915 \pm 0.1159$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$2.8691 \pm 0.0882$} & {\footnotesize$2.5989 \pm 0.0664$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$2.7696 \pm 0.0200$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$2.4879 \pm 0.2524$} & {\footnotesize$2.1501 \pm 0.1506$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$2.2867 \pm 0.2489$} & {\footnotesize$1.9179 \pm 0.1530$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$2.2719 \pm 0.1006$} & {\footnotesize$2.1026 \pm 0.1088$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$2.1832 \pm 0.1124$} & {\footnotesize$2.0775 \pm 0.0805$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$2.0979 \pm 0.1779$} & {\footnotesize$1.9283 \pm 0.1334$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$2.7567 \pm 0.0470$} & {\footnotesize$2.7294 \pm 0.0366$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$2.7005 \pm 0.0296$} & {\footnotesize$2.6903 \pm 0.0290$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$2.8847 \pm 0.0227$} & {\footnotesize$2.8574 \pm 0.0148$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$1.9760 \pm 0.1738$} & {\footnotesize$1.7627 \pm 0.1520$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$1.9919 \pm 0.1813$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$1.7905 \pm 0.1594$} & {\footnotesize$1.6205 \pm 0.1676$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$1.8912 \pm 0.1851$} & {\footnotesize$1.7147 \pm 0.1348$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$1.8831 \pm 0.1194$} & {\footnotesize$1.8578 \pm 0.1088$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$1.8433 \pm 0.1196$} & {\footnotesize$1.8230 \pm 0.1197$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$1.9091 \pm 0.1345$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$1.9421 \pm 0.0971$} & {\footnotesize$1.9013 \pm 0.0813$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$1.7799 \pm 0.0859$} & {\footnotesize$1.7560 \pm 0.0795$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{cpu\_act \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$2.6814 \pm 0.2291$} & {\footnotesize$2.4953 \pm 0.1150$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$2.3933 \pm 0.0641$} & {\footnotesize$2.3005 \pm 0.0397$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$2.7868 \pm 0.1999$} & {\footnotesize$2.4884 \pm 0.0327$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$2.5811 \pm 0.1480$} & {\footnotesize$2.3863 \pm 0.0324$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$2.2133 \pm 0.0221$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$2.2537 \pm 0.0536$} & {\footnotesize$2.1708 \pm 0.0349$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$2.3079 \pm 0.0829$} & {\footnotesize$2.1831 \pm 0.0470$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$2.3094 \pm 0.2401$} & {\footnotesize$2.1411 \pm 0.0767$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$2.2781 \pm 0.0630$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$2.2394 \pm 0.0508$} & {\footnotesize$2.1494 \pm 0.0268$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$2.2111 \pm 0.0413$} & {\footnotesize$2.1330 \pm 0.0316$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$2.2730 \pm 0.0457$} & {\footnotesize$2.1899 \pm 0.0419$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$2.2671 \pm 0.0383$} & {\footnotesize$2.1940 \pm 0.0433$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$2.3309 \pm 0.0719$} & {\footnotesize$2.2516 \pm 0.0574$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$2.5237 \pm 0.3530$} & {\footnotesize$2.4723 \pm 0.3789$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$2.2223 \pm 0.0894$} & {\footnotesize$2.2067 \pm 0.0916$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$2.1239 \pm 0.0489$} & {\footnotesize$2.1092 \pm 0.0499$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$2.2980 \pm 0.0529$} & {\footnotesize$2.2228 \pm 0.0501$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$2.1278 \pm 0.0783$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$2.2603 \pm 0.0479$} & {\footnotesize$2.2339 \pm 0.0508$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$2.2105 \pm 0.0483$} & {\footnotesize$2.1396 \pm 0.0474$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$2.1940 \pm 0.0523$} & {\footnotesize$2.1677 \pm 0.0487$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$2.1402 \pm 0.0588$} & {\footnotesize$2.1265 \pm 0.0580$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$2.1549 \pm 0.0626$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$2.1638 \pm 0.0420$} & {\footnotesize$2.1508 \pm 0.0416$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$2.1391 \pm 0.0542$} & {\footnotesize$2.1221 \pm 0.0570$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{bank-marketing \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.7860 \pm 0.0057$} & {\footnotesize$0.7887 \pm 0.0052$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.7894 \pm 0.0091$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.7921 \pm 0.0076$} & {\footnotesize$0.7932 \pm 0.0066$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.7859 \pm 0.0068$} & {\footnotesize$0.7917 \pm 0.0078$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.7836 \pm 0.0074$} & {\footnotesize$0.7882 \pm 0.0054$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.7975 \pm 0.0080$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.7917 \pm 0.0071$} & {\footnotesize$0.7956 \pm 0.0058$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.7954 \pm 0.0059$} & {\footnotesize$0.8001 \pm 0.0048$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.7957 \pm 0.0090$} & {\footnotesize$0.7985 \pm 0.0106$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.7953 \pm 0.0058$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.7918 \pm 0.0076$} & {\footnotesize$0.7951 \pm 0.0071$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.7918 \pm 0.0058$} & {\footnotesize$0.7955 \pm 0.0047$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.7947 \pm 0.0101$} & {\footnotesize$0.7977 \pm 0.0117$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.7988 \pm 0.0092$} & {\footnotesize$0.8024 \pm 0.0093$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.7981 \pm 0.0065$} & {\footnotesize$0.8008 \pm 0.0057$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8013 \pm 0.0081$} & {\footnotesize$0.8030 \pm 0.0076$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.8006 \pm 0.0078$} & {\footnotesize$0.8013 \pm 0.0072$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8026 \pm 0.0068$} & {\footnotesize$0.8056 \pm 0.0082$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.7995 \pm 0.0054$} & {\footnotesize$0.8015 \pm 0.0037$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8023 \pm 0.0088$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.7961 \pm 0.0065$} & {\footnotesize$0.8003 \pm 0.0077$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.7977 \pm 0.0081$} & {\footnotesize$0.8010 \pm 0.0084$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.7908 \pm 0.0068$} & {\footnotesize$0.7915 \pm 0.0068$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.7944 \pm 0.0060$} & {\footnotesize$0.7944 \pm 0.0052$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.7935 \pm 0.0064$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.7941 \pm 0.0055$} & {\footnotesize$0.7943 \pm 0.0045$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.7989 \pm 0.0086$} & {\footnotesize$0.8002 \pm 0.0074$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{Brazilian\_houses \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.0473 \pm 0.0179$} & {\footnotesize$0.0440 \pm 0.0207$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.0505 \pm 0.0181$} & {\footnotesize$0.0458 \pm 0.0207$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.0477 \pm 0.0172$} & {\footnotesize$0.0427 \pm 0.0207$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.0630 \pm 0.0162$} & {\footnotesize$0.0556 \pm 0.0175$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.0404 \pm 0.0266$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.0470 \pm 0.0192$} & {\footnotesize$0.0437 \pm 0.0217$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.0513 \pm 0.0234$} & {\footnotesize$0.0484 \pm 0.0262$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.0450 \pm 0.0156$} & {\footnotesize$0.0418 \pm 0.0190$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.0479 \pm 0.0205$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.0438 \pm 0.0181$} & {\footnotesize$0.0412 \pm 0.0204$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.0468 \pm 0.0165$} & {\footnotesize$0.0436 \pm 0.0211$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.0426 \pm 0.0180$} & {\footnotesize$0.0397 \pm 0.0206$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.0437 \pm 0.0203$} & {\footnotesize$0.0407 \pm 0.0230$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.0421 \pm 0.0209$} & {\footnotesize$0.0409 \pm 0.0226$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.0541 \pm 0.0270$} & {\footnotesize$0.0535 \pm 0.0287$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.0603 \pm 0.0249$} & {\footnotesize$0.0589 \pm 0.0271$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.0468 \pm 0.0312$} & {\footnotesize$0.0456 \pm 0.0332$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.0490 \pm 0.0152$} & {\footnotesize$0.0454 \pm 0.0170$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.0451 \pm 0.0163$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.0527 \pm 0.0157$} & {\footnotesize$0.0509 \pm 0.0180$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.0553 \pm 0.0192$} & {\footnotesize$0.0511 \pm 0.0191$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.0443 \pm 0.0213$} & {\footnotesize$0.0431 \pm 0.0233$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.0417 \pm 0.0208$} & {\footnotesize$0.0413 \pm 0.0222$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.0424 \pm 0.0201$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.0433 \pm 0.0232$} & {\footnotesize$0.0428 \pm 0.0247$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.0416 \pm 0.0215$} & {\footnotesize$0.0406 \pm 0.0230$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{MagicTelescope \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.8539 \pm 0.0060$} & {\footnotesize$0.8566 \pm 0.0061$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.8579 \pm 0.0064$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.8589 \pm 0.0068$} & {\footnotesize$0.8651 \pm 0.0049$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.8432 \pm 0.0074$} & {\footnotesize$0.8490 \pm 0.0046$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.8536 \pm 0.0052$} & {\footnotesize$0.8567 \pm 0.0047$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.8605 \pm 0.0102$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.8522 \pm 0.0056$} & {\footnotesize$0.8560 \pm 0.0034$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.8571 \pm 0.0080$} & {\footnotesize$0.8624 \pm 0.0044$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.8480 \pm 0.0090$} & {\footnotesize$0.8543 \pm 0.0075$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.8595 \pm 0.0060$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.8588 \pm 0.0046$} & {\footnotesize$0.8643 \pm 0.0037$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.8553 \pm 0.0055$} & {\footnotesize$0.8595 \pm 0.0051$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.8591 \pm 0.0061$} & {\footnotesize$0.8626 \pm 0.0044$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.8575 \pm 0.0056$} & {\footnotesize$0.8605 \pm 0.0051$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.8593 \pm 0.0054$} & {\footnotesize$0.8621 \pm 0.0037$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8550 \pm 0.0094$} & {\footnotesize$0.8589 \pm 0.0110$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.8547 \pm 0.0085$} & {\footnotesize$0.8556 \pm 0.0086$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8586 \pm 0.0070$} & {\footnotesize$0.8588 \pm 0.0077$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.8682 \pm 0.0058$} & {\footnotesize$0.8729 \pm 0.0038$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8641 \pm 0.0052$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.8602 \pm 0.0061$} & {\footnotesize$0.8628 \pm 0.0041$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8622 \pm 0.0085$} & {\footnotesize$0.8681 \pm 0.0064$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8607 \pm 0.0058$} & {\footnotesize$0.8622 \pm 0.0050$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8622 \pm 0.0049$} & {\footnotesize$0.8631 \pm 0.0046$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.8600 \pm 0.0055$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.8606 \pm 0.0055$} & {\footnotesize$0.8618 \pm 0.0049$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8644 \pm 0.0088$} & {\footnotesize$0.8673 \pm 0.0075$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{Ailerons \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.0002 \pm 0.0000$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.0002 \pm 0.0000$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.0002 \pm 0.0000$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.0002 \pm 0.0000$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.0002 \pm 0.0000$} & {\footnotesize$0.0002 \pm 0.0000$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{MiamiHousing2016 \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.1614 \pm 0.0033$} & {\footnotesize$0.1574 \pm 0.0043$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.1548 \pm 0.0030$} & {\footnotesize$0.1511 \pm 0.0027$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.1683 \pm 0.0099$} & {\footnotesize$0.1575 \pm 0.0047$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.1618 \pm 0.0029$} & {\footnotesize$0.1557 \pm 0.0021$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.1478 \pm 0.0028$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.1537 \pm 0.0035$} & {\footnotesize$0.1478 \pm 0.0027$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.1527 \pm 0.0037$} & {\footnotesize$0.1479 \pm 0.0033$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.1519 \pm 0.0038$} & {\footnotesize$0.1442 \pm 0.0022$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.1507 \pm 0.0022$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.1514 \pm 0.0029$} & {\footnotesize$0.1462 \pm 0.0031$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.1523 \pm 0.0023$} & {\footnotesize$0.1478 \pm 0.0024$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.1514 \pm 0.0025$} & {\footnotesize$0.1479 \pm 0.0017$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.1512 \pm 0.0019$} & {\footnotesize$0.1470 \pm 0.0024$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.1461 \pm 0.0015$} & {\footnotesize$0.1433 \pm 0.0022$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.1440 \pm 0.0029$} & {\footnotesize$0.1434 \pm 0.0029$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.1461 \pm 0.0025$} & {\footnotesize$0.1455 \pm 0.0030$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.1417 \pm 0.0021$} & {\footnotesize$0.1408 \pm 0.0026$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.1417 \pm 0.0025$} & {\footnotesize$0.1390 \pm 0.0020$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.1392 \pm 0.0023$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.1503 \pm 0.0040$} & {\footnotesize$0.1477 \pm 0.0032$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.1475 \pm 0.0031$} & {\footnotesize$0.1438 \pm 0.0024$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.1483 \pm 0.0030$} & {\footnotesize$0.1465 \pm 0.0029$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.1478 \pm 0.0012$} & {\footnotesize$0.1471 \pm 0.0011$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.1482 \pm 0.0012$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.1481 \pm 0.0021$} & {\footnotesize$0.1471 \pm 0.0020$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.1408 \pm 0.0019$} & {\footnotesize$0.1399 \pm 0.0018$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{OnlineNewsPopularity \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.8643 \pm 0.0007$} & {\footnotesize$0.8632 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.8665 \pm 0.0011$} & {\footnotesize$0.8639 \pm 0.0000$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.8714 \pm 0.0013$} & {\footnotesize$0.8648 \pm 0.0004$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.8692 \pm 0.0015$} & {\footnotesize$0.8665 \pm 0.0005$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.8623 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.8636 \pm 0.0022$} & {\footnotesize$0.8596 \pm 0.0008$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.8615 \pm 0.0008$} & {\footnotesize$0.8598 \pm 0.0004$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.8605 \pm 0.0024$} & {\footnotesize$0.8556 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.8600 \pm 0.0007$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.8629 \pm 0.0019$} & {\footnotesize$0.8603 \pm 0.0000$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.8632 \pm 0.0009$} & {\footnotesize$0.8572 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.8604 \pm 0.0009$} & {\footnotesize$0.8591 \pm 0.0004$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.8594 \pm 0.0004$} & {\footnotesize$0.8585 \pm 0.0001$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.8585 \pm 0.0003$} & {\footnotesize$0.8581 \pm 0.0001$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8545 \pm 0.0002$} & {\footnotesize$0.8543 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.8546 \pm 0.0002$} & {\footnotesize$0.8544 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8532 \pm 0.0003$} & {\footnotesize$0.8527 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.8677 \pm 0.0013$} & {\footnotesize$0.8633 \pm 0.0009$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8624 \pm 0.0011$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.8651 \pm 0.0003$} & {\footnotesize$0.8650 \pm 0.0002$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8647 \pm 0.0010$} & {\footnotesize$0.8624 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8584 \pm 0.0003$} & {\footnotesize$0.8581 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8579 \pm 0.0003$} & {\footnotesize$0.8575 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.8579 \pm 0.0004$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.8588 \pm 0.0004$} & {\footnotesize$0.8581 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8563 \pm 0.0004$} & {\footnotesize$0.8558 \pm 0.0002$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{credit \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.7735 \pm 0.0042$} & {\footnotesize$0.7729 \pm 0.0047$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.7636 \pm 0.0045$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.7721 \pm 0.0033$} & {\footnotesize$0.7738 \pm 0.0027$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.7703 \pm 0.0034$} & {\footnotesize$0.7746 \pm 0.0026$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.7712 \pm 0.0045$} & {\footnotesize$0.7716 \pm 0.0059$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.7740 \pm 0.0006$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.7737 \pm 0.0050$} & {\footnotesize$0.7765 \pm 0.0058$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.7748 \pm 0.0038$} & {\footnotesize$0.7768 \pm 0.0059$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.7724 \pm 0.0038$} & {\footnotesize$0.7740 \pm 0.0069$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.7739 \pm 0.0052$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.7745 \pm 0.0041$} & {\footnotesize$0.7767 \pm 0.0040$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.7744 \pm 0.0046$} & {\footnotesize$0.7762 \pm 0.0057$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.7749 \pm 0.0055$} & {\footnotesize$0.7767 \pm 0.0075$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.7734 \pm 0.0034$} & {\footnotesize$0.7747 \pm 0.0043$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.7758 \pm 0.0040$} & {\footnotesize$0.7772 \pm 0.0055$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.7698 \pm 0.0027$} & {\footnotesize$0.7706 \pm 0.0029$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.7686 \pm 0.0028$} & {\footnotesize$0.7726 \pm 0.0034$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.7734 \pm 0.0035$} & {\footnotesize$0.7752 \pm 0.0038$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.7730 \pm 0.0043$} & {\footnotesize$0.7740 \pm 0.0040$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.7723 \pm 0.0037$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.7739 \pm 0.0032$} & {\footnotesize$0.7757 \pm 0.0026$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.7734 \pm 0.0045$} & {\footnotesize$0.7754 \pm 0.0040$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.7751 \pm 0.0042$} & {\footnotesize$0.7755 \pm 0.0049$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.7760 \pm 0.0043$} & {\footnotesize$0.7771 \pm 0.0044$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.7754 \pm 0.0045$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.7752 \pm 0.0047$} & {\footnotesize$0.7754 \pm 0.0048$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.7761 \pm 0.0033$} & {\footnotesize$0.7760 \pm 0.0028$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{elevators \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.0020 \pm 0.0001$} & {\footnotesize$0.0019 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0019 \pm 0.0000$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0019 \pm 0.0000$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.0020 \pm 0.0001$} & {\footnotesize$0.0019 \pm 0.0000$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.0018 \pm 0.0000$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.0018 \pm 0.0000$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.0018 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.0018 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.0020 \pm 0.0000$} & {\footnotesize$0.0020 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.0020 \pm 0.0000$} & {\footnotesize$0.0020 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.0020 \pm 0.0000$} & {\footnotesize$0.0019 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.0049 \pm 0.0000$} & {\footnotesize$0.0049 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.0019 \pm 0.0001$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0019 \pm 0.0000$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.0018 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.0019 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.0018 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.0018 \pm 0.0000$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.0018 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.0018 \pm 0.0000$} & {\footnotesize$0.0018 \pm 0.0000$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{fifa \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.8038 \pm 0.0124$} & {\footnotesize$0.8011 \pm 0.0143$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.8025 \pm 0.0140$} & {\footnotesize$0.7985 \pm 0.0149$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.8046 \pm 0.0135$} & {\footnotesize$0.7993 \pm 0.0129$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.8074 \pm 0.0140$} & {\footnotesize$0.8031 \pm 0.0147$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.7880 \pm 0.0180$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.7923 \pm 0.0128$} & {\footnotesize$0.7886 \pm 0.0127$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.7936 \pm 0.0119$} & {\footnotesize$0.7903 \pm 0.0133$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.7909 \pm 0.0111$} & {\footnotesize$0.7862 \pm 0.0161$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.7901 \pm 0.0118$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.7928 \pm 0.0132$} & {\footnotesize$0.7888 \pm 0.0130$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.7928 \pm 0.0139$} & {\footnotesize$0.7904 \pm 0.0183$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.7940 \pm 0.0118$} & {\footnotesize$0.7898 \pm 0.0141$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.7907 \pm 0.0092$} & {\footnotesize$0.7870 \pm 0.0096$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.7806 \pm 0.0104$} & {\footnotesize$0.7800 \pm 0.0114$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.7800 \pm 0.0108$} & {\footnotesize$0.7795 \pm 0.0114$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.7806 \pm 0.0120$} & {\footnotesize$0.7787 \pm 0.0122$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.7835 \pm 0.0116$} & {\footnotesize$0.7817 \pm 0.0114$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.7902 \pm 0.0119$} & {\footnotesize$0.7863 \pm 0.0120$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.7914 \pm 0.0136$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.7967 \pm 0.0138$} & {\footnotesize$0.7933 \pm 0.0145$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.7909 \pm 0.0107$} & {\footnotesize$0.7866 \pm 0.0106$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.7974 \pm 0.0144$} & {\footnotesize$0.7954 \pm 0.0160$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.7953 \pm 0.0135$} & {\footnotesize$0.7942 \pm 0.0148$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.7948 \pm 0.0135$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.7938 \pm 0.0156$} & {\footnotesize$0.7920 \pm 0.0176$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.7771 \pm 0.0107$} & {\footnotesize$0.7761 \pm 0.0117$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{house\_sales \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.1790 \pm 0.0009$} & {\footnotesize$0.1763 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.1755 \pm 0.0014$} & {\footnotesize$0.1738 \pm 0.0006$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.1862 \pm 0.0032$} & {\footnotesize$0.1778 \pm 0.0015$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.1800 \pm 0.0008$} & {\footnotesize$0.1770 \pm 0.0004$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.1667 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.1700 \pm 0.0014$} & {\footnotesize$0.1670 \pm 0.0008$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.1704 \pm 0.0007$} & {\footnotesize$0.1690 \pm 0.0005$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.1713 \pm 0.0010$} & {\footnotesize$0.1668 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.1713 \pm 0.0015$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.1690 \pm 0.0010$} & {\footnotesize$0.1659 \pm 0.0004$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.1689 \pm 0.0010$} & {\footnotesize$0.1664 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.1699 \pm 0.0008$} & {\footnotesize$0.1687 \pm 0.0007$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.1690 \pm 0.0005$} & {\footnotesize$0.1676 \pm 0.0003$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.1687 \pm 0.0004$} & {\footnotesize$0.1681 \pm 0.0001$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.1694 \pm 0.0003$} & {\footnotesize$0.1689 \pm 0.0001$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.1692 \pm 0.0004$} & {\footnotesize$0.1686 \pm 0.0001$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.1669 \pm 0.0001$} & {\footnotesize$0.1667 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.1689 \pm 0.0009$} & {\footnotesize$0.1657 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.1636 \pm 0.0009$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.1737 \pm 0.0013$} & {\footnotesize$0.1714 \pm 0.0005$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.1694 \pm 0.0007$} & {\footnotesize$0.1670 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.1692 \pm 0.0011$} & {\footnotesize$0.1680 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.1666 \pm 0.0003$} & {\footnotesize$0.1662 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.1667 \pm 0.0003$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.1673 \pm 0.0004$} & {\footnotesize$0.1668 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.1652 \pm 0.0003$} & {\footnotesize$0.1644 \pm 0.0001$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{medical\_charges \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.0816 \pm 0.0001$} & {\footnotesize$0.0814 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.0824 \pm 0.0003$} & {\footnotesize$0.0817 \pm 0.0001$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.0818 \pm 0.0003$} & {\footnotesize$0.0815 \pm 0.0001$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.0827 \pm 0.0006$} & {\footnotesize$0.0817 \pm 0.0001$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.0812 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.0822 \pm 0.0007$} & {\footnotesize$0.0814 \pm 0.0001$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.0814 \pm 0.0002$} & {\footnotesize$0.0811 \pm 0.0000$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.0817 \pm 0.0004$} & {\footnotesize$0.0813 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.0814 \pm 0.0002$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.0814 \pm 0.0002$} & {\footnotesize$0.0812 \pm 0.0000$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.0813 \pm 0.0002$} & {\footnotesize$0.0811 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.0812 \pm 0.0002$} & {\footnotesize$0.0810 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.0812 \pm 0.0001$} & {\footnotesize$0.0809 \pm 0.0001$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.0812 \pm 0.0000$} & {\footnotesize$0.0811 \pm 0.0000$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.0825 \pm 0.0001$} & {\footnotesize$0.0825 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.0820 \pm 0.0000$} & {\footnotesize$0.0820 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.0816 \pm 0.0000$} & {\footnotesize$0.0815 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.0815 \pm 0.0002$} & {\footnotesize$0.0812 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.0811 \pm 0.0001$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.0811 \pm 0.0001$} & {\footnotesize$0.0810 \pm 0.0000$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.0809 \pm 0.0000$} & {\footnotesize$0.0808 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.0813 \pm 0.0001$} & {\footnotesize$0.0812 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.0812 \pm 0.0000$} & {\footnotesize$0.0812 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.0812 \pm 0.0000$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.0813 \pm 0.0000$} & {\footnotesize$0.0813 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.0811 \pm 0.0001$} & {\footnotesize$0.0811 \pm 0.0000$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{pol \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$5.5244 \pm 0.5768$} & {\footnotesize$4.9945 \pm 0.5923$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$6.3739 \pm 0.6286$} & {\footnotesize$5.8181 \pm 0.6054$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$6.5374 \pm 0.9479$} & {\footnotesize$5.1814 \pm 0.7775$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$6.1816 \pm 0.7366$} & {\footnotesize$5.5959 \pm 0.8243$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$3.2337 \pm 0.0605$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$3.3295 \pm 0.3379$} & {\footnotesize$2.7999 \pm 0.1776$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$3.2011 \pm 0.2921$} & {\footnotesize$2.8698 \pm 0.2577$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$3.0682 \pm 0.2389$} & {\footnotesize$2.5816 \pm 0.0368$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$2.7203 \pm 0.1858$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$2.6974 \pm 0.1666$} & {\footnotesize$2.3718 \pm 0.0724$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$2.9539 \pm 0.1994$} & {\footnotesize$2.6282 \pm 0.0730$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$2.8239 \pm 0.2173$} & {\footnotesize$2.5266 \pm 0.0605$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$2.5452 \pm 0.1221$} & {\footnotesize$2.3700 \pm 0.0867$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$2.4958 \pm 0.1292$} & {\footnotesize$2.3651 \pm 0.1223$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$4.2963 \pm 0.0644$} & {\footnotesize$4.2548 \pm 0.0488$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$4.2320 \pm 0.3369$} & {\footnotesize$4.1880 \pm 0.3110$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$3.6320 \pm 0.1006$} & {\footnotesize$3.5505 \pm 0.0896$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$6.0708 \pm 0.5368$} & {\footnotesize$5.5578 \pm 0.4036$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$2.5770 \pm 0.1689$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$5.7878 \pm 0.4884$} & {\footnotesize$5.3773 \pm 0.5463$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$2.9083 \pm 0.1364$} & {\footnotesize$2.6717 \pm 0.0530$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$3.3595 \pm 0.4017$} & {\footnotesize$3.2130 \pm 0.3979$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$3.0198 \pm 0.2975$} & {\footnotesize$2.9595 \pm 0.3107$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$3.0358 \pm 0.3077$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$3.1351 \pm 0.1952$} & {\footnotesize$3.0478 \pm 0.2061$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$2.2808 \pm 0.0343$} & {\footnotesize$2.2383 \pm 0.0111$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{superconduct \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$10.8740 \pm 0.0868$} & {\footnotesize$10.4118 \pm 0.0429$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$10.7711 \pm 0.1454$} & {\footnotesize$10.3495 \pm 0.0168$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$10.8108 \pm 0.0957$} & {\footnotesize$10.4342 \pm 0.0179$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$10.8562 \pm 0.1300$} & {\footnotesize$10.3342 \pm 0.0509$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$10.4442 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$11.0019 \pm 0.1391$} & {\footnotesize$10.4469 \pm 0.0521$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$10.7502 \pm 0.0800$} & {\footnotesize$10.3281 \pm 0.0450$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$11.0879 \pm 0.1571$} & {\footnotesize$10.4094 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$10.7807 \pm 0.1074$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$10.8256 \pm 0.1692$} & {\footnotesize$10.3391 \pm 0.0794$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$10.8310 \pm 0.1406$} & {\footnotesize$10.3017 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$10.5058 \pm 0.0758$} & {\footnotesize$10.2322 \pm 0.0463$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$10.5061 \pm 0.0330$} & {\footnotesize$10.2440 \pm 0.0127$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$10.7220 \pm 0.0757$} & {\footnotesize$10.3758 \pm 0.0606$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$10.1610 \pm 0.0201$} & {\footnotesize$10.1413 \pm 0.0025$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$10.1634 \pm 0.0118$} & {\footnotesize$10.1552 \pm 0.0050$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$10.2422 \pm 0.0222$} & {\footnotesize$10.2116 \pm 0.0058$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$10.8842 \pm 0.1073$} & {\footnotesize$10.4800 \pm 0.0280$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$10.3835 \pm 0.0562$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$10.4419 \pm 0.0640$} & {\footnotesize$10.2926 \pm 0.0261$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$10.5651 \pm 0.0616$} & {\footnotesize$10.3155 \pm 0.0253$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$10.3379 \pm 0.0338$} & {\footnotesize$10.1943 \pm 0.0291$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$10.2628 \pm 0.0275$} & {\footnotesize$10.2300 \pm 0.0108$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$10.2572 \pm 0.0463$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$10.2472 \pm 0.0208$} & {\footnotesize$10.2094 \pm 0.0057$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$10.1326 \pm 0.0186$} & {\footnotesize$10.0866 \pm 0.0070$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{jannis \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.7840 \pm 0.0018$} & {\footnotesize$0.7872 \pm 0.0007$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.7419 \pm 0.0018$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.7923 \pm 0.0024$} & {\footnotesize$0.7958 \pm 0.0010$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.7712 \pm 0.0029$} & {\footnotesize$0.7825 \pm 0.0009$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.7818 \pm 0.0025$} & {\footnotesize$0.7859 \pm 0.0011$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.8027 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.7933 \pm 0.0018$} & {\footnotesize$0.7983 \pm 0.0013$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.7927 \pm 0.0025$} & {\footnotesize$0.8019 \pm 0.0012$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.7954 \pm 0.0015$} & {\footnotesize$0.8021 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.7971 \pm 0.0028$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.7940 \pm 0.0028$} & {\footnotesize$0.7998 \pm 0.0006$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.7998 \pm 0.0024$} & {\footnotesize$0.8052 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.7923 \pm 0.0018$} & {\footnotesize$0.7945 \pm 0.0010$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.7947 \pm 0.0017$} & {\footnotesize$0.7967 \pm 0.0011$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.7891 \pm 0.0013$} & {\footnotesize$0.7900 \pm 0.0006$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.7967 \pm 0.0019$} & {\footnotesize$0.7998 \pm 0.0007$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.7956 \pm 0.0017$} & {\footnotesize$0.7968 \pm 0.0005$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.7985 \pm 0.0018$} & {\footnotesize$0.8009 \pm 0.0012$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.7983 \pm 0.0022$} & {\footnotesize$0.8023 \pm 0.0018$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8051 \pm 0.0023$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.7993 \pm 0.0019$} & {\footnotesize$0.8042 \pm 0.0013$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8068 \pm 0.0021$} & {\footnotesize$0.8128 \pm 0.0007$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8066 \pm 0.0015$} & {\footnotesize$0.8075 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8080 \pm 0.0019$} & {\footnotesize$0.8102 \pm 0.0017$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.8064 \pm 0.0018$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.8053 \pm 0.0012$} & {\footnotesize$0.8066 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8078 \pm 0.0008$} & {\footnotesize$0.8086 \pm 0.0005$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{MiniBooNE \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.9480 \pm 0.0007$} & {\footnotesize$0.9498 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.9266 \pm 0.0012$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.9488 \pm 0.0011$} & {\footnotesize$0.9504 \pm 0.0005$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.9433 \pm 0.0011$} & {\footnotesize$0.9470 \pm 0.0010$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.9476 \pm 0.0013$} & {\footnotesize$0.9491 \pm 0.0010$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.9473 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.9447 \pm 0.0014$} & {\footnotesize$0.9473 \pm 0.0010$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.9446 \pm 0.0014$} & {\footnotesize$0.9483 \pm 0.0002$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.9430 \pm 0.0015$} & {\footnotesize$0.9451 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.9471 \pm 0.0009$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.9467 \pm 0.0014$} & {\footnotesize$0.9486 \pm 0.0010$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.9475 \pm 0.0014$} & {\footnotesize$0.9508 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.9466 \pm 0.0009$} & {\footnotesize$0.9478 \pm 0.0004$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.9473 \pm 0.0010$} & {\footnotesize$0.9493 \pm 0.0004$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.9482 \pm 0.0008$} & {\footnotesize$0.9492 \pm 0.0001$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.9436 \pm 0.0006$} & {\footnotesize$0.9452 \pm 0.0003$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.9422 \pm 0.0009$} & {\footnotesize$0.9427 \pm 0.0003$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.9453 \pm 0.0008$} & {\footnotesize$0.9459 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.9487 \pm 0.0008$} & {\footnotesize$0.9500 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.9475 \pm 0.0007$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.9488 \pm 0.0010$} & {\footnotesize$0.9505 \pm 0.0001$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.9493 \pm 0.0012$} & {\footnotesize$0.9501 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.9500 \pm 0.0005$} & {\footnotesize$0.9505 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.9503 \pm 0.0006$} & {\footnotesize$0.9501 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.9496 \pm 0.0010$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.9495 \pm 0.0005$} & {\footnotesize$0.9500 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.9490 \pm 0.0004$} & {\footnotesize$0.9492 \pm 0.0002$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{nyc-taxi-green-dec-2016 \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.3951 \pm 0.0009$} & {\footnotesize$0.3921 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.3899 \pm 0.0016$} & {\footnotesize$0.3873 \pm 0.0009$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.3919 \pm 0.0009$} & {\footnotesize$0.3889 \pm 0.0003$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.3933 \pm 0.0013$} & {\footnotesize$0.3899 \pm 0.0004$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.3979 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.4084 \pm 0.0256$} & {\footnotesize$0.3967 \pm 0.0059$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.3914 \pm 0.0026$} & {\footnotesize$0.3861 \pm 0.0013$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.3969 \pm 0.0036$} & {\footnotesize$0.3897 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.3905 \pm 0.0013$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.3937 \pm 0.0064$} & {\footnotesize$0.3889 \pm 0.0018$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.3908 \pm 0.0045$} & {\footnotesize$0.3858 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.3812 \pm 0.0018$} & {\footnotesize$0.3761 \pm 0.0016$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.3795 \pm 0.0016$} & {\footnotesize$0.3733 \pm 0.0013$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.3680 \pm 0.0006$} & {\footnotesize$0.3653 \pm 0.0005$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.3792 \pm 0.0002$} & {\footnotesize$0.3787 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.3688 \pm 0.0002$} & {\footnotesize$0.3684 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.3647 \pm 0.0005$} & {\footnotesize$0.3632 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.3577 \pm 0.0222$} & {\footnotesize$0.3380 \pm 0.0027$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.3725 \pm 0.0091$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.3728 \pm 0.0012$} & {\footnotesize$0.3720 \pm 0.0010$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.3536 \pm 0.0052$} & {\footnotesize$0.3407 \pm 0.0009$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.3866 \pm 0.0006$} & {\footnotesize$0.3855 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.3849 \pm 0.0005$} & {\footnotesize$0.3843 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.3848 \pm 0.0005$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.3853 \pm 0.0005$} & {\footnotesize$0.3845 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.3485 \pm 0.0038$} & {\footnotesize$0.3448 \pm 0.0020$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{particulate-matter-ukair-2017 \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.3759 \pm 0.0004$} & {\footnotesize$0.3729 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.3743 \pm 0.0007$} & {\footnotesize$0.3718 \pm 0.0005$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.3759 \pm 0.0012$} & {\footnotesize$0.3738 \pm 0.0004$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.3790 \pm 0.0007$} & {\footnotesize$0.3744 \pm 0.0002$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.3700 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.3723 \pm 0.0011$} & {\footnotesize$0.3692 \pm 0.0010$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.3741 \pm 0.0010$} & {\footnotesize$0.3698 \pm 0.0004$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.3699 \pm 0.0014$} & {\footnotesize$0.3652 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.3704 \pm 0.0014$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.3735 \pm 0.0012$} & {\footnotesize$0.3686 \pm 0.0004$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.3676 \pm 0.0024$} & {\footnotesize$0.3631 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.3665 \pm 0.0008$} & {\footnotesize$0.3642 \pm 0.0003$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.3657 \pm 0.0007$} & {\footnotesize$0.3629 \pm 0.0002$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.3649 \pm 0.0011$} & {\footnotesize$0.3637 \pm 0.0008$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.3641 \pm 0.0001$} & {\footnotesize$0.3640 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.3637 \pm 0.0001$} & {\footnotesize$0.3635 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.3647 \pm 0.0004$} & {\footnotesize$0.3637 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.3613 \pm 0.0005$} & {\footnotesize$0.3590 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.3596 \pm 0.0004$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.3670 \pm 0.0004$} & {\footnotesize$0.3649 \pm 0.0002$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.3646 \pm 0.0001$} & {\footnotesize$0.3643 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.3686 \pm 0.0006$} & {\footnotesize$0.3679 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.3671 \pm 0.0007$} & {\footnotesize$0.3665 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.3667 \pm 0.0009$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.3664 \pm 0.0006$} & {\footnotesize$0.3655 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.3593 \pm 0.0004$} & {\footnotesize$0.3589 \pm 0.0000$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{road-safety \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.7857 \pm 0.0019$} & {\footnotesize$0.7873 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & {\footnotesize$0.7338 \pm 0.0032$}\\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$0.7875 \pm 0.0007$} & {\footnotesize$0.7898 \pm 0.0008$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.7781 \pm 0.0014$} & {\footnotesize$0.7823 \pm 0.0012$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.7847 \pm 0.0010$} & {\footnotesize$0.7865 \pm 0.0002$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.7804 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$0.7826 \pm 0.0030$} & {\footnotesize$0.7883 \pm 0.0013$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$0.7878 \pm 0.0032$} & {\footnotesize$0.7919 \pm 0.0015$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.7864 \pm 0.0053$} & {\footnotesize$0.7907 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.7584 \pm 0.0584$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.7907 \pm 0.0012$} & {\footnotesize$0.7943 \pm 0.0007$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.7912 \pm 0.0026$} & {\footnotesize$0.7961 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.7867 \pm 0.0018$} & {\footnotesize$0.7903 \pm 0.0002$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.7853 \pm 0.0014$} & {\footnotesize$0.7881 \pm 0.0007$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.7899 \pm 0.0009$} & {\footnotesize$0.7935 \pm 0.0003$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8101 \pm 0.0017$} & {\footnotesize$0.8129 \pm 0.0004$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.7982 \pm 0.0012$} & {\footnotesize$0.7996 \pm 0.0005$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8012 \pm 0.0009$} & {\footnotesize$0.8022 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.8403 \pm 0.0014$} & {\footnotesize$0.8441 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8374 \pm 0.0013$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.8080 \pm 0.0013$} & {\footnotesize$0.8121 \pm 0.0006$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8232 \pm 0.0017$} & {\footnotesize$0.8287 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.7946 \pm 0.0013$} & {\footnotesize$0.7961 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.7958 \pm 0.0011$} & {\footnotesize$0.7968 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.7954 \pm 0.0016$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.7933 \pm 0.0030$} & {\footnotesize$0.7970 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.7999 \pm 0.0023$} & {\footnotesize$0.8059 \pm 0.0012$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{year \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$8.9628 \pm 0.0232$} & {\footnotesize$8.8931 \pm 0.0066$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & {\footnotesize$8.9658 \pm 0.0239$} & {\footnotesize$8.8755 \pm 0.0066$}\\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$9.2761 \pm 0.0401$} & {\footnotesize$9.0640 \pm 0.0156$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$9.0054 \pm 0.0256$} & {\footnotesize$8.9351 \pm 0.0073$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$8.9707 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & {\footnotesize$9.0430 \pm 0.0280$} & {\footnotesize$8.9619 \pm 0.0092$}\\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & {\footnotesize$8.9589 \pm 0.0182$} & {\footnotesize$8.9086 \pm 0.0177$}\\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$9.0395 \pm 0.0266$} & {\footnotesize$8.9551 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$9.0248 \pm 0.0225$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$9.0005 \pm 0.0215$} & {\footnotesize$8.9360 \pm 0.0013$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$8.9775 \pm 0.0138$} & {\footnotesize$8.8979 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$8.9355 \pm 0.0103$} & {\footnotesize$8.9063 \pm 0.0030$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$8.9455 \pm 0.0173$} & {\footnotesize$8.9083 \pm 0.0046$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$8.9379 \pm 0.0206$} & {\footnotesize$8.8753 \pm 0.0038$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$9.0307 \pm 0.0028$} & {\footnotesize$9.0245 \pm 0.0015$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$9.0200 \pm 0.0025$} & {\footnotesize$9.0128 \pm 0.0015$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$9.0370 \pm 0.0073$} & {\footnotesize$9.0054 \pm 0.0028$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$9.0069 \pm 0.0152$} & {\footnotesize$8.9132 \pm 0.0088$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$8.9721 \pm 0.0105$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$8.9476 \pm 0.0152$} & {\footnotesize$8.8977 \pm 0.0037$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$8.8973 \pm 0.0082$} & {\footnotesize$8.8550 \pm 0.0031$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$8.8701 \pm 0.0110$} & {\footnotesize$8.8517 \pm 0.0022$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$8.8705 \pm 0.0043$} & {\footnotesize$8.8642 \pm 0.0028$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$8.8723 \pm 0.0080$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$8.9164 \pm 0.0089$} & {\footnotesize$8.9021 \pm 0.0036$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$8.8737 \pm 0.0119$} & {\footnotesize$8.8564 \pm 0.0054$}\\
\bottomrule
\end{tabular}}

\\
\end{longtable}
```
```{=latex}
\begin{longtable}{p{0.5\textwidth}p{0.5\textwidth}}
\caption{Extended results for TabReD \cite{rubachev2024tabred} benchmark. Results are grouped by datasets. One ensemble consists of five models trained independently under different random seeds.}\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{sberbank-housing \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.2529 \pm 0.0078$} & {\footnotesize$0.2474 \pm 0.0052$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & -- & -- \\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.2616 \pm 0.0049$} & {\footnotesize$0.2506 \pm 0.0015$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.2671 \pm 0.0140$} & {\footnotesize$0.2555 \pm 0.0033$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.2509 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & -- & -- \\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & -- & -- \\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.2533 \pm 0.0046$} & {\footnotesize$0.2485 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.2467 \pm 0.0019$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.2440 \pm 0.0038$} & {\footnotesize$0.2367 \pm 0.0010$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.2416 \pm 0.0025$} & {\footnotesize$0.2343 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.2528 \pm 0.0055$} & {\footnotesize$0.2503 \pm 0.0029$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.2412 \pm 0.0031$} & {\footnotesize$0.2355 \pm 0.0006$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.2383 \pm 0.0032$} & {\footnotesize$0.2327 \pm 0.0009$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.2419 \pm 0.0012$} & {\footnotesize$0.2416 \pm 0.0007$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.2468 \pm 0.0009$} & {\footnotesize$0.2467 \pm 0.0002$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.2482 \pm 0.0034$} & {\footnotesize$0.2473 \pm 0.0016$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.2820 \pm 0.0323$} & {\footnotesize$0.2603 \pm 0.0048$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.2542 \pm 0.0101$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.2593 \pm 0.0053$} & {\footnotesize$0.2520 \pm 0.0032$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.2448 \pm 0.0039$} & {\footnotesize$0.2404 \pm 0.0025$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.2469 \pm 0.0035$} & {\footnotesize$0.2440 \pm 0.0026$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.2439 \pm 0.0021$} & {\footnotesize$0.2428 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.2436 \pm 0.0027$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.2433 \pm 0.0017$} & {\footnotesize$0.2422 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.2334 \pm 0.0018$} & {\footnotesize$0.2324 \pm 0.0009$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{ecom-offers \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.5989 \pm 0.0017$} & {\footnotesize$0.5995 \pm 0.0011$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & -- & -- \\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.5996 \pm 0.0043$} & {\footnotesize$0.6039 \pm 0.0028$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.5912 \pm 0.0056$} & {\footnotesize$0.5961 \pm 0.0033$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.5803 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & -- & -- \\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & -- & -- \\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.5759 \pm 0.0066$} & {\footnotesize$0.5759 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.5812 \pm 0.0098$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.5775 \pm 0.0063$} & {\footnotesize$0.5817 \pm 0.0021$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.5791 \pm 0.0056$} & {\footnotesize$0.5824 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.5800 \pm 0.0029$} & {\footnotesize$0.5819 \pm 0.0011$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.5846 \pm 0.0048$} & {\footnotesize$0.5872 \pm 0.0018$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.5949 \pm 0.0013$} & {\footnotesize$0.5953 \pm 0.0006$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.5763 \pm 0.0072$} & {\footnotesize$0.5917 \pm 0.0035$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.5758 \pm 0.0006$} & {\footnotesize$0.5758 \pm 0.0003$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.5596 \pm 0.0068$} & {\footnotesize$0.5067 \pm 0.0011$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.5943 \pm 0.0019$} & {\footnotesize$0.5977 \pm 0.0009$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.5762 \pm 0.0052$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.5765 \pm 0.0087$} & {\footnotesize$0.5820 \pm 0.0047$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.5758 \pm 0.0050$} & {\footnotesize$0.5796 \pm 0.0009$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.5948 \pm 0.0006$} & {\footnotesize$0.5952 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.5941 \pm 0.0003$} & {\footnotesize$0.5941 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.5970 \pm 0.0010$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.5942 \pm 0.0003$} & {\footnotesize$0.5943 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.5910 \pm 0.0012$} & {\footnotesize$0.5913 \pm 0.0002$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{maps-routing \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.1625 \pm 0.0001$} & {\footnotesize$0.1621 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & -- & -- \\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.1656 \pm 0.0004$} & {\footnotesize$0.1636 \pm 0.0001$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.1634 \pm 0.0002$} & {\footnotesize$0.1625 \pm 0.0000$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.1624 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & -- & -- \\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & -- & -- \\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.1628 \pm 0.0001$} & {\footnotesize$0.1621 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.1634 \pm nan$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.1625 \pm 0.0003$} & {\footnotesize$0.1619 \pm 0.0001$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.1616 \pm 0.0001$} & {\footnotesize$0.1608 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.1618 \pm 0.0002$} & {\footnotesize$0.1613 \pm 0.0000$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.1618 \pm 0.0002$} & {\footnotesize$0.1613 \pm 0.0001$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.1620 \pm 0.0002$} & {\footnotesize$0.1614 \pm 0.0000$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.1616 \pm 0.0001$} & {\footnotesize$0.1614 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.1618 \pm 0.0000$} & {\footnotesize$0.1616 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.1619 \pm 0.0001$} & {\footnotesize$0.1615 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.1639 \pm 0.0003$} & {\footnotesize$0.1622 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.1622 \pm 0.0002$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.1625 \pm 0.0001$} & {\footnotesize$0.1621 \pm 0.0001$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.1627 \pm 0.0002$} & {\footnotesize$0.1623 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.1612 \pm 0.0001$} & {\footnotesize$0.1609 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.1612 \pm 0.0001$} & {\footnotesize$0.1610 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.1611 \pm 0.0001$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.1612 \pm 0.0001$} & {\footnotesize$0.1610 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.1610 \pm 0.0001$} & {\footnotesize$0.1609 \pm 0.0000$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{homesite-insurance \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.9506 \pm 0.0005$} & {\footnotesize$0.9514 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & -- & -- \\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.9398 \pm 0.0053$} & {\footnotesize$0.9432 \pm 0.0018$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.9473 \pm 0.0013$} & {\footnotesize$0.9484 \pm 0.0007$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.9588 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & -- & -- \\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & -- & -- \\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.9622 \pm 0.0004$} & {\footnotesize$0.9635 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.9613 \pm nan$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.9622 \pm 0.0006$} & {\footnotesize$0.9633 \pm 0.0001$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.9624 \pm 0.0006$} & {\footnotesize$0.9637 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.9609 \pm 0.0009$} & {\footnotesize$0.9626 \pm 0.0003$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.9617 \pm 0.0004$} & {\footnotesize$0.9630 \pm 0.0002$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.9582 \pm 0.0014$} & {\footnotesize$0.9599 \pm 0.0002$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.9601 \pm 0.0002$} & {\footnotesize$0.9602 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.9603 \pm 0.0002$} & {\footnotesize$0.9604 \pm 0.0001$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.9606 \pm 0.0003$} & {\footnotesize$0.9609 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.9487 \pm 0.0014$} & {\footnotesize$0.9505 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.9556 \pm 0.0021$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.9514 \pm 0.0038$} & {\footnotesize$0.9522 \pm 0.0027$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.9620 \pm 0.0006$} & {\footnotesize$0.9635 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.9641 \pm 0.0004$} & {\footnotesize$0.9644 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.9640 \pm 0.0002$} & {\footnotesize$0.9642 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.9641 \pm 0.0003$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.9643 \pm 0.0003$} & {\footnotesize$0.9645 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.9631 \pm 0.0003$} & {\footnotesize$0.9634 \pm 0.0001$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{cooking-time \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.4828 \pm 0.0002$} & {\footnotesize$0.4822 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & -- & -- \\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.4834 \pm 0.0003$} & {\footnotesize$0.4822 \pm 0.0001$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.4835 \pm 0.0006$} & {\footnotesize$0.4818 \pm 0.0002$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.4809 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & -- & -- \\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & -- & -- \\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.4821 \pm 0.0005$} & {\footnotesize$0.4808 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.4840 \pm nan$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.4820 \pm 0.0008$} & {\footnotesize$0.4813 \pm 0.0005$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.4809 \pm 0.0008$} & {\footnotesize$0.4797 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.4811 \pm 0.0004$} & {\footnotesize$0.4805 \pm 0.0001$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.4809 \pm 0.0006$} & {\footnotesize$0.4804 \pm 0.0003$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.4812 \pm 0.0004$} & {\footnotesize$0.4807 \pm 0.0002$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.4823 \pm 0.0001$} & {\footnotesize$0.4821 \pm 0.0000$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.4826 \pm 0.0001$} & {\footnotesize$0.4825 \pm 0.0001$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.4823 \pm 0.0001$} & {\footnotesize$0.4820 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.4828 \pm 0.0008$} & {\footnotesize$0.4814 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.4818 \pm 0.0006$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.4825 \pm 0.0004$} & {\footnotesize$0.4819 \pm 0.0003$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.4818 \pm 0.0005$} & {\footnotesize$0.4809 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.4803 \pm 0.0006$} & {\footnotesize$0.4797 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.4804 \pm 0.0002$} & {\footnotesize$0.4802 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.4800 \pm 0.0002$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.4803 \pm 0.0001$} & {\footnotesize$0.4801 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.4804 \pm 0.0001$} & {\footnotesize$0.4803 \pm 0.0000$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{homecredit-default \textuparrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.8538 \pm 0.0014$} & {\footnotesize$0.8566 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & -- & -- \\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.8471 \pm 0.0019$} & {\footnotesize$0.8549 \pm 0.0002$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.8541 \pm 0.0016$} & {\footnotesize$0.8569 \pm 0.0010$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.8355 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & -- & -- \\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & -- & -- \\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.8513 \pm 0.0024$} & {\footnotesize$0.8564 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.8377 \pm nan$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.8571 \pm 0.0023$} & {\footnotesize$0.8611 \pm 0.0013$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.8597 \pm 0.0007$} & {\footnotesize$0.8629 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.8598 \pm 0.0009$} & {\footnotesize$0.8607 \pm 0.0003$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.8572 \pm 0.0011$} & {\footnotesize$0.8590 \pm 0.0003$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.8568 \pm 0.0039$} & {\footnotesize$0.8614 \pm 0.0014$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.8670 \pm 0.0005$} & {\footnotesize$0.8674 \pm 0.0001$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.8664 \pm 0.0004$} & {\footnotesize$0.8667 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.8627 \pm nan$} & -- \\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.8501 \pm 0.0027$} & {\footnotesize$0.8548 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.8547 \pm 0.0021$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.8531 \pm 0.0018$} & {\footnotesize$0.8569 \pm 0.0004$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.8544 \pm 0.0033$} & {\footnotesize$0.8606 \pm 0.0024$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.8583 \pm 0.0010$} & {\footnotesize$0.8599 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.8599 \pm 0.0010$} & {\footnotesize$0.8607 \pm 0.0002$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.8588 \pm 0.0013$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.8605 \pm 0.0010$} & {\footnotesize$0.8614 \pm 0.0007$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.8635 \pm 0.0008$} & {\footnotesize$0.8646 \pm 0.0004$}\\
\bottomrule
\end{tabular}}

\\

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{delivery-eta \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$0.5493 \pm 0.0007$} & {\footnotesize$0.5478 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & -- & -- \\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$0.5516 \pm 0.0014$} & {\footnotesize$0.5495 \pm 0.0004$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$0.5495 \pm 0.0008$} & {\footnotesize$0.5479 \pm 0.0001$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$0.5519 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & -- & -- \\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & -- & -- \\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$0.5552 \pm 0.0030$} & {\footnotesize$0.5524 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$0.5528 \pm nan$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$0.5542 \pm 0.0026$} & {\footnotesize$0.5523 \pm 0.0018$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$0.5527 \pm 0.0016$} & {\footnotesize$0.5512 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$0.5521 \pm 0.0014$} & {\footnotesize$0.5512 \pm 0.0005$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$0.5535 \pm 0.0019$} & {\footnotesize$0.5526 \pm 0.0009$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$0.5521 \pm 0.0019$} & {\footnotesize$0.5511 \pm 0.0007$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$0.5468 \pm 0.0002$} & {\footnotesize$0.5463 \pm 0.0001$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$0.5468 \pm 0.0001$} & {\footnotesize$0.5465 \pm 0.0000$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$0.5465 \pm 0.0001$} & {\footnotesize$0.5461 \pm 0.0000$}\\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$0.5514 \pm 0.0024$} & {\footnotesize$0.5480 \pm 0.0005$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$0.5520 \pm 0.0015$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$0.5498 \pm 0.0007$} & {\footnotesize$0.5488 \pm 0.0002$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$0.5507 \pm 0.0013$} & {\footnotesize$0.5494 \pm 0.0006$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$0.5510 \pm 0.0015$} & {\footnotesize$0.5504 \pm 0.0004$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$0.5494 \pm 0.0004$} & {\footnotesize$0.5492 \pm 0.0001$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$0.5509 \pm 0.0003$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$0.5497 \pm 0.0007$} & {\footnotesize$0.5495 \pm 0.0003$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$0.5510 \pm 0.0019$} & {\footnotesize$0.5502 \pm 0.0000$}\\
\bottomrule
\end{tabular}}

&

\topalign{
\setlength\tabcolsep{2.5pt}
\renewcommand{\arraystretch}{0.8}
\begin{tabular}{lll}

\multicolumn{3}{c}{\small{weather \textdownarrow}} \\
\toprule
{\small Method} & {\small Single model} & {\small Ensemble} \\
\midrule\\[-0.7cm]
\multicolumn{3}{c}{} \\[0.05cm]
{\footnotesize $\mathrm{MLP}$ } & {\footnotesize$1.5378 \pm 0.0054$} & {\footnotesize$1.5111 \pm 0.0029$}\\
{\footnotesize $\mathrm{TabPFN}$ } & -- & -- \\
{\footnotesize $\mathrm{ResNet}$ } & -- & -- \\
{\footnotesize $\mathrm{DCN2}$ } & {\footnotesize$1.5606 \pm 0.0057$} & {\footnotesize$1.5292 \pm 0.0028$}\\
{\footnotesize $\mathrm{SNN}$ } & {\footnotesize$1.5280 \pm 0.0085$} & {\footnotesize$1.5013 \pm 0.0034$}\\
{\footnotesize $\mathrm{Trompt}$ } & {\footnotesize$1.5187 \pm nan$} & -- \\
{\footnotesize $\mathrm{AutoInt}$ } & -- & -- \\
{\footnotesize $\mathrm{MLP\texttt{-}Mixer}$ } & -- & -- \\
{\footnotesize $\mathrm{Excel^*}$ } & {\footnotesize$1.5131 \pm 0.0022$} & {\footnotesize$1.4707 \pm nan$}\\
{\footnotesize $\mathrm{SAINT}$ } & {\footnotesize$1.5097 \pm 0.0045$} & -- \\
{\footnotesize $\mathrm{FT\texttt{-}T}$ } & {\footnotesize$1.5104 \pm 0.0097$} & {\footnotesize$1.4719 \pm 0.0040$}\\
{\footnotesize $\mathrm{T2G}$ } & {\footnotesize$1.4849 \pm 0.0087$} & {\footnotesize$1.4513 \pm nan$}\\
{\footnotesize $\mathrm{MLP^{\ddagger-lite}}$ } & {\footnotesize$1.5170 \pm 0.0040$} & {\footnotesize$1.4953 \pm 0.0023$}\\
{\footnotesize $\mathrm{MLP^\ddagger}$ } & {\footnotesize$1.5139 \pm 0.0031$} & {\footnotesize$1.4978 \pm 0.0020$}\\
{\footnotesize $\mathrm{MLP^\dagger}$ } & {\footnotesize$1.5162 \pm 0.0020$} & {\footnotesize$1.5066 \pm 0.0008$}\\
{\footnotesize $\mathrm{XGBoost}$ } & {\footnotesize$1.4671 \pm 0.0006$} & {\footnotesize$1.4629 \pm 0.0002$}\\
{\footnotesize $\mathrm{LightGBM}$ } & {\footnotesize$1.4625 \pm 0.0008$} & {\footnotesize$1.4581 \pm 0.0003$}\\
{\footnotesize $\mathrm{CatBoost}$ } & {\footnotesize$1.4688 \pm 0.0019$} & -- \\
{\footnotesize $\mathrm{TabR}$ } & {\footnotesize$1.4666 \pm 0.0039$} & {\footnotesize$1.4547 \pm 0.0008$}\\
{\footnotesize $\mathrm{TabR^\ddagger}$ } & {\footnotesize$1.4458 \pm 0.0018$} & -- \\
{\footnotesize $\mathrm{MNCA}$ } & {\footnotesize$1.5062 \pm 0.0054$} & {\footnotesize$1.4822 \pm 0.0013$}\\
{\footnotesize $\mathrm{MNCA^\ddagger}$ } & {\footnotesize$1.5008 \pm 0.0034$} & {\footnotesize$1.4782 \pm 0.0011$}\\
{\footnotesize $\mathrm{TabM^{\spadesuit}}$ } & {\footnotesize$1.4786 \pm 0.0039$} & {\footnotesize$1.4715 \pm 0.0020$}\\
{\footnotesize $\mathrm{TabM}$ } & {\footnotesize$1.4722 \pm 0.0024$} & {\footnotesize$1.4675 \pm 0.0009$}\\
{\footnotesize $\mathrm{TabM[G]}$ } & {\footnotesize$1.4728 \pm 0.0022$} & -- \\
{\footnotesize $\mathrm{TabM_{mini}}$ } & {\footnotesize$1.4716 \pm 0.0016$} & {\footnotesize$1.4669 \pm 0.0010$}\\
{\footnotesize $\mathrm{TabM_{mini}^\dagger}$ } & {\footnotesize$1.4651 \pm 0.0020$} & {\footnotesize$1.4581 \pm 0.0016$}\\
\bottomrule
\end{tabular}}

\\
\end{longtable}
```

[^1]: The corresponding author: `yurygorishniy@gmail.com`

[^2]: https://github.com/jyansir/t2g-former

[^3]: https://github.com/WhatAShot/ExcelFormer

[^4]: https://github.com/yandex-research/tabular-dl-tabr

[^5]: https://github.com/shichence/AutoInt