TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling
Source
- Raw Markdown: paper_tabm-2024.md
- PDF: paper_tabm-2024.pdf
- Preprint: arXiv 2410.24210
- Venue page: ICLR 2025
- Official code: yandex-research/tabm
- Numerical embedding package used by the official code: yandex-research/rtdl-num-embeddings
Core Claim
TabM is a practical static-tabular deep-learning baseline that turns an MLP into a parameter-efficient ensemble: one model represents multiple implicit MLP submodels, trains them in parallel, shares most weights by default, and averages multiple predictions per object.
Benchmarked Model Entry
- Model: TabM
- Family: MLP-based tabular deep learning with parameter-efficient ensembling.
- Venue: ICLR 2025; this local slug uses the arXiv submission year because the paper and local citation keys are 2024-based.
- Primary task surface: supervised classification and regression on static tabular datasets.
- Main architecture variants: TabM, TabM-mini, and TabM-packed.
- Typical ensemble size in the paper and official package guidance:
k=32. - Official artifact: the
tabmPyTorch package and repository.
Key Contributions
- Shows that parameter-efficient ensembling is a strong path for tabular MLPs, competing with or outperforming attention- and retrieval-based tabular deep-learning architectures under the paper’s protocol.
- Uses simultaneous training of implicit submodels so early stopping and hyperparameter tuning can target the collective prediction rather than independent single-model checkpoints.
- Uses weight sharing between submodels as both an efficiency mechanism and an effective regularizer.
- Introduces practical variants, including TabM-mini, which keeps only the first multiplicative adapter, and TabM-packed, which packs fully independent submodels.
- Reports training and inference efficiency comparisons, including larger tabular datasets where attention- and retrieval-based methods become less practical.
- Reuses non-linear numerical feature embeddings, especially updated piecewise-linear embeddings, as a high-utility input interface for continuous features.
Numerical Feature Embedding Notes
TabM is important for Number Tokenization, but it should not be collapsed with text number tokenization. Its numerical embeddings operate on typed table columns, not on literal numerals in a token stream.
The default tabular pipeline keeps numerical features as scalars after preprocessing, while categorical features are one-hot encoded. When num_embeddings is passed in the official implementation, the numerical embedding module is applied before the MLP backbone, the per-feature representations are flattened, and the embedding module is shared across the implicit submodels.
The official TabM package allows LinearReLUEmbeddings, PiecewiseLinearEmbeddings, and PeriodicEmbeddings from rtdl_num_embeddings as num_embeddings. LinearReLUEmbeddings maps each scalar feature through a per-feature linear layer plus ReLU. PiecewiseLinearEmbeddings first encodes a scalar through bins, then learns embeddings over that piecewise-linear representation. PeriodicEmbeddings uses learned frequencies, cosine/sine activations, and an outer linear layer; the paper uses related periodic embeddings for MLP-PLR and several baselines.
The paper’s TabM-with-embeddings variants use an updated piecewise-linear embedding with quantile-based bins, a faster implementation, and different parametrization/initialization from the earlier numerical-embedding paper. In the current official package, PiecewiseLinearEmbeddings(..., version="B") is required for TabM because that version starts from a linear component and learns the piecewise-linear contribution incrementally.
Method Notes
TabM is not a tabular foundation model. It does not learn an in-context inference procedure from synthetic tasks like TabPFN-style models; instead, it is trained per supervised tabular dataset. Its value for the knowledge base is as a strong static-tabular baseline and as a concrete catalogue of continuous numeric-feature embedding options.
For time-series and world-model work, the portable idea is the input interface: auxiliary numeric values can be represented as typed scalar features with per-feature embeddings, not only as text numerals, bit tokens, Fourier number tokens, or raw observation samples.
Evidence And Results
The benchmark uses 46 public tabular datasets: 38 from prior tabular-DL work and 8 from TabReD. The paper reports that TabM is a top-tier tabular deep-learning model under this protocol and emphasizes that MLP-like models, including TabM, offer a better performance-efficiency tradeoff than the evaluated attention- and retrieval-based tabular architectures.
The paper also analyzes TabM’s ensemble-like behavior: individual submodels can be weak or overfit, while their collective averaged prediction generalizes better. This supports treating TabM as an efficient ensemble rather than a single ordinary MLP.
Limitations
TabM targets static supervised tabular prediction. It does not encode temporal order, next-state dynamics, event streams, actions, control inputs, interventions, or causal rollouts by itself. Its numerical feature embeddings are column-specific and often depend on training-set preprocessing or bins, so they should not be treated as universal number representations in the same sense as FoNE or BitTokens.
Links Into The Wiki
- TabM
- Number Tokenization
- Tabular Foundation Models
- Time-Series Foundation Models
- Time-Series Benchmark Hygiene
Open Questions
- Should the wiki ingest the earlier numerical-embedding source behind
rtdl_num_embeddingsas its own page? - Which TabM-style embedding option is most appropriate for known future exogenous variables, numeric control inputs, and intervention intensities in multivariate time-series models?
- Can per-feature piecewise-linear embeddings coexist with point-wise time-series tokenizers without losing cross-channel alignment?