Mamba-3: Improved Sequence Modeling using State Space Principles

Source

Raw Markdown: paper_mamba-3-2026.md
PDF: paper_mamba-3-2026.pdf
Preprint: arXiv 2603.15569
Official blog post: Princeton Language and Intelligence
Official code: state-spaces/mamba
Conference page: OpenReview ICLR 2026

Core Claim

Mamba-3 improves the Mamba-2 line from an SSM-first perspective: a more expressive exponential-trapezoidal discretization, complex-valued state transitions, and a multi-input multi-output formulation improve model quality and state tracking while preserving low-latency recurrent inference.

Key Contributions

Formalizes the Mamba-1 and Mamba-2 discretization as an exponential-Euler rule, then generalizes it to an exponential-trapezoidal update.
Reintroduces complex-valued state dynamics through an efficient real-valued rotation formulation, giving the recurrent state a better mechanism for parity-like state tracking.
Adds a MIMO state update that increases useful decoding FLOPs and modeling power without materially increasing decode latency in the reported setting.
Releases fast training and inference kernels for the updated architecture.

Evidence And Results

The paper reports that at the 1.5B scale Mamba-3 SISO improves average downstream accuracy by 0.6 points over the next best model in its comparison, while the MIMO variant adds another 1.2 points for a total 1.8-point gain over that baseline. It also reports that Mamba-3 with state size 64 can match Mamba-2 with state size 128, making the same language-modeling quality possible with a smaller recurrent state in the paper’s state-size experiments.

On state-tracking tasks, the data-dependent RoPE/complex-state variant solves parity and modular arithmetic tasks that Mamba-2 and non-complex variants fail in the reported experiments.

Relevance To This Wiki

Mamba-3 is important background for efficient sequence models because it shows that the Mamba family is still gaining expressivity from state-space principles rather than only from kernel engineering. It sharpens the difference between efficient recurrent models that preserve linear hidden-state updates and ParaRNN’s attempt to make genuinely nonlinear hidden-state updates trainable in parallel.

For time-series and world-model readers, the complex-state and MIMO directions are especially relevant because they move SSMs closer to richer latent-state dynamics while retaining recurrent inference.

Limitations

Mamba-3 remains in the structured SSM family; it is not a general nonlinear RNN solver.
The MIMO variant improves inference-time hardware utilization but can trade off slower training.
The main evaluation surface is language modeling, retrieval, and synthetic state tracking, so claims about numeric time series, trajectories, or action-conditioned world models need separate validation.

Links Into The Wiki

Open Questions

Do complex-valued recurrent state transitions help numeric time-series and control-input modeling, or are the gains mostly for symbolic state tracking?
Can MIMO SSM updates improve multivariate time-series models without erasing channel-specific deviations?
How much of Mamba-3’s gain transfers to hybrid architectures that combine attention, SSMs, and nonlinear RNN cells?

Alex Knowledge Base

Explorer

Mamba-3: Improved Sequence Modeling using State Space Principles

Mamba-3: Improved Sequence Modeling using State Space Principles

Source

Core Claim

Key Contributions

Evidence And Results

Relevance To This Wiki

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks