Mamba-3: Improved Sequence Modeling using State Space Principles
Source
- Raw Markdown: paper_mamba-3-2026.md
- PDF: paper_mamba-3-2026.pdf
- Preprint: arXiv 2603.15569
- Official blog post: Princeton Language and Intelligence
- Official code: state-spaces/mamba
- Conference page: OpenReview ICLR 2026
Core Claim
Mamba-3 improves the Mamba-2 line from an SSM-first perspective: a more expressive exponential-trapezoidal discretization, complex-valued state transitions, and a multi-input multi-output formulation improve model quality and state tracking while preserving low-latency recurrent inference.
Key Contributions
- Formalizes the Mamba-1 and Mamba-2 discretization as an exponential-Euler rule, then generalizes it to an exponential-trapezoidal update.
- Reintroduces complex-valued state dynamics through an efficient real-valued rotation formulation, giving the recurrent state a better mechanism for parity-like state tracking.
- Adds a MIMO state update that increases useful decoding FLOPs and modeling power without materially increasing decode latency in the reported setting.
- Releases fast training and inference kernels for the updated architecture.
Evidence And Results
The paper reports that at the 1.5B scale Mamba-3 SISO improves average downstream accuracy by 0.6 points over the next best model in its comparison, while the MIMO variant adds another 1.2 points for a total 1.8-point gain over that baseline. It also reports that Mamba-3 with state size 64 can match Mamba-2 with state size 128, making the same language-modeling quality possible with a smaller recurrent state in the paper’s state-size experiments.
On state-tracking tasks, the data-dependent RoPE/complex-state variant solves parity and modular arithmetic tasks that Mamba-2 and non-complex variants fail in the reported experiments.
Relevance To This Wiki
Mamba-3 is important background for efficient sequence models because it shows that the Mamba family is still gaining expressivity from state-space principles rather than only from kernel engineering. It sharpens the difference between efficient recurrent models that preserve linear hidden-state updates and ParaRNN’s attempt to make genuinely nonlinear hidden-state updates trainable in parallel.
For time-series and world-model readers, the complex-state and MIMO directions are especially relevant because they move SSMs closer to richer latent-state dynamics while retaining recurrent inference.
Limitations
- Mamba-3 remains in the structured SSM family; it is not a general nonlinear RNN solver.
- The MIMO variant improves inference-time hardware utilization but can trade off slower training.
- The main evaluation surface is language modeling, retrieval, and synthetic state tracking, so claims about numeric time series, trajectories, or action-conditioned world models need separate validation.
Links Into The Wiki
- Mamba-3
- Efficient Recurrent Sequence Models
- Time-Series Scaling And Efficiency
- Mamba
- Mamba-2
- ParaRNN
Open Questions
- Do complex-valued recurrent state transitions help numeric time-series and control-input modeling, or are the gains mostly for symbolic state tracking?
- Can MIMO SSM updates improve multivariate time-series models without erasing channel-specific deviations?
- How much of Mamba-3’s gain transfers to hybrid architectures that combine attention, SSMs, and nonlinear RNN cells?