Efficient Recurrent Sequence Models
Summary
Efficient recurrent sequence models try to recover the serving advantages of compact latent state while avoiding the historical training bottleneck of sequential RNN unrolling. The Mamba line keeps the recurrence linear in the hidden state and wins parallelism through scans or structured matrix algorithms; ParaRNN keeps nonlinear recurrent cells and makes the hidden trajectory parallelizable through Newton linearization plus parallel reduction.
Linear Recurrent State Path
Mamba introduces selective SSMs: input-dependent state-space parameters let the model decide what to remember or forget, while the hidden-state recurrence remains linear enough to compute with a hardware-aware parallel scan. It is the core source for the idea that compact recurrent state can compete with attention on token sequences when selectivity and kernel design are strong enough.
Mamba-2 reframes SSMs as semiseparable matrix mixers. The structured state space duality view turns the recurrent update into a matrix-algorithm problem and yields SSD, which improves training efficiency and allows larger state sizes.
Mamba-3 keeps the structured SSM family but adds more expressive state dynamics: exponential-trapezoidal discretization, complex-valued state transitions, and MIMO updates. Its relevance is that richer state tracking can be added while staying close to the efficient SSM inference contract.
Nonlinear Recurrent State Path
ParaRNN is the key new source because it challenges the assumption that nonlinear recurrent hidden-state updates are inherently impractical at training time. It solves the all-time-step hidden-state trajectory as a nonlinear system; each Newton step becomes a linear recurrence over Jacobians and residuals, which is solved by parallel reduction.
The important distinction is that ParaRNN’s recurrent cell is genuinely nonlinear in the hidden state, while Mamba-family models preserve a linear hidden-state recurrence with input-dependent parameters. This makes ParaRNN especially relevant for domains where nonlinear latent dynamics are the modeling prior, including numeric time series and action-conditioned world models, but the paper itself evaluates token-sequence language modeling.
Relevance For Time-Series And World Models
For this wiki, these papers are architectural background rather than direct forecasting evidence. They matter because many time-series and trajectory models need a compact latent state, long context, and efficient inference. Mamba-style selective state can be a strong passive dynamics backbone; ParaRNN suggests that nonlinear latent-state dynamics might become practical at scale if the solver remains stable and structured.
When writing about numeric time-series models, cite these sources as sequence-model primitives and then separately cite the concrete forecasting or trajectory paper for empirical claims. Do not treat language-model perplexity as evidence for multivariate time-series forecasting, event-stream modeling, or action-conditioned world modeling without a bridging experiment.
Open Questions
- Which time-series or trajectory regimes actually need nonlinear hidden-state updates rather than selective linear recurrent state?
- Can ParaRNN-style solvers handle explicit actions, control inputs, or interventions while preserving sparse or block-diagonal Jacobian structure?
- Do Mamba-3 complex state transitions help periodic, rotational, or conservation-like dynamics in numeric time series?
- What is the right benchmark for comparing attention, SSMs, and nonlinear RNNs when serving cost, context length, channel count, and state tracking all matter?