Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models
Source
- Raw Markdown: paper_parallel-samplers-recurrent-depth-2025.md
- PDF: paper_parallel-samplers-recurrent-depth-2025.pdf
- Preprint: arXiv 2510.14961
- Official code: seal-rg/recurrent-pretraining
Core Claim
The paper connects recurrent-depth language models to diffusion language models and introduces a sampler that decodes new tokens while refining latent states in parallel.
Relevance To This Wiki
It addresses a practical bottleneck of recurrent-depth models: how to use loop compute without paying fully serial autoregressive latency.
Limitations
The sampler is language-generation oriented. The diffusion analogy should not be overextended to continuous numeric trajectories without a separate generative interface.
Foundation TSFM Relevance
Potentially relevant to parallel rollouts or forecast refinement if recurrent-depth state updates can be separated from output emission.
Links Into The Wiki
- Parallel Samplers for Recurrent-Depth Models
- Looped Transformers And Test-Time Memory
- Efficient Recurrent Sequence Models
- Time-Series Scaling And Efficiency
- Huginn
- Foundation Time-Series Model Research Agenda
Open Questions
- What matched-budget baseline should this source be compared against: unique-depth Transformer layers, recurrent state, explicit memory, or extra inference steps?
- Which claims transfer from token-sequence reasoning to multivariate time-series state tracking, event streams, or action-conditioned world models?
- Can diffusion-style recurrent-depth sampling transfer to continuous numeric trajectories without losing causal time semantics?