LLM Sleep
Summary
LLM Sleep is a sleep-time memory-consolidation method for SSM-attention hybrid language models. It is best read as a test-time or serving-time compute allocation pattern: perform offline recurrent passes over a context window before clearing the KV cache, update SSM fast weights, and let later wake-time predictions use the consolidated state with a single forward pass.
Role In The Wiki
Use this page as the object card for the method. The source page carries the evidence details, limitations, and agenda mapping.
LLM Sleep sits between compact recurrent-state models and recurrent-depth models: the recurrence is spent on fast-weight formation before context eviction, not on every prediction token. For this wiki, its main relevance is the infinite-context streaming analogy: finite windows can roll off if the system has a learned state-refresh step that consolidates them first.
Relation To Foundation TSFM Agenda
Use the source-level agenda mapping in language-models-need-sleep-2026 rather than duplicating verdict rows here. At the entity level, this page should stay as the object card; source pages carry slot-level evidence, limitations, and missing pieces.
Evidence
Official Artifacts
- Preprint: arXiv 2605.26099
No official code, project page, blog post, or author X thread was verified during ingest.