MoDA

Summary

MoDA, or Mixture-of-Depths Attention, is a Transformer attention operator that lets each head jointly attend to current sequence key/value pairs and depth key/value memories from previous layers.

Role In The Wiki

MoDA is the current source-backed object for content-based inter-layer retrieval. It is relevant to intermediate-layer representations, depth scaling, dynamic compute, and the open question of whether layer aggregation should be fixed, learned, sparse, or attention-based.

Official Artifacts

Evidence

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in moda-2026. At the entity level, MoDA should stay as the object card for the method, code, blog, and thread.

The core transfer hypothesis is narrow: MoDA may help future time-series or world-model systems retrieve useful intermediate state across depth, but the current evidence is language-model architecture evidence, not numeric time-series or control evidence.