Representation Collapse
Summary
Representation collapse is the failure mode where predictive representation learning maps inputs to uninformative or nearly identical embeddings.
What The Wiki Currently Believes
- LeJEPA argues that a good JEPA objective should force embeddings toward an isotropic Gaussian target distribution.
- LeWorldModel uses Gaussian regularization to stabilize end-to-end pixel world-model training without EMA, pretrained encoders, or auxiliary supervision.
- NEPA uses next-embedding prediction with causal masking and stop-gradient, showing a simpler visual predictive objective can work without pixel reconstruction or discrete tokens.
Evidence
The sources agree collapse prevention is central, but they disagree in mechanism: distribution matching and Gaussian regularization versus stop-gradient predictive training.
Open Questions
- Which collapse-prevention mechanism is most robust at frontier data/model scale?
- Can a single target embedding distribution work across visual, temporal, and language modalities?