LeJEPA: Provable And Scalable Self-Supervised Learning Without The Heuristics
Source
- Raw Markdown: paper_lejepa-2025.md
- PDF: paper_lejepa-2025.pdf
Core Claim
LeJEPA argues that JEPA embeddings should follow an isotropic Gaussian distribution and introduces SIGReg to enforce that distribution efficiently.
Key Contributions
- Provides a theory for the optimal embedding distribution for downstream prediction risk.
- Introduces Sketched Isotropic Gaussian Regularization (SIGReg).
- Combines JEPA predictive loss with SIGReg to reduce reliance on stop-gradient, EMA, teacher-student, and scheduler heuristics.
- Validates across many datasets, architectures, and domains.
Method Notes
LeJEPA is central to JEPA, Representation Collapse, and Self-Supervised Representation Learning.
Evidence And Results
The source reports broad empirical validation, stable training across architectures and domains, and ImageNet-1k linear evaluation examples for large ViT models.
Limitations
The paper’s strongest claim is generality. The wiki should test that claim against multimodal and control-specific sources such as VL-JEPA and LeWorldModel.
Links Into The Wiki
Open Questions
- Can SIGReg remain sufficient at frontier multimodal scale?
- Is the isotropic Gaussian target universally optimal or domain-dependent?