LeJEPA: Provable And Scalable Self-Supervised Learning Without The Heuristics

Source

Raw Markdown: paper_lejepa-2025.md
PDF: paper_lejepa-2025.pdf

Core Claim

LeJEPA argues that JEPA embeddings should follow an isotropic Gaussian distribution and introduces SIGReg to enforce that distribution efficiently.

Key Contributions

Provides a theory for the optimal embedding distribution for downstream prediction risk.
Introduces Sketched Isotropic Gaussian Regularization (SIGReg).
Combines JEPA predictive loss with SIGReg to reduce reliance on stop-gradient, EMA, teacher-student, and scheduler heuristics.
Validates across many datasets, architectures, and domains.

Method Notes

LeJEPA is central to JEPA, Representation Collapse, and Self-Supervised Representation Learning.

Evidence And Results

The source reports broad empirical validation, stable training across architectures and domains, and ImageNet-1k linear evaluation examples for large ViT models.

Limitations

The paper’s strongest claim is generality. The wiki should test that claim against multimodal and control-specific sources such as VL-JEPA and LeWorldModel.

Links Into The Wiki

Open Questions

Can SIGReg remain sufficient at frontier multimodal scale?
Is the isotropic Gaussian target universally optimal or domain-dependent?

Alex Knowledge Base

Explorer

LeJEPA: Provable And Scalable Self-Supervised Learning Without The Heuristics

LeJEPA: Provable And Scalable Self-Supervised Learning Without The Heuristics

Source

Core Claim

Key Contributions

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks