LeJEPA: Provable And Scalable Self-Supervised Learning Without The Heuristics

Source

Core Claim

LeJEPA argues that JEPA embeddings should follow an isotropic Gaussian distribution and introduces SIGReg to enforce that distribution efficiently.

Key Contributions

  • Provides a theory for the optimal embedding distribution for downstream prediction risk.
  • Introduces Sketched Isotropic Gaussian Regularization (SIGReg).
  • Combines JEPA predictive loss with SIGReg to reduce reliance on stop-gradient, EMA, teacher-student, and scheduler heuristics.
  • Validates across many datasets, architectures, and domains.

Method Notes

LeJEPA is central to JEPA, Representation Collapse, and Self-Supervised Representation Learning.

Evidence And Results

The source reports broad empirical validation, stable training across architectures and domains, and ImageNet-1k linear evaluation examples for large ViT models.

Limitations

The paper’s strongest claim is generality. The wiki should test that claim against multimodal and control-specific sources such as VL-JEPA and LeWorldModel.

Open Questions

  • Can SIGReg remain sufficient at frontier multimodal scale?
  • Is the isotropic Gaussian target universally optimal or domain-dependent?