Joint Embedding Predictive Architecture

Summary

JEPA is the wiki’s central pattern for learning by predicting in representation space instead of reconstructing raw observations or generating tokens.

What The Wiki Currently Believes

A Path Towards Autonomous Machine Intelligence frames JEPA as a building block for predictive world models and hierarchical planning.
Introduction to Latent Variable Energy-Based Models presents H-JEPA as a hierarchical stack of joint embedding predictors for multi-level prediction under uncertainty.
LeJEPA argues that JEPA needs a target embedding distribution, specifically an isotropic Gaussian, and proposes SIGReg as a scalable way to enforce it.
LeWorldModel applies JEPA to action-conditioned pixel world modeling with a two-term objective.
VL-JEPA extends the idea to vision-language learning by predicting target text embeddings rather than autoregressive text tokens.

Evidence

The source set shows JEPA moving from architecture proposal to theory, then to domain-specific systems: autonomous intelligence in APTAMI, lecture-note grounding in LVEBM, theory and regularization in LeJEPA, pixel control in LeWorldModel, and vision-language tasks in VL-JEPA.

Open Questions

Can SIGReg-style Gaussian regularization replace stop-gradient and teacher-student stabilizers at very large multimodal scale?
Which domains require latent variables beyond deterministic embeddings?

Alex Knowledge Base

Explorer

Joint Embedding Predictive Architecture

Joint Embedding Predictive Architecture

Summary

What The Wiki Currently Believes

Evidence

Open Questions

Graph View

Table of Contents

Backlinks

Alex Knowledge Base

Explorer

Joint Embedding Predictive Architecture

Joint Embedding Predictive Architecture

Summary

What The Wiki Currently Believes

Evidence

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks