LeWorldModel: Stable End-To-End Joint-Embedding Predictive Architecture From Pixels

Source

Raw Markdown: paper_leworldmodel-2026.md
PDF: paper_leworldmodel-2026.pdf

Core Claim

LeWorldModel trains a stable end-to-end JEPA world model from raw pixels using next-embedding prediction and Gaussian-distribution regularization.

Key Contributions

Presents a two-term objective for stable pixel world modeling.
Avoids EMA, pretrained encoders, auxiliary supervision, and multi-loss heuristic stacks.
Uses Gaussian-distributed latent embeddings to prevent collapse.
Reports fast planning and meaningful physical latent structure on control tasks.

Method Notes

LeWorldModel operationalizes ideas from APTAMI, LeJEPA, and World Models.

Evidence And Results

The abstract reports training with about 15M parameters on a single GPU, planning up to 48x faster than foundation-model-based world models, and competitive control performance across 2D and 3D tasks.

Limitations

The paper notes short-horizon planning, offline data coverage, and action-label reliance as remaining limitations.

Links Into The Wiki

Open Questions

Can LeWorldModel scale to long-horizon hierarchical planning?
Can inverse dynamics reduce dependence on explicit action labels?

Alex Knowledge Base

Explorer

LeWorldModel: Stable End-To-End Joint-Embedding Predictive Architecture From Pixels

LeWorldModel: Stable End-To-End Joint-Embedding Predictive Architecture From Pixels

Source

Core Claim

Key Contributions

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks