Self-Supervised Representation Learning

Summary

The wiki’s SSL thread compares scaled visual representation learning with predictive embedding objectives that avoid raw reconstruction.

What The Wiki Currently Believes

DINOv3 is the scaled vision-foundation-model reference point, with strong dense features and broad frozen transfer.
LeJEPA argues for a theory-grounded JEPA objective with SIGReg.
NEPA shows next-embedding prediction can make strong vision learners without pixels, tokens, contrastive loss, or task-specific heads.
VL-JEPA applies predictive embedding learning to vision-language tasks.

Evidence

The corpus suggests a spectrum from large-scale SSL systems to simpler predictive objectives. DINOv3 shows the value of scale and careful training; LeJEPA and NEPA ask whether the objective itself can be simpler and more principled.

Open Questions

Which predictive objective best preserves dense spatial structure?
How much of DINOv3’s performance comes from scale versus objective design?

JEPA
Vision Foundation Models

Alex Knowledge Base

Explorer

Self-Supervised Representation Learning

Self-Supervised Representation Learning

Summary

What The Wiki Currently Believes

Evidence

Open Questions

Graph View

Table of Contents

Backlinks

Alex Knowledge Base

Explorer

Self-Supervised Representation Learning

Self-Supervised Representation Learning

Summary

What The Wiki Currently Believes

Evidence

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks