VL-JEPA
Summary
VL-JEPA is a vision-language model that predicts continuous target-text embeddings instead of autoregressively generating text tokens.
Role In The Wiki
VL-JEPA extends JEPA-style representation prediction to general-domain vision-language tasks and selective decoding.