Perception Encoder

Summary

Perception Encoder is a Meta vision-encoder family trained around scaled contrastive vision-language learning, then specialized through language and spatial alignment.

Role In The Wiki

Perception Encoder is the clearest current source for the claim that a strong visual encoder’s best reusable features may be internal rather than final-layer outputs. It complements Guillotine Regularization: Guillotine names the layer-cutting effect in SSL/projector settings, while Perception Encoder shows the same pattern in a large contrastive vision-language system and uses alignment tuning to expose the hidden features.

Evidence

Official Artifacts