Perception Encoder
Summary
Perception Encoder is a Meta vision-encoder family trained around scaled contrastive vision-language learning, then specialized through language and spatial alignment.
Role In The Wiki
Perception Encoder is the clearest current source for the claim that a strong visual encoder’s best reusable features may be internal rather than final-layer outputs. It complements Guillotine Regularization: Guillotine names the layer-cutting effect in SSL/projector settings, while Perception Encoder shows the same pattern in a large contrastive vision-language system and uses alignment tuning to expose the hidden features.