Tuna-2

Summary

Tuna-2 is a pixel-space unified multimodal model that discards pretrained vision encoder modules.

Role In The Wiki

Tuna-2 is the strongest current counterpoint to semantic-encoder-first multimodal design in the corpus.

Evidence