Tuna-2
Summary
Tuna-2 is a pixel-space unified multimodal model that discards pretrained vision encoder modules.
Role In The Wiki
Tuna-2 is the strongest current counterpoint to semantic-encoder-first multimodal design in the corpus.
Tuna-2 is a pixel-space unified multimodal model that discards pretrained vision encoder modules.
Tuna-2 is the strongest current counterpoint to semantic-encoder-first multimodal design in the corpus.