Tokenizer Transfer
Summary
Tokenizer transfer is the problem of moving model capability across tokenization regimes without retraining from scratch.
What The Wiki Currently Believes
- Bolmo treats byteification as a special case of tokenizer transfer: a subword LM serves as a teacher for a byte-level latent-tokenizer architecture.
Evidence
Bolmo’s main empirical claim is that byte-level LMs can become competitive quickly by exact distillation from existing subword LMs, rather than paying the full cost of from-scratch byte-level pretraining.
Open Questions
- Which capabilities are lost when moving from subword to byte-level latent patches?
- Can tokenizer transfer be combined with learned dynamic chunking rather than fixed byte patches?