Tokenizer Transfer

Summary

Tokenizer transfer is the problem of moving model capability across tokenization regimes without retraining from scratch.

What The Wiki Currently Believes

  • Bolmo treats byteification as a special case of tokenizer transfer: a subword LM serves as a teacher for a byte-level latent-tokenizer architecture.

Evidence

Bolmo’s main empirical claim is that byte-level LMs can become competitive quickly by exact distillation from existing subword LMs, rather than paying the full cost of from-scratch byte-level pretraining.

Open Questions

  • Which capabilities are lost when moving from subword to byte-level latent patches?
  • Can tokenizer transfer be combined with learned dynamic chunking rather than fixed byte patches?