Tokenizer Transfer

Summary

Tokenizer transfer is the problem of moving model capability across tokenization regimes without retraining from scratch.

What The Wiki Currently Believes

Bolmo treats byteification as a special case of tokenizer transfer: a subword LM serves as a teacher for a byte-level latent-tokenizer architecture.

Evidence

Bolmo’s main empirical claim is that byte-level LMs can become competitive quickly by exact distillation from existing subword LMs, rather than paying the full cost of from-scratch byte-level pretraining.

Open Questions

Which capabilities are lost when moving from subword to byte-level latent patches?
Can tokenizer transfer be combined with learned dynamic chunking rather than fixed byte patches?

Byte-Level Language Models
Latent Tokenization

Alex Knowledge Base

Explorer

Tokenizer Transfer

Tokenizer Transfer

Summary

What The Wiki Currently Believes

Evidence

Open Questions

Graph View

Table of Contents

Backlinks

Alex Knowledge Base

Explorer

Tokenizer Transfer

Tokenizer Transfer

Summary

What The Wiki Currently Believes

Evidence

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks