Latent Tokenization

Summary

Latent tokenization is the broader pattern of replacing fixed tokens with learned chunks, concepts, or abstraction levels.

What The Wiki Currently Believes

H-Net learns content- and context-dependent chunking inside a hierarchical network.
Synergy learns a routing mechanism that bridges byte-level and higher-level abstraction.
ConceptMoE merges semantically similar token sequences into concept representations before the expensive concept model.

Evidence

These papers suggest segmentation is no longer just preprocessing. It can become a differentiable compute-allocation and abstraction problem inside the model.

Open Questions

Should learned tokenization be byte-native, token-compressive, or concept-level?
How should learned chunking interact with attention, KV cache, and MoE routing?

Byte-Level Language Models
Mixture Of Experts

Alex Knowledge Base

Explorer

Latent Tokenization

Latent Tokenization

Summary

What The Wiki Currently Believes

Evidence

Open Questions

Graph View

Table of Contents

Backlinks

Alex Knowledge Base

Explorer

Latent Tokenization

Latent Tokenization

Summary

What The Wiki Currently Believes

Evidence

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks