Dynamic Chunking For End-To-End Hierarchical Sequence Modeling

Source

Raw Markdown: paper_h-net-2025.md
PDF: paper_h-net-2025.pdf

Core Claim

H-Net replaces the tokenizer-LM-detokenizer pipeline with an end-to-end hierarchical network that learns dynamic content- and context-dependent byte chunking.

Key Contributions

Introduces dynamic chunking learned jointly with the rest of the model.
Builds an explicit hierarchical network over byte-level inputs.
Shows one-stage hierarchy can outperform a compute/data-matched BPE Transformer.
Reports improved scaling with multiple hierarchy stages and strong gains in domains where tokenization heuristics are weak.

Method Notes

H-Net is central to Latent Tokenization and Byte-Level Language Models.

Evidence And Results

The abstract reports improved byte-level language modeling, increased character robustness, meaningful learned chunking, and nearly 4x data-efficiency improvement on DNA sequences relative to baselines.

Limitations

H-Net emphasizes end-to-end chunking, while Bolmo emphasizes practical transfer from existing subword LMs. The two solve different deployment problems.

Links Into The Wiki

Open Questions

How should dynamic chunking scale to multimodal or action-conditioned data?
Can H-Net-like hierarchy inherit capabilities from pretrained subword systems?

Alex Knowledge Base

Explorer

Dynamic Chunking For End-To-End Hierarchical Sequence Modeling

Dynamic Chunking For End-To-End Hierarchical Sequence Modeling

Source

Core Claim

Key Contributions

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks