ConceptMoE: Adaptive Token-To-Concept Compression For Implicit Compute Allocation

Source

Raw Markdown: paper_conceptmoe-2026.md
PDF: paper_conceptmoe-2026.pdf

Core Claim

ConceptMoE improves efficiency and effectiveness by merging semantically similar token sequences into concept representations before expensive MoE computation.

Key Contributions

Introduces learnable token-to-concept chunking based on semantic similarity.
Uses MoE to compare architectures under matched total parameters and activated FLOPs.
Reports improvements on language pretraining, long-context understanding, multimodal benchmarks, and continual conversion.
Reduces attention computation and KV cache requirements at higher compression ratios.

Method Notes

ConceptMoE connects Latent Tokenization with Mixture Of Experts: compression is not only a preprocessing step, but an implicit compute-allocation mechanism.

Evidence And Results

The abstract reports +0.9 language pretraining points, +2.3 long-context points, +0.6 multimodal points, and +5.5 points during continual training conversion under controlled settings.

Limitations

The source does not remove tokenization entirely; it compresses already-tokenized streams into concepts. It should be compared with byte-native methods such as H-Net and Synergy.

Links Into The Wiki

Open Questions

How stable are learned concept boundaries across domains?
Can concept compression be combined with byte-level or pixel-level inputs?

Alex Knowledge Base

Explorer

ConceptMoE: Adaptive Token-To-Concept Compression For Implicit Compute Allocation

ConceptMoE: Adaptive Token-To-Concept Compression For Implicit Compute Allocation

Source

Core Claim

Key Contributions

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks