Convergent Evolution: How Different Language Models Learn Similar Number Representations

Source

Raw Markdown: paper_convergent-evolution-number-representations-2026.md
PDF: paper_convergent-evolution-number-representations-2026.pdf
Preprint: arXiv 2604.20817
Model collection: Hugging Face collection
Gonzo ML discussion: post 5315
Review: ArxivIQ note

Status And Credibility

Recent April 2026 arXiv preprint from a credible academic author team. Treat as important current evidence for number-representation diagnostics, with the usual preprint caveat until peer-review status is known.

Core Claim

Many language models and even raw number-token frequencies show Fourier spikes at periods such as $T = 2, 5, 10$ , but those spikes do not guarantee useful modular number representations. The paper separates spectral convergence, where embeddings have periodic Fourier power, from geometric convergence, where residue classes such as $n mod T$ are linearly separable.

Key Contributions

Shows Fourier spikes across Transformers, non-Transformer LMs, classical word embeddings, and raw number-token frequency distributions.
Proves that Fourier-domain sparsity is necessary but not sufficient for mod- $T$ geometric separability.
Uses controlled 300M-parameter pretraining experiments to test the roles of data, architecture, optimizer, tokenizer, and context.
Shows two routes to geometric convergence: language co-occurrence structure and multi-token addition tasks that force modular subproblems.
Shows single-token addition can leave representations seed- and optimizer-dependent because it does not impose the same modular pressure.

Why It Matters For Number Tokenization

This source is a guardrail for Fourier-number enthusiasm. FoNE intentionally builds Fourier number embeddings; this paper shows why a visible Fourier spectrum alone is not evidence that a model has learned functional numeracy.

For time-series and numeric-feature work, the lesson is broader: representation diagnostics should test usable geometry, not only visible basis structure. A periodic basis can be present because of token frequencies or co-occurrence artifacts while still failing the downstream operation that motivated the basis.

Limitations

The evidence concerns text-number token embeddings and controlled arithmetic training. It does not directly evaluate scalar sensor values, units, missingness, uncertainty, exogenous numeric variables, or action/control intensities in time-series foundation models.

The paper is strongest as a diagnostic and attribution source, not as a direct proposal for a new numeric encoding.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Number tokenization	warning	Fourier spikes can be universal but non-functional; mod- $T$ probes test geometry more directly.	Need TSFM-specific probes over scalar values, units, regimes, and control inputs.
Representation quality	adjacent	Distinguishes spectral structure from linearly usable modular structure.	Need probes tied to forecasting, generation, editing, and action utility.
Benchmark hygiene	warning	Representation-level diagnostics can mistake training-distribution artifacts for learned structure.	Need attribution and ablation protocols for numeric TSFM representations.

Links Into The Wiki

Open Questions

Which TSFM numeric embeddings show only spectral structure, and which expose task-usable geometry?
Do periodic point-wise scalar embeddings help with noisy continuous observations, or mainly with discrete modular arithmetic?
What probes should test whether numeric features preserve units, scale, uncertainty, and intervention intensity?

Alex Open Research Wiki

Explorer

Convergent Evolution: How Different Language Models Learn Similar Number Representations

Convergent Evolution: How Different Language Models Learn Similar Number Representations

Source

Status And Credibility

Core Claim

Key Contributions

Why It Matters For Number Tokenization

Limitations

Foundation TSFM Relevance

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks