Convergent Evolution: How Different Language Models Learn Similar Number Representations

Source

Status And Credibility

Recent April 2026 arXiv preprint from a credible academic author team. Treat as important current evidence for number-representation diagnostics, with the usual preprint caveat until peer-review status is known.

Core Claim

Many language models and even raw number-token frequencies show Fourier spikes at periods such as , but those spikes do not guarantee useful modular number representations. The paper separates spectral convergence, where embeddings have periodic Fourier power, from geometric convergence, where residue classes such as are linearly separable.

Key Contributions

  • Shows Fourier spikes across Transformers, non-Transformer LMs, classical word embeddings, and raw number-token frequency distributions.
  • Proves that Fourier-domain sparsity is necessary but not sufficient for mod- geometric separability.
  • Uses controlled 300M-parameter pretraining experiments to test the roles of data, architecture, optimizer, tokenizer, and context.
  • Shows two routes to geometric convergence: language co-occurrence structure and multi-token addition tasks that force modular subproblems.
  • Shows single-token addition can leave representations seed- and optimizer-dependent because it does not impose the same modular pressure.

Why It Matters For Number Tokenization

This source is a guardrail for Fourier-number enthusiasm. FoNE intentionally builds Fourier number embeddings; this paper shows why a visible Fourier spectrum alone is not evidence that a model has learned functional numeracy.

For time-series and numeric-feature work, the lesson is broader: representation diagnostics should test usable geometry, not only visible basis structure. A periodic basis can be present because of token frequencies or co-occurrence artifacts while still failing the downstream operation that motivated the basis.

Limitations

The evidence concerns text-number token embeddings and controlled arithmetic training. It does not directly evaluate scalar sensor values, units, missingness, uncertainty, exogenous numeric variables, or action/control intensities in time-series foundation models.

The paper is strongest as a diagnostic and attribution source, not as a direct proposal for a new numeric encoding.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Number tokenizationwarningFourier spikes can be universal but non-functional; mod- probes test geometry more directly.Need TSFM-specific probes over scalar values, units, regimes, and control inputs.
Representation qualityadjacentDistinguishes spectral structure from linearly usable modular structure.Need probes tied to forecasting, generation, editing, and action utility.
Benchmark hygienewarningRepresentation-level diagnostics can mistake training-distribution artifacts for learned structure.Need attribution and ablation protocols for numeric TSFM representations.

Open Questions

  • Which TSFM numeric embeddings show only spectral structure, and which expose task-usable geometry?
  • Do periodic point-wise scalar embeddings help with noisy continuous observations, or mainly with discrete modular arithmetic?
  • What probes should test whether numeric features preserve units, scale, uncertainty, and intervention intensity?