Contradictions And Open Tensions

Semantic Latents Versus Pixel Fidelity

Reconstruction or Semantics? argues that semantic latent spaces can be more policy-relevant for robotic world models than reconstruction-focused latents. The Prism Hypothesis and Tuna-2 complicate that: Prism frames semantic and pixel encoders as occupying different frequency bands, while Tuna-2 argues that end-to-end pixel embeddings can beat pretrained vision encoders for unified multimodal understanding and generation. The wiki should not collapse these into one rule; the right latent appears task-dependent.

Heuristic-Free JEPA Versus Stabilized Predictive Training

LeJEPA and LeWorldModel emphasize Gaussian/SIGReg-style regularization as a path away from teacher-student, stop-gradient, and schedule heuristics. NEPA still uses causal masking and stop-gradient for next-embedding visual prediction. The open question is whether Gaussian regularization can replace these stabilizers across large-scale vision and multimodal settings.

Tokenizer Removal Has Multiple Incompatible Paths

H-Net learns hierarchical byte chunking end to end, Synergy learns routing over byte-level abstraction, Bolmo byteifies existing subword LMs through distillation, and ConceptMoE compresses token streams into concepts inside an MoE. These are not the same claim. They agree that fixed tokenization is limiting, but disagree on whether the future is byte-level modeling, learned chunking, concept-level compute allocation, or transfer from subword models.

Synthetic Time-Series Data Is Promising But Not Sufficiently Settled

CauKer and ChatTS both use synthetic data to overcome scarcity, but they target different bottlenecks: classification TSFM pretraining versus time-series-language alignment. TimeOmni-1 adds a reasoning-suite requirement, and TimeOmni-VL adds fidelity-preserving time-series/image conversion for generation. The open tension is whether synthetic data quality, reasoning annotations, or representation fidelity is the dominant constraint.

Forecasting Accuracy Versus Reasoning And Control

Eidos argues for latent-space predictive learning for robust forecasting, while TimeOmni-1 targets explicit reasoning and TimeOmni-VL targets unified understanding/generation. These goals overlap but are not interchangeable. A model can be a strong forecaster without being a strong reasoning model, and a reasoning model can fail numerical fidelity.