Gemini Robotics 1.5

Source

Core Claim

Gemini Robotics 1.5 is a robot foundation-model family that combines a VLA action model with an embodied-reasoning VLM orchestrator. The system uses language/thinking as context and a subtask-handoff interface for planning, progress checking, and control-input generation.

Method Notes

  • Gemini Robotics-ER 1.5 is the higher-level embodied reasoning model; Gemini Robotics 1.5 is the VLA/action model.
  • The VLA model outputs continuous numeric robot control inputs and can emit thinking text when that mode is enabled.
  • The source is a strong anchor for hierarchical text-conditioned control, but it should not be classified as diffusion or flow unless a future source states the action generator objective explicitly.

Evidence And Limitations

The paper reports multi-embodiment control, Motion Transfer across robot platforms, thinking-mode gains, and an agentic system that combines orchestrator and action model. Limitations include private model availability, bounded safety claims, and the difficulty of separating VLA execution gains from higher-level orchestration gains.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Causal structure, counterfactuals, and controladjacentCombines an embodied reasoning model, VLA action model, progress understanding, and continuous numeric robot control inputs, which is an analogy for the digital-world robot action interface.No passive-to-counterfactual dynamics model, candidate-action future evaluation, or analogous digital telemetry/topology/action API.
Context interfaceadjacentUses language, images/video, subtask handoffs, and thinking as context for action.Does not define channel context or general context schemas for multivariate time-series systems.
BenchmarkswarningReal-robot A/B/n evaluations reduce some variance across tasks and embodiments.Private model and benchmark details limit reproducibility and TSFM comparability.

Open Questions

  • Which task gains require natural-language thinking, and which could be handled by latent or action-space subgoals?
  • How should this wiki evaluate private robotics models whose architecture details and weights are not fully public?