Gemini Robotics 1.5

Source

Raw Markdown: paper_gemini-robotics-1-5-2025.md
PDF: paper_gemini-robotics-1-5-2025.pdf
Preprint: arXiv 2510.03342

Core Claim

Gemini Robotics 1.5 is a robot foundation-model family that combines a VLA action model with an embodied-reasoning VLM orchestrator. The system uses language/thinking as context and a subtask-handoff interface for planning, progress checking, and control-input generation.

Method Notes

Gemini Robotics-ER 1.5 is the higher-level embodied reasoning model; Gemini Robotics 1.5 is the VLA/action model.
The VLA model outputs continuous numeric robot control inputs and can emit thinking text when that mode is enabled.
The source is a strong anchor for hierarchical text-conditioned control, but it should not be classified as diffusion or flow unless a future source states the action generator objective explicitly.

Evidence And Limitations

The paper reports multi-embodiment control, Motion Transfer across robot platforms, thinking-mode gains, and an agentic system that combines orchestrator and action model. Limitations include private model availability, bounded safety claims, and the difficulty of separating VLA execution gains from higher-level orchestration gains.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Causal structure, counterfactuals, and control	adjacent	Combines an embodied reasoning model, VLA action model, progress understanding, and continuous numeric robot control inputs, which is an analogy for the digital-world robot action interface.	No passive-to-counterfactual dynamics model, candidate-action future evaluation, or analogous digital telemetry/topology/action API.
Context interface	adjacent	Uses language, images/video, subtask handoffs, and thinking as context for action.	Does not define channel context or general context schemas for multivariate time-series systems.
Benchmarks	warning	Real-robot A/B/n evaluations reduce some variance across tasks and embodiments.	Private model and benchmark details limit reproducibility and TSFM comparability.

Links Into The Wiki

Open Questions

Which task gains require natural-language thinking, and which could be handled by latent or action-space subgoals?
How should this wiki evaluate private robotics models whose architecture details and weights are not fully public?

Alex Open Research Wiki

Explorer

Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer