GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Source

Raw Markdown: paper_gr00t-n1-2025.md
PDF: paper_gr00t-n1-2025.pdf
Preprint: arXiv 2503.14734
Official code: github.com/NVIDIA/Isaac-GR00T
Official dataset: PhysicalAI-Robotics-GR00T-X-Embodiment-Sim

Core Claim

GR00T N1 is an open humanoid VLA policy with a dual-system design: a vision-language module interprets observations and instructions, while a Diffusion Transformer trained with action flow matching generates high-frequency continuous motor actions.

Method Notes

The source explicitly separates slower semantic processing from faster action generation.
System 2 is a VLM for image/language interpretation; System 1 is a DiT-style action module conditioned on VLM outputs and robot state.
The model is a control-input generator over short trajectories, not a future-observation world model.

Evidence And Limitations

The source reports simulation and real GR-1 humanoid evaluations, cross-embodiment data use, and public release artifacts. Limitations include bounded task suites, post-training requirements for specific embodiments, and safety constraints that must be handled outside the released checkpoint.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Causal structure, counterfactuals, and control	partially closes	Generates short-horizon motor trajectories conditioned on observations, instructions, VLM outputs, and robot state, and provides a fast/slow control-architecture analogy for digital-world robots.	It is an action generator, not a future-observation world model for comparing candidate interventions or an observability/digital-system state model.
Multi-modal future distributions	adjacent	Uses action flow matching / diffusion-style trajectory generation for continuous control.	Does not expose calibrated multiple future system states for planning under uncertainty.

Links Into The Wiki

Open Questions

How much does actionless-video pretraining help once pseudo-actions introduce labeling error?
Can the System 2/System 1 split become a standard interface for humanoid fast/slow control?

Alex Open Research Wiki

Explorer

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Source

Core Claim

Method Notes

Evidence And Limitations

Foundation TSFM Relevance

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks