---
source_type: official_company_writeup
title: "Helix: A Vision-Language-Action Model for Generalist Humanoid Control"
publisher: Figure AI
published: 2025-02-20
retrieved: 2026-05-17
url: https://www.figure.ai/news/helix
---

# Helix: A Vision-Language-Action Model for Generalist Humanoid Control

## Provenance

This raw source note records the official Figure AI technical writeup for Helix. The source is a company-published web article rather than an arXiv paper, so this file preserves the verified source URL, publication date, and extracted technical facts needed by the wiki source page.

## Extracted Technical Facts

- Helix is presented as a generalist humanoid Vision-Language-Action model for natural-language-conditioned robot actions.
- The official writeup describes a System 2 / System 1 architecture for whole upper-body control.
- System 2 is described as a 7B open-source/open-weight internet-pretrained VLM that runs at 7-9 Hz for scene understanding and language comprehension.
- System 1 is described as an 80M cross-attention encoder-decoder visuomotor Transformer that outputs continuous upper-body control inputs at 200 Hz.
- Observations include monocular robot images and robot state such as wrist pose and finger positions.
- Actions/control inputs include wrist poses, finger flexion and abduction, torso and head orientation targets, plus a synthetic task-completion action.
- Training is described as end-to-end from raw pixels and text commands to continuous actions using a standard regression loss.
- Deployment is described as asynchronous model-parallel inference on dual embedded GPUs: System 2 updates a shared latent representation while System 1 runs the high-rate control loop.
- Figure reports about 500 hours of teleoperated training trajectories, hindsight language labels generated by a VLM, qualitative full-upper-body control, multi-robot collaboration, and novel-object generalization.

## Caveats

- The source is not a peer-reviewed paper and does not publish model weights, datasets, detailed benchmark protocols, ablations, or failure-rate statistics.
- Vendor claims such as first demonstrations and commercial readiness should be recorded as Figure's claims unless independently verified.