Learning Sim-to-Real Humanoid Locomotion in 15 Minutes

Amazon FAR (Frontier AI & Robotics)
* Equal contribution

We provide a simple recipe with FastSAC and FastTD3 for rapid sim-to-real humanoid iterations


Abstract

Massively parallel simulation has reduced reinforcement learning (RL) training time for robots from days to minutes. However, achieving fast and reliable sim-to-real RL for humanoid control remains difficult due to the challenges introduced by factors such as high dimensionality and domain randomization. In this work, we introduce a simple and practical recipe based on off-policy RL algorithms, i.e., FastSAC and FastTD3, that enables rapid training of humanoid locomotion policies in just 15 minutes with a single RTX 4090 GPU. Our simple recipe stabilizes off-policy RL algorithms at massive scale with thousands of parallel environments through carefully tuned design choices and minimalist reward functions. We demonstrate rapid end-to-end learning of humanoid locomotion controllers on Unitree G1 and Booster T1 robots under strong domain randomization, e.g., randomized dynamics, rough terrain, and push perturbations, as well as fast training of whole-body human-motion tracking policies. An open-source implementation of our recipe is available at Holosoma repository.


Sim-to-Real RL Recipe with off-policy algorithms: FastSAC and FastTD3

FastSAC and FastTD3 are high-performance variants of popular off-policy RL algorithms, SAC and TD3, optimized for large-scale training with parallel simulation, introduced in prior work. In this work, we introduce a simple and practical recipe that further scales up FastSAC and FastTD3 to enable rapid sim-to-real iteration for humanoid control, in particular training whole-body humanoid control policies for locomotion with full joints or motion tracking policies.


Sim-to-Real Humanoid Locomotion in 15 Minutes

We train full-fledged humanoid locomotion policies with randomized dynamics, rough terrain, push perturbations, and an automatic action-rate curriculum, all end-to-end, in just 15 minutes on a single RTX 4090 GPU. All locomotion videos are recorded using the checkpoints trained for 15 minutes.

G1 Walking

G1 Side Walking

G1 Turning

T1 Walking

T1 Side Walking

T1 Turning


Push Perturbations

Our policies, trained only with 15 minutes, can stably stand and walk, and is robust to push perturbations.

G1 Push Perturbation

T1 Push Perturbation


Whole-Body Tracking

We demonstrate whole-body tracking capabilities with various challenging motions.

Box Lifting

Dancing

Push