ManiSkill Baselines

This repository contains unofficial baselines for ManiSkill (more specifically, version 0.5.3). These baselines are heavily tuned so they generally give you better sample efficiency and performance.

Installation

Install all dependencies via mamba or conda by running the following command:

mamba env create -f environment.yml
mamba activate ms

Note: mamba is a drop-in replacement for conda. Feel free to use conda if you prefer it.

Download and link the necessary assets for ManiSkill

python -m mani_skill2.utils.download_asset all # if you need the assets for all tasks
python -m mani_skill2.utils.download_asset ${ENV_ID} # if you only need the assets for one task

which downloads assets to ./data. You may move these assets to any location. Then, add the following line to your ~/.bashrc or ~/.zshrc:

export MS2_ASSET_DIR=<path>/<to>/<data>

and restart your terminal.

TODOs

Benchmark Overview

Task	SAC (state)	SAC (RGBD)	PPO (state)	PPO (RGBD)	Diffusion Policy (state)	Diffusion Policy (RGBD)
PickCube	✅	✅	✅	✅	✅	✅
StackCube	✅		❌		✅	✅
PickSingleYCB	✅		✅
PickSingleEGAD	✅		✅
PickClutterYCB	✅		⚠️
PegInsertionSide	✅		❌		✅	❌
TurnFaucet	✅		✅		⚠️	⚠️
PlugCharger	⚠️		❌
PandaAvoidObstacles	❌		❌
OpenCabinetDrawer	✅		⚠️
OpenCabinetDoor	✅		⚠️
MoveBucket	✅		❌
PushChair	⚠️		⚠️		⚠️	⚠️

✅ = works well
⚠️ = works, but there is still room for improvement
❌ = doesn't work at all
blank = not tested yet

Run Experiments

The following commands should be run under the repo root dir.

SAC

State observation:

python rl/sac_state.py --env-id PickCube-v1 --total-timesteps 500_000
python rl/sac_state.py --env-id StackCube-v1 --total-timesteps 5_000_000
python rl/sac_state.py --env-id PickSingleYCB-v1 --total-timesteps 5_000_000
python rl/sac_state.py --env-id PickSingleEGAD-v1 --total-timesteps 2_000_000
python rl/sac_state.py --env-id PickClutterYCB-v1 --total-timesteps 15_000_000
python rl/sac_state.py --env-id PegInsertionSide-v1 --total-timesteps 10_000_000 --gamma 0.9 --control-mode pd_ee_delta_pose
python rl/sac_state.py --env-id TurnFaucet-v0 --total-timesteps 20_000_000 --gamma 0.95 --control-mode pd_ee_delta_pose
python rl/sac_state.py --env-id PlugCharger-v0 --total-timesteps 15_000_000 --control-mode pd_ee_delta_pose
python rl/sac_state.py --env-id OpenCabinetDrawer_unified-v1 --total-timesteps 3_000_000 --gamma 0.95 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel
python rl/sac_state.py --env-id OpenCabinetDoor_unified-v1 --total-timesteps 5_000_000 --gamma 0.95 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel
python rl/sac_state.py --env-id MoveBucket_unified-v1 --total-timesteps 80_000_000 --gamma 0.9 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000
python rl/sac_state.py --env-id PushChair_unified-v1 --total-timesteps 20_000_000 --gamma 0.9 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000

RGBD observation:

python rl/sac_rgbd.py --env-id PickCube-v1 --total-timesteps 500_000

Notes:

If you want to use Weights and Biases (wandb) to track learning progress, please add --track to your commands.
You can tune --num-envs to get better speed.

PPO

State observation:

python rl/ppo_state.py --env-id PickCube-v1 --total-timesteps 3_000_000
python rl/ppo_state.py --env-id PickSingleYCB-v1 --total-timesteps 50_000_000 --gamma 0.9 --utd 0.025
python rl/ppo_state.py --env-id PickSingleEGAD-v1 --total-timesteps 5_000_000 --utd 0.025
python rl/ppo_state.py --env-id PickClutterYCB-v1 --total-timesteps 50_000_000
python rl/sac_state.py --env-id TurnFaucet-v0 --total-timesteps 20_000_000 --gamma 0.99 --utd 0.025 --control-mode pd_ee_delta_pose
python rl/ppo_state.py --env-id OpenCabinetDrawer_unified-v1 --total-timesteps 30_000_000 --gamma 0.95 --utd 0.025 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000
python rl/ppo_state.py --env-id OpenCabinetDoor_unified-v1 --total-timesteps 50_000_000 --gamma 0.95 --utd 0.025 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000
python rl/ppo_state.py --env-id PushChair_unified-v1 --total-timesteps 20_000_000 --gamma 0.8 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000

RGBD observation:

python rl/ppo_rgbd.py --env-id PickCube-v1 --total-timesteps 5_000_000

Notes:

PPO usually yields worse sample effiency when comapred to SAC.

Diffusion Policy

State observation:

python bc/diffusion_unet.py --env-id PegInsertionSide-v0 --demo-path PATH_TO_MS2_OFFICIAL_DEMO

RGBD observation:

python bc/diffusion_unet_rgbd.py --env-id StackCube-v0 --demo-path PATH_TO_MS2_OFFICIAL_DEMO

Acknowledgments

This codebase is built upon CleanRL repository.

License

This project is licensed under the MIT License - see the LICENSE file for details. Note that the repository relies on third-party code, which is subject to their respective licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bc		bc
env_wrappers		env_wrappers
nets		nets
rl		rl
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ManiSkill Baselines

Installation

TODOs

Benchmark Overview

Run Experiments

SAC

PPO

Diffusion Policy

Acknowledgments

License

About

Releases

Packages

Languages

License

tongzhoumu/ManiSkill_Baselines

Folders and files

Latest commit

History

Repository files navigation

ManiSkill Baselines

Installation

TODOs

Benchmark Overview

Run Experiments

SAC

PPO

Diffusion Policy

Acknowledgments

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages