FoRL

A library for for First-order Reinforcement Learning algorithms.

In a world dominated by Policy Gradients based approaches, we have created a library that attempts to learn policies via First-Order Gradients (FOG). Also known as path-wise gradients or the reparametarization trick. Why? FOGs are knwon to have lower variance which translates more efficient learning but they are also don't perform very well with dicontinuous loss landscapes.

Applications:

Differentiable simulation
Model-based RL
World models

Installation

Tested only on Ubuntu 22.04. Requires Python, conda and an Nvidia GPU with >12GB VRAM.

git clone git@github.com:pairlab/FoRL.git
cd FoRL
conda env create -f environment.yaml
ln -s $CONDA_PREFIX/lib $CONDA_PREFIX/lib64 (hack to get CUDA to work inside conda)
pip install -e .

Examples

Dflex

One of the first differentiable simulations for robotics. First proposed with the SHAC algorithm but is now depricated.

cd scripts
conda activate forl
python train_dflex.py env=dflex_ant

The script is fully configured and usable with hydra.

Warp

The successor of dflex, warp is Nvidia's current effort to create a universal differentiable simulation.

TODO examples

Gym interface

We try to comply with the normnal gym interface but due to the nature of FOG methods, we cannot do that fully. As such we require gym envs passed to our algorithms to:

s, r, d, info = env.step(a) must accept and return PyTorch Tensors and maintain gradients through the funciton
The info dict must contain termination and truncation key-value pairs. Our libtary does not use the d done flag.
env.reset(grad=True) must accept an optional kwarg grad which if true resets the gradient graph but does not reset the enviuronment

Example implementation of this interface

Current algorithms

Short Horizon Actor Critic (SHAC)
Adaptive Horizon Actor Critic (SHAC)

Notes

Due to the nature of GPU acceleration, it is impossible to currently impossible to guarantee deterministic experiments. You can make them "less random" by using seeding(seed, True) but that slows down GPUs.

TODOs

Upgrade python version
Vectorize critic
Try Mish activation - helps
Try stop gradient on actor - hurts
Try regressing values - hurts
Try return normalization - stabler
Tune critic grad norm - higher values seem to help
Verify safe/load
Think about simplified gym interface that is compatible with rl_games
More dflex examples
Add warp support
Add AHAC algorithm
Try smooth reward functions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FoRL

Installation

Examples

Dflex

Warp

Gym interface

Current algorithms

Notes

TODOs

Further references

Files

README.md

Latest commit

History

README.md

File metadata and controls

FoRL

Installation

Examples

Dflex

Warp

Gym interface

Current algorithms

Notes

TODOs

Further references