diffusers

mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Files

Dorsa Rohani c10f875ff0 Add Diffusion Policy for Reinforcement Learning (#9824 )

* enable cpu ability

* model creation + comprehensive testing

* training + tests

* all tests working

* remove unneeded files + clarify docs

* update train tests

* update readme.md

* remove data from gitignore

* undo cpu enabled option

* Update README.md

* update readme

* code quality fixes

* diffusion policy example

* update readme

* add pretrained model weights + doc

* add comment

* add documentation

* add docstrings

* update comments

* update readme

* fix code quality

* Update examples/reinforcement_learning/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/reinforcement_learning/diffusion_policy.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* suggestions + safe globals for weights_only=True

* suggestions + safe weights loading

* fix code quality

* reformat file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

2024-11-02 09:18:44 +05:30

diffusion_policy.py

Add Diffusion Policy for Reinforcement Learning (#9824 )

2024-11-02 09:18:44 +05:30

README.md

Add Diffusion Policy for Reinforcement Learning (#9824 )

2024-11-02 09:18:44 +05:30

run_diffuser_locomotion.py

📄 Renamed File for Better Understanding (#4056 )

2023-07-21 09:08:27 -07:00

README.md

Diffusion-based Policy Learning for RL

diffusion_policy implements Diffusion Policy, a diffusion model that predicts robot action sequences in reinforcement learning tasks.

This example implements a robot control model for pushing a T-shaped block into a target area. The model takes in current state observations as input, and outputs a trajectory of subsequent steps to follow.

To execute the script, run diffusion_policy.py

Diffuser Locomotion

These examples show how to run Diffuser in Diffusers. There are two ways to use the script, run_diffuser_locomotion.py.

The key option is a change of the variable n_guide_steps. When n_guide_steps=0, the trajectories are sampled from the diffusion model, but not fine-tuned to maximize reward in the environment. By default, n_guide_steps=2 to match the original implementation.

You will need some RL specific requirements to run the examples:

pip install -f https://download.pytorch.org/whl/torch_stable.html \
                free-mujoco-py \
                einops \
                gym==0.24.1 \
                protobuf==3.20.1 \
                git+https://github.com/rail-berkeley/d4rl.git \
                mediapy \
                Pillow==9.0.0