From e6faf607f71b86bf210cc4eca06555304f445611 Mon Sep 17 00:00:00 2001 From: Sayak Paul Date: Thu, 5 Oct 2023 14:29:00 +0200 Subject: [PATCH] add: entry for DDPO support. (#5250) * add: entry for DDPO support. * move to training * address steven's comments./ --- docs/source/en/_toctree.yml | 2 ++ docs/source/en/training/ddpo.md | 17 +++++++++++++++++ 2 files changed, 19 insertions(+) create mode 100644 docs/source/en/training/ddpo.md diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index d95e553bd3..b8aa71dacb 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -106,6 +106,8 @@ title: Custom Diffusion - local: training/t2i_adapters title: T2I-Adapters + - local: training/ddpo + title: Reinforcement learning training with DDPO title: Training - sections: - local: using-diffusers/other-modalities diff --git a/docs/source/en/training/ddpo.md b/docs/source/en/training/ddpo.md new file mode 100644 index 0000000000..1ec961dfdd --- /dev/null +++ b/docs/source/en/training/ddpo.md @@ -0,0 +1,17 @@ + + +# Reinforcement learning training with DDPO + +You can fine-tune Stable Diffusion on a reward function via reinforcement learning with the 🤗 TRL library and 🤗 Diffusers. This is done with the Denoising Diffusion Policy Optimization (DDPO) algorithm introduced by Black et al. in [Training Diffusion Models with Reinforcement Learning](https://arxiv.org/abs/2305.13301), which is implemented in 🤗 TRL with the [`~trl.DDPOTrainer`]. + +For more information, check out the [`~trl.DDPOTrainer`] API reference and the [Finetune Stable Diffusion Models with DDPO via TRL](https://huggingface.co/blog/trl-ddpo) blog post. \ No newline at end of file