From e6faf607f71b86bf210cc4eca06555304f445611 Mon Sep 17 00:00:00 2001
From: Sayak Paul <spsayakpaul@gmail.com>
Date: Thu, 5 Oct 2023 14:29:00 +0200
Subject: [PATCH] add: entry for DDPO support. (#5250)

* add: entry for DDPO support.

* move to training

* address steven's comments./
---
 docs/source/en/_toctree.yml     |  2 ++
 docs/source/en/training/ddpo.md | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)
 create mode 100644 docs/source/en/training/ddpo.md

diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
index d95e553bd3..b8aa71dacb 100644
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -106,6 +106,8 @@
       title: Custom Diffusion
     - local: training/t2i_adapters
       title: T2I-Adapters
+    - local: training/ddpo
+      title: Reinforcement learning training with DDPO
     title: Training
   - sections:
     - local: using-diffusers/other-modalities
diff --git a/docs/source/en/training/ddpo.md b/docs/source/en/training/ddpo.md
new file mode 100644
index 0000000000..1ec961dfdd
--- /dev/null
+++ b/docs/source/en/training/ddpo.md
@@ -0,0 +1,17 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Reinforcement learning training with DDPO
+
+You can fine-tune Stable Diffusion on a reward function via reinforcement learning with the 🤗 TRL library and 🤗 Diffusers. This is done with the Denoising Diffusion Policy Optimization (DDPO) algorithm introduced by Black et al. in [Training Diffusion Models with Reinforcement Learning](https://arxiv.org/abs/2305.13301), which is implemented in 🤗 TRL with the [`~trl.DDPOTrainer`].
+
+For more information, check out the [`~trl.DDPOTrainer`] API reference and the [Finetune Stable Diffusion Models with DDPO via TRL](https://huggingface.co/blog/trl-ddpo) blog post.
\ No newline at end of file