mirror of https://github.com/huggingface/diffusers.git synced 2026-01-29 07:22:12 +03:00

Files

Quentin Gallouédec c8bb1ff53e Use HF Papers (#11567 )

* Use HF Papers

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

2025-05-19 06:22:33 -10:00

1.2 KiB

Raw Permalink Blame History

Reinforcement learning training with DDPO

You can fine-tune Stable Diffusion on a reward function via reinforcement learning with the 🤗 TRL library and 🤗 Diffusers. This is done with the Denoising Diffusion Policy Optimization (DDPO) algorithm introduced by Black et al. in Training Diffusion Models with Reinforcement Learning, which is implemented in 🤗 TRL with the [~trl.DDPOTrainer].

For more information, check out the [~trl.DDPOTrainer] API reference and the Finetune Stable Diffusion Models with DDPO via TRL blog post.

1.2 KiB Raw Permalink Blame History

Reinforcement learning training with DDPO

1.2 KiB

Raw Permalink Blame History