mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
136 lines
4.9 KiB
Markdown
136 lines
4.9 KiB
Markdown
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||
the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations under the License.
|
||
-->
|
||
|
||
# Improve generation quality with FreeU
|
||
|
||
[[open-in-colab]]
|
||
|
||
The UNet is responsible for denoising during the reverse diffusion process, and there are two distinct features in its architecture:
|
||
|
||
1. Backbone features primarily contribute to the denoising process
|
||
2. Skip features mainly introduce high-frequency features into the decoder module and can make the network overlook the semantics in the backbone features
|
||
|
||
However, the skip connection can sometimes introduce unnatural image details. [FreeU](https://hf.co/papers/2309.11497) is a technique for improving image quality by rebalancing the contributions from the UNet’s skip connections and backbone feature maps.
|
||
|
||
FreeU is applied during inference and it does not require any additional training. The technique works for different tasks such as text-to-image, image-to-image, and text-to-video.
|
||
|
||
In this guide, you will apply FreeU to the [`StableDiffusionPipeline`], [`StableDiffusionXLPipeline`], and [`TextToVideoSDPipeline`]. You need to install Diffusers from source to run the examples below.
|
||
|
||
## StableDiffusionPipeline
|
||
|
||
Load the pipeline:
|
||
|
||
```py
|
||
from diffusers import DiffusionPipeline
|
||
import torch
|
||
|
||
pipeline = DiffusionPipeline.from_pretrained(
|
||
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, safety_checker=None
|
||
).to("cuda")
|
||
```
|
||
|
||
Then enable the FreeU mechanism with the FreeU-specific hyperparameters. These values are scaling factors for the backbone and skip features.
|
||
|
||
```py
|
||
pipeline.enable_freeu(s1=0.9, s2=0.2, b1=1.2, b2=1.4)
|
||
```
|
||
|
||
The values above are from the official FreeU [code repository](https://github.com/ChenyangSi/FreeU) where you can also find [reference hyperparameters](https://github.com/ChenyangSi/FreeU#range-for-more-parameters) for different models.
|
||
|
||
<Tip>
|
||
|
||
Disable the FreeU mechanism by calling `disable_freeu()` on a pipeline.
|
||
|
||
</Tip>
|
||
|
||
And then run inference:
|
||
|
||
```py
|
||
prompt = "A squirrel eating a burger"
|
||
seed = 2023
|
||
image = pipeline(prompt, generator=torch.manual_seed(seed)).images[0]
|
||
image
|
||
```
|
||
|
||
The figure below compares non-FreeU and FreeU results respectively for the same hyperparameters used above (`prompt` and `seed`):
|
||
|
||

|
||
|
||
|
||
Let's see how Stable Diffusion 2 results are impacted:
|
||
|
||
```py
|
||
from diffusers import DiffusionPipeline
|
||
import torch
|
||
|
||
pipeline = DiffusionPipeline.from_pretrained(
|
||
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, safety_checker=None
|
||
).to("cuda")
|
||
|
||
prompt = "A squirrel eating a burger"
|
||
seed = 2023
|
||
|
||
pipeline.enable_freeu(s1=0.9, s2=0.2, b1=1.1, b2=1.2)
|
||
image = pipeline(prompt, generator=torch.manual_seed(seed)).images[0]
|
||
image
|
||
```
|
||
|
||

|
||
|
||
## Stable Diffusion XL
|
||
|
||
Finally, let's take a look at how FreeU affects Stable Diffusion XL results:
|
||
|
||
```py
|
||
from diffusers import DiffusionPipeline
|
||
import torch
|
||
|
||
pipeline = DiffusionPipeline.from_pretrained(
|
||
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16,
|
||
).to("cuda")
|
||
|
||
prompt = "A squirrel eating a burger"
|
||
seed = 2023
|
||
|
||
# Comes from
|
||
# https://wandb.ai/nasirk24/UNET-FreeU-SDXL/reports/FreeU-SDXL-Optimal-Parameters--Vmlldzo1NDg4NTUw
|
||
pipeline.enable_freeu(s1=0.6, s2=0.4, b1=1.1, b2=1.2)
|
||
image = pipeline(prompt, generator=torch.manual_seed(seed)).images[0]
|
||
image
|
||
```
|
||
|
||

|
||
|
||
## Text-to-video generation
|
||
|
||
FreeU can also be used to improve video quality:
|
||
|
||
```python
|
||
from diffusers import DiffusionPipeline
|
||
from diffusers.utils import export_to_video
|
||
import torch
|
||
|
||
model_id = "cerspense/zeroscope_v2_576w"
|
||
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
|
||
|
||
prompt = "an astronaut riding a horse on mars"
|
||
seed = 2023
|
||
|
||
# The values come from
|
||
# https://github.com/lyn-rgb/FreeU_Diffusers#video-pipelines
|
||
pipe.enable_freeu(b1=1.2, b2=1.4, s1=0.9, s2=0.2)
|
||
video_frames = pipe(prompt, height=320, width=576, num_frames=30, generator=torch.manual_seed(seed)).frames[0]
|
||
export_to_video(video_frames, "astronaut_rides_horse.mp4")
|
||
```
|
||
|
||
Thanks to [kadirnar](https://github.com/kadirnar/) for helping to integrate the feature, and to [justindujardin](https://github.com/justindujardin) for the helpful discussions.
|