mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
132 lines
5.7 KiB
Markdown
132 lines
5.7 KiB
Markdown
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
[[open-in-colab]]
|
|
|
|
# Basic performance
|
|
|
|
Diffusion is a random process that is computationally demanding. You may need to run the [`DiffusionPipeline`] several times before getting a desired output. That's why it's important to carefully balance generation speed and memory usage in order to iterate faster,
|
|
|
|
This guide recommends some basic performance tips for using the [`DiffusionPipeline`]. Refer to the Inference Optimization section docs such as [Accelerate inference](./optimization/fp16) or [Reduce memory usage](./optimization/memory) for more detailed performance guides.
|
|
|
|
## Memory usage
|
|
|
|
Reducing the amount of memory used indirectly speeds up generation and can help a model fit on device.
|
|
|
|
The [`~DiffusionPipeline.enable_model_cpu_offload`] method moves a model to the CPU when it is not in use to save GPU memory.
|
|
|
|
```py
|
|
import torch
|
|
from diffusers import DiffusionPipeline
|
|
|
|
pipeline = DiffusionPipeline.from_pretrained(
|
|
"stabilityai/stable-diffusion-xl-base-1.0",
|
|
torch_dtype=torch.bfloat16,
|
|
device_map="cuda"
|
|
)
|
|
pipeline.enable_model_cpu_offload()
|
|
|
|
prompt = """
|
|
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
|
|
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
|
|
"""
|
|
pipeline(prompt).images[0]
|
|
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
|
|
```
|
|
|
|
## Inference speed
|
|
|
|
Denoising is the most computationally demanding process during diffusion. Methods that optimizes this process accelerates inference speed. Try the following methods for a speed up.
|
|
|
|
- Add `device_map="cuda"` to place the pipeline on a GPU. Placing a model on an accelerator, like a GPU, increases speed because it performs computations in parallel.
|
|
- Set `torch_dtype=torch.bfloat16` to execute the pipeline in half-precision. Reducing the data type precision increases speed because it takes less time to perform computations in a lower precision.
|
|
|
|
```py
|
|
import torch
|
|
import time
|
|
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
|
|
|
|
pipeline = DiffusionPipeline.from_pretrained(
|
|
"stabilityai/stable-diffusion-xl-base-1.0",
|
|
torch_dtype=torch.bfloat16,
|
|
device_map="cuda
|
|
)
|
|
```
|
|
|
|
- Use a faster scheduler, such as [`DPMSolverMultistepScheduler`], which only requires ~20-25 steps.
|
|
- Set `num_inference_steps` to a lower value. Reducing the number of inference steps reduces the overall number of computations. However, this can result in lower generation quality.
|
|
|
|
```py
|
|
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
|
|
|
|
prompt = """
|
|
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
|
|
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
|
|
"""
|
|
|
|
start_time = time.perf_counter()
|
|
image = pipeline(prompt).images[0]
|
|
end_time = time.perf_counter()
|
|
|
|
print(f"Image generation took {end_time - start_time:.3f} seconds")
|
|
```
|
|
|
|
## Generation quality
|
|
|
|
Many modern diffusion models deliver high-quality images out-of-the-box. However, you can still improve generation quality by trying the following.
|
|
|
|
- Try a more detailed and descriptive prompt. Include details such as the image medium, subject, style, and aesthetic. A negative prompt may also help by guiding a model away from undesirable features by using words like low quality or blurry.
|
|
|
|
```py
|
|
import torch
|
|
from diffusers import DiffusionPipeline
|
|
|
|
pipeline = DiffusionPipeline.from_pretrained(
|
|
"stabilityai/stable-diffusion-xl-base-1.0",
|
|
torch_dtype=torch.bfloat16,
|
|
device_map="cuda"
|
|
)
|
|
|
|
prompt = """
|
|
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
|
|
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
|
|
"""
|
|
negative_prompt = "low quality, blurry, ugly, poor details"
|
|
pipeline(prompt, negative_prompt=negative_prompt).images[0]
|
|
```
|
|
|
|
For more details about creating better prompts, take a look at the [Prompt techniques](./using-diffusers/weighted_prompts) doc.
|
|
|
|
- Try a different scheduler, like [`HeunDiscreteScheduler`] or [`LMSDiscreteScheduler`], that gives up generation speed for quality.
|
|
|
|
```py
|
|
import torch
|
|
from diffusers import DiffusionPipeline, HeunDiscreteScheduler
|
|
|
|
pipeline = DiffusionPipeline.from_pretrained(
|
|
"stabilityai/stable-diffusion-xl-base-1.0",
|
|
torch_dtype=torch.bfloat16,
|
|
device_map="cuda"
|
|
)
|
|
pipeline.scheduler = HeunDiscreteScheduler.from_config(pipeline.scheduler.config)
|
|
|
|
prompt = """
|
|
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
|
|
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
|
|
"""
|
|
negative_prompt = "low quality, blurry, ugly, poor details"
|
|
pipeline(prompt, negative_prompt=negative_prompt).images[0]
|
|
```
|
|
|
|
## Next steps
|
|
|
|
Diffusers offers more advanced and powerful optimizations such as [group-offloading](./optimization/memory#group-offloading) and [regional compilation](./optimization/fp16#regional-compilation). To learn more about how to maximize performance, take a look at the Inference Optimization section. |