mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
262 lines
10 KiB
Markdown
262 lines
10 KiB
Markdown
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# Quickstart
|
|
|
|
Modular Diffusers is a framework for quickly building flexible and customizable pipelines. At the core of Modular Diffusers are [`ModularPipelineBlocks`] that can be combined with other blocks to adapt to new workflows. The blocks are converted into a [`ModularPipeline`], a friendly user-facing interface for running generation tasks.
|
|
|
|
This guide shows you how to run a modular pipeline, understand its structure, and customize it by modifying the blocks that compose it.
|
|
|
|
## Run a pipeline
|
|
|
|
[`ModularPipeline`] is the main interface for loading, running, and managing modular pipelines.
|
|
|
|
```py
|
|
import torch
|
|
from diffusers import ModularPipeline
|
|
|
|
pipe = ModularPipeline.from_pretrained("Qwen/Qwen-Image")
|
|
pipe.load_components(torch_dtype=torch.bfloat16)
|
|
pipe.to("cuda")
|
|
|
|
image = pipe(
|
|
prompt="cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney",
|
|
).images[0]
|
|
image
|
|
```
|
|
|
|
[`~ModularPipeline.from_pretrained`] uses lazy loading - it reads the configuration to learn where to load each component from, but doesn't actually load the model weights until you call [`~ModularPipeline.load_components`]. This gives you control over when and how components are loaded.
|
|
|
|
Learn more about creating and loading pipelines in the [Creating a pipeline](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#creating-a-pipeline) and [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guides.
|
|
|
|
## Understand the structure
|
|
|
|
A [`ModularPipeline`] has two parts:
|
|
- **State**: the loaded components (models, schedulers, processors) and configuration
|
|
- **Definition**: the [`ModularPipelineBlocks`] that specify inputs, outputs, expected components and computation logic
|
|
|
|
The blocks define *what* the pipeline does. Access them through `pipe.blocks`.
|
|
```py
|
|
print(pipe.blocks)
|
|
```
|
|
|
|
```
|
|
QwenImageAutoBlocks(
|
|
Class: SequentialPipelineBlocks
|
|
|
|
Description: Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks using QwenImage.
|
|
|
|
Supported workflows:
|
|
- `text2image`: requires `prompt`
|
|
- `image2image`: requires `prompt`, `image`
|
|
- `inpainting`: requires `prompt`, `mask_image`, `image`
|
|
- `controlnet_text2image`: requires `prompt`, `control_image`
|
|
...
|
|
|
|
Components:
|
|
text_encoder (`Qwen2_5_VLForConditionalGeneration`)
|
|
vae (`AutoencoderKLQwenImage`)
|
|
transformer (`QwenImageTransformer2DModel`)
|
|
...
|
|
|
|
Sub-Blocks:
|
|
[0] text_encoder (QwenImageAutoTextEncoderStep)
|
|
[1] vae_encoder (QwenImageAutoVaeEncoderStep)
|
|
[2] controlnet_vae_encoder (QwenImageOptionalControlNetVaeEncoderStep)
|
|
[3] denoise (QwenImageAutoCoreDenoiseStep)
|
|
[4] decode (QwenImageAutoDecodeStep)
|
|
)
|
|
```
|
|
|
|
The output returns:
|
|
- The supported workflows (text2image, image2image, inpainting, etc.)
|
|
- The Sub-Blocks it's composed of (text_encoder, vae_encoder, denoise, decode)
|
|
|
|
### Workflows
|
|
|
|
`QwenImageAutoBlocks` is a [`ConditionalPipelineBlocks`], so this pipeline supports multiple workflows and adapts its behavior based on the inputs you provide. For example, if you pass `image` to the pipeline, it runs an image-to-image workflow instead of text-to-image.
|
|
```py
|
|
from diffusers.utils import load_image
|
|
|
|
input_image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true")
|
|
|
|
image = pipe(
|
|
prompt="cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney",
|
|
image=input_image,
|
|
).images[0]
|
|
```
|
|
|
|
Use `get_workflow()` to extract the blocks for a specific workflow.
|
|
```py
|
|
img2img_blocks = pipe.blocks.get_workflow("image2image")
|
|
```
|
|
|
|
Conditional blocks are convenient for users, but their conditional logic adds complexity when customizing or debugging. Extracting a workflow gives you the specific blocks relevant to your workflow, making it easier to work with. Learn more in the [AutoPipelineBlocks](https://huggingface.co/docs/diffusers/modular_diffusers/auto_pipeline_blocks) guide.
|
|
|
|
### Sub-blocks
|
|
|
|
`QwenImageAutoBlocks` is itself composed of smaller blocks: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`. Access them through the `sub_blocks` property.
|
|
|
|
The `doc` property is useful for seeing the full documentation of any block, including its inputs, outputs, and components.
|
|
|
|
```py
|
|
vae_encoder_block = pipe.blocks.sub_blocks["vae_encoder"]
|
|
print(vae_encoder_block.doc)
|
|
```
|
|
|
|
This block can be converted to a pipeline and run on its own with [`~ModularPipelineBlocks.init_pipeline`].
|
|
```py
|
|
vae_encoder_pipe = vae_encoder_block.init_pipeline()
|
|
|
|
# Reuse the VAE we already loaded, we can reuse it with update_components() method
|
|
vae_encoder_pipe.update_components(vae=pipe.vae)
|
|
|
|
# Run just this block
|
|
image_latents = vae_encoder_pipe(image=input_image).image_latents
|
|
print(image_latents.shape)
|
|
```
|
|
|
|
It reuses the VAE from our original pipeline instead of reloading it, keeping memory usage efficient. Learn more in the [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guide.
|
|
|
|
Since blocks are composable, you can modify the pipeline's definition by adding, removing, or swapping blocks to create new workflows. In the next section, we'll add a canny edge detection block to a ControlNet pipeline, so you can pass a regular image instead of a pre-processed canny edge map.
|
|
|
|
## Compose new workflows
|
|
|
|
Let's add a canny edge detection block to a ControlNet pipeline. First, load a pre-built canny block from the Hub (see [Building Custom Blocks](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) to create your own).
|
|
|
|
```py
|
|
from diffusers.modular_pipelines import ModularPipelineBlocks
|
|
|
|
# Load a canny block from the Hub
|
|
canny_block = ModularPipelineBlocks.from_pretrained(
|
|
"diffusers-internal-dev/canny-filtering",
|
|
trust_remote_code=True,
|
|
)
|
|
|
|
print(canny_block.doc)
|
|
```
|
|
```
|
|
class CannyBlock
|
|
|
|
Inputs:
|
|
image (`Union[Image, ndarray]`):
|
|
Image to compute canny filter on
|
|
low_threshold (`int`, *optional*, defaults to 50):
|
|
Low threshold for the canny filter.
|
|
high_threshold (`int`, *optional*, defaults to 200):
|
|
High threshold for the canny filter.
|
|
...
|
|
|
|
Outputs:
|
|
control_image (`PIL.Image`):
|
|
Canny map for input image
|
|
```
|
|
|
|
Use `get_workflow` to extract the ControlNet workflow from [`QwenImageAutoBlocks`].
|
|
|
|
```py
|
|
# Get the controlnet workflow that we want to work with
|
|
blocks = pipe.blocks.get_workflow("controlnet_text2image")
|
|
print(blocks.doc)
|
|
```
|
|
```
|
|
class SequentialPipelineBlocks
|
|
|
|
Inputs:
|
|
prompt (`str`):
|
|
The prompt or prompts to guide image generation.
|
|
control_image (`Image`):
|
|
Control image for ControlNet conditioning.
|
|
...
|
|
```
|
|
It requires control_image as input. After inserting the canny block, the pipeline will accept a regular image instead.
|
|
|
|
```py
|
|
# and insert canny at the beginning
|
|
blocks.sub_blocks.insert("canny", canny_block, 0)
|
|
|
|
# Check the updated structure: CannyBlock is now listed as first sub-block
|
|
print(blocks)
|
|
# Check the updated doc: notice the pipeline now takes "image" as input
|
|
# even though it's a controlnet pipeline, because canny preprocesses it into control_image
|
|
print(blocks.doc)
|
|
```
|
|
```
|
|
class SequentialPipelineBlocks
|
|
|
|
Inputs:
|
|
image (`Union[Image, ndarray]`):
|
|
Image to compute canny filter on
|
|
low_threshold (`int`, *optional*, defaults to 50):
|
|
Low threshold for the canny filter.
|
|
high_threshold (`int`, *optional*, defaults to 200):
|
|
High threshold for the canny filter.
|
|
prompt (`str`):
|
|
The prompt or prompts to guide image generation.
|
|
...
|
|
```
|
|
|
|
Now the pipeline takes `image` as input - the canny block will preprocess it into `control_image` automatically.
|
|
|
|
Create a pipeline from the modified blocks and load a ControlNet model. We use [`ComponentsManager`] to enable CPU offloading for reduced memory usage (learn more in the [ComponentsManager](./components_manager) guide).
|
|
```py
|
|
from diffusers import ComponentsManager
|
|
|
|
manager = ComponentsManager()
|
|
manager.enable_auto_cpu_offload(device="cuda:0")
|
|
|
|
pipeline = blocks.init_pipeline("Qwen/Qwen-Image", components_manager=manager)
|
|
|
|
pipeline.load_components(torch_dtype=torch.bfloat16)
|
|
|
|
# Load the ControlNet model
|
|
controlnet_spec = pipeline.get_component_spec("controlnet")
|
|
controlnet_spec.pretrained_model_name_or_path = "InstantX/Qwen-Image-ControlNet-Union"
|
|
controlnet = controlnet_spec.load(torch_dtype=torch.bfloat16)
|
|
pipeline.update_components(controlnet=controlnet)
|
|
```
|
|
|
|
Now run the pipeline - the canny block preprocesses the image for ControlNet.
|
|
|
|
```py
|
|
from diffusers.utils import load_image
|
|
|
|
prompt = "cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney"
|
|
image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true")
|
|
|
|
output = pipeline(
|
|
prompt=prompt,
|
|
image=image,
|
|
).images[0]
|
|
output
|
|
```
|
|
|
|
## Next steps
|
|
|
|
<hfoptions id="next">
|
|
<hfoption id="Build custom blocks">
|
|
|
|
Learn how to create your own blocks with custom logic in the [Building Custom Blocks](./custom_blocks) guide.
|
|
|
|
</hfoption>
|
|
<hfoption id="Share components">
|
|
|
|
Use [`ComponentsManager`](./components_manager) to share models across multiple pipelines and manage memory efficiently.
|
|
|
|
</hfoption>
|
|
<hfoption id="Visual interface">
|
|
|
|
Connect modular pipelines to [Mellon](https://github.com/cubiq/Mellon), a visual node-based interface for building workflows. Custom blocks built with Modular Diffusers work out of the box with Mellon - no UI code required. Read more in Mellon guide.
|
|
|
|
</hfoption>
|
|
</hfoptions>
|