Quickstart

Modular Diffusers is a framework for quickly building flexible and customizable pipelines. At the core of Modular Diffusers are [ModularPipelineBlocks] that can be combined with other blocks to adapt to new workflows. The blocks are converted into a [ModularPipeline], a friendly user-facing interface for running generation tasks.

This guide shows you how to run a modular pipeline, understand its structure, and customize it by modifying the blocks that compose it.

Run a pipeline

[ModularPipeline] is the main interface for loading, running, and managing modular pipelines.

import torch
from diffusers import ModularPipeline

pipe = ModularPipeline.from_pretrained("Qwen/Qwen-Image")
pipe.load_components(torch_dtype=torch.bfloat16)
pipe.to("cuda")

image = pipe(
    prompt="A cat astronaut floating in space",
).images[0]
image

[~ModularPipeline.from_pretrained] uses lazy loading - it reads the configuration and knows where to load each component from, but doesn't actually load the model weights until you call [~ModularPipeline.load_components]. This gives you control over when and how components are loaded.

Learn more about creating and loading pipelines in the Creating a pipeline and Loading components guides.

Understand the structure

The pipeline you loaded from "Qwen/Qwen-Image" is built from a [ModularPipelineBlocks] called QwenImageAutoBlocks. Print it to see its structure.

print(pipe.blocks)

class QwenImageAutoBlocks

  Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks.

  Supported workflows:
    - `text2image`: requires `prompt`
    - `image2image`: requires `prompt`, `image`
    - `inpainting`: requires `prompt`, `mask_image`, `image`
    - `controlnet_text2image`: requires `prompt`, `control_image`
    ...

  Sub-blocks:
    - text_encoder: QwenImageTextEncoderStep
    - vae_encoder: QwenImageAutoVaeEncoderStep
    - denoise: QwenImageAutoCoreDenoiseStep
    - decode: QwenImageAutoDecodeStep

From this output you can see two things:

It supports multiple workflows (text2image, image2image, inpainting, etc.)
It's composed of sub_blocks (text_encoder, vae_encoder, denoise, decode)

Workflows

This pipeline supports multiple workflows and adapts its behavior based on the inputs you provide. For example, if you pass image to the pipeline, it runs an image-to-image workflow instead of text-to-image.

from diffusers.utils import load_image

input_image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true")

image = pipe(
    prompt="A cat astronaut floating in space",
    image=input_image,
).images[0]

Learn more about conditional blocks in the AutoPipelineBlocks guide.

Use get_workflow() to extract the blocks for a specific workflow.

img2img_blocks = pipe.blocks.get_workflow("image2image")

Sub-blocks

Blocks are the building blocks of the modular system. They are definitions that specify the inputs, outputs, and computation logic for a step - and they can be composed together in different ways.

Let's take a look at the vae_encoder block as an example. Use the doc property to see the full documentation for any block, including its inputs, outputs, and components.

vae_encoder_block = pipe.blocks.sub_blocks["vae_encoder"]
print(vae_encoder_block.doc)

Just like QwenImageAutoBlocks, this block can be converted to a pipeline and run on its own.

vae_encoder_pipe = vae_encoder_block.init_pipeline()

# Reuse the VAE we already loaded, we can reuse it with update_componenets() method
vae_encoder_pipe.update_components(vae=pipe.vae)

# Run just this block
image_latents = vae_encoder_pipe(image=input_image).image_latents
print(image_latents.shape)

This reuses the VAE from our original pipeline instead of loading it again, keeping memory usage efficient. Learn more in the Loading components guide.

You can also add new blocks to compose new workflows. Let's add a canny edge detection block to create a ControlNet pipeline.

First, load the canny block from the Hub and insert it into the controlnet workflow. If you want to learn how to create your own custom blocks and share them on the Hub, check out the Building Custom Blocks guide.

from diffusers.modular_pipelines import ModularPipelineBlocks

# Load a canny block from the Hub
canny_block = ModularPipelineBlocks.from_pretrained(
    "diffusers-internal-dev/canny-filtering",
    trust_remote_code=True,
)

# Get the controlnet workflow and insert canny at the beginning
blocks = pipe.blocks.get_workflow("controlnet_text2image")
blocks.sub_blocks.insert("canny", canny_block, 0)

# Check the updated structure - notice the pipeline now takes "image" as input
# even though it's a controlnet pipeline, because canny preprocesses it into control_image
print(blocks.doc)

Create a pipeline from the modified blocks and load a ControlNet model.

pipeline = blocks.init_pipeline("Qwen/Qwen-Image")
pipeline.load_components(torch_dtype=torch.bfloat16)

# Load the ControlNet model
controlnet_spec = pipeline.get_component_spec("controlnet")
controlnet_spec.pretrained_model_name_or_path = "InstantX/Qwen-Image-ControlNet-Union"
controlnet = controlnet_spec.load(torch_dtype=torch.bfloat16)
pipeline.update_components(controlnet=controlnet)
pipeline.to("cuda")

Now run the pipeline - the canny block preprocesses the image for ControlNet.

from diffusers.utils import load_image

prompt = "cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney"
image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true")

output = pipeline(
    prompt=prompt,
    image=image,
).images[0]
output

Next steps

Learn how to create your own blocks with custom logic in the Building Custom Blocks guide.

Use ComponentsManager to share models across multiple pipelines and manage memory efficiently.

Connect modular pipelines to Mellon, a visual node-based interface for building workflows. Custom blocks built with Modular Diffusers work out of the box with Mellon - no UI code required. Read more in Mellon guide

7.4 KiB Raw Blame History