1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-29 07:22:12 +03:00
Files
diffusers/docs/source/en/modular_diffusers/overview.md
2025-06-24 10:17:35 +02:00

8.5 KiB

Overview

The Modular Diffusers Framework consists of three main components:

ModularPipelineBlocks

Pipeline blocks are the fundamental building blocks of the Modular Diffusers system. All pipeline blocks inherit from the base class ModularPipelineBlocks, including:

To use a ModularPipelineBlocks officially supported in 🧨 Diffusers

>>> from diffusers.modular_pipelines.stable_diffusion_xl import StableDiffusionXLTextEncoderStep
>>> text_encoder_block = StableDiffusionXLTextEncoderStep()

Each [ModularPipelineBlocks] defines its requirement for components, configs, inputs, intermediate inputs, and outputs. You'll see that this text encoder block uses text_encoders, tokenizers as well as a guider component. It takes user inputs such as prompt and negative_prompt, and return a list of conditional text embeddings.

>>> text_encoder_block
StableDiffusionXLTextEncoderStep(
  Class: PipelineBlock
  Description: Text Encoder step that generate text_embeddings to guide the image generation
    Components:
        text_encoder (`CLIPTextModel`)
        text_encoder_2 (`CLIPTextModelWithProjection`)
        tokenizer (`CLIPTokenizer`)
        tokenizer_2 (`CLIPTokenizer`)
        guider (`ClassifierFreeGuidance`)
    Configs:
        force_zeros_for_empty_prompt (default: True)
  Inputs:
    prompt=None, prompt_2=None, negative_prompt=None, negative_prompt_2=None, cross_attention_kwargs=None, clip_skip=None
  Intermediates:
    - outputs: prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds
)

Pipeline blocks are essentially "definitions" - they define the specifications and computational steps for a pipeline. However, they do not contain any model states, and are not runnable until converted into a ModularPipeline object.

Read more about how to write your own ModularPipelineBlocks here

PipelineState & BlockState

PipelineState and BlockState manage dataflow between pipeline blocks. PipelineState acts as the global state container that ModularPipelineBlocks operate on - each block gets a local view (BlockState) of the relevant variables it needs from PipelineState, performs its operations, and then updates PipelineState with any changes.

You typically don't need to manually create or manage these state objects. The ModularPipeline automatically creates and manages them for you. However, understanding their roles is important for developing custom pipeline blocks.

ModularPipeline

ModularPipeline is the main interface to create and execute pipelines in the Modular Diffusers system.

Modular Repo

ModularPipeline only works with modular repositories. You can find an example modular repo here.

The main differences from standard diffusers repositories are:

  1. modular_model_index.json vs model_index.json

In standard model_index.json, each component entry is a (library, class) tuple:

"text_encoder": [
  "transformers",
  "CLIPTextModel"
],

In modular_model_index.json, each component entry contains 3 elements: (library, class, loading_specs {})

  • library and class: Information about the actual component loaded in the pipeline at the time of saving (can be None if not loaded)
  • loading_specs: A dictionary containing all information required to load this component, including repo, revision, subfolder, variant, and type_hint
"text_encoder": [
  null,  # library (same as model_index.json)
  null,  # class (same as model_index.json)
  {      # loading specs map (unique to modular_model_index.json)
    "repo": "stabilityai/stable-diffusion-xl-base-1.0",  # can be a different repo
    "revision": null,
    "subfolder": "text_encoder",
    "type_hint": [  # (library, class) for the expected component class
      "transformers",  
      "CLIPTextModel"
    ],
    "variant": null
  }
],
  1. Cross-Repository Component Loading

Unlike standard repositories where components must be in subfolders within the same repo, modular repositories can fetch components from different repositories based on the loading_specs dictionary. e.g. the text_encoder component will be fetched from the "text_encoder" folder in stabilityai/stable-diffusion-xl-base-1.0 while other components come from different repositories.

Create a ModularPipeline from ModularPipelineBlocks

Each ModularPipelineBlocks has an init_pipeline method that can initialize a ModularPipeline object based on its component and configuration specifications.

>>> pipeline = blocks.init_pipeline(pretrained_model_name_or_path)

💡 We recommend using ModularPipeline with Component Manager by passing a components_manager:

>>> components = ComponentsManager()
>>> pipeline = blocks.init_pipeline(pretrained_model_name_or_path, components_manager=components)

This helps you to:

  1. Detect and manage duplicated models (warns when trying to register an existing model)
  2. Easily reuse components across different pipelines
  3. Apply offloading strategies across multiple pipelines

You can read more about Components Manager here

Unlike DiffusionPipeline, you need to explicitly load model components using load_components:

>>> pipeline.load_components(torch_dtype=torch.float16)
>>> pipeline.to(device)

You can partially load specific components using the component_names argument, for example to only load unet and vae:

>>> pipeline.load_components(component_names=["unet", "vae"])

💡 You can inspect the pipeline's config attribute (which contains the same structure as modular_model_index.json we just walked through) to check the "loading status" of the pipeline, e.g. what components this pipeline expects to load and their loading specs, what components are already loaded and their actual class & loading specs etc.

Load a ModularPipeline from hub

You can create a ModularPipeline from a HuggingFace Hub repository with from_pretrained method, as long as it's a modular repo:

pipeline = ModularPipeline.from_pretrained(repo_id, components_manager=..., collection=...)

Loading custom code is also supported:

diffdiff_pipeline = ModularPipeline.from_pretrained(repo_id, trust_remote_code=True, ...)

Similar to init_pipeline method, the modular pipeline will not load any components automatically, so you will have to call load_components to explicitly load the components you need.

Execute a ModularPipeline

The API to run the ModularPipeline is very similar to how you would run a regular DiffusionPipeline:

>>> image = pipeline(prompt="a cat", num_inference_steps=15, output="images")[0]

There are a few key differences though:

  1. You can also pass a PipelineState object directly to the pipeline instead of individual arguments
  2. If you do not specify the output argument, it returns the PipelineState object
  3. You can pass a list as output, e.g. pipeline(... output=["images", "latents"]) will return a dictionary containing both the generated image and the final denoised latents

Under the hood, ModularPipeline's __call__ method is a wrapper around the pipeline blocks' __call__ method: it creates a PipelineState object and populates it with user inputs, then returns the output to the user based on the output argument. It also ensures that all pipeline-level config and components are exposed to all pipeline blocks by preparing and passing a components input.

Save a ModularPipeline

To save a ModularPipeline and publish it to hub:

pipeline.save_pretrained("YiYiXu/modular-loader-t2i", push_to_hub=True) 

We do not automatically save custom code and share it on hub for you. Please read more about how to share your custom pipeline on hub [here](TODO: ModularPipeline/CustomCode)