add a guide on components manager

2026-01-29 07:22:12 +03:00 · 2025-07-08 06:16:26 +02:00
parent 863c7df543
commit a2da0004ee
3 changed files with 506 additions and 398 deletions
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -96,6 +96,8 @@
 - sections:
  - local: modular_diffusers/getting_started
    title: Getting Started
+  - local: modular_diffusers/components_manager
+    title: Components Manager
  - local: modular_diffusers/write_own_pipeline_block
    title: Write your own pipeline block
  - local: modular_diffusers/end_to_end_guide
--- a/docs/source/en/modular_diffusers/components_manager.md
+++ b/docs/source/en/modular_diffusers/components_manager.md
@@ -0,0 +1,504 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Components Manager
+
+The Components Manager is a central model registry and management system in diffusers. It lets you add models then reuse them across multiple pipelines and workflows. It tracks all models in one place with useful metadata such as model size, device placement and loaded adapters (LoRA, IP-Adapter). It has mechanisms in place to prevent duplicate model instances, enables memory-efficient sharing. Most significantly, it offers offloading that works across pipelines — unlike regular DiffusionPipeline offloading which is limited to one pipeline with predefined sequences, the Components Manager automatically manages your device memory across all your models and workflows. 
+
+
+## Basic Operations
+
+Let's start with the fundamental operations. First, create a Components Manager:
+
+```py
+from diffusers import ComponentsManager
+comp = ComponentsManager()
+```
+
+Use the `add(name, component)` method to register a component. It returns a unique ID that combines the component name with the object's unique identifier (using Python's `id()` function):
+
+```py
+from diffusers import AutoModel
+text_encoder = AutoModel.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="text_encoder")
+# Returns component_id like 'text_encoder_139917733042864'
+component_id = comp.add("text_encoder", text_encoder)
+```
+
+You can view all registered components and their metadata:
+
+```py
+>>> comp
+Components:
+===============================================================================================================================================
+Models:
+-----------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID                      | Class                     | Device: act(exec)    | Dtype           | Size (GB)  | Load ID         | Collection
+-----------------------------------------------------------------------------------------------------------------------------------------------
+text_encoder_139917733042864 | CLIPTextModel             | cpu                  | torch.float32   | 0.46       | N/A             | N/A
+-----------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+And remove components using their unique ID:
+
+```py
+comp.remove("text_encoder_139917733042864")
+```
+
+## Duplicate Detection
+
+The Components Manager automatically detects and prevents duplicate model instances to save memory and avoid confusion. Let's walk through how this works in practice.
+
+When you try to add the same object twice, the manager will warn you and return the existing ID:
+
+```py
+>>> comp.add("text_encoder", text_encoder)
+'text_encoder_139917733042864'
+>>> comp.add("text_encoder", text_encoder)
+ComponentsManager: component 'text_encoder' already exists as 'text_encoder_139917733042864'
+'text_encoder_139917733042864'
+```
+
+Even if you add the same object under a different name, it will still be detected as a duplicate:
+
+```py
+>>> comp.add("clip", text_encoder)
+ComponentsManager: adding component 'clip' as 'clip_139917733042864', but it is duplicate of 'text_encoder_139917733042864'
+To remove a duplicate, call `components_manager.remove('<component_id>')`.
+'clip_139917733042864'
+```
+
+However, there's a more subtle case where duplicate detection becomes tricky. When you load the same model into different objects, the manager can't detect duplicates unless you use `ComponentSpec`. For example:
+
+```py
+>>> text_encoder_2 = AutoModel.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="text_encoder")
+>>> comp.add("text_encoder", text_encoder_2)
+'text_encoder_139917732983664'
+```
+
+This creates a problem - you now have two copies of the same model consuming double the memory:
+
+```py
+>>> comp
+Components:
+===============================================================================================================================================
+Models:
+-----------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID                      | Class                     | Device: act(exec)    | Dtype           | Size (GB)  | Load ID         | Collection
+-----------------------------------------------------------------------------------------------------------------------------------------------
+text_encoder_139917733042864 | CLIPTextModel             | cpu                  | torch.float32   | 0.46       | N/A             | N/A
+clip_139917733042864         | CLIPTextModel             | cpu                  | torch.float32   | 0.46       | N/A             | N/A
+text_encoder_139917732983664 | CLIPTextModel             | cpu                  | torch.float32   | 0.46       | N/A             | N/A
+-----------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+We recommend using `ComponentSpec` to load your models. Models loaded with `ComponentSpec` get tagged with a unique ID that encodes their loading parameters, allowing the Components Manager to detect when different objects represent the same underlying checkpoint:
+
+```py
+from diffusers import ComponentSpec, ComponentsManager
+from transformers import CLIPTextModel
+comp = ComponentsManager()
+
+# Create ComponentSpec for the first text encoder
+spec = ComponentSpec(name="text_encoder", repo="stabilityai/stable-diffusion-xl-base-1.0", subfolder="text_encoder", type_hint=AutoModel)
+# Create ComponentSpec for a duplicate text encoder (it is same checkpoint, from same repo/subfolder)
+spec_duplicated = ComponentSpec(name="text_encoder_duplicated", repo="stabilityai/stable-diffusion-xl-base-1.0", subfolder="text_encoder", type_hint=CLIPTextModel)
+
+# Load and add both components - the manager will detect they're the same model
+comp.add("text_encoder", spec.load())
+comp.add("text_encoder_duplicated", spec_duplicated.load())
+```
+
+Now the manager detects the duplicate and warns you:
+
+```out
+ComponentsManager: adding component 'text_encoder_duplicated_139917580682672', but it has duplicate load_id 'stabilityai/stable-diffusion-xl-base-1.0|text_encoder|null|null' with existing components: text_encoder_139918506246832. To remove a duplicate, call `components_manager.remove('<component_id>')`.
+'text_encoder_duplicated_139917580682672'
+```
+
+Both models now show the same `load_id`, making it clear they're the same model:
+
+```py
+>>> comp
+Components:
+======================================================================================================================================================================================================
+Models:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID                             | Class                     | Device: act(exec)    | Dtype           | Size (GB)  | Load ID                                                         | Collection
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+text_encoder_139918506246832        | CLIPTextModel             | cpu                  | torch.float32   | 0.46       | stabilityai/stable-diffusion-xl-base-1.0|text_encoder|null|null | N/A
+text_encoder_duplicated_139917580682672 | CLIPTextModel             | cpu                  | torch.float32   | 0.46       | stabilityai/stable-diffusion-xl-base-1.0|text_encoder|null|null | N/A
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+## Collections
+
+Collections are labels you can assign to components for better organization and management. You add a component under a collection by passing the `collection=` parameter when you add the component to the manager, i.e. `add(name, component, collection=...)`. Within each collection, only one component per name is allowed - if you add a second component with the same name, the first one is automatically removed.
+
+Here's how collections work in practice:
+
+```py
+comp = ComponentsManager()
+# Create ComponentSpec for the first UNet (SDXL base)
+spec = ComponentSpec(name="unet", repo="stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", type_hint=AutoModel)
+# Create ComponentSpec for a different UNet (Juggernaut-XL)
+spec2 = ComponentSpec(name="unet", repo="RunDiffusion/Juggernaut-XL-v9", subfolder="unet", type_hint=AutoModel, variant="fp16")
+
+# Add both UNets to the same collection - the second one will replace the first
+comp.add("unet", spec.load(), collection="sdxl")
+comp.add("unet", spec2.load(), collection="sdxl")
+```
+
+The manager automatically removes the old UNet and adds the new one:
+
+```out
+ComponentsManager: removing existing unet from collection 'sdxl': unet_139917723891888
+'unet_139917723893136'
+```
+
+Only one UNet remains in the collection:
+
+```py
+>>> comp
+Components:
+====================================================================================================================================================================
+Models:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID              | Class                     | Device: act(exec)    | Dtype           | Size (GB)  | Load ID                                      | Collection
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
+unet_139917723893136 | UNet2DConditionModel      | cpu                  | torch.float32   | 9.56       | RunDiffusion/Juggernaut-XL-v9|unet|fp16|null | sdxl
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+For example, in node-based systems, you can mark all models loaded from one node with the same collection label, automatically replace models when user loads new checkpoints under same name, batch delete all models in a collection when a node is removed.
+
+## Retrieving Components
+
+The Components Manager provides several methods to retrieve registered components.
+
+The `get_one()` method returns a single component and supports pattern matching for the `name` parameter. You can use:
+- exact matches like `comp.get_one(name="unet")`
+- wildcards like `comp.get_one(name="unet*")` for components starting with "unet"
+- exclusion patterns like `comp.get_one(name="!unet")` to exclude components named "unet"
+- OR patterns like `comp.get_one(name="unet|vae")` to match either "unet" OR "vae". 
+
+You can also filter by collection with `comp.get_one(name="unet", collection="sdxl")` or by load_id. If multiple components match, `get_one()` throws an error.
+
+Another useful method is `get_components_by_names()`, which takes a list of names and returns a dictionary mapping names to components. This is particularly helpful with modular pipelines since they provide lists of required component names, and the returned dictionary can be directly passed to `pipeline.update_components()`.
+
+```py
+# Get components by name list
+component_dict = comp.get_components_by_names(names=["text_encoder", "unet", "vae"])
+# Returns: {"text_encoder": component1, "unet": component2, "vae": component3}
+```
+
+## Using Components Manager with Modular Pipelines
+
+The Components Manager integrates seamlessly with Modular Pipelines. All you need to do is pass a Components Manager instance to `from_pretrained()` or `init_pipeline()` with an optional `collection` parameter:
+
+```py
+from diffusers import ModularPipeline, ComponentsManager
+comp = ComponentsManager()
+pipe = ModularPipeline.from_pretrained("YiYiXu/modular-demo-auto", components_manager=comp, collection="test1")
+```
+
+By default, modular pipelines don't load components immediately, so both the pipeline and Components Manager start empty:
+
+```py
+>>> comp
+Components:
+==================================================
+No components registered.
+==================================================
+```
+
+When you load components on the pipeline, they are automatically registered in the Components Manager:
+
+```py
+>>> pipe.load_components(names="unet")
+>>> comp
+Components:
+==============================================================================================================================================================
+Models:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID              | Class                     | Device: act(exec)    | Dtype           | Size (GB)  | Load ID                                | Collection
+--------------------------------------------------------------------------------------------------------------------------------------------------------------
+unet_139917726686304 | UNet2DConditionModel      | cpu                  | torch.float32   | 9.56       | SG161222/RealVisXL_V4.0|unet|null|null | test1
+--------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+Now let's load all default components and then create a second pipeline that reuses all components from the first one. We pass the same Components Manager to the second pipeline but with a different collection:
+
+```py
+# Load all default components 
+>>> pipe.load_default_components()`
+
+# Create a second pipeline using the same Components Manager but with a different collection
+>>> pipe2 = ModularPipeline.from_pretrained("YiYiXu/modular-demo-auto", components_manager=comp, collection="test2")
+```
+
+As mentioned earlier, `ModularPipeline` has a property `null_component_names` that returns a list of component names it needs to load. We can conveniently use this list with the `get_components_by_names` method on the Components Manager:
+
+```py
+# Get the list of components that pipe2 needs to load
+>>> pipe2.null_component_names 
+['text_encoder', 'text_encoder_2', 'tokenizer', 'tokenizer_2', 'image_encoder', 'unet', 'vae', 'scheduler', 'controlnet']
+
+# Retrieve all required components from the Components Manager
+>>> comp_dict = comp.get_components_by_names(names=pipe2.null_component_names)
+
+# Update the pipeline with the retrieved components
+>>> pipe2.update_components(**comp_dict)
+```
+
+The warnings that follow are expected and indicate that the Components Manager is correctly identifying that these components already exist and will be reused rather than creating duplicates:
+
+```
+ComponentsManager: component 'text_encoder' already exists as 'text_encoder_139917586016400'
+ComponentsManager: component 'text_encoder_2' already exists as 'text_encoder_2_139917699973424'
+ComponentsManager: component 'tokenizer' already exists as 'tokenizer_139917580599504'
+ComponentsManager: component 'tokenizer_2' already exists as 'tokenizer_2_139915763443904'
+ComponentsManager: component 'image_encoder' already exists as 'image_encoder_139917722468304'
+ComponentsManager: component 'unet' already exists as 'unet_139917580609632'
+ComponentsManager: component 'vae' already exists as 'vae_139917722459040'
+ComponentsManager: component 'scheduler' already exists as 'scheduler_139916266559408'
+ComponentsManager: component 'controlnet' already exists as 'controlnet_139917722454432'
+```
+```
+
+The pipeline is now fully loaded:
+
+```py
+# null_component_names return empty list, meaning everything are loaded
+>>> pipe2.null_component_names
+[]
+```
+
+No new components were added to the Components Manager - we're reusing everything. All models are now associated with both `test1` and `test2` collections, showing that these components are shared across multiple pipelines:
+```py
+>>> comp
+Components:
+========================================================================================================================================================================================
+Models:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID                        | Class                         | Device: act(exec)    | Dtype           | Size (GB)  | Load ID                                            | Collection
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+text_encoder_139917586016400   | CLIPTextModel                 | cpu                  | torch.float32   | 0.46       | SG161222/RealVisXL_V4.0|text_encoder|null|null     | test1
+                               |                               |                      |                 |            |                                                    | test2
+text_encoder_2_139917699973424 | CLIPTextModelWithProjection   | cpu                  | torch.float32   | 2.59       | SG161222/RealVisXL_V4.0|text_encoder_2|null|null   | test1
+                               |                               |                      |                 |            |                                                    | test2
+unet_139917580609632           | UNet2DConditionModel          | cpu                  | torch.float32   | 9.56       | SG161222/RealVisXL_V4.0|unet|null|null             | test1
+                               |                               |                      |                 |            |                                                    | test2
+controlnet_139917722454432     | ControlNetModel               | cpu                  | torch.float32   | 4.66       | diffusers/controlnet-canny-sdxl-1.0|null|null|null | test1
+                               |                               |                      |                 |            |                                                    | test2
+vae_139917722459040            | AutoencoderKL                 | cpu                  | torch.float32   | 0.31       | SG161222/RealVisXL_V4.0|vae|null|null              | test1
+                               |                               |                      |                 |            |                                                    | test2
+image_encoder_139917722468304  | CLIPVisionModelWithProjection | cpu                  | torch.float32   | 6.87       | h94/IP-Adapter|sdxl_models/image_encoder|null|null | test1
+                               |                               |                      |                 |            |                                                    | test2
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Other Components:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ID                             | Class                         | Collection
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+tokenizer_139917580599504      | CLIPTokenizer                 | test1
+                               |                               | test2
+scheduler_139916266559408      | EulerDiscreteScheduler        | test1
+                               |                               | test2
+tokenizer_2_139915763443904    | CLIPTokenizer                 | test1
+                               |                               | test2
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+
+## Automatic Memory Management
+
+The Components Manager provides a global offloading strategy across all models, regardless of which pipeline is using them:
+
+```py
+comp.enable_auto_cpu_offload(device="cuda")
+```
+
+When enabled, all models start on CPU. The manager moves models to the device right before they're used and moves other models back to CPU when GPU memory runs low. You can set your own rules for which models to offload first. This works smoothly as you add or remove components. Once it's on, you don't need to worry about device placement - you can focus on your workflow.
+
+
+
+## Practical Example: Building Modular Workflows with Component Reuse
+
+Now that we've covered the basics of the Components Manager, let's walk through a practical example that shows how to build workflows in a modular setting and use the Components Manager to reuse components across multiple pipelines. This example demonstrates the true power of Modular Diffusers by working with multiple pipelines that can share components. 
+
+In this example, we'll generate latents from a text-to-image pipeline, then refine them with an image-to-image pipeline. We will also use Lora and IP-Adapter.
+
+Let's create a modular text-to-image workflow by separating it into three components: `text_blocks` for encoding prompts, `t2i_blocks` for generating latents, and `decoder_blocks` for creating final images.
+
+```py
+import torch
+from diffusers.modular_pipelines import SequentialPipelineBlocks
+from diffusers.modular_pipelines.stable_diffusion_xl import ALL_BLOCKS
+
+# Create modular blocks and separate text encoding and decoding steps
+t2i_blocks = SequentialPipelineBlocks.from_blocks_dict(ALL_BLOCKS["text2img"])
+text_blocks = t2i_blocks.sub_blocks.pop("text_encoder")
+decoder_blocks = t2i_blocks.sub_blocks.pop("decode")
+```
+
+Now we will convert them into runnalbe pipelines and set up the Components Manager with auto offloading and organize components under a "t2i" collection:
+
+```py
+from diffusers import ComponentsManager, ModularPipeline
+
+# Set up Components Manager with auto offloading
+components = ComponentsManager()
+components.enable_auto_cpu_offload(device="cuda")
+
+# Create pipelines and load components
+t2i_repo = "YiYiXu/modular-demo-auto"
+t2i_loader_pipe = ModularPipeline.from_pretrained(t2i_repo, components_manager=components, collection="t2i")
+
+text_node = text_blocks.init_pipeline(t2i_repo, components_manager=components)
+decoder_node = decoder_blocks.init_pipeline(t2i_repo, components_manager=components)
+t2i_pipe = t2i_blocks.init_pipeline(t2i_repo, components_manager=components)
+```
+
+Load all components into the Components Manager under the "t2i" collection:
+
+```py
+# Load all components (including IP-Adapter and ControlNet for later use)
+t2i_loader_pipe.load_components(names=t2i_loader_pipe.pretrained_component_names, torch_dtype=torch.float16)
+```
+
+Now distribute the loaded components to each pipeline:
+
+```py
+# Get VAE for decoder (using get_one since there's only one)
+vae = components.get_one(load_id="SG161222/RealVisXL_V4.0|vae|null|null")
+decoder_node.update_components(vae=vae)
+
+# Get text components for text node (using get_components_by_names for multiple components)
+text_components = components.get_components_by_names(text_node.null_component_names)
+text_node.update_components(**text_components)
+
+# Get remaining components for t2i pipeline
+t2i_components = components.get_components_by_names(t2i_pipe.null_component_names)
+t2i_pipe.update_components(**t2i_components)
+```
+
+Now we can generate images using our modular workflow:
+
+```py
+# Generate text embeddings
+prompt = "an astronaut"
+text_embeddings = text_node(prompt=prompt, output=["prompt_embeds","negative_prompt_embeds", "pooled_prompt_embeds", "negative_pooled_prompt_embeds"])
+
+# Generate latents and decode to image
+generator = torch.Generator(device="cuda").manual_seed(0)
+latents_t2i = t2i_pipe(**text_embeddings, num_inference_steps=25, generator=generator, output="latents")
+image = decoder_node(latents=latents_t2i, output="images")[0]
+image.save("modular_part2_t2i.png")
+```
+
+Let's add a LoRA:
+
+```py
+# Load LoRA weights - only the UNet gets the adapter
+>>> t2i_loader_pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy_face")
+>>> components
+Components:
+============================================================================================================================================================
+...
+Additional Component Info:
+==================================================
+
+unet:
+  Adapters: ['toy_face']
+```
+
+You can see that the Components Manager tracks adapters metadata for all models it manages, and in our case, only Unet has lora loaded. This means we can reuse existing text embeddings. 
+
+```py
+# Generate with LoRA (reusing existing text embeddings)
+generator = torch.Generator(device="cuda").manual_seed(0)
+latents_lora = t2i_pipe(**text_embeddings, num_inference_steps=25, generator=generator, output="latents")
+image = decoder_node(latents=latents_lora, output="images")[0]
+image.save("modular_part2_lora.png")
+```
+
+
+Now let's create a refiner pipeline that reuses components from our text-to-image workflow:
+
+```py
+# Create refiner blocks (removing image_encoder and decode since we work with latents)
+refiner_blocks = SequentialPipelineBlocks.from_blocks_dict(ALL_BLOCKS["img2img"])
+refiner_blocks.sub_blocks.pop("image_encoder")
+refiner_blocks.sub_blocks.pop("decode")
+
+# Create refiner pipeline with different repo and collection
+refiner_repo = "YiYiXu/modular_refiner"
+refiner_pipe = refiner_blocks.init_pipeline(refiner_repo, components_manager=components, collection="refiner")
+```
+
+We pass the **same Components Manager** (`components`) to the refiner pipeline, but with a **different collection** (`"refiner"`). This allows the refiner to access and reuse components from the "t2i" collection while organizing its own components (like the refiner UNet) under the "refiner" collection. 
+
+```py
+# Load only the refiner UNet (different from t2i UNet)
+refiner_pipe.load_components(names="unet", torch_dtype=torch.float16)
+
+# Reuse components from t2i pipeline using pattern matching
+reuse_components = components.search_components("text_encoder_2|scheduler|vae|tokenizer_2")
+refiner_pipe.update_components(**reuse_components)
+```
+
+When we reuse components from the "t2i" collection, they automatically get added to the "refiner" collection as well. You can verify this by checking the Components Manager - you'll see components like `vae`, `scheduler`, etc. listed under both collections, indicating they're shared between workflows.
+
+Now we can refine any of our generated latents:
+
+```py
+# Refine all our different latents
+refined_latents = refiner_pipe(image_latents=latents_t2i, prompt=prompt, num_inference_steps=10, output="latents")
+refined_image = decoder_node(latents=refined_latents, output="images")[0]
+refined_image.save("modular_part2_t2i_refine_out.png")
+
+refined_latents = refiner_pipe(image_latents=latents_lora, prompt=prompt, num_inference_steps=10, output="latents")
+refined_image = decoder_node(latents=refined_latents, output="images")[0]
+refined_image.save("modular_part2_lora_refine_out.png")
+```
+
+
+Here are the results from our modular pipeline examples.
+
+#### Base Text-to-Image Generation
+| Base Text-to-Image | Base Text-to-Image (Refined) |
+|-------------------|------------------------------|
+| ![Base T2I](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_t2i.png) | ![Base T2I Refined](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_t2i_refine_out.png) |
+
+#### LoRA
+| LoRA              | LoRA               (Refined) |
+|-------------------|------------------------------|
+| ![LoRA](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_lora.png) | ![LoRA Refined](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_lora_refine_out.png) |
+
--- a/docs/source/en/modular_diffusers/getting_started.md
+++ b/docs/source/en/modular_diffusers/getting_started.md
@@ -1219,402 +1219,4 @@ image = pipeline(
 image.save("modular_ipa_out.png")
 ```

-## Building Advanced Workflows: The Modular Way
-
-We've learned the basic components of the Modular Diffusers System. Now let's tie everything together with more practical example that demonstrates the true power of Modular Diffusers: working between with multiple pipelines that can share components. 
-
-In this example, we'll generate latents from a text-to-image pipeline, then refine them with an image-to-image pipeline. We will use IP-adapter, LoRA, and ControlNet.
-
-### Base Text-to-Image
-
-Let's setup the text-to-image workflow. Instead of putting all blocks into one complete pipeline, we'll create separate `text_blocks` for encoding prompts, `t2i_blocks` for generating latents, and `decoder_blocks` for creating final images.
-
-
-```py
-import torch
-from diffusers.modular_pipelines import SequentialPipelineBlocks
-from diffusers.modular_pipelines.stable_diffusion_xl import ALL_BLOCKS
-
-# create t2i blocks and then pop out the text_encoder step and decoder step so that we can use them in standalone manner
-t2i_blocks = SequentialPipelineBlocks.from_blocks_dict(ALL_BLOCKS["text2img"])
-text_blocks = t2i_blocks.sub_blocks.pop("text_encoder")
-decoder_blocks = t2i_blocks.sub_blocks.pop("decode")
-```
-
-Next, convert them into runnable pipelines. We'll use a Components Manager with auto offloading strategy.
-
-**Components Manager**: Create one manager and pass it to `init_pipeline` along with a collection name. All models loaded by that pipeline will be added to the manager under that collection.
-
-**Auto Offloading**: All components are placed on CPU and only moved to device right before their forward pass. The manager monitors device memory and may move components off-device to make space for new ones. Unlike `DiffusionPipeline.enable_model_cpu_offload()`, this works across all components in the manager and all your workflows.
-
-
-```py
-from diffusers import ComponentsManager
-# Set up component manager and turn on the offloading
-components = ComponentsManager()
-components.enable_auto_cpu_offload(device="cuda")
-```
-
-Since we have a modular setup where different pipelines may share components, we recommend using a seperate `ModularPipeline` to load components all at once and add them to each pipeline with `update_components()`.
-
-
-```py
-from diffusers import ModularPipeline
-t2i_repo = "YiYiXu/modular-demo-auto"
-t2i_loader_pipe = ModularPipeline.from_pretrained(t2i_repo, components_manager=components, collection="t2i")
-
-text_node = text_blocks.init_pipeline(t2i_repo, components_manager=components)
-decoder_node = decoder_blocks.init_pipeline(t2i_repo, components_manager=components)
-t2i_pipe = t2i_blocks.init_pipeline(t2i_repo, components_manager=components)
-```
-
-We'll load components in `t2i_loader_pipe`. You can get the list of all loadable components from loader's `pretrained_component_names` property.
-
-```py
->>> t2i_loader_pipe.pretrained_component_names
-['controlnet', 'image_encoder', 'scheduler', 'text_encoder', 'text_encoder_2', 'tokenizer', 'tokenizer_2', 'unet', 'vae']
-```
-
-It include controlnet and image_encoder for ip-adapter that we don't need now. But I'll load them anyway since they'll stay on CPU and I might use them later. But you can choose what to load in the `names` argument.
-
-```py
-import torch
-# inspect before you load
-# t2i_loader
-t2i_loader_pipe.load_components(names=t2i_loader_pipe.pretrained_component_names, torch_dtype=torch.float16)
-```
-All the models are registered to components manager under the collection "t2i".
-
-```py
->>> components
-Components:
-============================================================================================================================================================
-Models:
------------------------------------------------------------------------------------------------------------------------------------------------------------
-Name           | Class                        | Device: act(exec)| Dtype        | Size (GB)| Load ID                                            | Collection
------------------------------------------------------------------------------------------------------------------------------------------------------------
-vae            | AutoencoderKL                | cpu(cuda:0)      | torch.float16| 0.16     | SG161222/RealVisXL_V4.0|vae|null|null              | t2i
-image_encoder  | CLIPVisionModelWithProjection| cpu(cuda:0)      | torch.float16| 3.44     | h94/IP-Adapter|sdxl_models/image_encoder|null|null | t2i
-text_encoder   | CLIPTextModel                | cpu(cuda:0)      | torch.float16| 0.23     | SG161222/RealVisXL_V4.0|text_encoder|null|null     | t2i
-unet           | UNet2DConditionModel         | cpu(cuda:0)      | torch.float16| 4.78     | SG161222/RealVisXL_V4.0|unet|null|null             | t2i
-text_encoder_2 | CLIPTextModelWithProjection  | cpu(cuda:0)      | torch.float16| 1.29     | SG161222/RealVisXL_V4.0|text_encoder_2|null|null   | t2i
-controlnet     | ControlNetModel              | cpu(cuda:0)      | torch.float16| 2.33     | diffusers/controlnet-canny-sdxl-1.0|null|null|null | t2i
------------------------------------------------------------------------------------------------------------------------------------------------------------
-
-Other Components:
------------------------------------------------------------------------------------------------------------------------------------------------------------
-Name           | Class                        | Collection
------------------------------------------------------------------------------------------------------------------------------------------------------------
-tokenizer_2    | CLIPTokenizer                | t2i
-tokenizer      | CLIPTokenizer                | t2i
-scheduler      | EulerDiscreteScheduler       | t2i
------------------------------------------------------------------------------------------------------------------------------------------------------------
-
-Additional Component Info:
-==================================================
-```
-
-Let's add the loaded components to each pipeline. We'll follow this pattern for each pipeline:
-1. Check what components the pipeline needs: inspect `pipeline` or use `pipeline.null_component_names`
-2. Get them from the components manager: use its `search_models()`/`get_one`/`get_components_from_names` method
-3. Update the pipeline: `pipeline.update_components()`
-4. Verify the components are loaded correctly: inspect `pipeline` as well as components manager
-
-We will start with `decoder_node`. First, check what components it needs:
-
-```py
->>> decoder_node.null_component_names
-['vae']
-```
-The pipeline only needs a `vae`. Looking at the components manager table, there's only one VAE available:
-
-```
-Name | Class        | Device: act(exec)| Dtype        | Size (GB)| Load ID                               | Collection
----------------------------------------------------------------------------------------------------------------------
-vae  | AutoencoderKL| cpu(cuda:0)      | torch.float16| 0.16     | SG161222/RealVisXL_V4.0|vae|null|null | t2i
-```
-Since there's only one VAE, we can get it using its unique Load ID:
-
-```py
-vae = components.get_one(load_id="SG161222/RealVisXL_V4.0|vae|null|null")
-decoder_node.update_components(vae=vae)
-```
-
-Verify it's correctly loaded:
-
-```py
-decoder_node
-```
-Now let's do the same for `text_node`. Get the list of components the pipeline needs to load:
-
-```py
->>> text_node.null_component_names
-['text_encoder', 'text_encoder_2', 'tokenizer', 'tokenizer_2']
-```
-Pass the list directly to the components manager to get the components and add it to the pipeline
-
-```py
-text_components = components.get_components_by_names(text_node.null_component_names)
-# Add components to pipeline
-text_node.update_components(**text_components)
-
-# Verify components are loaded
-assert not text_node.null_component_names
-text_node
-```
-
-Finally, let's set up `t2i_pipe`:
-
-```py
-
-# Get unet & scheduler from components manager and add to pipeline
-comps = components.get_components_by_names(t2i_pipe.null_component_names)
-t2i_pipe.update_components(**comps)
-
-# Verify everything is loaded
-assert not t2i_pipe.null_component_names
-t2i_pipe
-
-# Verify components manager hasn't changed (we only reused existing components)
-components
-```
-
-We can start to generate an image with the t2i pipeline.
-
-First to run the prompt through text_node to get prompt embeddings
-
-<Tip>
-
-💡 don't forget to `text_node.doc` to find out what outputs are available and set the `output` argument accordingly
-
-</Tip>
-
-```py
-prompt = "an astronaut"
-text_embeddings = text_node(prompt=prompt, output=["prompt_embeds","negative_prompt_embeds", "pooled_prompt_embeds", "negative_pooled_prompt_embeds"])
-```
-
-Now generate latents with t2i pipeline and then decode with decoder.
-
-
-```py
-generator = torch.Generator(device="cuda").manual_seed(0)
-latents_t2i = t2i_pipe(**text_embeddings, num_inference_steps=25, generator=generator, output="latents")
-image = decoder_node(latents=latents_t2i, output="images")[0]
-image.save("modular_part2_t2i.png")
-
-```
-
-### Lora
-
-Now let's add a LoRA to our pipeline. With the modular approach we will be able to reuse intermediate outputs from blocks that otherwise needs to be re-run. Let's load the LoRA weights and see what happens:
-
-```py
-t2i_loader_pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy_face")
-components
-```
-Notice that the "Additional Component Info" section shows that only the `unet` component has the LoRA adapter loaded. This means we can skip the text encoding step and reuse the existing embeddings, making the generation much faster.
-
-```out
-Components:
-============================================================================================================================================================
-...
-Additional Component Info:
-==================================================
-
-unet:
-  Adapters: ['toy_face']
-```
-
-
-<Tip>
-
-🔍 Alternatively, you can find a component's ID and then use `get_model_info` to get detailed metadata about that component:
-
-```py
-id = components.get_ids("unet")[0]
-components.get_model_info(id)
-# {'model_id': 'unet_6c2b839d-ec39-4ce9-8741-333ba6d25932', 'added_time': 1751101289.203884, 'collection': 't2i', 'class_name': 'UNet2DConditionModel', 'size_gb': 4.940812595188618, 'adapters': ['toy_face'], 'has_hook': True, 'execution_device': device(type='cuda', index=0)}
-```
-</Tip>
-
-
-```py
-generator = torch.Generator(device="cuda").manual_seed(0)
-latents_lora = t2i_pipe(**text_embeddings, num_inference_steps=25, generator=generator, output="latents")
-image = decoder_node(latents=latents_lora, output="images")[0]
-image.save("modular_part2_lora.png")
-```
-
-### IP-adapter 
-
-IP-adapter can also be used as a standalone pipeline. We can generate the embeddings once and reuse them for different workflows.
-
-```py
-from diffusers.utils import load_image
-
-ipa_blocks = ALL_BLOCKS["ip_adapter"]["ip_adapter"]()
-ipa_node = ipa_blocks.init_pipeline(t2i_repo, components_manager=components)
-comps = components.get_components_by_names(ipa_node.loader.null_component_names)
-ipa_node.update_components(**comps)
-
-t2i_loader_pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
-t2i_loader_pipe.set_ip_adapter_scale(0.6)
-
-# check it's correctly loaded
-assert not ipa_node.null_component_names
-ipa_node
-# find out inputs/outputs 
-print(ipa_node.doc)
-
-ip_adapter_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy/img5.png")
-ipa_embeddings = ipa_node(ip_adapter_image=ip_adapter_image, output=["ip_adapter_embeds","negative_ip_adapter_embeds"])
-
-generator = torch.Generator(device="cuda").manual_seed(0)
-latents_ipa = t2i_pipe(**text_embeddings, **ipa_embeddings, num_inference_steps=25, generator=generator, output="latents")
-
-image = decoder_node(latents=latents_ipa, output="images")[0]
-image.save("modular_part2_lora_ipa.png")
-```
-
-### ControlNet
-
-We can create a new ControlNet workflow by modifying the pipeline blocks, reusing components as much as possible, and see how it affects the generation.
-
-We want to use a different ControlNet from the one that's already loaded.
-
-```py
-from diffusers import ComponentSpec, ControlNetModel
-control_blocks = ALL_BLOCKS["controlnet"]["denoise"]()
-# update the t2i_blocks and create pipeline
-t2i_blocks.sub_blocks["denoise"] = control_blocks
-t2i_control_pipe = t2i_blocks.init_pipeline(t2i_repo, components_manager=components)
-
-# fetch the controlnet_pose seperately since we need to change name when adding it to the pipeline
-controlnet_spec = ComponentSpec(name="controlnet_pose", type_hint=ControlNetModel, repo="thibaud/controlnet-openpose-sdxl-1.0")
-controlnet = controlnet_spec.load(torch_dtype=torch.float16)
-t2i_control_pipe.update_components(controlnet=controlnet)
-
-# fetch the rest of the components from the components manager
-comps = components.get_components_by_names(t2i_control_pipe.loader.null_component_names)
-t2i_control_pipe.update_components(**comps)
-
-control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/controlnet/person_pose.png")
-generator = torch.Generator(device="cuda").manual_seed(0)
-latents_control = t2i_control_pipe(**text_embeddings, **ipa_embeddings, control_image=control_image, num_inference_steps=25, generator=generator, output="latents")
-
-image = decoder_node(latents=latents_control, output="images")[0]
-image.save("modular_part2_lora_ipa_control.png")
-```
-
-
-Now set up refiner workflow. For refiner blocks, we removed `image_encoder` since the refiner works with latents directly, and `decoder` since we already have a dedicated one. We keep `text_encoder` because SDXL refiner encodes text prompts differently from the text-to-image pipeline, so we cannot share it.
-
-```py
-# Create a refiner blocks
-# - removing image_encoder a since we'll use latents from t2i
-# - removing decode since we already created a seperate decoder_block
-refiner_blocks = SequentialPipelineBlocks.from_blocks_dict(ALL_BLOCKS["img2img"])
-refiner_blocks.sub_blocks.pop("image_encoder")
-refiner_blocks.sub_blocks.pop("decode")
-```
-
-### Refiner
-
-Create refiner pipeline. refiner has a different unet and use only one text_encoder so it is hosted in a different repo. We pass the same components manager to refiner pipeline, along with a unique "refiner" collection.
-
-```py
-refiner_repo = "YiYiXu/modular_refiner"
-refiner_pipe = refiner_blocks.init_pipeline(refiner_repo, components_manager=components, collection="refiner")
-```
-
-
-We want to reuse components from the t2i pipeline in the refiner as much as possible. First, let's check the loading status of the refiner pipeline to understand what components are needed:
-
-```py
->>> refiner_pipe
-```
-
-Looking at the loader output, you can see that `text_encoder` and `tokenizer` have empty loading spec maps (their `repo` fields are `null`), this is because refiner pipeline does not use these two components so they are not listed in the `modular_model_index.json` in `refiner_repo`. The `unet` is different from the one we loaded for text-to-image. The remaining components: `vae`, `text_encoder_2`, `tokenizer_2`, and `scheduler` are already available in the t2i collection, we can reuse them instead of loading duplicates.
-
-```py
-refiner_pipe.load_components(names="unet", torch_dtype=torch.float16)
-
-# verify loaded correctly
-refiner_pipe
-
-# veryfiy registered to components manager under refiner
-components
-```
-
-Now let's reuse the components from the t2i pipeline in the refiner. We use the`|` to select multiple components from components manager at once:
-
-```py
-# Reuse components from t2i pipeline (select everything at once)
-reuse_components = components.search_components("text_encoder_2|scheduler|vae|tokenizer_2")
-refiner_pipe.update_components(**reuse_components)
-```
-
-You'll see warnings indicating that these components already exist in the components manager:
-
-```out
-component 'text_encoder_2' already exists as 'text_encoder_2_238ae9a7-c864-4837-a8a2-f58ed753b2d0'
-component 'tokenizer_2' already exists as 'tokenizer_2_b795af3d-f048-4b07-a770-9e8237a2be2d'
-component 'scheduler' already exists as 'scheduler_e3435f63-266a-4427-9383-eb812e830fe8'
-component 'vae' already exists as 'vae_357eee6a-4a06-46f1-be83-494f7d60ca69'
-```
-
-These warnings are expected and indicate that the components manager is correctly identifying that these components are already loaded. The system will reuse the existing components rather than creating duplicates.
-
-Let's check the components manager again to see the updated state. You should see `text_encoder_2`, `vae`, `tokenizer_2`, and `scheduler` now appear under both "t2i" and "refiner" collections.
-
-Now let's refine! 
-
-```py
-# refine the latents from base text-to-image workflow
-refined_latents = refiner_pipe(image_latents=latents_t2i, prompt=prompt, num_inference_steps=10, output="latents")
-refined_image = decoder_node(latents=refined_latents, output="images")[0]
-refined_image.save("modular_part2_t2i_refine_out.png")
-
-# refine the latents from the text-to-image lora workflow
-refined_latents = refiner_pipe(image_latents=latents_lora, prompt=prompt, num_inference_steps=10, output="latents")
-refined_image = decoder_node(latents=refined_latents, output="images")[0]
-refined_image.save("modular_part2_lora_refine_out.png")
-
-# refine the latents from the text-to-image + lora + ip-adapter workflow
-refined_latents = refiner_pipe(image_latents=latents_ipa, prompt=prompt, num_inference_steps=10, output="latents")
-refined_image = decoder_node(latents=refined_latents, output="images")[0]
-refined_image.save("modular_part2_ipa_refine_out.png")
-
-# refine the latents from the text-to-image + lora + ip-adapter + controlnet workflow
-refined_latents = refiner_pipe(image_latents=latents_control, prompt=prompt, num_inference_steps=10, output="latents")
-refined_image = decoder_node(latents=refined_latents, output="images")[0]
-refined_image.save("modular_part2_control_refine_out.png")
-```
-
-
-### Results
-
-Here are the results from our modular pipeline examples.
-
-#### Base Text-to-Image Generation
-| Base Text-to-Image | Base Text-to-Image (Refined) |
-|-------------------|------------------------------|
-| ![Base T2I](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_t2i.png) | ![Base T2I Refined](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_t2i_refine_out.png) |
-
-#### LoRA
-| LoRA              | LoRA               (Refined) |
-|-------------------|------------------------------|
-| ![LoRA](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_lora.png) | ![LoRA Refined](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_lora_refine_out.png) |
-
-#### LoRA + IP-Adapter
-| LoRA + IP-Adapter | LoRA + IP-Adapter (Refined) |
-|-------------------|------------------------------|
-| ![LoRA + IP-Adapter](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_ipa.png) | ![LoRA + IP-Adapter Refined](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_ipa_refine_out.png) |
-
-#### ControlNet + LoRA + IP-Adapter
-| ControlNet + LoRA + IP-Adapter | ControlNet + LoRA + IP-Adapter (Refined) |
-|-------------------|------------------------------|
-| ![ControlNet + LoRA + IP-Adapter](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_control.png) | ![ControlNet + LoRA + IP-Adapter Refined](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/modular_quicktour/modular_part2_control_refine_out.png) |
-