diff --git a/docs/source/en/api/pipelines/pia.md b/docs/source/en/api/pipelines/pia.md
index 4793829aed..7bd480b49a 100644
--- a/docs/source/en/api/pipelines/pia.md
+++ b/docs/source/en/api/pipelines/pia.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# Image-to-Video Generation with PIA (Personalized Image Animator)
diff --git a/docs/source/en/api/pipelines/pix2pix.md b/docs/source/en/api/pipelines/pix2pix.md
index d0b3bf32b8..20a74577c1 100644
--- a/docs/source/en/api/pipelines/pix2pix.md
+++ b/docs/source/en/api/pipelines/pix2pix.md
@@ -1,4 +1,4 @@
-
+
+# QwenImage
+
+Qwen-Image from the Qwen team is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.
+
+Check out the model card [here](https://huggingface.co/Qwen/Qwen-Image) to learn more.
+
+
+
+Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
+
+
+
+## QwenImagePipeline
+
+[[autodoc]] QwenImagePipeline
+ - all
+ - __call__
+
+## QwenImagePipelineOutput
+
+[[autodoc]] pipelines.qwenimage.pipeline_output.QwenImagePipelineOutput
diff --git a/docs/source/en/api/pipelines/sana.md b/docs/source/en/api/pipelines/sana.md
index a66267c094..7491689fd8 100644
--- a/docs/source/en/api/pipelines/sana.md
+++ b/docs/source/en/api/pipelines/sana.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# Self-Attention Guidance
[Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://huggingface.co/papers/2210.00939) is by Susung Hong et al.
diff --git a/docs/source/en/api/pipelines/semantic_stable_diffusion.md b/docs/source/en/api/pipelines/semantic_stable_diffusion.md
index b9aacd3518..1ce44cf2de 100644
--- a/docs/source/en/api/pipelines/semantic_stable_diffusion.md
+++ b/docs/source/en/api/pipelines/semantic_stable_diffusion.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# Semantic Guidance
Semantic Guidance for Diffusion Models was proposed in [SEGA: Instructing Text-to-Image Models using Semantic Guidance](https://huggingface.co/papers/2301.12247) and provides strong semantic control over image generation.
diff --git a/docs/source/en/api/pipelines/shap_e.md b/docs/source/en/api/pipelines/shap_e.md
index 3c1f939c1f..5e5af0656a 100644
--- a/docs/source/en/api/pipelines/shap_e.md
+++ b/docs/source/en/api/pipelines/shap_e.md
@@ -1,4 +1,4 @@
-
+
+
+
+# SkyReels-V2: Infinite-length Film Generative model
+
+[SkyReels-V2](https://huggingface.co/papers/2504.13074) by the SkyReels Team.
+
+*Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming from general-purpose MLLMs' inability to interpret cinematic grammar, such as shot composition, actor expressions, and camera motions. These intertwined limitations hinder realistic long-form synthesis and professional film-style generation. To address these limitations, we propose SkyReels-V2, an Infinite-length Film Generative Model, that synergizes Multi-modal Large Language Model (MLLM), Multi-stage Pretraining, Reinforcement Learning, and Diffusion Forcing Framework. Firstly, we design a comprehensive structural representation of video that combines the general descriptions by the Multi-modal LLM and the detailed shot language by sub-expert models. Aided with human annotation, we then train a unified Video Captioner, named SkyCaptioner-V1, to efficiently label the video data. Secondly, we establish progressive-resolution pretraining for the fundamental video generation, followed by a four-stage post-training enhancement: Initial concept-balanced Supervised Fine-Tuning (SFT) improves baseline quality; Motion-specific Reinforcement Learning (RL) training with human-annotated and synthetic distortion data addresses dynamic artifacts; Our diffusion forcing framework with non-decreasing noise schedules enables long-video synthesis in an efficient search space; Final high-quality SFT refines visual fidelity. All the code and models are available at [this https URL](https://github.com/SkyworkAI/SkyReels-V2).*
+
+You can find all the original SkyReels-V2 checkpoints under the [Skywork](https://huggingface.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9) organization.
+
+The following SkyReels-V2 models are supported in Diffusers:
+- [SkyReels-V2 DF 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers)
+- [SkyReels-V2 DF 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P-Diffusers)
+- [SkyReels-V2 DF 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-720P-Diffusers)
+- [SkyReels-V2 T2V 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-540P-Diffusers)
+- [SkyReels-V2 T2V 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-720P-Diffusers)
+- [SkyReels-V2 I2V 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P-Diffusers)
+- [SkyReels-V2 I2V 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P-Diffusers)
+- [SkyReels-V2 I2V 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-720P-Diffusers)
+- [SkyReels-V2 FLF2V 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-FLF2V-1.3B-540P-Diffusers)
+
+> [!TIP]
+> Click on the SkyReels-V2 models in the right sidebar for more examples of video generation.
+
+### A _Visual_ Demonstration
+
+ An example with these parameters:
+ base_num_frames=97, num_frames=97, num_inference_steps=30, ar_step=5, causal_block_size=5
+
+ vae_scale_factor_temporal -> 4
+ num_latent_frames: (97-1)//vae_scale_factor_temporal+1 = 25 frames -> 5 blocks of 5 frames each
+
+ base_num_latent_frames = (97-1)//vae_scale_factor_temporal+1 = 25 → blocks = 25//5 = 5 blocks
+ This 5 blocks means the maximum context length of the model is 25 frames in the latent space.
+
+ Asynchronous Processing Timeline:
+ ┌─────────────────────────────────────────────────────────────────┐
+ │ Steps: 1 6 11 16 21 26 31 36 41 46 50 │
+ │ Block 1: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
+ │ Block 2: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
+ │ Block 3: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
+ │ Block 4: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
+ │ Block 5: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
+ └─────────────────────────────────────────────────────────────────┘
+
+ For Long Videos (num_frames > base_num_frames):
+ base_num_frames acts as the "sliding window size" for processing long videos.
+
+ Example: 257-frame video with base_num_frames=97, overlap_history=17
+ ┌──── Iteration 1 (frames 1-97) ────┐
+ │ Processing window: 97 frames │ → 5 blocks, async processing
+ │ Generates: frames 1-97 │
+ └───────────────────────────────────┘
+ ┌────── Iteration 2 (frames 81-177) ──────┐
+ │ Processing window: 97 frames │
+ │ Overlap: 17 frames (81-97) from prev │ → 5 blocks, async processing
+ │ Generates: frames 98-177 │
+ └─────────────────────────────────────────┘
+ ┌────── Iteration 3 (frames 161-257) ──────┐
+ │ Processing window: 97 frames │
+ │ Overlap: 17 frames (161-177) from prev │ → 5 blocks, async processing
+ │ Generates: frames 178-257 │
+ └──────────────────────────────────────────┘
+
+ Each iteration independently runs the asynchronous processing with its own 5 blocks.
+ base_num_frames controls:
+ 1. Memory usage (larger window = more VRAM)
+ 2. Model context length (must match training constraints)
+ 3. Number of blocks per iteration (base_num_latent_frames // causal_block_size)
+
+ Each block takes 30 steps to complete denoising.
+ Block N starts at step: 1 + (N-1) x ar_step
+ Total steps: 30 + (5-1) x 5 = 50 steps
+
+
+ Synchronous mode (ar_step=0) would process all blocks/frames simultaneously:
+ ┌──────────────────────────────────────────────┐
+ │ Steps: 1 ... 30 │
+ │ All blocks: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
+ └──────────────────────────────────────────────┘
+ Total steps: 30 steps
+
+
+ An example on how the step matrix is constructed for asynchronous processing:
+ Given the parameters: (num_inference_steps=30, flow_shift=8, num_frames=97, ar_step=5, causal_block_size=5)
+ - num_latent_frames = (97 frames - 1) // (4 temporal downsampling) + 1 = 25
+ - step_template = [999, 995, 991, 986, 980, 975, 969, 963, 956, 948,
+ 941, 932, 922, 912, 901, 888, 874, 859, 841, 822,
+ 799, 773, 743, 708, 666, 615, 551, 470, 363, 216]
+
+ The algorithm creates a 50x25 step_matrix where:
+ - Row 1: [999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999]
+ - Row 2: [995, 995, 995, 995, 995, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999]
+ - Row 3: [991, 991, 991, 991, 991, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999]
+ - ...
+ - Row 7: [969, 969, 969, 969, 969, 995, 995, 995, 995, 995, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999]
+ - ...
+ - Row 21: [799, 799, 799, 799, 799, 888, 888, 888, 888, 888, 941, 941, 941, 941, 941, 975, 975, 975, 975, 975, 999, 999, 999, 999, 999]
+ - ...
+ - Row 35: [ 0, 0, 0, 0, 0, 216, 216, 216, 216, 216, 666, 666, 666, 666, 666, 822, 822, 822, 822, 822, 901, 901, 901, 901, 901]
+ - ...
+ - Row 42: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 551, 551, 551, 551, 551, 773, 773, 773, 773, 773]
+ - ...
+ - Row 50: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 216, 216, 216, 216, 216]
+
+ Detailed Row 6 Analysis:
+ - step_matrix[5]: [ 975, 975, 975, 975, 975, 999, 999, 999, 999, 999, 999, ..., 999]
+ - step_index[5]: [ 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 0, ..., 0]
+ - step_update_mask[5]: [True,True,True,True,True,True,True,True,True,True,False, ...,False]
+ - valid_interval[5]: (0, 25)
+
+ Key Pattern: Block i lags behind Block i-1 by exactly ar_step=5 timesteps, creating the
+ staggered "diffusion forcing" effect where later blocks condition on cleaner earlier blocks.
+
+### Text-to-Video Generation
+
+The example below demonstrates how to generate a video from text.
+
+
+
+
+Refer to the [Reduce memory usage](../../optimization/memory) guide for more details about the various memory saving techniques.
+
+From the original repo:
+>You can use --ar_step 5 to enable asynchronous inference. When asynchronous inference, --causal_block_size 5 is recommended while it is not supposed to be set for synchronous generation... Asynchronous inference will take more steps to diffuse the whole sequence which means it will be SLOWER than synchronous mode. In our experiments, asynchronous inference may improve the instruction following and visual consistent performance.
+
+```py
+# pip install ftfy
+import torch
+from diffusers import AutoModel, SkyReelsV2DiffusionForcingPipeline, UniPCMultistepScheduler
+from diffusers.utils import export_to_video
+
+vae = AutoModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="vae", torch_dtype=torch.float32)
+transformer = AutoModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
+
+pipeline = SkyReelsV2DiffusionForcingPipeline.from_pretrained(
+ "Skywork/SkyReels-V2-DF-14B-540P-Diffusers",
+ vae=vae,
+ transformer=transformer,
+ torch_dtype=torch.bfloat16
+)
+flow_shift = 8.0 # 8.0 for T2V, 5.0 for I2V
+pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config, flow_shift=flow_shift)
+pipeline = pipeline.to("cuda")
+
+prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window."
+
+output = pipeline(
+ prompt=prompt,
+ num_inference_steps=30,
+ height=544, # 720 for 720P
+ width=960, # 1280 for 720P
+ num_frames=97,
+ base_num_frames=97, # 121 for 720P
+ ar_step=5, # Controls asynchronous inference (0 for synchronous mode)
+ causal_block_size=5, # Number of frames in each block for asynchronous processing
+ overlap_history=None, # Number of frames to overlap for smooth transitions in long videos; 17 for long video generations
+ addnoise_condition=20, # Improves consistency in long video generation
+).frames[0]
+export_to_video(output, "T2V.mp4", fps=24, quality=8)
+```
+
+
+
+
+### First-Last-Frame-to-Video Generation
+
+The example below demonstrates how to use the image-to-video pipeline to generate a video using a text description, a starting frame, and an ending frame.
+
+
+
+
+```python
+import numpy as np
+import torch
+import torchvision.transforms.functional as TF
+from diffusers import AutoencoderKLWan, SkyReelsV2DiffusionForcingImageToVideoPipeline, UniPCMultistepScheduler
+from diffusers.utils import export_to_video, load_image
+
+
+model_id = "Skywork/SkyReels-V2-DF-14B-720P-Diffusers"
+vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
+pipeline = SkyReelsV2DiffusionForcingImageToVideoPipeline.from_pretrained(
+ model_id, vae=vae, torch_dtype=torch.bfloat16
+)
+flow_shift = 5.0 # 8.0 for T2V, 5.0 for I2V
+pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config, flow_shift=flow_shift)
+pipeline.to("cuda")
+
+first_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
+last_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")
+
+def aspect_ratio_resize(image, pipeline, max_area=720 * 1280):
+ aspect_ratio = image.height / image.width
+ mod_value = pipeline.vae_scale_factor_spatial * pipeline.transformer.config.patch_size[1]
+ height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
+ width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
+ image = image.resize((width, height))
+ return image, height, width
+
+def center_crop_resize(image, height, width):
+ # Calculate resize ratio to match first frame dimensions
+ resize_ratio = max(width / image.width, height / image.height)
+
+ # Resize the image
+ width = round(image.width * resize_ratio)
+ height = round(image.height * resize_ratio)
+ size = [width, height]
+ image = TF.center_crop(image, size)
+
+ return image, height, width
+
+first_frame, height, width = aspect_ratio_resize(first_frame, pipeline)
+if last_frame.size != first_frame.size:
+ last_frame, _, _ = center_crop_resize(last_frame, height, width)
+
+prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings. The bird's feathers are delicate, with a unique pattern on its chest. The background shows a blue sky with white clouds under bright sunshine. The camera follows the bird upward, capturing its flight and the vastness of the sky from a close-up, low-angle perspective."
+
+output = pipeline(
+ image=first_frame, last_image=last_frame, prompt=prompt, height=height, width=width, guidance_scale=5.0
+).frames[0]
+export_to_video(output, "output.mp4", fps=24, quality=8)
+```
+
+
+
+
+
+### Video-to-Video Generation
+
+
+
+
+`SkyReelsV2DiffusionForcingVideoToVideoPipeline` extends a given video.
+
+```python
+import numpy as np
+import torch
+import torchvision.transforms.functional as TF
+from diffusers import AutoencoderKLWan, SkyReelsV2DiffusionForcingVideoToVideoPipeline, UniPCMultistepScheduler
+from diffusers.utils import export_to_video, load_video
+
+
+model_id = "Skywork/SkyReels-V2-DF-14B-540P-Diffusers"
+vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
+pipeline = SkyReelsV2DiffusionForcingVideoToVideoPipeline.from_pretrained(
+ model_id, vae=vae, torch_dtype=torch.bfloat16
+)
+flow_shift = 5.0 # 8.0 for T2V, 5.0 for I2V
+pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config, flow_shift=flow_shift)
+pipeline.to("cuda")
+
+video = load_video("input_video.mp4")
+
+prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings. The bird's feathers are delicate, with a unique pattern on its chest. The background shows a blue sky with white clouds under bright sunshine. The camera follows the bird upward, capturing its flight and the vastness of the sky from a close-up, low-angle perspective."
+
+output = pipeline(
+ video=video, prompt=prompt, height=544, width=960, guidance_scale=5.0,
+ num_inference_steps=30, num_frames=257, base_num_frames=97#, ar_step=5, causal_block_size=5,
+).frames[0]
+export_to_video(output, "output.mp4", fps=24, quality=8)
+# Total frames will be the number of frames of given video + 257
+```
+
+
+
+
+
+## Notes
+
+- SkyReels-V2 supports LoRAs with [`~loaders.SkyReelsV2LoraLoaderMixin.load_lora_weights`].
+
+
+ Show example code
+
+ ```py
+ # pip install ftfy
+ import torch
+ from diffusers import AutoModel, SkyReelsV2DiffusionForcingPipeline
+ from diffusers.utils import export_to_video
+
+ vae = AutoModel.from_pretrained(
+ "Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", subfolder="vae", torch_dtype=torch.float32
+ )
+ pipeline = SkyReelsV2DiffusionForcingPipeline.from_pretrained(
+ "Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", vae=vae, torch_dtype=torch.bfloat16
+ )
+ pipeline.to("cuda")
+
+ pipeline.load_lora_weights("benjamin-paine/steamboat-willie-1.3b", adapter_name="steamboat-willie")
+ pipeline.set_adapters("steamboat-willie")
+
+ pipeline.enable_model_cpu_offload()
+
+ # use "steamboat willie style" to trigger the LoRA
+ prompt = """
+ steamboat willie style, golden era animation, The camera rushes from far to near in a low-angle shot,
+ revealing a white ferret on a log. It plays, leaps into the water, and emerges, as the camera zooms in
+ for a close-up. Water splashes berry bushes nearby, while moss, snow, and leaves blanket the ground.
+ Birch trees and a light blue sky frame the scene, with ferns in the foreground. Side lighting casts dynamic
+ shadows and warm highlights. Medium composition, front view, low angle, with depth of field.
+ """
+
+ output = pipeline(
+ prompt=prompt,
+ num_frames=97,
+ guidance_scale=6.0,
+ ).frames[0]
+ export_to_video(output, "output.mp4", fps=24)
+ ```
+
+
+
+
+## SkyReelsV2DiffusionForcingPipeline
+
+[[autodoc]] SkyReelsV2DiffusionForcingPipeline
+ - all
+ - __call__
+
+## SkyReelsV2DiffusionForcingImageToVideoPipeline
+
+[[autodoc]] SkyReelsV2DiffusionForcingImageToVideoPipeline
+ - all
+ - __call__
+
+## SkyReelsV2DiffusionForcingVideoToVideoPipeline
+
+[[autodoc]] SkyReelsV2DiffusionForcingVideoToVideoPipeline
+ - all
+ - __call__
+
+## SkyReelsV2Pipeline
+
+[[autodoc]] SkyReelsV2Pipeline
+ - all
+ - __call__
+
+## SkyReelsV2ImageToVideoPipeline
+
+[[autodoc]] SkyReelsV2ImageToVideoPipeline
+ - all
+ - __call__
+
+## SkyReelsV2PipelineOutput
+
+[[autodoc]] pipelines.skyreels_v2.pipeline_output.SkyReelsV2PipelineOutput
\ No newline at end of file
diff --git a/docs/source/en/api/pipelines/stable_audio.md b/docs/source/en/api/pipelines/stable_audio.md
index 3f689ba0ad..82763a52a9 100644
--- a/docs/source/en/api/pipelines/stable_audio.md
+++ b/docs/source/en/api/pipelines/stable_audio.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# GLIGEN (Grounded Language-to-Image Generation)
The GLIGEN model was created by researchers and engineers from [University of Wisconsin-Madison, Columbia University, and Microsoft](https://github.com/gligen/GLIGEN). The [`StableDiffusionGLIGENPipeline`] and [`StableDiffusionGLIGENTextImagePipeline`] can generate photorealistic images conditioned on grounding inputs. Along with text and bounding boxes with [`StableDiffusionGLIGENPipeline`], if input images are given, [`StableDiffusionGLIGENTextImagePipeline`] can insert objects described by text at the region defined by bounding boxes. Otherwise, it'll generate an image described by the caption/prompt and insert objects described by text at the region defined by bounding boxes. It's trained on COCO2014D and COCO2014CD datasets, and the model uses a frozen CLIP ViT-L/14 text encoder to condition itself on grounding inputs.
diff --git a/docs/source/en/api/pipelines/stable_diffusion/image_variation.md b/docs/source/en/api/pipelines/stable_diffusion/image_variation.md
index 57dd2f0d5b..7a50971fdf 100644
--- a/docs/source/en/api/pipelines/stable_diffusion/image_variation.md
+++ b/docs/source/en/api/pipelines/stable_diffusion/image_variation.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# K-Diffusion
[k-diffusion](https://github.com/crowsonkb/k-diffusion) is a popular library created by [Katherine Crowson](https://github.com/crowsonkb/). We provide `StableDiffusionKDiffusionPipeline` and `StableDiffusionXLKDiffusionPipeline` that allow you to run Stable DIffusion with samplers from k-diffusion.
diff --git a/docs/source/en/api/pipelines/stable_diffusion/latent_upscale.md b/docs/source/en/api/pipelines/stable_diffusion/latent_upscale.md
index 9abccd6e13..d5a15cb002 100644
--- a/docs/source/en/api/pipelines/stable_diffusion/latent_upscale.md
+++ b/docs/source/en/api/pipelines/stable_diffusion/latent_upscale.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# Text-to-(RGB, depth)
diff --git a/docs/source/en/api/pipelines/stable_diffusion/overview.md b/docs/source/en/api/pipelines/stable_diffusion/overview.md
index 2598409121..7e6e16c347 100644
--- a/docs/source/en/api/pipelines/stable_diffusion/overview.md
+++ b/docs/source/en/api/pipelines/stable_diffusion/overview.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# Safe Stable Diffusion
Safe Stable Diffusion was proposed in [Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models](https://huggingface.co/papers/2211.05105) and mitigates inappropriate degeneration from Stable Diffusion models because they're trained on unfiltered web-crawled datasets. For instance Stable Diffusion may unexpectedly generate nudity, violence, images depicting self-harm, and otherwise offensive content. Safe Stable Diffusion is an extension of Stable Diffusion that drastically reduces this type of content.
diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.md b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.md
index bf7ce9d793..30e4379066 100644
--- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.md
+++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.md
@@ -1,4 +1,4 @@
-
-
-
-🧪 This pipeline is for research purposes only.
-
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
# Text-to-video
diff --git a/docs/source/en/api/pipelines/text_to_video_zero.md b/docs/source/en/api/pipelines/text_to_video_zero.md
index a84ce0be11..5fe3789d82 100644
--- a/docs/source/en/api/pipelines/text_to_video_zero.md
+++ b/docs/source/en/api/pipelines/text_to_video_zero.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# Text2Video-Zero
diff --git a/docs/source/en/api/pipelines/unclip.md b/docs/source/en/api/pipelines/unclip.md
index 943cebdb28..8011a4b533 100644
--- a/docs/source/en/api/pipelines/unclip.md
+++ b/docs/source/en/api/pipelines/unclip.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# unCLIP
[Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) is by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. The unCLIP model in 🤗 Diffusers comes from kakaobrain's [karlo](https://github.com/kakaobrain/karlo).
diff --git a/docs/source/en/api/pipelines/unidiffuser.md b/docs/source/en/api/pipelines/unidiffuser.md
index 802aefea6b..7d767f2db5 100644
--- a/docs/source/en/api/pipelines/unidiffuser.md
+++ b/docs/source/en/api/pipelines/unidiffuser.md
@@ -1,4 +1,4 @@
-
+> [!WARNING]
+> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.
+
# UniDiffuser
diff --git a/docs/source/en/api/pipelines/value_guided_sampling.md b/docs/source/en/api/pipelines/value_guided_sampling.md
index 5aaee9090c..797847ee47 100644
--- a/docs/source/en/api/pipelines/value_guided_sampling.md
+++ b/docs/source/en/api/pipelines/value_guided_sampling.md
@@ -1,4 +1,4 @@
-
+
+# AutoPipelineBlocks
+
+
+
+🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
+
+
+
+`AutoPipelineBlocks` is a subclass of `ModularPipelineBlocks`. It is a multi-block that automatically selects which sub-blocks to run based on the inputs provided at runtime, creating conditional workflows that adapt to different scenarios. The main purpose is convenience and portability - for developers, you can package everything into one workflow, making it easier to share and use.
+
+In this tutorial, we will show you how to create an `AutoPipelineBlocks` and learn more about how the conditional selection works.
+
+
+
+Other types of multi-blocks include [SequentialPipelineBlocks](sequential_pipeline_blocks.md) (for linear workflows) and [LoopSequentialPipelineBlocks](loop_sequential_pipeline_blocks.md) (for iterative workflows). For information on creating individual blocks, see the [PipelineBlock guide](pipeline_block.md).
+
+Additionally, like all `ModularPipelineBlocks`, `AutoPipelineBlocks` are definitions/specifications, not runnable pipelines. You need to convert them into a `ModularPipeline` to actually execute them. For information on creating and running pipelines, see the [Modular Pipeline guide](modular_pipeline.md).
+
+
+
+For example, you might want to support text-to-image and image-to-image tasks. Instead of creating two separate pipelines, you can create an `AutoPipelineBlocks` that automatically chooses the workflow based on whether an `image` input is provided.
+
+Let's see an example. We'll use the helper function from the [PipelineBlock guide](./pipeline_block.md) to create our blocks:
+
+**Helper Function**
+
+```py
+from diffusers.modular_pipelines import PipelineBlock, InputParam, OutputParam
+import torch
+
+def make_block(inputs=[], intermediate_inputs=[], intermediate_outputs=[], block_fn=None, description=None):
+ class TestBlock(PipelineBlock):
+ model_name = "test"
+
+ @property
+ def inputs(self):
+ return inputs
+
+ @property
+ def intermediate_inputs(self):
+ return intermediate_inputs
+
+ @property
+ def intermediate_outputs(self):
+ return intermediate_outputs
+
+ @property
+ def description(self):
+ return description if description is not None else ""
+
+ def __call__(self, components, state):
+ block_state = self.get_block_state(state)
+ if block_fn is not None:
+ block_state = block_fn(block_state, state)
+ self.set_block_state(state, block_state)
+ return components, state
+
+ return TestBlock
+```
+
+Now let's create a dummy `AutoPipelineBlocks` that includes dummy text-to-image, image-to-image, and inpaint pipelines.
+
+
+```py
+from diffusers.modular_pipelines import AutoPipelineBlocks
+
+# These are dummy blocks and we only focus on "inputs" for our purpose
+inputs = [InputParam(name="prompt")]
+# block_fn prints out which workflow is running so we can see the execution order at runtime
+block_fn = lambda x, y: print("running the text-to-image workflow")
+block_t2i_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a text-to-image workflow!")
+
+inputs = [InputParam(name="prompt"), InputParam(name="image")]
+block_fn = lambda x, y: print("running the image-to-image workflow")
+block_i2i_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a image-to-image workflow!")
+
+inputs = [InputParam(name="prompt"), InputParam(name="image"), InputParam(name="mask")]
+block_fn = lambda x, y: print("running the inpaint workflow")
+block_inpaint_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a inpaint workflow!")
+
+class AutoImageBlocks(AutoPipelineBlocks):
+ # List of sub-block classes to choose from
+ block_classes = [block_inpaint_cls, block_i2i_cls, block_t2i_cls]
+ # Names for each block in the same order
+ block_names = ["inpaint", "img2img", "text2img"]
+ # Trigger inputs that determine which block to run
+ # - "mask" triggers inpaint workflow
+ # - "image" triggers img2img workflow (but only if mask is not provided)
+ # - if none of above, runs the text2img workflow (default)
+ block_trigger_inputs = ["mask", "image", None]
+ # Description is extremely important for AutoPipelineBlocks
+ @property
+ def description(self):
+ return (
+ "Pipeline generates images given different types of conditions!\n"
+ + "This is an auto pipeline block that works for text2img, img2img and inpainting tasks.\n"
+ + " - inpaint workflow is run when `mask` is provided.\n"
+ + " - img2img workflow is run when `image` is provided (but only when `mask` is not provided).\n"
+ + " - text2img workflow is run when neither `image` nor `mask` is provided.\n"
+ )
+
+# Create the blocks
+auto_blocks = AutoImageBlocks()
+# convert to pipeline
+auto_pipeline = auto_blocks.init_pipeline()
+```
+
+Now we have created an `AutoPipelineBlocks` that contains 3 sub-blocks. Notice the warning message at the top - this automatically appears in every `ModularPipelineBlocks` that contains `AutoPipelineBlocks` to remind end users that dynamic block selection happens at runtime.
+
+```py
+AutoImageBlocks(
+ Class: AutoPipelineBlocks
+
+ ====================================================================================================
+ This pipeline contains blocks that are selected at runtime based on inputs.
+ Trigger Inputs: ['mask', 'image']
+ ====================================================================================================
+
+
+ Description: Pipeline generates images given different types of conditions!
+ This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
+ - inpaint workflow is run when `mask` is provided.
+ - img2img workflow is run when `image` is provided (but only when `mask` is not provided).
+ - text2img workflow is run when neither `image` nor `mask` is provided.
+
+
+
+ Sub-Blocks:
+ • inpaint [trigger: mask] (TestBlock)
+ Description: I'm a inpaint workflow!
+
+ • img2img [trigger: image] (TestBlock)
+ Description: I'm a image-to-image workflow!
+
+ • text2img [default] (TestBlock)
+ Description: I'm a text-to-image workflow!
+
+)
+```
+
+Check out the documentation with `print(auto_pipeline.doc)`:
+
+```py
+>>> print(auto_pipeline.doc)
+class AutoImageBlocks
+
+ Pipeline generates images given different types of conditions!
+ This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
+ - inpaint workflow is run when `mask` is provided.
+ - img2img workflow is run when `image` is provided (but only when `mask` is not provided).
+ - text2img workflow is run when neither `image` nor `mask` is provided.
+
+ Inputs:
+
+ prompt (`None`, *optional*):
+
+ image (`None`, *optional*):
+
+ mask (`None`, *optional*):
+```
+
+There is a fundamental trade-off of AutoPipelineBlocks: it trades clarity for convenience. While it is really easy for packaging multiple workflows, it can become confusing without proper documentation. e.g. if we just throw a pipeline at you and tell you that it contains 3 sub-blocks and takes 3 inputs `prompt`, `image` and `mask`, and ask you to run an image-to-image workflow: if you don't have any prior knowledge on how these pipelines work, you would be pretty clueless, right?
+
+This pipeline we just made though, has a docstring that shows all available inputs and workflows and explains how to use each with different inputs. So it's really helpful for users. For example, it's clear that you need to pass `image` to run img2img. This is why the description field is absolutely critical for AutoPipelineBlocks. We highly recommend you to explain the conditional logic very well for each `AutoPipelineBlocks` you would make. We also recommend to always test individual pipelines first before packaging them into AutoPipelineBlocks.
+
+Let's run this auto pipeline with different inputs to see if the conditional logic works as described. Remember that we have added `print` in each `PipelineBlock`'s `__call__` method to print out its workflow name, so it should be easy to tell which one is running:
+
+```py
+>>> _ = auto_pipeline(image="image", mask="mask")
+running the inpaint workflow
+>>> _ = auto_pipeline(image="image")
+running the image-to-image workflow
+>>> _ = auto_pipeline(prompt="prompt")
+running the text-to-image workflow
+>>> _ = auto_pipeline(image="prompt", mask="mask")
+running the inpaint workflow
+```
+
+However, even with documentation, it can become very confusing when AutoPipelineBlocks are combined with other blocks. The complexity grows quickly when you have nested AutoPipelineBlocks or use them as sub-blocks in larger pipelines.
+
+Let's make another `AutoPipelineBlocks` - this one only contains one block, and it does not include `None` in its `block_trigger_inputs` (which corresponds to the default block to run when none of the trigger inputs are provided). This means this block will be skipped if the trigger input (`ip_adapter_image`) is not provided at runtime.
+
+```py
+from diffusers.modular_pipelines import SequentialPipelineBlocks, InsertableDict
+inputs = [InputParam(name="ip_adapter_image")]
+block_fn = lambda x, y: print("running the ip-adapter workflow")
+block_ipa_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a IP-adapter workflow!")
+
+class AutoIPAdapter(AutoPipelineBlocks):
+ block_classes = [block_ipa_cls]
+ block_names = ["ip-adapter"]
+ block_trigger_inputs = ["ip_adapter_image"]
+ @property
+ def description(self):
+ return "Run IP Adapter step if `ip_adapter_image` is provided."
+```
+
+Now let's combine these 2 auto blocks together into a `SequentialPipelineBlocks`:
+
+```py
+auto_ipa_blocks = AutoIPAdapter()
+blocks_dict = InsertableDict()
+blocks_dict["ip-adapter"] = auto_ipa_blocks
+blocks_dict["image-generation"] = auto_blocks
+all_blocks = SequentialPipelineBlocks.from_blocks_dict(blocks_dict)
+pipeline = all_blocks.init_pipeline()
+```
+
+Let's take a look: now things get more confusing. In this particular example, you could still try to explain the conditional logic in the `description` field here - there are only 4 possible execution paths so it's doable. However, since this is a `SequentialPipelineBlocks` that could contain many more blocks, the complexity can quickly get out of hand as the number of blocks increases.
+
+```py
+>>> all_blocks
+SequentialPipelineBlocks(
+ Class: ModularPipelineBlocks
+
+ ====================================================================================================
+ This pipeline contains blocks that are selected at runtime based on inputs.
+ Trigger Inputs: ['image', 'mask', 'ip_adapter_image']
+ Use `get_execution_blocks()` with input names to see selected blocks (e.g. `get_execution_blocks('image')`).
+ ====================================================================================================
+
+
+ Description:
+
+
+ Sub-Blocks:
+ [0] ip-adapter (AutoIPAdapter)
+ Description: Run IP Adapter step if `ip_adapter_image` is provided.
+
+
+ [1] image-generation (AutoImageBlocks)
+ Description: Pipeline generates images given different types of conditions!
+ This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
+ - inpaint workflow is run when `mask` is provided.
+ - img2img workflow is run when `image` is provided (but only when `mask` is not provided).
+ - text2img workflow is run when neither `image` nor `mask` is provided.
+
+
+)
+
+```
+
+This is when the `get_execution_blocks()` method comes in handy - it basically extracts a `SequentialPipelineBlocks` that only contains the blocks that are actually run based on your inputs.
+
+Let's try some examples:
+
+`mask`: we expect it to skip the first ip-adapter since `ip_adapter_image` is not provided, and then run the inpaint for the second block.
+
+```py
+>>> all_blocks.get_execution_blocks('mask')
+SequentialPipelineBlocks(
+ Class: ModularPipelineBlocks
+
+ Description:
+
+
+ Sub-Blocks:
+ [0] image-generation (TestBlock)
+ Description: I'm a inpaint workflow!
+
+)
+```
+
+Let's also actually run the pipeline to confirm:
+
+```py
+>>> _ = pipeline(mask="mask")
+skipping auto block: AutoIPAdapter
+running the inpaint workflow
+```
+
+Try a few more:
+
+```py
+print(f"inputs: ip_adapter_image:")
+blocks_select = all_blocks.get_execution_blocks('ip_adapter_image')
+print(f"expected_execution_blocks: {blocks_select}")
+print(f"actual execution blocks:")
+_ = pipeline(ip_adapter_image="ip_adapter_image", prompt="prompt")
+# expect to see ip-adapter + text2img
+
+print(f"inputs: image:")
+blocks_select = all_blocks.get_execution_blocks('image')
+print(f"expected_execution_blocks: {blocks_select}")
+print(f"actual execution blocks:")
+_ = pipeline(image="image", prompt="prompt")
+# expect to see img2img
+
+print(f"inputs: prompt:")
+blocks_select = all_blocks.get_execution_blocks('prompt')
+print(f"expected_execution_blocks: {blocks_select}")
+print(f"actual execution blocks:")
+_ = pipeline(prompt="prompt")
+# expect to see text2img (prompt is not a trigger input so fallback to default)
+
+print(f"inputs: mask + ip_adapter_image:")
+blocks_select = all_blocks.get_execution_blocks('mask','ip_adapter_image')
+print(f"expected_execution_blocks: {blocks_select}")
+print(f"actual execution blocks:")
+_ = pipeline(mask="mask", ip_adapter_image="ip_adapter_image")
+# expect to see ip-adapter + inpaint
+```
+
+In summary, `AutoPipelineBlocks` is a good tool for packaging multiple workflows into a single, convenient interface and it can greatly simplify the user experience. However, always provide clear descriptions explaining the conditional logic, test individual pipelines first before combining them, and use `get_execution_blocks()` to understand runtime behavior in complex compositions.
\ No newline at end of file
diff --git a/docs/source/en/modular_diffusers/components_manager.md b/docs/source/en/modular_diffusers/components_manager.md
new file mode 100644
index 0000000000..15b6c66b9b
--- /dev/null
+++ b/docs/source/en/modular_diffusers/components_manager.md
@@ -0,0 +1,514 @@
+
+
+# Components Manager
+
+
+
+🧪 **Experimental Feature**: This is an experimental feature we are actively developing. The API may be subject to breaking changes.
+
+
+
+The Components Manager is a central model registry and management system in diffusers. It lets you add models then reuse them across multiple pipelines and workflows. It tracks all models in one place with useful metadata such as model size, device placement and loaded adapters (LoRA, IP-Adapter). It has mechanisms in place to prevent duplicate model instances, enables memory-efficient sharing. Most significantly, it offers offloading that works across pipelines — unlike regular DiffusionPipeline offloading (i.e. `enable_model_cpu_offload` and `enable_sequential_cpu_offload`) which is limited to one pipeline with predefined sequences, the Components Manager automatically manages your device memory across all your models and workflows.
+
+
+## Basic Operations
+
+Let's start with the most basic operations. First, create a Components Manager:
+
+```py
+from diffusers import ComponentsManager
+comp = ComponentsManager()
+```
+
+Use the `add(name, component)` method to register a component. It returns a unique ID that combines the component name with the object's unique identifier (using Python's `id()` function):
+
+```py
+from diffusers import AutoModel
+text_encoder = AutoModel.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="text_encoder")
+# Returns component_id like 'text_encoder_139917733042864'
+component_id = comp.add("text_encoder", text_encoder)
+```
+
+You can view all registered components and their metadata:
+
+```py
+>>> comp
+Components:
+===============================================================================================================================================
+Models:
+-----------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID | Class | Device: act(exec) | Dtype | Size (GB) | Load ID | Collection
+-----------------------------------------------------------------------------------------------------------------------------------------------
+text_encoder_139917733042864 | CLIPTextModel | cpu | torch.float32 | 0.46 | N/A | N/A
+-----------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+And remove components using their unique ID:
+
+```py
+comp.remove("text_encoder_139917733042864")
+```
+
+## Duplicate Detection
+
+The Components Manager automatically detects and prevents duplicate model instances to save memory and avoid confusion. Let's walk through how this works in practice.
+
+When you try to add the same object twice, the manager will warn you and return the existing ID:
+
+```py
+>>> comp.add("text_encoder", text_encoder)
+'text_encoder_139917733042864'
+>>> comp.add("text_encoder", text_encoder)
+ComponentsManager: component 'text_encoder' already exists as 'text_encoder_139917733042864'
+'text_encoder_139917733042864'
+```
+
+Even if you add the same object under a different name, it will still be detected as a duplicate:
+
+```py
+>>> comp.add("clip", text_encoder)
+ComponentsManager: adding component 'clip' as 'clip_139917733042864', but it is duplicate of 'text_encoder_139917733042864'
+To remove a duplicate, call `components_manager.remove('
')`.
+'clip_139917733042864'
+```
+
+However, there's a more subtle case where duplicate detection becomes tricky. When you load the same model into different objects, the manager can't detect duplicates unless you use `ComponentSpec`. For example:
+
+```py
+>>> text_encoder_2 = AutoModel.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="text_encoder")
+>>> comp.add("text_encoder", text_encoder_2)
+'text_encoder_139917732983664'
+```
+
+This creates a problem - you now have two copies of the same model consuming double the memory:
+
+```py
+>>> comp
+Components:
+===============================================================================================================================================
+Models:
+-----------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID | Class | Device: act(exec) | Dtype | Size (GB) | Load ID | Collection
+-----------------------------------------------------------------------------------------------------------------------------------------------
+text_encoder_139917733042864 | CLIPTextModel | cpu | torch.float32 | 0.46 | N/A | N/A
+clip_139917733042864 | CLIPTextModel | cpu | torch.float32 | 0.46 | N/A | N/A
+text_encoder_139917732983664 | CLIPTextModel | cpu | torch.float32 | 0.46 | N/A | N/A
+-----------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+We recommend using `ComponentSpec` to load your models. Models loaded with `ComponentSpec` get tagged with a unique ID that encodes their loading parameters, allowing the Components Manager to detect when different objects represent the same underlying checkpoint:
+
+```py
+from diffusers import ComponentSpec, ComponentsManager
+from transformers import CLIPTextModel
+comp = ComponentsManager()
+
+# Create ComponentSpec for the first text encoder
+spec = ComponentSpec(name="text_encoder", repo="stabilityai/stable-diffusion-xl-base-1.0", subfolder="text_encoder", type_hint=AutoModel)
+# Create ComponentSpec for a duplicate text encoder (it is same checkpoint, from same repo/subfolder)
+spec_duplicated = ComponentSpec(name="text_encoder_duplicated", repo="stabilityai/stable-diffusion-xl-base-1.0", subfolder="text_encoder", type_hint=CLIPTextModel)
+
+# Load and add both components - the manager will detect they're the same model
+comp.add("text_encoder", spec.load())
+comp.add("text_encoder_duplicated", spec_duplicated.load())
+```
+
+Now the manager detects the duplicate and warns you:
+
+```out
+ComponentsManager: adding component 'text_encoder_duplicated_139917580682672', but it has duplicate load_id 'stabilityai/stable-diffusion-xl-base-1.0|text_encoder|null|null' with existing components: text_encoder_139918506246832. To remove a duplicate, call `components_manager.remove('')`.
+'text_encoder_duplicated_139917580682672'
+```
+
+Both models now show the same `load_id`, making it clear they're the same model:
+
+```py
+>>> comp
+Components:
+======================================================================================================================================================================================================
+Models:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID | Class | Device: act(exec) | Dtype | Size (GB) | Load ID | Collection
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+text_encoder_139918506246832 | CLIPTextModel | cpu | torch.float32 | 0.46 | stabilityai/stable-diffusion-xl-base-1.0|text_encoder|null|null | N/A
+text_encoder_duplicated_139917580682672 | CLIPTextModel | cpu | torch.float32 | 0.46 | stabilityai/stable-diffusion-xl-base-1.0|text_encoder|null|null | N/A
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+## Collections
+
+Collections are labels you can assign to components for better organization and management. You add a component under a collection by passing the `collection=` parameter when you add the component to the manager, i.e. `add(name, component, collection=...)`. Within each collection, only one component per name is allowed - if you add a second component with the same name, the first one is automatically removed.
+
+Here's how collections work in practice:
+
+```py
+comp = ComponentsManager()
+# Create ComponentSpec for the first UNet (SDXL base)
+spec = ComponentSpec(name="unet", repo="stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", type_hint=AutoModel)
+# Create ComponentSpec for a different UNet (Juggernaut-XL)
+spec2 = ComponentSpec(name="unet", repo="RunDiffusion/Juggernaut-XL-v9", subfolder="unet", type_hint=AutoModel, variant="fp16")
+
+# Add both UNets to the same collection - the second one will replace the first
+comp.add("unet", spec.load(), collection="sdxl")
+comp.add("unet", spec2.load(), collection="sdxl")
+```
+
+The manager automatically removes the old UNet and adds the new one:
+
+```out
+ComponentsManager: removing existing unet from collection 'sdxl': unet_139917723891888
+'unet_139917723893136'
+```
+
+Only one UNet remains in the collection:
+
+```py
+>>> comp
+Components:
+====================================================================================================================================================================
+Models:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID | Class | Device: act(exec) | Dtype | Size (GB) | Load ID | Collection
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
+unet_139917723893136 | UNet2DConditionModel | cpu | torch.float32 | 9.56 | RunDiffusion/Juggernaut-XL-v9|unet|fp16|null | sdxl
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+For example, in node-based systems, you can mark all models loaded from one node with the same collection label, automatically replace models when user loads new checkpoints under same name, batch delete all models in a collection when a node is removed.
+
+## Retrieving Components
+
+The Components Manager provides several methods to retrieve registered components.
+
+The `get_one()` method returns a single component and supports pattern matching for the `name` parameter. You can use:
+- exact matches like `comp.get_one(name="unet")`
+- wildcards like `comp.get_one(name="unet*")` for components starting with "unet"
+- exclusion patterns like `comp.get_one(name="!unet")` to exclude components named "unet"
+- OR patterns like `comp.get_one(name="unet|vae")` to match either "unet" OR "vae".
+
+Optionally, You can add collection and load_id as filters e.g. `comp.get_one(name="unet", collection="sdxl")`. If multiple components match, `get_one()` throws an error.
+
+Another useful method is `get_components_by_names()`, which takes a list of names and returns a dictionary mapping names to components. This is particularly helpful with modular pipelines since they provide lists of required component names, and the returned dictionary can be directly passed to `pipeline.update_components()`.
+
+```py
+# Get components by name list
+component_dict = comp.get_components_by_names(names=["text_encoder", "unet", "vae"])
+# Returns: {"text_encoder": component1, "unet": component2, "vae": component3}
+```
+
+## Using Components Manager with Modular Pipelines
+
+The Components Manager integrates seamlessly with Modular Pipelines. All you need to do is pass a Components Manager instance to `from_pretrained()` or `init_pipeline()` with an optional `collection` parameter:
+
+```py
+from diffusers import ModularPipeline, ComponentsManager
+comp = ComponentsManager()
+pipe = ModularPipeline.from_pretrained("YiYiXu/modular-demo-auto", components_manager=comp, collection="test1")
+```
+
+By default, modular pipelines don't load components immediately, so both the pipeline and Components Manager start empty:
+
+```py
+>>> comp
+Components:
+==================================================
+No components registered.
+==================================================
+```
+
+When you load components on the pipeline, they are automatically registered in the Components Manager:
+
+```py
+>>> pipe.load_components(names="unet")
+>>> comp
+Components:
+==============================================================================================================================================================
+Models:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID | Class | Device: act(exec) | Dtype | Size (GB) | Load ID | Collection
+--------------------------------------------------------------------------------------------------------------------------------------------------------------
+unet_139917726686304 | UNet2DConditionModel | cpu | torch.float32 | 9.56 | SG161222/RealVisXL_V4.0|unet|null|null | test1
+--------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+Now let's load all default components and then create a second pipeline that reuses all components from the first one. We pass the same Components Manager to the second pipeline but with a different collection:
+
+```py
+# Load all default components
+>>> pipe.load_default_components()
+
+# Create a second pipeline using the same Components Manager but with a different collection
+>>> pipe2 = ModularPipeline.from_pretrained("YiYiXu/modular-demo-auto", components_manager=comp, collection="test2")
+```
+
+As mentioned earlier, `ModularPipeline` has a property `null_component_names` that returns a list of component names it needs to load. We can conveniently use this list with the `get_components_by_names` method on the Components Manager:
+
+```py
+# Get the list of components that pipe2 needs to load
+>>> pipe2.null_component_names
+['text_encoder', 'text_encoder_2', 'tokenizer', 'tokenizer_2', 'image_encoder', 'unet', 'vae', 'scheduler', 'controlnet']
+
+# Retrieve all required components from the Components Manager
+>>> comp_dict = comp.get_components_by_names(names=pipe2.null_component_names)
+
+# Update the pipeline with the retrieved components
+>>> pipe2.update_components(**comp_dict)
+```
+
+The warnings that follow are expected and indicate that the Components Manager is correctly identifying that these components already exist and will be reused rather than creating duplicates:
+
+```out
+ComponentsManager: component 'text_encoder' already exists as 'text_encoder_139917586016400'
+ComponentsManager: component 'text_encoder_2' already exists as 'text_encoder_2_139917699973424'
+ComponentsManager: component 'tokenizer' already exists as 'tokenizer_139917580599504'
+ComponentsManager: component 'tokenizer_2' already exists as 'tokenizer_2_139915763443904'
+ComponentsManager: component 'image_encoder' already exists as 'image_encoder_139917722468304'
+ComponentsManager: component 'unet' already exists as 'unet_139917580609632'
+ComponentsManager: component 'vae' already exists as 'vae_139917722459040'
+ComponentsManager: component 'scheduler' already exists as 'scheduler_139916266559408'
+ComponentsManager: component 'controlnet' already exists as 'controlnet_139917722454432'
+```
+
+
+The pipeline is now fully loaded:
+
+```py
+# null_component_names return empty list, meaning everything are loaded
+>>> pipe2.null_component_names
+[]
+```
+
+No new components were added to the Components Manager - we're reusing everything. All models are now associated with both `test1` and `test2` collections, showing that these components are shared across multiple pipelines:
+```py
+>>> comp
+Components:
+========================================================================================================================================================================================
+Models:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+Name_ID | Class | Device: act(exec) | Dtype | Size (GB) | Load ID | Collection
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+text_encoder_139917586016400 | CLIPTextModel | cpu | torch.float32 | 0.46 | SG161222/RealVisXL_V4.0|text_encoder|null|null | test1
+ | | | | | | test2
+text_encoder_2_139917699973424 | CLIPTextModelWithProjection | cpu | torch.float32 | 2.59 | SG161222/RealVisXL_V4.0|text_encoder_2|null|null | test1
+ | | | | | | test2
+unet_139917580609632 | UNet2DConditionModel | cpu | torch.float32 | 9.56 | SG161222/RealVisXL_V4.0|unet|null|null | test1
+ | | | | | | test2
+controlnet_139917722454432 | ControlNetModel | cpu | torch.float32 | 4.66 | diffusers/controlnet-canny-sdxl-1.0|null|null|null | test1
+ | | | | | | test2
+vae_139917722459040 | AutoencoderKL | cpu | torch.float32 | 0.31 | SG161222/RealVisXL_V4.0|vae|null|null | test1
+ | | | | | | test2
+image_encoder_139917722468304 | CLIPVisionModelWithProjection | cpu | torch.float32 | 6.87 | h94/IP-Adapter|sdxl_models/image_encoder|null|null | test1
+ | | | | | | test2
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Other Components:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ID | Class | Collection
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+tokenizer_139917580599504 | CLIPTokenizer | test1
+ | | test2
+scheduler_139916266559408 | EulerDiscreteScheduler | test1
+ | | test2
+tokenizer_2_139915763443904 | CLIPTokenizer | test1
+ | | test2
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Additional Component Info:
+==================================================
+```
+
+
+## Automatic Memory Management
+
+The Components Manager provides a global offloading strategy across all models, regardless of which pipeline is using them:
+
+```py
+comp.enable_auto_cpu_offload(device="cuda")
+```
+
+When enabled, all models start on CPU. The manager moves models to the device right before they're used and moves other models back to CPU when GPU memory runs low. You can set your own rules for which models to offload first. This works smoothly as you add or remove components. Once it's on, you don't need to worry about device placement - you can focus on your workflow.
+
+
+
+## Practical Example: Building Modular Workflows with Component Reuse
+
+Now that we've covered the basics of the Components Manager, let's walk through a practical example that shows how to build workflows in a modular setting and use the Components Manager to reuse components across multiple pipelines. This example demonstrates the true power of Modular Diffusers by working with multiple pipelines that can share components.
+
+In this example, we'll generate latents from a text-to-image pipeline, then refine them with an image-to-image pipeline.
+
+Let's create a modular text-to-image workflow by separating it into three workflows: `text_blocks` for encoding prompts, `t2i_blocks` for generating latents, and `decoder_blocks` for creating final images.
+
+```py
+import torch
+from diffusers.modular_pipelines import SequentialPipelineBlocks
+from diffusers.modular_pipelines.stable_diffusion_xl import ALL_BLOCKS
+
+# Create modular blocks and separate text encoding and decoding steps
+t2i_blocks = SequentialPipelineBlocks.from_blocks_dict(ALL_BLOCKS["text2img"])
+text_blocks = t2i_blocks.sub_blocks.pop("text_encoder")
+decoder_blocks = t2i_blocks.sub_blocks.pop("decode")
+```
+
+Now we will convert them into runnalbe pipelines and set up the Components Manager with auto offloading and organize components under a "t2i" collection
+
+Since we now have 3 different workflows that share components, we create a separate pipeline that serves as a dedicated loader to load all the components, register them to the component manager, and then reuse them across different workflows.
+
+```py
+from diffusers import ComponentsManager, ModularPipeline
+
+# Set up Components Manager with auto offloading
+components = ComponentsManager()
+components.enable_auto_cpu_offload(device="cuda")
+
+# Create a new pipeline to load the components
+t2i_repo = "YiYiXu/modular-demo-auto"
+t2i_loader_pipe = ModularPipeline.from_pretrained(t2i_repo, components_manager=components, collection="t2i")
+
+# convert the 3 blocks into pipelines and attach the same components manager to all 3
+text_node = text_blocks.init_pipeline(t2i_repo, components_manager=components)
+decoder_node = decoder_blocks.init_pipeline(t2i_repo, components_manager=components)
+t2i_pipe = t2i_blocks.init_pipeline(t2i_repo, components_manager=components)
+```
+
+Load all components into the loader pipeline, they should all be automatically registered to Components Manager under the "t2i" collection:
+
+```py
+# Load all components (including IP-Adapter and ControlNet for later use)
+t2i_loader_pipe.load_default_components(torch_dtype=torch.float16)
+```
+
+Now distribute the loaded components to each pipeline:
+
+```py
+# Get VAE for decoder (using get_one since there's only one)
+vae = components.get_one(load_id="SG161222/RealVisXL_V4.0|vae|null|null")
+decoder_node.update_components(vae=vae)
+
+# Get text components for text node (using get_components_by_names for multiple components)
+text_components = components.get_components_by_names(text_node.null_component_names)
+text_node.update_components(**text_components)
+
+# Get remaining components for t2i pipeline
+t2i_components = components.get_components_by_names(t2i_pipe.null_component_names)
+t2i_pipe.update_components(**t2i_components)
+```
+
+Now we can generate images using our modular workflow:
+
+```py
+# Generate text embeddings
+prompt = "an astronaut"
+text_embeddings = text_node(prompt=prompt, output=["prompt_embeds","negative_prompt_embeds", "pooled_prompt_embeds", "negative_pooled_prompt_embeds"])
+
+# Generate latents and decode to image
+generator = torch.Generator(device="cuda").manual_seed(0)
+latents_t2i = t2i_pipe(**text_embeddings, num_inference_steps=25, generator=generator, output="latents")
+image = decoder_node(latents=latents_t2i, output="images")[0]
+image.save("modular_part2_t2i.png")
+```
+
+Let's add a LoRA:
+
+```py
+# Load LoRA weights
+>>> t2i_loader_pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy_face")
+>>> components
+Components:
+============================================================================================================================================================
+...
+Additional Component Info:
+==================================================
+
+unet:
+ Adapters: ['toy_face']
+```
+
+You can see that the Components Manager tracks adapters metadata for all models it manages, and in our case, only Unet has lora loaded. This means we can reuse existing text embeddings.
+
+```py
+# Generate with LoRA (reusing existing text embeddings)
+generator = torch.Generator(device="cuda").manual_seed(0)
+latents_lora = t2i_pipe(**text_embeddings, num_inference_steps=25, generator=generator, output="latents")
+image = decoder_node(latents=latents_lora, output="images")[0]
+image.save("modular_part2_lora.png")
+```
+
+
+Now let's create a refiner pipeline that reuses components from our text-to-image workflow:
+
+```py
+# Create refiner blocks (removing image_encoder and decode since we work with latents)
+refiner_blocks = SequentialPipelineBlocks.from_blocks_dict(ALL_BLOCKS["img2img"])
+refiner_blocks.sub_blocks.pop("image_encoder")
+refiner_blocks.sub_blocks.pop("decode")
+
+# Create refiner pipeline with different repo and collection,
+# Attach the same component manager to it
+refiner_repo = "YiYiXu/modular_refiner"
+refiner_pipe = refiner_blocks.init_pipeline(refiner_repo, components_manager=components, collection="refiner")
+```
+
+We pass the **same Components Manager** (`components`) to the refiner pipeline, but with a **different collection** (`"refiner"`). This allows the refiner to access and reuse components from the "t2i" collection while organizing its own components (like the refiner UNet) under the "refiner" collection.
+
+```py
+# Load only the refiner UNet (different from t2i UNet)
+refiner_pipe.load_components(names="unet", torch_dtype=torch.float16)
+
+# Reuse components from t2i pipeline using pattern matching
+reuse_components = components.search_components("text_encoder_2|scheduler|vae|tokenizer_2")
+refiner_pipe.update_components(**reuse_components)
+```
+
+When we reuse components from the "t2i" collection, they automatically get added to the "refiner" collection as well. You can verify this by checking the Components Manager - you'll see components like `vae`, `scheduler`, etc. listed under both collections, indicating they're shared between workflows.
+
+Now we can refine any of our generated latents:
+
+```py
+# Refine all our different latents
+refined_latents = refiner_pipe(image_latents=latents_t2i, prompt=prompt, num_inference_steps=10, output="latents")
+refined_image = decoder_node(latents=refined_latents, output="images")[0]
+refined_image.save("modular_part2_t2i_refine_out.png")
+
+refined_latents = refiner_pipe(image_latents=latents_lora, prompt=prompt, num_inference_steps=10, output="latents")
+refined_image = decoder_node(latents=refined_latents, output="images")[0]
+refined_image.save("modular_part2_lora_refine_out.png")
+```
+
+
+Here are the results from our modular pipeline examples.
+
+#### Base Text-to-Image Generation
+| Base Text-to-Image | Base Text-to-Image (Refined) |
+|-------------------|------------------------------|
+|  |  |
+
+#### LoRA
+| LoRA | LoRA (Refined) |
+|-------------------|------------------------------|
+|  |  |
+
diff --git a/docs/source/en/modular_diffusers/end_to_end_guide.md b/docs/source/en/modular_diffusers/end_to_end_guide.md
new file mode 100644
index 0000000000..cb7b87552a
--- /dev/null
+++ b/docs/source/en/modular_diffusers/end_to_end_guide.md
@@ -0,0 +1,648 @@
+
+
+# End-to-End Developer Guide: Building with Modular Diffusers
+
+
+
+🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
+
+
+
+
+In this tutorial we will walk through the process of adding a new pipeline to the modular framework using differential diffusion as our example. We'll cover the complete workflow from implementation to deployment: implementing the new pipeline, ensuring compatibility with existing tools, sharing the code on Hugging Face Hub, and deploying it as a UI node.
+
+We'll also demonstrate the 4-step framework process we use for implementing new basic pipelines in the modular system.
+
+1. **Start with an existing pipeline as a base**
+ - Identify which existing pipeline is most similar to the one you want to implement
+ - Determine what part of the pipeline needs modification
+
+2. **Build a working pipeline structure first**
+ - Assemble the complete pipeline structure
+ - Use existing blocks wherever possible
+ - For new blocks, create placeholders (e.g. you can copy from similar blocks and change the name) without implementing custom logic just yet
+
+3. **Set up an example**
+ - Create a simple inference script with expected inputs/outputs
+
+4. **Implement your custom logic and test incrementally**
+ - Add the custom logics the blocks you want to change
+ - Test incrementally, and inspect pipeline states and debug as needed
+
+Let's see how this works with the Differential Diffusion example.
+
+
+## Differential Diffusion Pipeline
+
+### Start with an existing pipeline
+
+Differential diffusion (https://differential-diffusion.github.io/) is an image-to-image workflow, so it makes sense for us to start with the preset of pipeline blocks used to build img2img pipeline (`IMAGE2IMAGE_BLOCKS`) and see how we can build this new pipeline with them.
+
+```py
+>>> from diffusers.modular_pipelines.stable_diffusion_xl import IMAGE2IMAGE_BLOCKS
+>>> IMAGE2IMAGE_BLOCKS = InsertableDict([
+... ("text_encoder", StableDiffusionXLTextEncoderStep),
+... ("image_encoder", StableDiffusionXLVaeEncoderStep),
+... ("input", StableDiffusionXLInputStep),
+... ("set_timesteps", StableDiffusionXLImg2ImgSetTimestepsStep),
+... ("prepare_latents", StableDiffusionXLImg2ImgPrepareLatentsStep),
+... ("prepare_add_cond", StableDiffusionXLImg2ImgPrepareAdditionalConditioningStep),
+... ("denoise", StableDiffusionXLDenoiseStep),
+... ("decode", StableDiffusionXLDecodeStep)
+... ])
+```
+
+Note that "denoise" (`StableDiffusionXLDenoiseStep`) is a `LoopSequentialPipelineBlocks` that contains 3 loop blocks (more on LoopSequentialPipelineBlocks [here](https://huggingface.co/docs/diffusers/modular_diffusers/write_own_pipeline_block#loopsequentialpipelineblocks))
+
+```py
+>>> denoise_blocks = IMAGE2IMAGE_BLOCKS["denoise"]()
+>>> print(denoise_blocks)
+```
+
+```out
+StableDiffusionXLDenoiseStep(
+ Class: StableDiffusionXLDenoiseLoopWrapper
+
+ Description: Denoise step that iteratively denoise the latents.
+ Its loop logic is defined in `StableDiffusionXLDenoiseLoopWrapper.__call__` method
+ At each iteration, it runs blocks defined in `sub_blocks` sequencially:
+ - `StableDiffusionXLLoopBeforeDenoiser`
+ - `StableDiffusionXLLoopDenoiser`
+ - `StableDiffusionXLLoopAfterDenoiser`
+ This block supports both text2img and img2img tasks.
+
+
+ Components:
+ scheduler (`EulerDiscreteScheduler`)
+ guider (`ClassifierFreeGuidance`)
+ unet (`UNet2DConditionModel`)
+
+ Sub-Blocks:
+ [0] before_denoiser (StableDiffusionXLLoopBeforeDenoiser)
+ Description: step within the denoising loop that prepare the latent input for the denoiser. This block should be used to compose the `sub_blocks` attribute of a `LoopSequentialPipelineBlocks` object (e.g. `StableDiffusionXLDenoiseLoopWrapper`)
+
+ [1] denoiser (StableDiffusionXLLoopDenoiser)
+ Description: Step within the denoising loop that denoise the latents with guidance. This block should be used to compose the `sub_blocks` attribute of a `LoopSequentialPipelineBlocks` object (e.g. `StableDiffusionXLDenoiseLoopWrapper`)
+
+ [2] after_denoiser (StableDiffusionXLLoopAfterDenoiser)
+ Description: step within the denoising loop that update the latents. This block should be used to compose the `sub_blocks` attribute of a `LoopSequentialPipelineBlocks` object (e.g. `StableDiffusionXLDenoiseLoopWrapper`)
+
+)
+```
+
+Let's compare standard image-to-image and differential diffusion! The key difference in algorithm is that standard image-to-image diffusion applies uniform noise across all pixels based on a single `strength` parameter, but differential diffusion uses a change map where each pixel value determines when that region starts denoising. Regions with lower values get "frozen" earlier by replacing them with noised original latents, preserving more of the original image.
+
+Therefore, the key differences when it comes to pipeline implementation would be:
+1. The `prepare_latents` step (which prepares the change map and pre-computes noised latents for all timesteps)
+2. The `denoise` step (which selectively applies denoising based on the change map)
+3. Since differential diffusion doesn't use the `strength` parameter, we'll use the text-to-image `set_timesteps` step instead of the image-to-image version
+
+To implement differntial diffusion, we can reuse most blocks from image-to-image and text-to-image workflows, only modifying the `prepare_latents` step and the first part of the `denoise` step (i.e. `before_denoiser (StableDiffusionXLLoopBeforeDenoiser)`).
+
+Here's a flowchart showing the pipeline structure and the changes we need to make:
+
+
+
+
+
+### Build a Working Pipeline Structure
+
+ok now we've identified the blocks to modify, let's build the pipeline skeleton first - at this stage, our goal is to get the pipeline struture working end-to-end (even though it's just doing the img2img behavior). I would simply create placeholder blocks by copying from existing ones:
+
+```py
+>>> # Copy existing blocks as placeholders
+>>> class SDXLDiffDiffPrepareLatentsStep(PipelineBlock):
+... """Copied from StableDiffusionXLImg2ImgPrepareLatentsStep - will modify later"""
+... # ... same implementation as StableDiffusionXLImg2ImgPrepareLatentsStep
+...
+>>> class SDXLDiffDiffLoopBeforeDenoiser(PipelineBlock):
+... """Copied from StableDiffusionXLLoopBeforeDenoiser - will modify later"""
+... # ... same implementation as StableDiffusionXLLoopBeforeDenoiser
+```
+
+`SDXLDiffDiffLoopBeforeDenoiser` is the be part of the denoise loop we need to change. Let's use it to assemble a `SDXLDiffDiffDenoiseStep`.
+
+```py
+>>> class SDXLDiffDiffDenoiseStep(StableDiffusionXLDenoiseLoopWrapper):
+... block_classes = [SDXLDiffDiffLoopBeforeDenoiser, StableDiffusionXLLoopDenoiser, StableDiffusionXLLoopAfterDenoiser]
+... block_names = ["before_denoiser", "denoiser", "after_denoiser"]
+```
+
+Now we can put together our differential diffusion pipeline.
+
+```py
+>>> DIFFDIFF_BLOCKS = IMAGE2IMAGE_BLOCKS.copy()
+>>> DIFFDIFF_BLOCKS["set_timesteps"] = TEXT2IMAGE_BLOCKS["set_timesteps"]
+>>> DIFFDIFF_BLOCKS["prepare_latents"] = SDXLDiffDiffPrepareLatentsStep
+>>> DIFFDIFF_BLOCKS["denoise"] = SDXLDiffDiffDenoiseStep
+>>>
+>>> dd_blocks = SequentialPipelineBlocks.from_blocks_dict(DIFFDIFF_BLOCKS)
+>>> print(dd_blocks)
+>>> # At this point, the pipeline works exactly like img2img since our blocks are just copies
+```
+
+### Set up an example
+
+ok, so now our blocks should be able to compile without an error, we can move on to the next step. Let's setup a simple example so we can run the pipeline as we build it. diff-diff use same model checkpoints as SDXL so we can fetch the models from a regular SDXL repo.
+
+```py
+>>> dd_pipeline = dd_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff")
+>>> dd_pipeline.load_default_componenets(torch_dtype=torch.float16)
+>>> dd_pipeline.to("cuda")
+```
+
+We will use this example script:
+
+```py
+>>> image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png?download=true")
+>>> mask = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask.png?download=true")
+>>>
+>>> prompt = "a green pear"
+>>> negative_prompt = "blurry"
+>>>
+>>> image = dd_pipeline(
+... prompt=prompt,
+... negative_prompt=negative_prompt,
+... num_inference_steps=25,
+... diffdiff_map=mask,
+... image=image,
+... output="images"
+... )[0]
+>>>
+>>> image.save("diffdiff_out.png")
+```
+
+If you run the script right now, you will get a complaint about unexpected input `diffdiff_map`.
+and you would get the same result as the original img2img pipeline.
+
+### implement your custom logic and test incrementally
+
+Let's modify the pipeline so that we can get expected result with this example script.
+
+We'll start with the `prepare_latents` step. The main changes are:
+- Requires a new user input `diffdiff_map`
+- Requires new component `mask_processor` to process the `diffdiff_map`
+- Requires new intermediate inputs:
+ - Need `timestep` instead of `latent_timestep` to precompute all the latents
+ - Need `num_inference_steps` to create the `diffdiff_masks`
+- create a new output `diffdiff_masks` and `original_latents`
+
+
+
+💡 use `print(dd_pipeline.doc)` to check compiled inputs and outputs of the built piepline.
+
+e.g. after we added `diffdiff_map` as an input in this step, we can run `print(dd_pipeline.doc)` to verify that it shows up in the docstring as a user input.
+
+
+
+Once we make sure all the variables we need are available in the block state, we can implement the diff-diff logic inside `__call__`. We created 2 new variables: the change map `diffdiff_mask` and the pre-computed noised latents for all timesteps `original_latents`.
+
+
+
+💡 Implement incrementally! Run the example script as you go, and insert `print(state)` and `print(block_state)` everywhere inside the `__call__` method to inspect the intermediate results. This helps you understand what's going on and what each line you just added does.
+
+
+
+Here are the key changes we made to implement differential diffusion:
+
+**1. Modified `prepare_latents` step:**
+```diff
+class SDXLDiffDiffPrepareLatentsStep(PipelineBlock):
+ @property
+ def expected_components(self) -> List[ComponentSpec]:
+ return [
+ ComponentSpec("vae", AutoencoderKL),
+ ComponentSpec("scheduler", EulerDiscreteScheduler),
++ ComponentSpec("mask_processor", VaeImageProcessor, config=FrozenDict({"do_normalize": False, "do_convert_grayscale": True}))
+ ]
+
+ @property
+ def inputs(self) -> List[Tuple[str, Any]]:
+ return [
++ InputParam("diffdiff_map", required=True),
+ ]
+
+ @property
+ def intermediate_inputs(self) -> List[InputParam]:
+ return [
+ InputParam("generator"),
+- InputParam("latent_timestep", required=True, type_hint=torch.Tensor),
++ InputParam("timesteps", type_hint=torch.Tensor),
++ InputParam("num_inference_steps", type_hint=int),
+ ]
+
+ @property
+ def intermediate_outputs(self) -> List[OutputParam]:
+ return [
++ OutputParam("original_latents", type_hint=torch.Tensor),
++ OutputParam("diffdiff_masks", type_hint=torch.Tensor),
+ ]
+
+ def __call__(self, components, state: PipelineState):
+ # ... existing logic ...
++ # Process change map and create masks
++ diffdiff_map = components.mask_processor.preprocess(block_state.diffdiff_map, height=latent_height, width=latent_width)
++ thresholds = torch.arange(block_state.num_inference_steps, dtype=diffdiff_map.dtype) / block_state.num_inference_steps
++ block_state.diffdiff_masks = diffdiff_map > (thresholds + (block_state.denoising_start or 0))
++ block_state.original_latents = block_state.latents
+```
+
+**2. Modified `before_denoiser` step:**
+```diff
+class SDXLDiffDiffLoopBeforeDenoiser(PipelineBlock):
+ @property
+ def description(self) -> str:
+ return (
+ "Step within the denoising loop for differential diffusion that prepare the latent input for the denoiser"
+ )
+
++ @property
++ def inputs(self) -> List[Tuple[str, Any]]:
++ return [
++ InputParam("denoising_start"),
++ ]
+
+ @property
+ def intermediate_inputs(self) -> List[str]:
+ return [
+ InputParam("latents", required=True, type_hint=torch.Tensor),
++ InputParam("original_latents", type_hint=torch.Tensor),
++ InputParam("diffdiff_masks", type_hint=torch.Tensor),
+ ]
+
+ def __call__(self, components, block_state, i, t):
++ # Apply differential diffusion logic
++ if i == 0 and block_state.denoising_start is None:
++ block_state.latents = block_state.original_latents[:1]
++ else:
++ block_state.mask = block_state.diffdiff_masks[i].unsqueeze(0).unsqueeze(1)
++ block_state.latents = block_state.original_latents[i] * block_state.mask + block_state.latents * (1 - block_state.mask)
+
+ # ... rest of existing logic ...
+```
+
+That's all there is to it! We've just created a simple sequential pipeline by mix-and-match some existing and new pipeline blocks.
+
+Now we use the process we've prepred in step2 to build the pipeline and inspect it.
+
+
+```py
+>> dd_pipeline
+SequentialPipelineBlocks(
+ Class: ModularPipelineBlocks
+
+ Description:
+
+
+ Components:
+ text_encoder (`CLIPTextModel`)
+ text_encoder_2 (`CLIPTextModelWithProjection`)
+ tokenizer (`CLIPTokenizer`)
+ tokenizer_2 (`CLIPTokenizer`)
+ guider (`ClassifierFreeGuidance`)
+ vae (`AutoencoderKL`)
+ image_processor (`VaeImageProcessor`)
+ scheduler (`EulerDiscreteScheduler`)
+ mask_processor (`VaeImageProcessor`)
+ unet (`UNet2DConditionModel`)
+
+ Configs:
+ force_zeros_for_empty_prompt (default: True)
+ requires_aesthetics_score (default: False)
+
+ Blocks:
+ [0] text_encoder (StableDiffusionXLTextEncoderStep)
+ Description: Text Encoder step that generate text_embeddings to guide the image generation
+
+ [1] image_encoder (StableDiffusionXLVaeEncoderStep)
+ Description: Vae Encoder step that encode the input image into a latent representation
+
+ [2] input (StableDiffusionXLInputStep)
+ Description: Input processing step that:
+ 1. Determines `batch_size` and `dtype` based on `prompt_embeds`
+ 2. Adjusts input tensor shapes based on `batch_size` (number of prompts) and `num_images_per_prompt`
+
+ All input tensors are expected to have either batch_size=1 or match the batch_size
+ of prompt_embeds. The tensors will be duplicated across the batch dimension to
+ have a final batch_size of batch_size * num_images_per_prompt.
+
+ [3] set_timesteps (StableDiffusionXLSetTimestepsStep)
+ Description: Step that sets the scheduler's timesteps for inference
+
+ [4] prepare_latents (SDXLDiffDiffPrepareLatentsStep)
+ Description: Step that prepares the latents for the differential diffusion generation process
+
+ [5] prepare_add_cond (StableDiffusionXLImg2ImgPrepareAdditionalConditioningStep)
+ Description: Step that prepares the additional conditioning for the image-to-image/inpainting generation process
+
+ [6] denoise (SDXLDiffDiffDenoiseStep)
+ Description: Pipeline block that iteratively denoise the latents over `timesteps`. The specific steps with each iteration can be customized with `sub_blocks` attributes
+
+ [7] decode (StableDiffusionXLDecodeStep)
+ Description: Step that decodes the denoised latents into images
+
+)
+```
+
+Run the example now, you should see an apple with its right half transformed into a green pear.
+
+
+
+
+## Adding IP-adapter
+
+We provide an auto IP-adapter block that you can plug-and-play into your modular workflow. It's an `AutoPipelineBlocks`, so it will only run when the user passes an IP adapter image. In this tutorial, we'll focus on how to package it into your differential diffusion workflow. To learn more about `AutoPipelineBlocks`, see [here](./auto_pipeline_blocks.md)
+
+We talked about how to add IP-adapter into your workflow in the [Modular Pipeline Guide](./modular_pipeline.md). Let's just go ahead to create the IP-adapter block.
+
+```py
+>>> from diffusers.modular_pipelines.stable_diffusion_xl.encoders import StableDiffusionXLAutoIPAdapterStep
+>>> ip_adapter_block = StableDiffusionXLAutoIPAdapterStep()
+```
+
+We can directly add the ip-adapter block instance to the `diffdiff_blocks` that we created before. The `sub_blocks` attribute is a `InsertableDict`, so we're able to insert the it at specific position (index `0` here).
+
+```py
+>>> dd_blocks.sub_blocks.insert("ip_adapter", ip_adapter_block, 0)
+```
+
+Take a look at the new diff-diff pipeline with ip-adapter!
+
+```py
+>>> print(dd_blocks)
+```
+
+The pipeline now lists ip-adapter as its first block, and tells you that it will run only if `ip_adapter_image` is provided. It also includes the two new components from ip-adpater: `image_encoder` and `feature_extractor`
+
+```out
+SequentialPipelineBlocks(
+ Class: ModularPipelineBlocks
+
+ ====================================================================================================
+ This pipeline contains blocks that are selected at runtime based on inputs.
+ Trigger Inputs: {'ip_adapter_image'}
+ Use `get_execution_blocks()` with input names to see selected blocks (e.g. `get_execution_blocks('ip_adapter_image')`).
+ ====================================================================================================
+
+
+ Description:
+
+
+ Components:
+ image_encoder (`CLIPVisionModelWithProjection`)
+ feature_extractor (`CLIPImageProcessor`)
+ unet (`UNet2DConditionModel`)
+ guider (`ClassifierFreeGuidance`)
+ text_encoder (`CLIPTextModel`)
+ text_encoder_2 (`CLIPTextModelWithProjection`)
+ tokenizer (`CLIPTokenizer`)
+ tokenizer_2 (`CLIPTokenizer`)
+ vae (`AutoencoderKL`)
+ image_processor (`VaeImageProcessor`)
+ scheduler (`EulerDiscreteScheduler`)
+ mask_processor (`VaeImageProcessor`)
+
+ Configs:
+ force_zeros_for_empty_prompt (default: True)
+ requires_aesthetics_score (default: False)
+
+ Blocks:
+ [0] ip_adapter (StableDiffusionXLAutoIPAdapterStep)
+ Description: Run IP Adapter step if `ip_adapter_image` is provided.
+
+ [1] text_encoder (StableDiffusionXLTextEncoderStep)
+ Description: Text Encoder step that generate text_embeddings to guide the image generation
+
+ [2] image_encoder (StableDiffusionXLVaeEncoderStep)
+ Description: Vae Encoder step that encode the input image into a latent representation
+
+ [3] input (StableDiffusionXLInputStep)
+ Description: Input processing step that:
+ 1. Determines `batch_size` and `dtype` based on `prompt_embeds`
+ 2. Adjusts input tensor shapes based on `batch_size` (number of prompts) and `num_images_per_prompt`
+
+ All input tensors are expected to have either batch_size=1 or match the batch_size
+ of prompt_embeds. The tensors will be duplicated across the batch dimension to
+ have a final batch_size of batch_size * num_images_per_prompt.
+
+ [4] set_timesteps (StableDiffusionXLSetTimestepsStep)
+ Description: Step that sets the scheduler's timesteps for inference
+
+ [5] prepare_latents (SDXLDiffDiffPrepareLatentsStep)
+ Description: Step that prepares the latents for the differential diffusion generation process
+
+ [6] prepare_add_cond (StableDiffusionXLImg2ImgPrepareAdditionalConditioningStep)
+ Description: Step that prepares the additional conditioning for the image-to-image/inpainting generation process
+
+ [7] denoise (SDXLDiffDiffDenoiseStep)
+ Description: Pipeline block that iteratively denoise the latents over `timesteps`. The specific steps with each iteration can be customized with `sub_blocks` attributes
+
+ [8] decode (StableDiffusionXLDecodeStep)
+ Description: Step that decodes the denoised latents into images
+
+)
+```
+
+Let's test it out. We used an orange image to condition the generation via ip-addapter and we can see a slight orange color and texture in the final output.
+
+
+```py
+>>> ip_adapter_block = StableDiffusionXLAutoIPAdapterStep()
+>>> dd_blocks.sub_blocks.insert("ip_adapter", ip_adapter_block, 0)
+>>>
+>>> dd_pipeline = dd_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff")
+>>> dd_pipeline.load_default_components(torch_dtype=torch.float16)
+>>> dd_pipeline.loader.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
+>>> dd_pipeline.loader.set_ip_adapter_scale(0.6)
+>>> dd_pipeline = dd_pipeline.to(device)
+>>>
+>>> ip_adapter_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/diffdiff_orange.jpeg")
+>>> image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png?download=true")
+>>> mask = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask.png?download=true")
+>>>
+>>> prompt = "a green pear"
+>>> negative_prompt = "blurry"
+>>> generator = torch.Generator(device=device).manual_seed(42)
+>>>
+>>> image = dd_pipeline(
+... prompt=prompt,
+... negative_prompt=negative_prompt,
+... num_inference_steps=25,
+... generator=generator,
+... ip_adapter_image=ip_adapter_image,
+... diffdiff_map=mask,
+... image=image,
+... output="images"
+... )[0]
+```
+
+## Working with ControlNets
+
+What about controlnet? Can differential diffusion work with controlnet? The key differences between a regular pipeline and a ControlNet pipeline are:
+1. A ControlNet input step that prepares the control condition
+2. Inside the denoising loop, a modified denoiser step where the control image is first processed through ControlNet, then control information is injected into the UNet
+
+From looking at the code workflow: differential diffusion only modifies the "before denoiser" step, while ControlNet operates within the "denoiser" itself. Since they intervene at different points in the pipeline, they should work together without conflicts.
+
+Intuitively, these two techniques are orthogonal and should combine naturally: differential diffusion controls how much the inference process can deviate from the original in each region, while ControlNet controls in what direction that change occurs.
+
+With this understanding, let's assemble the diffdiff-controlnet loop by combining the diffdiff before-denoiser step and controlnet denoiser step.
+
+```py
+>>> class SDXLDiffDiffControlNetDenoiseStep(StableDiffusionXLDenoiseLoopWrapper):
+... block_classes = [SDXLDiffDiffLoopBeforeDenoiser, StableDiffusionXLControlNetLoopDenoiser, StableDiffusionXLDenoiseLoopAfterDenoiser]
+... block_names = ["before_denoiser", "denoiser", "after_denoiser"]
+>>>
+>>> controlnet_denoise_block = SDXLDiffDiffControlNetDenoiseStep()
+>>> # print(controlnet_denoise)
+```
+
+We provide a auto controlnet input block that you can directly put into your workflow to proceess the `control_image`: similar to auto ip-adapter block, this step will only run if `control_image` input is passed from user. It work with both controlnet and controlnet union.
+
+
+```py
+>>> from diffusers.modular_pipelines.stable_diffusion_xl.modular_blocks import StableDiffusionXLAutoControlNetInputStep
+>>> control_input_block = StableDiffusionXLAutoControlNetInputStep()
+>>> print(control_input_block)
+```
+
+```out
+StableDiffusionXLAutoControlNetInputStep(
+ Class: AutoPipelineBlocks
+
+ ====================================================================================================
+ This pipeline contains blocks that are selected at runtime based on inputs.
+ Trigger Inputs: ['control_image', 'control_mode']
+ ====================================================================================================
+
+
+ Description: Controlnet Input step that prepare the controlnet input.
+ This is an auto pipeline block that works for both controlnet and controlnet_union.
+ (it should be called right before the denoise step) - `StableDiffusionXLControlNetUnionInputStep` is called to prepare the controlnet input when `control_mode` and `control_image` are provided.
+ - `StableDiffusionXLControlNetInputStep` is called to prepare the controlnet input when `control_image` is provided. - if neither `control_mode` nor `control_image` is provided, step will be skipped.
+
+
+ Components:
+ controlnet (`ControlNetUnionModel`)
+ control_image_processor (`VaeImageProcessor`)
+
+ Sub-Blocks:
+ • controlnet_union [trigger: control_mode] (StableDiffusionXLControlNetUnionInputStep)
+ Description: step that prepares inputs for the ControlNetUnion model
+
+ • controlnet [trigger: control_image] (StableDiffusionXLControlNetInputStep)
+ Description: step that prepare inputs for controlnet
+
+)
+
+```
+
+Let's assemble the blocks and run an example using controlnet + differential diffusion. We used a tomato as `control_image`, so you can see that in the output, the right half that transformed into a pear had a tomato-like shape.
+
+```py
+>>> dd_blocks.sub_blocks.insert("controlnet_input", control_input_block, 7)
+>>> dd_blocks.sub_blocks["denoise"] = controlnet_denoise_block
+>>>
+>>> dd_pipeline = dd_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff")
+>>> dd_pipeline.load_default_components(torch_dtype=torch.float16)
+>>> dd_pipeline = dd_pipeline.to(device)
+>>>
+>>> control_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/diffdiff_tomato_canny.jpeg")
+>>> image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png?download=true")
+>>> mask = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask.png?download=true")
+>>>
+>>> prompt = "a green pear"
+>>> negative_prompt = "blurry"
+>>> generator = torch.Generator(device=device).manual_seed(42)
+>>>
+>>> image = dd_pipeline(
+... prompt=prompt,
+... negative_prompt=negative_prompt,
+... num_inference_steps=25,
+... generator=generator,
+... control_image=control_image,
+... controlnet_conditioning_scale=0.5,
+... diffdiff_map=mask,
+... image=image,
+... output="images"
+... )[0]
+```
+
+Optionally, We can combine `SDXLDiffDiffControlNetDenoiseStep` and `SDXLDiffDiffDenoiseStep` into a `AutoPipelineBlocks` so that same workflow can work with or without controlnet.
+
+
+```py
+>>> class SDXLDiffDiffAutoDenoiseStep(AutoPipelineBlocks):
+... block_classes = [SDXLDiffDiffControlNetDenoiseStep, SDXLDiffDiffDenoiseStep]
+... block_names = ["controlnet_denoise", "denoise"]
+... block_trigger_inputs = ["controlnet_cond", None]
+```
+
+`SDXLDiffDiffAutoDenoiseStep` will run the ControlNet denoise step if `control_image` input is provided, otherwise it will run the regular denoise step.
+
+
+
+ Note that it's perfectly fine not to use `AutoPipelineBlocks`. In fact, we recommend only using `AutoPipelineBlocks` to package your workflow at the end once you've verified all your pipelines work as expected.
+
+
+
+Now you can create the differential diffusion preset that works with ip-adapter & controlnet.
+
+```py
+>>> DIFFDIFF_AUTO_BLOCKS = IMAGE2IMAGE_BLOCKS.copy()
+>>> DIFFDIFF_AUTO_BLOCKS["prepare_latents"] = SDXLDiffDiffPrepareLatentsStep
+>>> DIFFDIFF_AUTO_BLOCKS["set_timesteps"] = TEXT2IMAGE_BLOCKS["set_timesteps"]
+>>> DIFFDIFF_AUTO_BLOCKS["denoise"] = SDXLDiffDiffAutoDenoiseStep
+>>> DIFFDIFF_AUTO_BLOCKS.insert("ip_adapter", StableDiffusionXLAutoIPAdapterStep, 0)
+>>> DIFFDIFF_AUTO_BLOCKS.insert("controlnet_input",StableDiffusionXLControlNetAutoInput, 7)
+>>>
+>>> print(DIFFDIFF_AUTO_BLOCKS)
+```
+
+to use
+
+```py
+>>> dd_auto_blocks = SequentialPipelineBlocks.from_blocks_dict(DIFFDIFF_AUTO_BLOCKS)
+>>> dd_pipeline = dd_auto_blocks.init_pipeline(...)
+```
+## Creating a Modular Repo
+
+You can easily share your differential diffusion workflow on the Hub by creating a modular repo. This is one created using the code we just wrote together: https://huggingface.co/YiYiXu/modular-diffdiff
+
+To create a Modular Repo and share on hub, you just need to run `save_pretrained()` along with the `push_to_hub=True` flag. Note that if your pipeline contains custom block, you need to manually upload the code to the hub. But we are working on a command line tool to help you upload it very easily.
+
+```py
+dd_pipeline.save_pretrained("YiYiXu/test_modular_doc", push_to_hub=True)
+```
+
+With a modular repo, it is very easy for the community to use the workflow you just created! Here is an example to use the differential-diffusion pipeline we just created and shared.
+
+```py
+>>> from diffusers.modular_pipelines import ModularPipeline, ComponentsManager
+>>> import torch
+>>> from diffusers.utils import load_image
+>>>
+>>> repo_id = "YiYiXu/modular-diffdiff-0704"
+>>>
+>>> components = ComponentsManager()
+>>>
+>>> diffdiff_pipeline = ModularPipeline.from_pretrained(repo_id, trust_remote_code=True, components_manager=components, collection="diffdiff")
+>>> diffdiff_pipeline.load_default_components(torch_dtype=torch.float16)
+>>> components.enable_auto_cpu_offload()
+```
+
+see more usage example on model card.
+
+## deploy a mellon node
+
+[YIYI TODO: for now, here is an example of mellon node https://huggingface.co/YiYiXu/diff-diff-mellon]
diff --git a/docs/source/en/modular_diffusers/loop_sequential_pipeline_blocks.md b/docs/source/en/modular_diffusers/loop_sequential_pipeline_blocks.md
new file mode 100644
index 0000000000..e95cdc7163
--- /dev/null
+++ b/docs/source/en/modular_diffusers/loop_sequential_pipeline_blocks.md
@@ -0,0 +1,194 @@
+
+
+# LoopSequentialPipelineBlocks
+
+
+
+🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
+
+
+
+`LoopSequentialPipelineBlocks` is a subclass of `ModularPipelineBlocks`. It is a multi-block that composes other blocks together in a loop, creating iterative workflows where blocks run multiple times with evolving state. It's particularly useful for denoising loops requiring repeated execution of the same blocks.
+
+
+
+Other types of multi-blocks include [SequentialPipelineBlocks](./sequential_pipeline_blocks.md) (for linear workflows) and [AutoPipelineBlocks](./auto_pipeline_blocks.md) (for conditional block selection). For information on creating individual blocks, see the [PipelineBlock guide](./pipeline_block.md).
+
+Additionally, like all `ModularPipelineBlocks`, `LoopSequentialPipelineBlocks` are definitions/specifications, not runnable pipelines. You need to convert them into a `ModularPipeline` to actually execute them. For information on creating and running pipelines, see the [Modular Pipeline guide](modular_pipeline.md).
+
+
+
+You could create a loop using `PipelineBlock` like this:
+
+```python
+class DenoiseLoop(PipelineBlock):
+ def __call__(self, components, state):
+ block_state = self.get_block_state(state)
+ for t in range(block_state.num_inference_steps):
+ # ... loop logic here
+ pass
+ self.set_block_state(state, block_state)
+ return components, state
+```
+
+But in this tutorial, we will focus on how to use `LoopSequentialPipelineBlocks` to create a "composable" denoising loop where you can add or remove blocks within the loop or reuse the same loop structure with different block combinations.
+
+It involves two parts: a **loop wrapper** and **loop blocks**
+
+* The **loop wrapper** (`LoopSequentialPipelineBlocks`) defines the loop structure, e.g. it defines the iteration variables, and loop configurations such as progress bar.
+
+* The **loop blocks** are basically standard pipeline blocks you add to the loop wrapper.
+ - they run sequentially for each iteration of the loop
+ - they receive the current iteration index as an additional parameter
+ - they share the same block_state throughout the entire loop
+
+Unlike regular `SequentialPipelineBlocks` where each block gets its own state, loop blocks share a single state that persists and evolves across iterations.
+
+We will build a simple loop block to demonstrate these concepts. Creating a loop block involves three steps:
+1. defining the loop wrapper class
+2. creating the loop blocks
+3. adding the loop blocks to the loop wrapper class to create the loop wrapper instance
+
+**Step 1: Define the Loop Wrapper**
+
+To create a `LoopSequentialPipelineBlocks` class, you need to define:
+
+* `loop_inputs`: User input variables (equivalent to `PipelineBlock.inputs`)
+* `loop_intermediate_inputs`: Intermediate variables needed from the mutable pipeline state (equivalent to `PipelineBlock.intermediates_inputs`)
+* `loop_intermediate_outputs`: New intermediate variables this block will add to the mutable pipeline state (equivalent to `PipelineBlock.intermediates_outputs`)
+* `__call__` method: Defines the loop structure and iteration logic
+
+Here is an example of a loop wrapper:
+
+```py
+import torch
+from diffusers.modular_pipelines import LoopSequentialPipelineBlocks, PipelineBlock, InputParam, OutputParam
+
+class LoopWrapper(LoopSequentialPipelineBlocks):
+ model_name = "test"
+ @property
+ def description(self):
+ return "I'm a loop!!"
+ @property
+ def loop_inputs(self):
+ return [InputParam(name="num_steps")]
+ @torch.no_grad()
+ def __call__(self, components, state):
+ block_state = self.get_block_state(state)
+ # Loop structure - can be customized to your needs
+ for i in range(block_state.num_steps):
+ # loop_step executes all registered blocks in sequence
+ components, block_state = self.loop_step(components, block_state, i=i)
+ self.set_block_state(state, block_state)
+ return components, state
+```
+
+**Step 2: Create Loop Blocks**
+
+Loop blocks are standard `PipelineBlock`s, but their `__call__` method works differently:
+* It receives the iteration variable (e.g., `i`) passed by the loop wrapper
+* It works directly with `block_state` instead of pipeline state
+* No need to call `self.get_block_state()` or `self.set_block_state()`
+
+```py
+class LoopBlock(PipelineBlock):
+ # this is used to identify the model family, we won't worry about it in this example
+ model_name = "test"
+ @property
+ def inputs(self):
+ return [InputParam(name="x")]
+ @property
+ def intermediate_outputs(self):
+ # outputs produced by this block
+ return [OutputParam(name="x")]
+ @property
+ def description(self):
+ return "I'm a block used inside the `LoopWrapper` class"
+ def __call__(self, components, block_state, i: int):
+ block_state.x += 1
+ return components, block_state
+```
+
+**Step 3: Combine Everything**
+
+Finally, assemble your loop by adding the block(s) to the wrapper:
+
+```py
+loop = LoopWrapper.from_blocks_dict({"block1": LoopBlock})
+```
+
+Now you've created a loop with one step:
+
+```py
+>>> loop
+LoopWrapper(
+ Class: LoopSequentialPipelineBlocks
+
+ Description: I'm a loop!!
+
+ Sub-Blocks:
+ [0] block1 (LoopBlock)
+ Description: I'm a block used inside the `LoopWrapper` class
+
+)
+```
+
+It has two inputs: `x` (used at each step within the loop) and `num_steps` used to define the loop.
+
+```py
+>>> print(loop.doc)
+class LoopWrapper
+
+ I'm a loop!!
+
+ Inputs:
+
+ x (`None`, *optional*):
+
+ num_steps (`None`, *optional*):
+
+ Outputs:
+
+ x (`None`):
+```
+
+**Running the Loop:**
+
+```py
+# run the loop
+loop_pipeline = loop.init_pipeline()
+x = loop_pipeline(num_steps=10, x=0, output="x")
+assert x == 10
+```
+
+**Adding Multiple Blocks:**
+
+We can add multiple blocks to run within each iteration. Let's run the loop block twice within each iteration:
+
+```py
+loop = LoopWrapper.from_blocks_dict({"block1": LoopBlock(), "block2": LoopBlock})
+loop_pipeline = loop.init_pipeline()
+x = loop_pipeline(num_steps=10, x=0, output="x")
+assert x == 20 # Each iteration runs 2 blocks, so 10 iterations * 2 = 20
+```
+
+**Key Differences from SequentialPipelineBlocks:**
+
+The main difference is that loop blocks share the same `block_state` across all iterations, allowing values to accumulate and evolve throughout the loop. Loop blocks could receive additional arguments (like the current iteration index) depending on the loop wrapper's implementation, since the wrapper defines how loop blocks are called. You can easily add, remove, or reorder blocks within the loop without changing the loop logic itself.
+
+The officially supported denoising loops in Modular Diffusers are implemented using `LoopSequentialPipelineBlocks`. You can explore the actual implementation to see how these concepts work in practice:
+
+```py
+from diffusers.modular_pipelines.stable_diffusion_xl.denoise import StableDiffusionXLDenoiseStep
+StableDiffusionXLDenoiseStep()
+```
\ No newline at end of file
diff --git a/docs/source/en/modular_diffusers/modular_diffusers_states.md b/docs/source/en/modular_diffusers/modular_diffusers_states.md
new file mode 100644
index 0000000000..744089fcf6
--- /dev/null
+++ b/docs/source/en/modular_diffusers/modular_diffusers_states.md
@@ -0,0 +1,59 @@
+
+
+# PipelineState and BlockState
+
+
+
+🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
+
+
+
+In Modular Diffusers, `PipelineState` and `BlockState` are the core data structures that enable blocks to communicate and share data. The concept is fundamental to understand how blocks interact with each other and the pipeline system.
+
+In the modular diffusers system, `PipelineState` acts as the global state container that all pipeline blocks operate on. It maintains the complete runtime state of the pipeline and provides a structured way for blocks to read from and write to shared data.
+
+A `PipelineState` consists of two distinct states:
+
+- **The immutable state** (i.e. the `inputs` dict) contains a copy of values provided by users. Once a value is added to the immutable state, it cannot be changed. Blocks can read from the immutable state but cannot write to it.
+
+- **The mutable state** (i.e. the `intermediates` dict) contains variables that are passed between blocks and can be modified by them.
+
+Here's an example of what a `PipelineState` looks like:
+
+```py
+PipelineState(
+ inputs={
+ 'prompt': 'a cat'
+ 'guidance_scale': 7.0
+ 'num_inference_steps': 25
+ },
+ intermediates={
+ 'prompt_embeds': Tensor(dtype=torch.float32, shape=torch.Size([1, 1, 1, 1]))
+ 'negative_prompt_embeds': None
+ },
+)
+```
+
+Each pipeline blocks define what parts of that state they can read from and write to through their `inputs`, `intermediate_inputs`, and `intermediate_outputs` properties. At run time, they gets a local view (`BlockState`) of the relevant variables it needs from `PipelineState`, performs its operations, and then updates `PipelineState` with any changes.
+
+For example, if a block defines an input `image`, inside the block's `__call__` method, the `BlockState` would contain:
+
+```py
+BlockState(
+ image:
+)
+```
+
+You can access the variables directly as attributes: `block_state.image`.
+
+We will explore more on how blocks interact with pipeline state through their `inputs`, `intermediate_inputs`, and `intermediate_outputs` properties, see the [PipelineBlock guide](./pipeline_block.md).
\ No newline at end of file
diff --git a/docs/source/en/modular_diffusers/modular_pipeline.md b/docs/source/en/modular_diffusers/modular_pipeline.md
new file mode 100644
index 0000000000..55182b921f
--- /dev/null
+++ b/docs/source/en/modular_diffusers/modular_pipeline.md
@@ -0,0 +1,1237 @@
+
+
+# ModularPipeline
+
+
+
+🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
+
+
+
+`ModularPipeline` is the main interface for end users to run pipelines in Modular Diffusers. It takes pipeline blocks and converts them into a runnable pipeline that can load models and execute the computation steps.
+
+In this guide, we will focus on how to build pipelines using the blocks we officially support at diffusers 🧨. We'll cover how to use predefined blocks and convert them into a `ModularPipeline` for execution.
+
+
+
+This guide shows you how to use predefined blocks. If you want to learn how to create your own pipeline blocks, see the [PipelineBlock guide](pipeline_block.md) for creating individual blocks, and the multi-block guides for connecting them together:
+- [SequentialPipelineBlocks](sequential_pipeline_blocks.md) (for linear workflows)
+- [LoopSequentialPipelineBlocks](loop_sequential_pipeline_blocks.md) (for iterative workflows)
+- [AutoPipelineBlocks](auto_pipeline_blocks.md) (for conditional workflows)
+
+For information on how data flows through pipelines, see the [PipelineState and BlockState guide](modular_diffusers_states.md).
+
+
+
+
+## Create ModularPipelineBlocks
+
+In Modular Diffusers system, you build pipelines using Pipeline blocks. Pipeline Blocks are fundamental building blocks - they define what components, inputs/outputs, and computation logics are needed. They are designed to be assembled into workflows for tasks such as image generation, video creation, and inpainting. But they are just definitions and don't actually run anything. To execute blocks, you need to put them into a `ModularPipeline`. We'll first learn how to create predefined blocks here before talking about how to run them using `ModularPipeline`.
+
+All pipeline blocks inherit from the base class `ModularPipelineBlocks`, including:
+
+- [`PipelineBlock`]: The most granular block - you define the input/output/components requirements and computation logic.
+- [`SequentialPipelineBlocks`]: A multi-block composed of multiple blocks that run sequentially, passing outputs as inputs to the next block.
+- [`LoopSequentialPipelineBlocks`]: A special type of `SequentialPipelineBlocks` that runs the same sequence of blocks multiple times (loops), typically used for iterative processes like denoising steps in diffusion models.
+- [`AutoPipelineBlocks`]: A multi-block composed of multiple blocks that are selected at runtime based on the inputs.
+
+It is very easy to use a `ModularPipelineBlocks` officially supported in 🧨 Diffusers
+
+```py
+from diffusers.modular_pipelines.stable_diffusion_xl import StableDiffusionXLTextEncoderStep
+
+text_encoder_block = StableDiffusionXLTextEncoderStep()
+```
+
+This is a single `PipelineBlock`. You'll see that this text encoder block uses 2 text_encoders, 2 tokenizers as well as a guider component. It takes user inputs such as `prompt` and `negative_prompt`, and return text embeddings outputs such as `prompt_embeds` and `negative_prompt_embeds`.
+
+```py
+>>> text_encoder_block
+StableDiffusionXLTextEncoderStep(
+ Class: PipelineBlock
+ Description: Text Encoder step that generate text_embeddings to guide the image generation
+ Components:
+ text_encoder (`CLIPTextModel`)
+ text_encoder_2 (`CLIPTextModelWithProjection`)
+ tokenizer (`CLIPTokenizer`)
+ tokenizer_2 (`CLIPTokenizer`)
+ guider (`ClassifierFreeGuidance`)
+ Configs:
+ force_zeros_for_empty_prompt (default: True)
+ Inputs:
+ prompt=None, prompt_2=None, negative_prompt=None, negative_prompt_2=None, cross_attention_kwargs=None, clip_skip=None
+ Intermediates:
+ - outputs: prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds
+)
+```
+
+More commonly, you need multiple blocks to build your workflow. You can create a `SequentialPipelineBlocks` using block class presets from 🧨 Diffusers. `TEXT2IMAGE_BLOCKS` is a dict containing all the blocks needed for text-to-image generation.
+
+```py
+from diffusers.modular_pipelines import SequentialPipelineBlocks
+from diffusers.modular_pipelines.stable_diffusion_xl import TEXT2IMAGE_BLOCKS
+t2i_blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)
+```
+
+This creates a `SequentialPipelineBlocks`. Unlike the `text_encoder_block` we saw earlier, this is a multi-block and its `sub_blocks` attribute contains a list of other blocks (text_encoder, input, set_timesteps, prepare_latents, prepare_added_con, denoise, decode). Its requirements for components, inputs, and intermediate inputs are combined from these blocks that compose it. At runtime, it executes its sub-blocks sequentially and passes the pipeline state from one block to another.
+
+```py
+>>> t2i_blocks
+SequentialPipelineBlocks(
+ Class: ModularPipelineBlocks
+
+ Description:
+
+
+ Components:
+ text_encoder (`CLIPTextModel`)
+ text_encoder_2 (`CLIPTextModelWithProjection`)
+ tokenizer (`CLIPTokenizer`)
+ tokenizer_2 (`CLIPTokenizer`)
+ guider (`ClassifierFreeGuidance`)
+ scheduler (`EulerDiscreteScheduler`)
+ unet (`UNet2DConditionModel`)
+ vae (`AutoencoderKL`)
+ image_processor (`VaeImageProcessor`)
+
+ Configs:
+ force_zeros_for_empty_prompt (default: True)
+
+ Sub-Blocks:
+ [0] text_encoder (StableDiffusionXLTextEncoderStep)
+ Description: Text Encoder step that generate text_embeddings to guide the image generation
+
+ [1] input (StableDiffusionXLInputStep)
+ Description: Input processing step that:
+ 1. Determines `batch_size` and `dtype` based on `prompt_embeds`
+ 2. Adjusts input tensor shapes based on `batch_size` (number of prompts) and `num_images_per_prompt`
+
+ All input tensors are expected to have either batch_size=1 or match the batch_size
+ of prompt_embeds. The tensors will be duplicated across the batch dimension to
+ have a final batch_size of batch_size * num_images_per_prompt.
+
+ [2] set_timesteps (StableDiffusionXLSetTimestepsStep)
+ Description: Step that sets the scheduler's timesteps for inference
+
+ [3] prepare_latents (StableDiffusionXLPrepareLatentsStep)
+ Description: Prepare latents step that prepares the latents for the text-to-image generation process
+
+ [4] prepare_add_cond (StableDiffusionXLPrepareAdditionalConditioningStep)
+ Description: Step that prepares the additional conditioning for the text-to-image generation process
+
+ [5] denoise (StableDiffusionXLDenoiseStep)
+ Description: Denoise step that iteratively denoise the latents.
+ Its loop logic is defined in `StableDiffusionXLDenoiseLoopWrapper.__call__` method
+ At each iteration, it runs blocks defined in `sub_blocks` sequencially:
+ - `StableDiffusionXLLoopBeforeDenoiser`
+ - `StableDiffusionXLLoopDenoiser`
+ - `StableDiffusionXLLoopAfterDenoiser`
+ This block supports both text2img and img2img tasks.
+
+ [6] decode (StableDiffusionXLDecodeStep)
+ Description: Step that decodes the denoised latents into images
+
+)
+```
+
+This is the block classes preset (`TEXT2IMAGE_BLOCKS`) we used: It is just a dictionary that maps names to ModularPipelineBlocks classes
+
+```py
+>>> TEXT2IMAGE_BLOCKS
+InsertableDict([
+ 0: ('text_encoder', ),
+ 1: ('input', ),
+ 2: ('set_timesteps', ),
+ 3: ('prepare_latents', ),
+ 4: ('prepare_add_cond', ),
+ 5: ('denoise', ),
+ 6: ('decode', )
+])
+```
+
+When we create a `SequentialPipelineBlocks` from this preset, it instantiates each block class into actual block objects. Its `sub_blocks` attribute now contains these instantiated objects:
+
+```py
+>>> t2i_blocks.sub_blocks
+InsertableDict([
+ 0: ('text_encoder', ),
+ 1: ('input', ),
+ 2: ('set_timesteps', ),
+ 3: ('prepare_latents', ),
+ 4: ('prepare_add_cond', ),
+ 5: ('denoise', ),
+ 6: ('decode', )
+])
+```
+
+Note that both the block classes preset and the `sub_blocks` attribute are `InsertableDict` objects. This is a custom dictionary that extends `OrderedDict` with the ability to insert items at specific positions. You can perform all standard dictionary operations (get, set, delete) plus insert items at any index, which is particularly useful for reordering or inserting blocks in the middle of a pipeline.
+
+**Add a block:**
+```py
+# BLOCKS is dict of block classes, you need to add class to it
+BLOCKS.insert("block_name", BlockClass, index)
+# sub_blocks attribute contains instance, add a block instance to the attribute
+t2i_blocks.sub_blocks.insert("block_name", block_instance, index)
+```
+
+**Remove a block:**
+```py
+# remove a block class from preset
+BLOCKS.pop("text_encoder")
+# split out a block instance on its own
+text_encoder_block = t2i_blocks.sub_blocks.pop("text_encoder")
+```
+
+**Swap block:**
+```py
+# Replace block class in preset
+BLOCKS["prepare_latents"] = CustomPrepareLatents
+# Replace in sub_blocks attribute using an block instance
+t2i_blocks.sub_blocks["prepare_latents"] = CustomPrepareLatents()
+```
+
+This means you can mix-and-match blocks in very flexible ways. Let's see some real examples:
+
+**Example 1: Adding IP-Adapter to the Block Classes Preset**
+Let's make a new block classes preset by insert IP-Adapter at index 0 (before the text_encoder block), and create a text-to-image pipeline with IP-Adapter support:
+
+```py
+from diffusers.modular_pipelines.stable_diffusion_xl import StableDiffusionXLAutoIPAdapterStep
+CUSTOM_BLOCKS = TEXT2IMAGE_BLOCKS.copy()
+# CUSTOM_BLOCKS is now a preset including ip_adapter
+CUSTOM_BLOCKS.insert("ip_adapter", StableDiffusionXLAutoIPAdapterStep, 0)
+# create a blocks isntance from the preset
+custom_blocks = SequentialPipelineBlocks.from_blocks_dict(CUSTOM_BLOCKS)
+```
+
+**Example 2: Extracting a block from a multi-block**
+You can extract a block instance from the multi-block to use it independently. A common pattern is to use text_encoder to process prompts once, then reuse the text embeddings outputs to generate multiple images with different settings (schedulers, seeds, inference steps). We can do this by simply extracting the text_encoder block from the pipeline.
+
+```py
+# this gives you StableDiffusionXLTextEncoderStep()
+>>> text_encoder_blocks = t2i_blocks.sub_blocks.pop("text_encoder")
+>>> text_encoder_blocks
+```
+
+The multi-block now has fewer components and no longer has the `text_encoder` block. If you check its docstring `t2i_blocks.doc`, you will see that it no longer accepts `prompt` as input - you will need to pass the embeddings instead.
+
+```py
+>>> t2i_blocks
+SequentialPipelineBlocks(
+ Class: ModularPipelineBlocks
+
+ Description:
+
+ Components:
+ scheduler (`EulerDiscreteScheduler`)
+ guider (`ClassifierFreeGuidance`)
+ unet (`UNet2DConditionModel`)
+ vae (`AutoencoderKL`)
+ image_processor (`VaeImageProcessor`)
+
+ Blocks:
+ [0] input (StableDiffusionXLInputStep)
+ Description: Input processing step that:
+ 1. Determines `batch_size` and `dtype` based on `prompt_embeds`
+ 2. Adjusts input tensor shapes based on `batch_size` (number of prompts) and `num_images_per_prompt`
+
+ All input tensors are expected to have either batch_size=1 or match the batch_size
+ of prompt_embeds. The tensors will be duplicated across the batch dimension to
+ have a final batch_size of batch_size * num_images_per_prompt.
+
+ [1] set_timesteps (StableDiffusionXLSetTimestepsStep)
+ Description: Step that sets the scheduler's timesteps for inference
+
+ [2] prepare_latents (StableDiffusionXLPrepareLatentsStep)
+ Description: Prepare latents step that prepares the latents for the text-to-image generation process
+
+ [3] prepare_add_cond (StableDiffusionXLPrepareAdditionalConditioningStep)
+ Description: Step that prepares the additional conditioning for the text-to-image generation process
+
+ [4] denoise (StableDiffusionXLDenoiseLoop)
+ Description: Denoise step that iteratively denoise the latents.
+ Its loop logic is defined in `StableDiffusionXLDenoiseLoopWrapper.__call__` method
+ At each iteration, it runs blocks defined in `blocks` sequencially:
+ - `StableDiffusionXLLoopBeforeDenoiser`
+ - `StableDiffusionXLLoopDenoiser`
+ - `StableDiffusionXLLoopAfterDenoiser`
+
+
+ [5] decode (StableDiffusionXLDecodeStep)
+ Description: Step that decodes the denoised latents into images
+
+)
+```
+
+
+
+💡 You can find all the block classes presets we support for each model in `ALL_BLOCKS`.
+
+```py
+# For Stable Diffusion XL
+from diffusers.modular_pipelines.stable_diffusion_xl import ALL_BLOCKS
+ALL_BLOCKS
+# For other models...
+from diffusers.modular_pipelines. import ALL_BLOCKS
+```
+
+Each model provides a dictionary that maps all supported tasks/techniques to their corresponding block classes presets. For SDXL, it is
+
+```py
+ALL_BLOCKS = {
+ "text2img": TEXT2IMAGE_BLOCKS,
+ "img2img": IMAGE2IMAGE_BLOCKS,
+ "inpaint": INPAINT_BLOCKS,
+ "controlnet": CONTROLNET_BLOCKS,
+ "ip_adapter": IP_ADAPTER_BLOCKS,
+ "auto": AUTO_BLOCKS,
+}
+```
+
+
+
+This covers the essentials of pipeline blocks! Like we have already mentioned, **pipeline blocks are not runnable by themselves**. They are essentially **"definitions"** - they define the specifications and computational steps for a pipeline, but they do not contain any model states. To actually run them, you need to convert them into a `ModularPipeline` object.
+
+
+## Modular Repo
+
+To convert blocks into a runnable pipeline, you may need a repository if your blocks contain **pretrained components** (models with checkpoints that need to be loaded from the Hub). Pipeline blocks define what components they need (like a UNet, text encoder, etc.), as well as how to create them: components can be either created using **from_pretrained** method (with checkpoints) or **from_config** (initialized from scratch with default configuration, usually stateless like a guider or scheduler).
+
+If your pipeline contains **pretrained components**, you typically need to use a repository to provide the loading specifications and metadata.
+
+`ModularPipeline` works specifically with modular repositories, which offer more flexibility in component loading compared to traditional repositories. You can find an example modular repo [here](https://huggingface.co/YiYiXu/modular-diffdiff).
+
+A `DiffusionPipeline` defines `model_index.json` to configure its components. However, repositories for Modular Diffusers work with `modular_model_index.json`. Let's walk through the differences here.
+
+In standard `model_index.json`, each component entry is a `(library, class)` tuple:
+```py
+"text_encoder": [
+ "transformers",
+ "CLIPTextModel"
+],
+```
+
+In `modular_model_index.json`, each component entry contains 3 elements: `(library, class, loading_specs_dict)`
+
+- `library` and `class`: Information about the actual component loaded in the pipeline at the time of saving (will be `null` if not loaded)
+- `loading_specs_dict`: A dictionary containing all information required to load this component, including `repo`, `revision`, `subfolder`, `variant`, and `type_hint`.
+
+```py
+"text_encoder": [
+ null, # library of actual loaded component (same as in model_index.json)
+ null, # class of actual loaded componenet (same as in model_index.json)
+ { # loading specs map (unique to modular_model_index.json)
+ "repo": "stabilityai/stable-diffusion-xl-base-1.0", # can be a different repo
+ "revision": null,
+ "subfolder": "text_encoder",
+ "type_hint": [ # (library, class) for the expected component
+ "transformers",
+ "CLIPTextModel"
+ ],
+ "variant": null
+ }
+],
+```
+
+Unlike standard repositories where components must be in subfolders within the same repo, modular repositories can fetch components from different repositories based on the `loading_specs_dict`. e.g. the `text_encoder` component will be fetched from the "text_encoder" folder in `stabilityai/stable-diffusion-xl-base-1.0` while other components come from different repositories.
+
+
+## Creating a `ModularPipeline` from `ModularPipelineBlocks`
+
+Each `ModularPipelineBlocks` has an `init_pipeline` method that can initialize a `ModularPipeline` object based on its component and configuration specifications.
+
+Let's convert our `t2i_blocks` (which we created earlier) into a runnable `ModularPipeline`. We'll use a `ComponentsManager` to handle device placement, memory management, and component reuse automatically:
+
+```py
+# We already have this from earlier
+t2i_blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)
+
+# Now convert it to a ModularPipeline
+from diffusers import ComponentsManager
+modular_repo_id = "YiYiXu/modular-loader-t2i-0704"
+components = ComponentsManager()
+t2i_pipeline = t2i_blocks.init_pipeline(modular_repo_id, components_manager=components)
+```
+
+
+
+💡 **ComponentsManager** is the model registry and management system in diffusers, it track all the models in one place and let you add, remove and reuse them across different workflows in most efficient way. Without it, you'd need to manually manage GPU memory, device placement, and component sharing between workflows. See the [Components Manager guide](components_manager.md) for detailed information.
+
+
+
+The `init_pipeline()` method creates a ModularPipeline and loads component specifications from the repository's `modular_model_index.json` file, but doesn't load the actual models yet.
+
+
+## Creating a `ModularPipeline` with `from_pretrained`
+
+You can create a `ModularPipeline` from a HuggingFace Hub repository with `from_pretrained` method, as long as it's a modular repo:
+
+```py
+from diffusers import ModularPipeline, ComponentsManager
+components = ComponentsManager()
+pipeline = ModularPipeline.from_pretrained("YiYiXu/modular-loader-t2i-0704", components_manager=components)
+```
+
+Loading custom code is also supported:
+
+```py
+from diffusers import ModularPipeline, ComponentsManager
+components = ComponentsManager()
+modular_repo_id = "YiYiXu/modular-diffdiff-0704"
+diffdiff_pipeline = ModularPipeline.from_pretrained(modular_repo_id, trust_remote_code=True, components_manager=components)
+```
+
+This modular repository contains custom code. The folder contains these files:
+
+```
+modular-diffdiff-0704/
+├── block.py # Custom pipeline blocks implementation
+├── config.json # Pipeline configuration and auto_map
+└── modular_model_index.json # Component loading specifications
+```
+
+The [`config.json`](https://huggingface.co/YiYiXu/modular-diffdiff-0704/blob/main/config.json) file defines a custom `DiffDiffBlocks` class and points to its implementation:
+
+```json
+{
+ "_class_name": "DiffDiffBlocks",
+ "auto_map": {
+ "ModularPipelineBlocks": "block.DiffDiffBlocks"
+ }
+}
+```
+
+The `auto_map` tells the pipeline where to find the custom blocks definition - in this case, it's looking for `DiffDiffBlocks` in the `block.py` file. The actual `DiffDiffBlocks` class is defined in [`block.py`](https://huggingface.co/YiYiXu/modular-diffdiff-0704/blob/main/block.py) within the repository.
+
+When `diffdiff_pipeline.blocks` is created, it's based on the `DiffDiffBlocks` definition from the custom code in the repository, allowing you to use specialized blocks that aren't part of the standard diffusers library.
+
+## Loading components into a `ModularPipeline`
+
+Unlike `DiffusionPipeline`, when you create a `ModularPipeline` instance (whether using `from_pretrained` or converting from pipeline blocks), its components aren't loaded automatically. You need to explicitly load model components using `load_default_components` or `load_components(names=..,)`:
+
+```py
+# This will load ALL the expected components into pipeline
+import torch
+t2i_pipeline.load_default_components(torch_dtype=torch.float16)
+t2i_pipeline.to("cuda")
+```
+
+All expected components are now loaded into the pipeline. You can also partially load specific components using the `names` argument. For example, to only load unet and vae:
+
+```py
+>>> t2i_pipeline.load_components(names=["unet", "vae"], torch_dtype=torch.float16)
+```
+
+You can inspect the pipeline's loading status by simply printing the pipeline itself. It helps you understand what components are expected to load, which ones are already loaded, how they were loaded, and what loading specs are available. Let's print out the `t2i_pipeline`:
+
+```py
+>>> t2i_pipeline
+StableDiffusionXLModularPipeline {
+ "_blocks_class_name": "SequentialPipelineBlocks",
+ "_class_name": "StableDiffusionXLModularPipeline",
+ "_diffusers_version": "0.35.0.dev0",
+ "force_zeros_for_empty_prompt": true,
+ "scheduler": [
+ null,
+ null,
+ {
+ "repo": "stabilityai/stable-diffusion-xl-base-1.0",
+ "revision": null,
+ "subfolder": "scheduler",
+ "type_hint": [
+ "diffusers",
+ "EulerDiscreteScheduler"
+ ],
+ "variant": null
+ }
+ ],
+ "text_encoder": [
+ null,
+ null,
+ {
+ "repo": "stabilityai/stable-diffusion-xl-base-1.0",
+ "revision": null,
+ "subfolder": "text_encoder",
+ "type_hint": [
+ "transformers",
+ "CLIPTextModel"
+ ],
+ "variant": null
+ }
+ ],
+ "text_encoder_2": [
+ null,
+ null,
+ {
+ "repo": "stabilityai/stable-diffusion-xl-base-1.0",
+ "revision": null,
+ "subfolder": "text_encoder_2",
+ "type_hint": [
+ "transformers",
+ "CLIPTextModelWithProjection"
+ ],
+ "variant": null
+ }
+ ],
+ "tokenizer": [
+ null,
+ null,
+ {
+ "repo": "stabilityai/stable-diffusion-xl-base-1.0",
+ "revision": null,
+ "subfolder": "tokenizer",
+ "type_hint": [
+ "transformers",
+ "CLIPTokenizer"
+ ],
+ "variant": null
+ }
+ ],
+ "tokenizer_2": [
+ null,
+ null,
+ {
+ "repo": "stabilityai/stable-diffusion-xl-base-1.0",
+ "revision": null,
+ "subfolder": "tokenizer_2",
+ "type_hint": [
+ "transformers",
+ "CLIPTokenizer"
+ ],
+ "variant": null
+ }
+ ],
+ "unet": [
+ "diffusers",
+ "UNet2DConditionModel",
+ {
+ "repo": "RunDiffusion/Juggernaut-XL-v9",
+ "revision": null,
+ "subfolder": "unet",
+ "type_hint": [
+ "diffusers",
+ "UNet2DConditionModel"
+ ],
+ "variant": "fp16"
+ }
+ ],
+ "vae": [
+ "diffusers",
+ "AutoencoderKL",
+ {
+ "repo": "madebyollin/sdxl-vae-fp16-fix",
+ "revision": null,
+ "subfolder": null,
+ "type_hint": [
+ "diffusers",
+ "AutoencoderKL"
+ ],
+ "variant": null
+ }
+ ]
+}
+```
+
+You can see all the **pretrained components** that will be loaded using `from_pretrained` method are listed as entries. Each entry contains 3 elements: `(library, class, loading_specs_dict)`:
+
+- **`library` and `class`**: Show the actual loaded component info. If `null`, the component is not loaded yet.
+- **`loading_specs_dict`**: Contains all the information needed to load the component (repo, subfolder, variant, etc.)
+
+In this example:
+- **Loaded components**: `vae` and `unet` (their `library` and `class` fields show the actual loaded models)
+- **Not loaded yet**: `scheduler`, `text_encoder`, `text_encoder_2`, `tokenizer`, `tokenizer_2` (their `library` and `class` fields are `null`, but you can see their loading specs to know where they'll be loaded from when you call `load_components()`)
+
+You're looking at essentailly the pipeline's config dict that's synced with the `modular_model_index.json` from the repository you used during `init_pipeline()` - it takes the loading specs that match the pipeline's component requirements.
+
+For example, if your pipeline needs a `text_encoder` component, it will include the loading spec for `text_encoder` from the modular repo during the `init_pipeline`. If the pipeline doesn't need a component (like `controlnet` in a basic text-to-image pipeline), that component won't be included even if it exists in the modular repo.
+
+There are also a few properties that can provide a quick summary of component loading status:
+
+```py
+# All components expected by the pipeline
+>>> t2i_pipeline.component_names
+['text_encoder', 'text_encoder_2', 'tokenizer', 'tokenizer_2', 'guider', 'scheduler', 'unet', 'vae', 'image_processor']
+
+# Components that are not loaded yet (will be loaded with from_pretrained)
+>>> t2i_pipeline.null_component_names
+['text_encoder', 'text_encoder_2', 'tokenizer', 'tokenizer_2', 'scheduler']
+
+# Components that will be loaded from pretrained models
+>>> t2i_pipeline.pretrained_component_names
+['text_encoder', 'text_encoder_2', 'tokenizer', 'tokenizer_2', 'scheduler', 'unet', 'vae']
+
+# Components that are created with default config (no repo needed)
+>>> t2i_pipeline.config_component_names
+['guider', 'image_processor']
+```
+
+From config components (like `guider` and `image_processor`) are not included in the pipeline output above because they don't need loading specs - they're already initialized during pipeline creation. You can see this because they're not listed in `null_component_names`.
+
+## Modifying Loading Specs
+
+When you call `pipeline.load_components(names=)` or `pipeline.load_default_components()`, it uses the loading specs from the modular repository's `modular_model_index.json`. You can change where components are loaded from by modifying the `modular_model_index.json` in the repository. Just find the file on the Hub and click edit - you can change any field in the loading specs: `repo`, `subfolder`, `variant`, `revision`, etc.
+
+```py
+# Original spec in modular_model_index.json
+"unet": [
+ null, null,
+ {
+ "repo": "stabilityai/stable-diffusion-xl-base-1.0",
+ "subfolder": "unet",
+ "variant": "fp16"
+ }
+]
+
+# Modified spec - changed repo, subfolder, and variant
+"unet": [
+ null, null,
+ {
+ "repo": "RunDiffusion/Juggernaut-XL-v9",
+ "subfolder": "unet",
+ "variant": "fp16"
+ }
+]
+```
+
+Now if you create a pipeline using the same blocks and updated repository, it will by default load from the new repository.
+
+```py
+pipeline = ModularPipeline.from_pretrained("YiYiXu/modular-loader-t2i-0704", components_manager=components)
+pipeline.load_components(names="unet")
+```
+
+
+## Updating components in a `ModularPipeline`
+
+Similar to `DiffusionPipeline`, you can load components separately to replace the default ones in the pipeline. In Modular Diffusers, the approach depends on the component type:
+
+- **Pretrained components** (`default_creation_method='from_pretrained'`): Must use `ComponentSpec` to load them to update the existing one.
+- **Config components** (`default_creation_method='from_config'`): These are components that don't need loading specs - they're created during pipeline initialization with default config. To update them, you can either pass the object directly or pass a ComponentSpec directly.
+
+
+
+💡 **Component Type Changes**: The component type (pretrained vs config-based) can change when you update components. These types are initially defined in pipeline blocks' `expected_components` field using `ComponentSpec` with `default_creation_method`. See the [Customizing Guidance Techniques](#customizing-guidance-techniques) section for examples of how this works in practice.
+
+
+
+`ComponentSpec` defines how to create or load components and can actually create them using its `create()` method (for ConfigMixin objects) or `load()` method (wrapper around `from_pretrained()`). When a component is loaded with a ComponentSpec, it gets tagged with a unique ID that encodes its creation parameters, allowing you to always extract the original specification using `ComponentSpec.from_component()`.
+
+Now let's look at how to update pretrained components in practice:
+
+So instead of
+
+```py
+from diffusers import UNet2DConditionModel
+import torch
+unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", variant="fp16", torch_dtype=torch.float16)
+```
+You should load your model like this
+
+```py
+from diffusers import ComponentSpec, UNet2DConditionModel
+unet_spec = ComponentSpec(name="unet",type_hint=UNet2DConditionModel, repo="stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", variant="fp16")
+unet2 = unet_spec.load(torch_dtype=torch.float16)
+```
+
+The key difference is that the second unet retains its loading specs, so you can extract the spec and recreate the unet:
+
+```py
+# component -> spec
+>>> spec = ComponentSpec.from_component("unet", unet2)
+>>> spec
+ComponentSpec(name='unet', type_hint=, description=None, config=None, repo='stabilityai/stable-diffusion-xl-base-1.0', subfolder='unet', variant='fp16', revision=None, default_creation_method='from_pretrained')
+# spec -> component
+>>> unet2_recreatd = spec.load(torch_dtype=torch.float16)
+```
+
+To replace the unet in the pipeline
+
+```
+t2i_pipeline.update_components(unet=unet2)
+```
+
+Not only is the `unet` component swapped, but its loading specs are also updated from "RunDiffusion/Juggernaut-XL-v9" to "stabilityai/stable-diffusion-xl-base-1.0" in pipeline config. This means that if you save the pipeline now and load it back with `from_pretrained`, the new pipeline will by default load the SDXL original unet.
+
+```
+>>> t2i_pipeline
+StableDiffusionXLModularPipeline {
+ ...
+ "unet": [
+ "diffusers",
+ "UNet2DConditionModel",
+ {
+ "repo": "stabilityai/stable-diffusion-xl-base-1.0",
+ "revision": null,
+ "subfolder": "unet",
+ "type_hint": [
+ "diffusers",
+ "UNet2DConditionModel"
+ ],
+ "variant": "fp16"
+ }
+ ],
+ ...
+}
+```
+
+
+💡 **Modifying Component Specs**: You can get a copy of the current component spec from the pipeline using `get_component_spec()`. This makes it easy to modify the spec and updating components.
+
+```py
+>>> unet_spec = t2i_pipeline.get_component_spec("unet")
+>>> unet_spec
+ComponentSpec(
+ name='unet',
+ type_hint=,
+ repo='RunDiffusion/Juggernaut-XL-v9',
+ subfolder='unet',
+ variant='fp16',
+ default_creation_method='from_pretrained'
+)
+
+# Modify the spec to load from a different repository
+>>> unet_spec.repo = "stabilityai/stable-diffusion-xl-base-1.0"
+
+# Load the component with the modified spec
+>>> unet = unet_spec.load(torch_dtype=torch.float16)
+```
+
+
+
+## Customizing Guidance Techniques
+
+Guiders are implementations of different [classifier-free guidance](https://huggingface.co/papers/2207.12598) techniques that can be applied during the denoising process to improve generation quality, control, and adherence to prompts. They work by steering the model predictions towards desired directions and away from undesired directions. In diffusers, guiders are implemented as subclasses of `BaseGuidance`. They can easily be integrated into modular pipelines and provide a flexible way to enhance generation quality without modifying the underlying diffusion models.
+
+**ClassifierFreeGuidance (CFG)** is the first and most common guidance technique, used in all our standard pipelines. We also offer many other guidance techniques from the latest research in this area - **PerturbedAttentionGuidance (PAG)**, **SkipLayerGuidance (SLG)**, **SmoothedEnergyGuidance (SEG)**, and others that can provide better results for specific use cases.
+
+This section demonstrates how to use guiders using the component updating methods we just learned. Since `BaseGuidance` components are stateless (similar to schedulers), they are typically created with default configurations during pipeline initialization using `default_creation_method='from_config'`. This means they don't require loading specs from the repository - you won't see guider listed in `modular_model_index.json` files.
+
+Let's take a look at the default guider configuration:
+
+```py
+>>> t2i_pipeline.get_component_spec("guider")
+ComponentSpec(name='guider', type_hint=, description=None, config=FrozenDict([('guidance_scale', 7.5), ('guidance_rescale', 0.0), ('use_original_formulation', False), ('start', 0.0), ('stop', 1.0), ('_use_default_values', ['start', 'guidance_rescale', 'stop', 'use_original_formulation'])]), repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_config')
+```
+
+As you can see, the guider is configured to use `ClassifierFreeGuidance` with default parameters and `default_creation_method='from_config'`, meaning it's created during pipeline initialization rather than loaded from a repository. Let's verify this, here we run `init_pipeline()` without a modular repo, and there it is, a guider with the default configuration we just saw
+
+
+```py
+>>> pipeline = t2i_blocks.init_pipeline()
+>>> pipeline.guider
+ClassifierFreeGuidance {
+ "_class_name": "ClassifierFreeGuidance",
+ "_diffusers_version": "0.35.0.dev0",
+ "guidance_rescale": 0.0,
+ "guidance_scale": 7.5,
+ "start": 0.0,
+ "stop": 1.0,
+ "use_original_formulation": false
+}
+```
+
+#### Modify Parameters of the Same Guider Type
+
+To change parameters of the same guider type (e.g., adjusting the `guidance_scale` for CFG), you have two options:
+
+**Option 1: Use ComponentSpec.create() method**
+
+You just need to pass the parameter with the new value to override the default one.
+
+```python
+>>> guider_spec = t2i_pipeline.get_component_spec("guider")
+>>> guider = guider_spec.create(guidance_scale=10)
+>>> t2i_pipeline.update_components(guider=guider)
+```
+
+**Option 2: Pass ComponentSpec directly**
+
+Update the spec directly and pass it to `update_components()`.
+
+```python
+>>> guider_spec = t2i_pipeline.get_component_spec("guider")
+>>> guider_spec.config["guidance_scale"] = 10
+>>> t2i_pipeline.update_components(guider=guider_spec)
+```
+
+Both approaches produce the same result:
+```python
+>>> t2i_pipeline.guider
+ClassifierFreeGuidance {
+ "_class_name": "ClassifierFreeGuidance",
+ "_diffusers_version": "0.35.0.dev0",
+ "guidance_rescale": 0.0,
+ "guidance_scale": 10,
+ "start": 0.0,
+ "stop": 1.0,
+ "use_original_formulation": false
+}
+```
+
+#### Switch to a Different Guider Type
+
+Switching between guidance techniques is as simple as passing a guider object of that technique:
+
+```py
+from diffusers import LayerSkipConfig, PerturbedAttentionGuidance
+config = LayerSkipConfig(indices=[2, 9], fqn="mid_block.attentions.0.transformer_blocks", skip_attention=False, skip_attention_scores=True, skip_ff=False)
+guider = PerturbedAttentionGuidance(
+ guidance_scale=5.0, perturbed_guidance_scale=2.5, perturbed_guidance_config=config
+)
+t2i_pipeline.update_components(guider=guider)
+```
+
+Note that you will get a warning about changing the guider type, which is expected:
+
+```
+ModularPipeline.update_components: adding guider with new type: PerturbedAttentionGuidance, previous type: ClassifierFreeGuidance
+```
+
+
+
+- For `from_config` components (like guiders, schedulers): You can pass an object of required type OR pass a ComponentSpec directly (which calls `create()` under the hood)
+- For `from_pretrained` components (like models): You must use ComponentSpec to ensure proper tagging and loading
+
+
+
+Let's verify that the guider has been updated:
+
+```py
+>>> t2i_pipeline.guider
+PerturbedAttentionGuidance {
+ "_class_name": "PerturbedAttentionGuidance",
+ "_diffusers_version": "0.35.0.dev0",
+ "guidance_rescale": 0.0,
+ "guidance_scale": 5.0,
+ "perturbed_guidance_config": {
+ "dropout": 1.0,
+ "fqn": "mid_block.attentions.0.transformer_blocks",
+ "indices": [
+ 2,
+ 9
+ ],
+ "skip_attention": false,
+ "skip_attention_scores": true,
+ "skip_ff": false
+ },
+ "perturbed_guidance_layers": null,
+ "perturbed_guidance_scale": 2.5,
+ "perturbed_guidance_start": 0.01,
+ "perturbed_guidance_stop": 0.2,
+ "start": 0.0,
+ "stop": 1.0,
+ "use_original_formulation": false
+}
+
+```
+
+The component spec has also been updated to reflect the new guider type:
+
+```py
+>>> t2i_pipeline.get_component_spec("guider")
+ComponentSpec(name='guider', type_hint=, description=None, config=FrozenDict([('guidance_scale', 5.0), ('perturbed_guidance_scale', 2.5), ('perturbed_guidance_start', 0.01), ('perturbed_guidance_stop', 0.2), ('perturbed_guidance_layers', None), ('perturbed_guidance_config', LayerSkipConfig(indices=[2, 9], fqn='mid_block.attentions.0.transformer_blocks', skip_attention=False, skip_attention_scores=True, skip_ff=False, dropout=1.0)), ('guidance_rescale', 0.0), ('use_original_formulation', False), ('start', 0.0), ('stop', 1.0), ('_use_default_values', ['perturbed_guidance_start', 'use_original_formulation', 'perturbed_guidance_layers', 'stop', 'start', 'guidance_rescale', 'perturbed_guidance_stop']), ('_class_name', 'PerturbedAttentionGuidance'), ('_diffusers_version', '0.35.0.dev0')]), repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_config')
+```
+
+The "guider" is still a `from_config` component: is still not included in the pipeline config and will not be saved into the `modular_model_index.json`.
+
+```py
+>>> assert "guider" not in t2i_pipeline.config
+```
+
+However, you can change it to a `from_pretrained` component, which allows you to upload your customized guider to the Hub and load it into your pipeline.
+
+#### Loading Custom Guiders from Hub
+
+If you already have a guider saved on the Hub and a `modular_model_index.json` with the loading spec for that guider, it will automatically be changed to a `from_pretrained` component during pipeline initialization.
+
+For example, this `modular_model_index.json` includes loading specs for the guider:
+
+```json
+{
+ "guider": [
+ null,
+ null,
+ {
+ "repo": "YiYiXu/modular-loader-t2i-guider",
+ "revision": null,
+ "subfolder": "pag_guider",
+ "type_hint": [
+ "diffusers",
+ "PerturbedAttentionGuidance"
+ ],
+ "variant": null
+ }
+ ]
+}
+```
+
+When you use this repository to create a pipeline with the same blocks (that originally configured guider as a `from_config` component), the guider becomes a `from_pretrained` component. This means it doesn't get created during initialization, and after you call `load_default_components()`, it loads based on the spec - resulting in the PAG guider instead of the default CFG.
+
+```py
+t2i_pipeline = t2i_blocks.init_pipeline("YiYiXu/modular-doc-guider")
+assert t2i_pipeline.guider is None # Not created during init
+t2i_pipeline.load_default_components()
+t2i_pipeline.guider # Now loaded as PAG guider
+```
+
+#### Upload Custom Guider to Hub for Easy Loading & Sharing
+
+Now let's see how we can share the guider on the Hub and change it to a `from_pretrained` component.
+
+```py
+guider.push_to_hub("YiYiXu/modular-loader-t2i-guider", subfolder="pag_guider")
+```
+
+Voilà! Now you have a subfolder called `pag_guider` on that repository.
+
+You have a few options to make this guider available in your pipeline:
+
+1. **Directly modify the `modular_model_index.json`** to add a loading spec for the guider by pointing to a folder containing the desired guider config.
+
+2. **Use the `update_components` method** to change it to a `from_pretrained` component for your pipeline. This is easier if you just want to try it out with different repositories.
+
+Let's use the second approach and change our guider_spec to use `from_pretrained` as the default creation method and update the loading spec to use this subfolder we just created:
+
+```python
+guider_spec = t2i_pipeline.get_component_spec("guider")
+guider_spec.default_creation_method="from_pretrained"
+guider_spec.repo="YiYiXu/modular-loader-t2i-guider"
+guider_spec.subfolder="pag_guider"
+pag_guider = guider_spec.load()
+t2i_pipeline.update_components(guider=pag_guider)
+```
+
+You will get a warning about changing the creation method:
+
+```
+ModularPipeline.update_components: changing the default_creation_method of guider from from_config to from_pretrained.
+```
+
+Now not only the `guider` component and its component_spec are updated, but so is the pipeline config.
+
+If you want to change the default behavior for future pipelines, you can push the updated pipeline to the Hub. This way, when others use your repository, they'll get the PAG guider by default. However, this is optional - you don't have to do this if you just want to experiment locally.
+
+```py
+t2i_pipeline.push_to_hub("YiYiXu/modular-doc-guider")
+```
+
+
+
+
+Experiment with different techniques and parameters to find what works best for your specific use case! You can find all the guider class we support [here](TODO: API doc)
+
+Additionally, you can write your own guider implementations, for example, CFG Zero* combined with Skip Layer Guidance, and they should be compatible out-of-the-box with modular diffusers!
+
+
+
+## Running a `ModularPipeline`
+
+The API to run the `ModularPipeline` is very similar to how you would run a regular `DiffusionPipeline`:
+
+```py
+>>> image = pipeline(prompt="a cat", num_inference_steps=15, output="images")[0]
+```
+
+There are a few key differences though:
+1. You can also pass a `PipelineState` object directly to the pipeline instead of individual arguments
+2. If you do not specify the `output` argument, it returns the `PipelineState` object
+3. You can pass a list as `output`, e.g. `pipeline(... output=["images", "latents"])` will return a dictionary containing both the generated image and the final denoised latents
+
+Under the hood, `ModularPipeline`'s `__call__` method is a wrapper around the pipeline blocks' `__call__` method: it creates a `PipelineState` object and populates it with user inputs, then returns the output to the user based on the `output` argument. It also ensures that all pipeline-level config and components are exposed to all pipeline blocks by preparing and passing a `components` input.
+
+
+
+You can inspect the docstring of a `ModularPipeline` to check what arguments the pipeline accepts and how to specify the `output` you want. It will list all available outputs (basically everything in the intermediate pipeline state) so you can choose from the list.
+
+```py
+t2i_pipeline.doc
+```
+
+**Important**: It is important to always check the docstring because arguments can be different from standard pipelines that you're familar with. For example, in Modular Diffusers we standardized controlnet image input as `control_image`, but regular pipelines have inconsistencies over the names, e.g. controlnet text-to-image uses `image` while SDXL controlnet img2img uses `control_image`.
+
+**Note**: The `output` list might be longer than you expected - it includes everything in the intermediate state that you can choose to return. Most of the time, you'll just want `output="images"` or `output="latents"`.
+
+
+
+#### Text-to-Image, Image-to-Image, and Inpainting
+
+These are minimum inference examples for basic tasks: text-to-image, image-to-image, and inpainting. The process to create different pipelines is the same - only difference is the block classes presets. The inference is also more or less same to standard pipelines, but please always check `.doc` for correct input names and remember to pass `output="images"`.
+
+
+
+
+
+```py
+import torch
+from diffusers.modular_pipelines import SequentialPipelineBlocks
+from diffusers.modular_pipelines.stable_diffusion_xl import TEXT2IMAGE_BLOCKS
+
+# create pipeline from official blocks preset
+blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)
+
+modular_repo_id = "YiYiXu/modular-loader-t2i-0704"
+pipeline = blocks.init_pipeline(modular_repo_id)
+
+pipeline.load_default_components(torch_dtype=torch.float16)
+pipeline.to("cuda")
+
+# run pipeline, need to pass a "output=images" argument
+image = pipeline(prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", output="images")[0]
+image.save("modular_t2i_out.png")
+```
+
+
+
+
+```py
+import torch
+from diffusers.modular_pipelines import SequentialPipelineBlocks
+from diffusers.modular_pipelines.stable_diffusion_xl import IMAGE2IMAGE_BLOCKS
+
+# create pipeline from blocks preset
+blocks = SequentialPipelineBlocks.from_blocks_dict(IMAGE2IMAGE_BLOCKS)
+
+modular_repo_id = "YiYiXu/modular-loader-t2i-0704"
+pipeline = blocks.init_pipeline(modular_repo_id)
+
+pipeline.load_default_components(torch_dtype=torch.float16)
+pipeline.to("cuda")
+
+url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png"
+init_image = load_image(url)
+prompt = "a dog catching a frisbee in the jungle"
+image = pipeline(prompt=prompt, image=init_image, strength=0.8, output="images")[0]
+image.save("modular_i2i_out.png")
+```
+
+
+
+
+```py
+import torch
+from diffusers.modular_pipelines import SequentialPipelineBlocks
+from diffusers.modular_pipelines.stable_diffusion_xl import INPAINT_BLOCKS
+from diffusers.utils import load_image
+
+# create pipeline from blocks preset
+blocks = SequentialPipelineBlocks.from_blocks_dict(INPAINT_BLOCKS)
+
+modular_repo_id = "YiYiXu/modular-loader-t2i-0704"
+pipeline = blocks.init_pipeline(modular_repo_id)
+
+pipeline.load_default_components(torch_dtype=torch.float16)
+pipeline.to("cuda")
+
+img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png"
+mask_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-inpaint-mask.png"
+
+init_image = load_image(img_url)
+mask_image = load_image(mask_url)
+
+prompt = "A deep sea diver floating"
+image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.85, output="images")[0]
+image.save("moduar_inpaint_out.png")
+```
+
+
+
+
+#### ControlNet
+
+For ControlNet, we provide one auto block you can place at the `denoise` step. Let's create it and inspect it to see what it tells us.
+
+
+
+💡 **How to explore new tasks**: When you want to figure out how to do a specific task in Modular Diffusers, it is a good idea to start by checking what block classes presets we offer in `ALL_BLOCKS`. Then create the block instance and inspect it - it will show you the required components, description, and sub-blocks. This is crucial for understanding what each block does and what it needs.
+
+
+
+```py
+>>> from diffusers.modular_pipelines.stable_diffusion_xl import ALL_BLOCKS
+>>> ALL_BLOCKS["controlnet"]
+InsertableDict([
+ 0: ('denoise', )
+])
+>>> controlnet_blocks = ALL_BLOCKS["controlnet"]["denoise"]()
+>>> controlnet_blocks
+StableDiffusionXLAutoControlnetStep(
+ Class: SequentialPipelineBlocks
+
+ ====================================================================================================
+ This pipeline contains blocks that are selected at runtime based on inputs.
+ Trigger Inputs: {'mask', 'control_mode', 'control_image', 'controlnet_cond'}
+ Use `get_execution_blocks()` with input names to see selected blocks (e.g. `get_execution_blocks('mask')`).
+ ====================================================================================================
+
+
+ Description: Controlnet auto step that prepare the controlnet input and denoise the latents. It works for both controlnet and controlnet_union and supports text2img, img2img and inpainting tasks. (it should be replace at 'denoise' step)
+
+
+ Components:
+ controlnet (`ControlNetUnionModel`)
+ control_image_processor (`VaeImageProcessor`)
+ scheduler (`EulerDiscreteScheduler`)
+ unet (`UNet2DConditionModel`)
+ guider (`ClassifierFreeGuidance`)
+
+ Sub-Blocks:
+ [0] controlnet_input (StableDiffusionXLAutoControlNetInputStep)
+ Description: Controlnet Input step that prepare the controlnet input.
+ This is an auto pipeline block that works for both controlnet and controlnet_union.
+ (it should be called right before the denoise step) - `StableDiffusionXLControlNetUnionInputStep` is called to prepare the controlnet input when `control_mode` and `control_image` are provided.
+ - `StableDiffusionXLControlNetInputStep` is called to prepare the controlnet input when `control_image` is provided. - if neither `control_mode` nor `control_image` is provided, step will be skipped.
+
+ [1] controlnet_denoise (StableDiffusionXLAutoControlNetDenoiseStep)
+ Description: Denoise step that iteratively denoise the latents with controlnet. This is a auto pipeline block that using controlnet for text2img, img2img and inpainting tasks.This block should not be used without a controlnet_cond input - `StableDiffusionXLInpaintControlNetDenoiseStep` (inpaint_controlnet_denoise) is used when mask is provided. - `StableDiffusionXLControlNetDenoiseStep` (controlnet_denoise) is used when mask is not provided but controlnet_cond is provided. - If neither mask nor controlnet_cond are provided, step will be skipped.
+
+)
+```
+
+
+
+💡 **Auto Blocks**: This is first time we meet a Auto Blocks! `AutoPipelineBlocks` automatically adapt to your inputs by combining multiple workflows with conditional logic. This is why one convenient block can work for all tasks and controlnet types. See the [Auto Blocks Guide](./auto_pipeline_blocks.md) for more details.
+
+
+
+The block shows us it has two steps (prepare inputs + denoise) and supports all tasks with both controlnet and controlnet union. Most importantly, it tells us to place it at the 'denoise' step. Let's do exactly that:
+
+```py
+import torch
+from diffusers.modular_pipelines import SequentialPipelineBlocks
+from diffusers.modular_pipelines.stable_diffusion_xl import TEXT2IMAGE_BLOCKS, StableDiffusionXLAutoControlnetStep
+from diffusers.utils import load_image
+
+# create pipeline from blocks preset
+blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)
+
+# these two lines applies controlnet
+controlnet_blocks = StableDiffusionXLAutoControlnetStep()
+blocks.sub_blocks["denoise"] = controlnet_blocks
+```
+
+Before we convert the blocks into a pipeline and load its components, let's inspect the blocks and its docs again to make sure it was assembled correctly. You should be able to see that `controlnet` and `control_image_processor` are now listed as `Components`, so we should initialize the pipeline with a repo that contains desired loading specs for these 2 components.
+
+```py
+# make sure to a modular_repo including controlnet
+modular_repo_id = "YiYiXu/modular-demo-auto"
+pipeline = blocks.init_pipeline(modular_repo_id)
+pipeline.load_default_components(torch_dtype=torch.float16)
+pipeline.to("cuda")
+
+# generate
+canny_image = load_image(
+ "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/bird_canny.png"
+)
+image = pipeline(
+ prompt="a bird", controlnet_conditioning_scale=0.5, control_image=canny_image, output="images"
+)[0]
+image.save("modular_control_out.png")
+```
+
+#### IP-Adapter
+
+**Challenge time!** Before we show you how to apply IP-adapter, try doing it yourself! Use the same process we just walked you through with ControlNet: check the official blocks preset, inspect the block instance and docstring `.doc`, and adapt a regular IP-adapter example to modular.
+
+Let's walk through the steps:
+
+1. Check blocks preset
+
+```py
+>>> from diffusers.modular_pipelines.stable_diffusion_xl import ALL_BLOCKS
+>>> ALL_BLOCKS["ip_adapter"]
+InsertableDict([
+ 0: ('ip_adapter', )
+])
+```
+
+2. inspect the block & doc
+
+```
+>>> from diffusers.modular_pipelines.stable_diffusion_xl import StableDiffusionXLAutoIPAdapterStep
+>>> ip_adapter_blocks = StableDiffusionXLAutoIPAdapterStep()
+>>> ip_adapter_blocks
+StableDiffusionXLAutoIPAdapterStep(
+ Class: AutoPipelineBlocks
+
+ ====================================================================================================
+ This pipeline contains blocks that are selected at runtime based on inputs.
+ Trigger Inputs: {'ip_adapter_image'}
+ Use `get_execution_blocks()` with input names to see selected blocks (e.g. `get_execution_blocks('ip_adapter_image')`).
+ ====================================================================================================
+
+
+ Description: Run IP Adapter step if `ip_adapter_image` is provided. This step should be placed before the 'input' step.
+
+
+
+ Components:
+ image_encoder (`CLIPVisionModelWithProjection`)
+ feature_extractor (`CLIPImageProcessor`)
+ unet (`UNet2DConditionModel`)
+ guider (`ClassifierFreeGuidance`)
+
+ Sub-Blocks:
+ • ip_adapter [trigger: ip_adapter_image] (StableDiffusionXLIPAdapterStep)
+ Description: IP Adapter step that prepares ip adapter image embeddings.
+ Note that this step only prepares the embeddings - in order for it to work correctly, you need to load ip adapter weights into unet via ModularPipeline.load_ip_adapter() and pipeline.set_ip_adapter_scale().
+ See [ModularIPAdapterMixin](https://huggingface.co/docs/diffusers/api/loaders/ip_adapter#diffusers.loaders.ModularIPAdapterMixin) for more details
+
+)
+```
+3. follow the instruction to build
+
+```py
+import torch
+from diffusers.modular_pipelines import SequentialPipelineBlocks
+from diffusers.modular_pipelines.stable_diffusion_xl import TEXT2IMAGE_BLOCKS
+
+# create pipeline from official blocks preset
+blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)
+
+# insert ip_adapter_blocks before the input step as instructed
+blocks.sub_blocks.insert("ip_adapter", ip_adapter_blocks, 1)
+
+# inspec the blocks before you convert it into pipelines,
+# and make sure to use a repo that contains the loading spec for all components
+# for ip-adapter, you need image_encoder & feature_extractor
+modular_repo_id = "YiYiXu/modular-demo-auto"
+pipeline = blocks.init_pipeline(modular_repo_id)
+
+pipeline.load_default_components(torch_dtype=torch.float16)
+pipeline.load_ip_adapter(
+ "h94/IP-Adapter",
+ subfolder="sdxl_models",
+ weight_name="ip-adapter_sdxl.bin"
+)
+pipeline.set_ip_adapter_scale(0.8)
+pipeline.to("cuda")
+```
+
+4. adapt an example to modular
+
+We are using [this one](https://huggingface.co/docs/diffusers/using-diffusers/ip_adapter?ipadapter-variants=IP-Adapter+Plus#ip-adapter) from our IP-Adapter doc!
+
+
+```py
+from diffusers.utils import load_image
+image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png")
+image = pipeline(
+ prompt="a polar bear sitting in a chair drinking a milkshake",
+ ip_adapter_image=image,
+ negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
+ output="images"
+)[0]
+image.save("modular_ipa_out.png")
+```
+
+
diff --git a/docs/source/en/modular_diffusers/overview.md b/docs/source/en/modular_diffusers/overview.md
new file mode 100644
index 0000000000..9702cea063
--- /dev/null
+++ b/docs/source/en/modular_diffusers/overview.md
@@ -0,0 +1,42 @@
+
+
+# Getting Started with Modular Diffusers
+
+
+
+🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
+
+
+
+With Modular Diffusers, we introduce a unified pipeline system that simplifies how you work with diffusion models. Instead of creating separate pipelines for each task, Modular Diffusers lets you:
+
+**Write Only What's New**: You won't need to write an entire pipeline from scratch every time you have a new use case. You can create pipeline blocks just for your new workflow's unique aspects and reuse existing blocks for existing functionalities.
+
+**Assemble Like LEGO®**: You can mix and match between blocks in flexible ways. This allows you to write dedicated blocks unique to specific workflows, and then assemble different blocks into a pipeline that can be used more conveniently for multiple workflows.
+
+
+Here's how our guides are organized to help you navigate the Modular Diffusers documentation:
+
+### 🚀 Running Pipelines
+- **[Modular Pipeline Guide](./modular_pipeline.md)** - How to use predefined blocks to build a pipeline and run it
+- **[Components Manager Guide](./components_manager.md)** - How to manage and reuse components across multiple pipelines
+
+### 📚 Creating PipelineBlocks
+- **[Pipeline and Block States](./modular_diffusers_states.md)** - Understanding PipelineState and BlockState
+- **[Pipeline Block](./pipeline_block.md)** - How to write custom PipelineBlocks
+- **[SequentialPipelineBlocks](sequential_pipeline_blocks.md)** - Connecting blocks in sequence
+- **[LoopSequentialPipelineBlocks](./loop_sequential_pipeline_blocks.md)** - Creating iterative workflows
+- **[AutoPipelineBlocks](./auto_pipeline_blocks.md)** - Conditional block selection
+
+### 🎯 Practical Examples
+- **[End-to-End Example](./end_to_end_guide.md)** - Complete end-to-end examples including sharing your workflow in huggingface hub and deplying UI nodes
diff --git a/docs/source/en/modular_diffusers/pipeline_block.md b/docs/source/en/modular_diffusers/pipeline_block.md
new file mode 100644
index 0000000000..17a819732f
--- /dev/null
+++ b/docs/source/en/modular_diffusers/pipeline_block.md
@@ -0,0 +1,292 @@
+
+
+# PipelineBlock
+
+
+
+🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
+
+
+
+In Modular Diffusers, you build your workflow using `ModularPipelineBlocks`. We support 4 different types of blocks: `PipelineBlock`, `SequentialPipelineBlocks`, `LoopSequentialPipelineBlocks`, and `AutoPipelineBlocks`. Among them, `PipelineBlock` is the most fundamental building block of the whole system - it's like a brick in a Lego system. These blocks are designed to easily connect with each other, allowing for modular construction of creative and potentially very complex workflows.
+
+
+
+**Important**: `PipelineBlock`s are definitions/specifications, not runnable pipelines. They define what a block should do and what data it needs, but you need to convert them into a `ModularPipeline` to actually execute them. For information on creating and running pipelines, see the [Modular Pipeline guide](./modular_pipeline.md).
+
+
+
+In this tutorial, we will focus on how to write a basic `PipelineBlock` and how it interacts with the pipeline state.
+
+## PipelineState
+
+Before we dive into creating `PipelineBlock`s, make sure you have a basic understanding of `PipelineState`. It acts as the global state container that all blocks operate on - each block gets a local view (`BlockState`) of the relevant variables it needs from `PipelineState`, performs its operations, and then updates `PipelineState` with any changes. See the [PipelineState and BlockState guide](./modular_diffusers_states.md) for more details.
+
+## Define a `PipelineBlock`
+
+To write a `PipelineBlock` class, you need to define a few properties that determine how your block interacts with the pipeline state. Understanding these properties is crucial - they define what data your block can access and what it can produce.
+
+The three main properties you need to define are:
+- `inputs`: Immutable values from the user that cannot be modified
+- `intermediate_inputs`: Mutable values from previous blocks that can be read and modified
+- `intermediate_outputs`: New values your block creates for subsequent blocks and user access
+
+Let's explore each one and understand how they work with the pipeline state.
+
+**Inputs: Immutable User Values**
+
+Inputs are variables your block needs from the immutable pipeline state - these are user-provided values that cannot be modified by any block. You define them using `InputParam`:
+
+```py
+user_inputs = [
+ InputParam(name="image", type_hint="PIL.Image", description="raw input image to process")
+]
+```
+
+When you list something as an input, you're saying "I need this value directly from the end user, and I will talk to them directly, telling them what I need in the 'description' field. They will provide it and it will come to me unchanged."
+
+This is especially useful for raw values that serve as the "source of truth" in your workflow. For example, with a raw image, many workflows require preprocessing steps like resizing that a previous block might have performed. But in many cases, you also want the raw PIL image. In some inpainting workflows, you need the original image to overlay with the generated result for better control and consistency.
+
+**Intermediate Inputs: Mutable Values from Previous Blocks, or Users**
+
+Intermediate inputs are variables your block needs from the mutable pipeline state - these are values that can be read and modified. They're typically created by previous blocks, but could also be directly provided by the user if not the case:
+
+```py
+user_intermediate_inputs = [
+ InputParam(name="processed_image", type_hint="torch.Tensor", description="image that has been preprocessed and normalized"),
+]
+```
+
+When you list something as an intermediate input, you're saying "I need this value, but I want to work with a different block that has already created it. I already know for sure that I can get it from this other block, but it's okay if other developers want use something different."
+
+**Intermediate Outputs: New Values for Subsequent Blocks and User Access**
+
+Intermediate outputs are new variables your block creates and adds to the mutable pipeline state. They serve two purposes:
+
+1. **For subsequent blocks**: They can be used as intermediate inputs by other blocks in the pipeline
+2. **For users**: They become available as final outputs that users can access when running the pipeline
+
+```py
+user_intermediate_outputs = [
+ OutputParam(name="image_latents", description="latents representing the image")
+]
+```
+
+Intermediate inputs and intermediate outputs work together like Lego studs and anti-studs - they're the connection points that make blocks modular. When one block produces an intermediate output, it becomes available as an intermediate input for subsequent blocks. This is where the "modular" nature of the system really shines - blocks can be connected and reconnected in different ways as long as their inputs and outputs match.
+
+Additionally, all intermediate outputs are accessible to users when they run the pipeline, typically you would only need the final images, but they are also able to access intermediate results like latents, embeddings, or other processing steps.
+
+**The `__call__` Method Structure**
+
+Your `PipelineBlock`'s `__call__` method should follow this structure:
+
+```py
+def __call__(self, components, state):
+ # Get a local view of the state variables this block needs
+ block_state = self.get_block_state(state)
+
+ # Your computation logic here
+ # block_state contains all your inputs and intermediate_inputs
+ # You can access them like: block_state.image, block_state.processed_image
+
+ # Update the pipeline state with your updated block_states
+ self.set_block_state(state, block_state)
+ return components, state
+```
+
+The `block_state` object contains all the variables you defined in `inputs` and `intermediate_inputs`, making them easily accessible for your computation.
+
+**Components and Configs**
+
+You can define the components and pipeline-level configs your block needs using `ComponentSpec` and `ConfigSpec`:
+
+```py
+from diffusers import ComponentSpec, ConfigSpec
+
+# Define components your block needs
+expected_components = [
+ ComponentSpec(name="unet", type_hint=UNet2DConditionModel),
+ ComponentSpec(name="scheduler", type_hint=EulerDiscreteScheduler)
+]
+
+# Define pipeline-level configs
+expected_config = [
+ ConfigSpec("force_zeros_for_empty_prompt", True)
+]
+```
+
+**Components**: In the `ComponentSpec`, you must provide a `name` and ideally a `type_hint`. You can also specify a `default_creation_method` to indicate whether the component should be loaded from a pretrained model or created with default configurations. The actual loading details (`repo`, `subfolder`, `variant` and `revision` fields) are typically specified when creating the pipeline, as we covered in the [Modular Pipeline Guide](./modular_pipeline.md).
+
+**Configs**: Pipeline-level settings that control behavior across all blocks.
+
+When you convert your blocks into a pipeline using `blocks.init_pipeline()`, the pipeline collects all component requirements from the blocks and fetches the loading specs from the modular repository. The components are then made available to your block as the first argument of the `__call__` method. You can access any component you need using dot notation:
+
+```py
+def __call__(self, components, state):
+ # Access components using dot notation
+ unet = components.unet
+ vae = components.vae
+ scheduler = components.scheduler
+```
+
+That's all you need to define in order to create a `PipelineBlock`. There is no hidden complexity. In fact we are going to create a helper function that take exactly these variables as input and return a pipeline block. We will use this helper function through out the tutorial to create test blocks
+
+Note that for `__call__` method, the only part you should implement differently is the part between `self.get_block_state()` and `self.set_block_state()`, which can be abstracted into a simple function that takes `block_state` and returns the updated state. Our helper function accepts a `block_fn` that does exactly that.
+
+**Helper Function**
+
+```py
+from diffusers.modular_pipelines import PipelineBlock, InputParam, OutputParam
+import torch
+
+def make_block(inputs=[], intermediate_inputs=[], intermediate_outputs=[], block_fn=None, description=None):
+ class TestBlock(PipelineBlock):
+ model_name = "test"
+
+ @property
+ def inputs(self):
+ return inputs
+
+ @property
+ def intermediate_inputs(self):
+ return intermediate_inputs
+
+ @property
+ def intermediate_outputs(self):
+ return intermediate_outputs
+
+ @property
+ def description(self):
+ return description if description is not None else ""
+
+ def __call__(self, components, state):
+ block_state = self.get_block_state(state)
+ if block_fn is not None:
+ block_state = block_fn(block_state, state)
+ self.set_block_state(state, block_state)
+ return components, state
+
+ return TestBlock
+```
+
+## Example: Creating a Simple Pipeline Block
+
+Let's create a simple block to see how these definitions interact with the pipeline state. To better understand what's happening, we'll print out the states before and after updates to inspect them:
+
+```py
+inputs = [
+ InputParam(name="image", type_hint="PIL.Image", description="raw input image to process")
+]
+
+intermediate_inputs = [InputParam(name="batch_size", type_hint=int)]
+
+intermediate_outputs = [
+ OutputParam(name="image_latents", description="latents representing the image")
+]
+
+def image_encoder_block_fn(block_state, pipeline_state):
+ print(f"pipeline_state (before update): {pipeline_state}")
+ print(f"block_state (before update): {block_state}")
+
+ # Simulate processing the image
+ block_state.image = torch.randn(1, 3, 512, 512)
+ block_state.batch_size = block_state.batch_size * 2
+ block_state.processed_image = [torch.randn(1, 3, 512, 512)] * block_state.batch_size
+ block_state.image_latents = torch.randn(1, 4, 64, 64)
+
+ print(f"block_state (after update): {block_state}")
+ return block_state
+
+# Create a block with our definitions
+image_encoder_block_cls = make_block(
+ inputs=inputs,
+ intermediate_inputs=intermediate_inputs,
+ intermediate_outputs=intermediate_outputs,
+ block_fn=image_encoder_block_fn,
+ description="Encode raw image into its latent presentation"
+)
+image_encoder_block = image_encoder_block_cls()
+pipe = image_encoder_block.init_pipeline()
+```
+
+Let's check the pipeline's docstring to see what inputs it expects:
+```py
+>>> print(pipe.doc)
+class TestBlock
+
+ Encode raw image into its latent presentation
+
+ Inputs:
+
+ image (`PIL.Image`, *optional*):
+ raw input image to process
+
+ batch_size (`int`, *optional*):
+
+ Outputs:
+
+ image_latents (`None`):
+ latents representing the image
+```
+
+Notice that `batch_size` appears as an input even though we defined it as an intermediate input. This happens because no previous block provided it, so the pipeline makes it available as a user input. However, unlike regular inputs, this value goes directly into the mutable intermediate state.
+
+Now let's run the pipeline:
+
+```py
+from diffusers.utils import load_image
+
+image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png")
+state = pipe(image=image, batch_size=2)
+print(f"pipeline_state (after update): {state}")
+```
+```out
+pipeline_state (before update): PipelineState(
+ inputs={
+ image:
+ },
+ intermediates={
+ batch_size: 2
+ },
+)
+block_state (before update): BlockState(
+ image:
+ batch_size: 2
+)
+
+block_state (after update): BlockState(
+ image: Tensor(dtype=torch.float32, shape=torch.Size([1, 3, 512, 512]))
+ batch_size: 4
+ processed_image: List[4] of Tensors with shapes [torch.Size([1, 3, 512, 512]), torch.Size([1, 3, 512, 512]), torch.Size([1, 3, 512, 512]), torch.Size([1, 3, 512, 512])]
+ image_latents: Tensor(dtype=torch.float32, shape=torch.Size([1, 4, 64, 64]))
+)
+pipeline_state (after update): PipelineState(
+ inputs={
+ image:
+ },
+ intermediates={
+ batch_size: 4
+ image_latents: Tensor(dtype=torch.float32, shape=torch.Size([1, 4, 64, 64]))
+ },
+)
+```
+
+**Key Observations:**
+
+1. **Before the update**: `image` (the input) goes to the immutable inputs dict, while `batch_size` (the intermediate_input) goes to the mutable intermediates dict, and both are available in `block_state`.
+
+2. **After the update**:
+ - **`image` (inputs)** changed in `block_state` but not in `pipeline_state` - this change is local to the block only.
+ - **`batch_size (intermediate_inputs)`** was updated in both `block_state` and `pipeline_state` - this change affects subsequent blocks (we didn't need to declare it as an intermediate output since it was already in the intermediates dict)
+ - **`image_latents (intermediate_outputs)`** was added to `pipeline_state` because it was declared as an intermediate output
+ - **`processed_image`** was not added to `pipeline_state` because it wasn't declared as an intermediate output
\ No newline at end of file
diff --git a/docs/source/en/modular_diffusers/sequential_pipeline_blocks.md b/docs/source/en/modular_diffusers/sequential_pipeline_blocks.md
new file mode 100644
index 0000000000..a683f0d065
--- /dev/null
+++ b/docs/source/en/modular_diffusers/sequential_pipeline_blocks.md
@@ -0,0 +1,189 @@
+
+
+# SequentialPipelineBlocks
+
+
+
+🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
+
+
+
+`SequentialPipelineBlocks` is a subclass of `ModularPipelineBlocks`. Unlike `PipelineBlock`, it is a multi-block that composes other blocks together in sequence, creating modular workflows where data flows from one block to the next. It's one of the most common ways to build complex pipelines by combining simpler building blocks.
+
+
+
+Other types of multi-blocks include [AutoPipelineBlocks](auto_pipeline_blocks.md) (for conditional block selection) and [LoopSequentialPipelineBlocks](loop_sequential_pipeline_blocks.md) (for iterative workflows). For information on creating individual blocks, see the [PipelineBlock guide](pipeline_block.md).
+
+Additionally, like all `ModularPipelineBlocks`, `SequentialPipelineBlocks` are definitions/specifications, not runnable pipelines. You need to convert them into a `ModularPipeline` to actually execute them. For information on creating and running pipelines, see the [Modular Pipeline guide](modular_pipeline.md).
+
+
+
+In this tutorial, we will focus on how to create `SequentialPipelineBlocks` and how blocks connect and work together.
+
+The key insight is that blocks connect through their intermediate inputs and outputs - the "studs and anti-studs" we discussed in the [PipelineBlock guide](pipeline_block.md). When one block produces an intermediate output, it becomes available as an intermediate input for subsequent blocks.
+
+Let's explore this through an example. We will use the same helper function from the PipelineBlock guide to create blocks.
+
+```py
+from diffusers.modular_pipelines import PipelineBlock, InputParam, OutputParam
+import torch
+
+def make_block(inputs=[], intermediate_inputs=[], intermediate_outputs=[], block_fn=None, description=None):
+ class TestBlock(PipelineBlock):
+ model_name = "test"
+
+ @property
+ def inputs(self):
+ return inputs
+
+ @property
+ def intermediate_inputs(self):
+ return intermediate_inputs
+
+ @property
+ def intermediate_outputs(self):
+ return intermediate_outputs
+
+ @property
+ def description(self):
+ return description if description is not None else ""
+
+ def __call__(self, components, state):
+ block_state = self.get_block_state(state)
+ if block_fn is not None:
+ block_state = block_fn(block_state, state)
+ self.set_block_state(state, block_state)
+ return components, state
+
+ return TestBlock
+```
+
+Let's create a block that produces `batch_size`, which we'll call "input_block":
+
+```py
+def input_block_fn(block_state, pipeline_state):
+
+ batch_size = len(block_state.prompt)
+ block_state.batch_size = batch_size * block_state.num_images_per_prompt
+
+ return block_state
+
+input_block_cls = make_block(
+ inputs=[
+ InputParam(name="prompt", type_hint=list, description="list of text prompts"),
+ InputParam(name="num_images_per_prompt", type_hint=int, description="number of images per prompt")
+ ],
+ intermediate_outputs=[
+ OutputParam(name="batch_size", description="calculated batch size")
+ ],
+ block_fn=input_block_fn,
+ description="A block that determines batch_size based on the number of prompts and num_images_per_prompt argument."
+)
+input_block = input_block_cls()
+```
+
+Now let's create a second block that uses the `batch_size` from the first block:
+
+```py
+def image_encoder_block_fn(block_state, pipeline_state):
+ # Simulate processing the image
+ block_state.image = torch.randn(1, 3, 512, 512)
+ block_state.batch_size = block_state.batch_size * 2
+ block_state.image_latents = torch.randn(1, 4, 64, 64)
+ return block_state
+
+image_encoder_block_cls = make_block(
+ inputs=[
+ InputParam(name="image", type_hint="PIL.Image", description="raw input image to process")
+ ],
+ intermediate_inputs=[
+ InputParam(name="batch_size", type_hint=int)
+ ],
+ intermediate_outputs=[
+ OutputParam(name="image_latents", description="latents representing the image")
+ ],
+ block_fn=image_encoder_block_fn,
+ description="Encode raw image into its latent presentation"
+)
+image_encoder_block = image_encoder_block_cls()
+```
+
+Now let's connect these blocks to create a `SequentialPipelineBlocks`:
+
+```py
+from diffusers.modular_pipelines import SequentialPipelineBlocks, InsertableDict
+
+# Define a dict mapping block names to block instances
+blocks_dict = InsertableDict()
+blocks_dict["input"] = input_block
+blocks_dict["image_encoder"] = image_encoder_block
+
+# Create the SequentialPipelineBlocks
+blocks = SequentialPipelineBlocks.from_blocks_dict(blocks_dict)
+```
+
+Now you have a `SequentialPipelineBlocks` with 2 blocks:
+
+```py
+>>> blocks
+SequentialPipelineBlocks(
+ Class: ModularPipelineBlocks
+
+ Description:
+
+
+ Sub-Blocks:
+ [0] input (TestBlock)
+ Description: A block that determines batch_size based on the number of prompts and num_images_per_prompt argument.
+
+ [1] image_encoder (TestBlock)
+ Description: Encode raw image into its latent presentation
+
+)
+```
+
+When you inspect `blocks.doc`, you can see that `batch_size` is not listed as an input. The pipeline automatically detects that the `input_block` can produce `batch_size` for the `image_encoder_block`, so it doesn't ask the user to provide it.
+
+```py
+>>> print(blocks.doc)
+class SequentialPipelineBlocks
+
+ Inputs:
+
+ prompt (`None`, *optional*):
+
+ num_images_per_prompt (`None`, *optional*):
+
+ image (`PIL.Image`, *optional*):
+ raw input image to process
+
+ Outputs:
+
+ batch_size (`None`):
+
+ image_latents (`None`):
+ latents representing the image
+```
+
+At runtime, you have data flow like this:
+
+
+
+**How SequentialPipelineBlocks Works:**
+
+1. Blocks are executed in the order they're registered in the `blocks_dict`
+2. Outputs from one block become available as intermediate inputs to all subsequent blocks
+3. The pipeline automatically figures out which values need to be provided by the user and which will be generated by previous blocks
+4. Each block maintains its own behavior and operates through its defined interface, while collectively these interfaces determine what the entire pipeline accepts and produces
+
+What happens within each block follows the same pattern we described earlier: each block gets its own `block_state` with the relevant inputs and intermediate inputs, performs its computation, and updates the pipeline state with its intermediate outputs.
\ No newline at end of file
diff --git a/docs/source/en/optimization/cache.md b/docs/source/en/optimization/cache.md
index ea510aed66..881529b27f 100644
--- a/docs/source/en/optimization/cache.md
+++ b/docs/source/en/optimization/cache.md
@@ -1,4 +1,4 @@
-
+
+# Compile and offloading quantized models
+
+Optimizing models often involves trade-offs between [inference speed](./fp16) and [memory-usage](./memory). For instance, while [caching](./cache) can boost inference speed, it also increases memory consumption since it needs to store the outputs of intermediate attention layers. A more balanced optimization strategy combines quantizing a model, [torch.compile](./fp16#torchcompile) and various [offloading methods](./memory#offloading).
+
+> [!TIP]
+> Check the [torch.compile](./fp16#torchcompile) guide to learn more about compilation and how they can be applied here. For example, regional compilation can significantly reduce compilation time without giving up any speedups.
+
+For image generation, combining quantization and [model offloading](./memory#model-offloading) can often give the best trade-off between quality, speed, and memory. Group offloading is not as effective for image generation because it is usually not possible to *fully* overlap data transfer if the compute kernel finishes faster. This results in some communication overhead between the CPU and GPU.
+
+For video generation, combining quantization and [group-offloading](./memory#group-offloading) tends to be better because video models are more compute-bound.
+
+The table below provides a comparison of optimization strategy combinations and their impact on latency and memory-usage for Flux.
+
+| combination | latency (s) | memory-usage (GB) |
+|---|---|---|
+| quantization | 32.602 | 14.9453 |
+| quantization, torch.compile | 25.847 | 14.9448 |
+| quantization, torch.compile, model CPU offloading | 32.312 | 12.2369 |
+These results are benchmarked on Flux with a RTX 4090. The transformer and text_encoder components are quantized. Refer to the [benchmarking script](https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d) if you're interested in evaluating your own model.
+
+This guide will show you how to compile and offload a quantized model with [bitsandbytes](../quantization/bitsandbytes#torchcompile). Make sure you are using [PyTorch nightly](https://pytorch.org/get-started/locally/) and the latest version of bitsandbytes.
+
+```bash
+pip install -U bitsandbytes
+```
+
+## Quantization and torch.compile
+
+Start by [quantizing](../quantization/overview) a model to reduce the memory required for storage and [compiling](./fp16#torchcompile) it to accelerate inference.
+
+Configure the [Dynamo](https://docs.pytorch.org/docs/stable/torch.compiler_dynamo_overview.html) `capture_dynamic_output_shape_ops = True` to handle dynamic outputs when compiling bitsandbytes models.
+
+```py
+import torch
+from diffusers import DiffusionPipeline
+from diffusers.quantizers import PipelineQuantizationConfig
+
+torch._dynamo.config.capture_dynamic_output_shape_ops = True
+
+# quantize
+pipeline_quant_config = PipelineQuantizationConfig(
+ quant_backend="bitsandbytes_4bit",
+ quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
+ components_to_quantize=["transformer", "text_encoder_2"],
+)
+pipeline = DiffusionPipeline.from_pretrained(
+ "black-forest-labs/FLUX.1-dev",
+ quantization_config=pipeline_quant_config,
+ torch_dtype=torch.bfloat16,
+).to("cuda")
+
+# compile
+pipeline.transformer.to(memory_format=torch.channels_last)
+pipeline.transformer.compile(mode="max-autotune", fullgraph=True)
+pipeline("""
+ cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
+ highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
+"""
+).images[0]
+```
+
+## Quantization, torch.compile, and offloading
+
+In addition to quantization and torch.compile, try offloading if you need to reduce memory-usage further. Offloading moves various layers or model components from the CPU to the GPU as needed for computations.
+
+Configure the [Dynamo](https://docs.pytorch.org/docs/stable/torch.compiler_dynamo_overview.html) `cache_size_limit` during offloading to avoid excessive recompilation and set `capture_dynamic_output_shape_ops = True` to handle dynamic outputs when compiling bitsandbytes models.
+
+
+
+
+[Model CPU offloading](./memory#model-offloading) moves an individual pipeline component, like the transformer model, to the GPU when it is needed for computation. Otherwise, it is offloaded to the CPU.
+
+```py
+import torch
+from diffusers import DiffusionPipeline
+from diffusers.quantizers import PipelineQuantizationConfig
+
+torch._dynamo.config.cache_size_limit = 1000
+torch._dynamo.config.capture_dynamic_output_shape_ops = True
+
+# quantize
+pipeline_quant_config = PipelineQuantizationConfig(
+ quant_backend="bitsandbytes_4bit",
+ quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
+ components_to_quantize=["transformer", "text_encoder_2"],
+)
+pipeline = DiffusionPipeline.from_pretrained(
+ "black-forest-labs/FLUX.1-dev",
+ quantization_config=pipeline_quant_config,
+ torch_dtype=torch.bfloat16,
+).to("cuda")
+
+# model CPU offloading
+pipeline.enable_model_cpu_offload()
+
+# compile
+pipeline.transformer.compile()
+pipeline(
+ "cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain"
+).images[0]
+```
+
+
+
+
+[Group offloading](./memory#group-offloading) moves the internal layers of an individual pipeline component, like the transformer model, to the GPU for computation and offloads it when it's not required. At the same time, it uses the [CUDA stream](./memory#cuda-stream) feature to prefetch the next layer for execution.
+
+By overlapping computation and data transfer, it is faster than model CPU offloading while also saving memory.
+
+```py
+# pip install ftfy
+import torch
+from diffusers import AutoModel, DiffusionPipeline
+from diffusers.hooks import apply_group_offloading
+from diffusers.utils import export_to_video
+from diffusers.quantizers import PipelineQuantizationConfig
+from transformers import UMT5EncoderModel
+
+torch._dynamo.config.cache_size_limit = 1000
+torch._dynamo.config.capture_dynamic_output_shape_ops = True
+
+# quantize
+pipeline_quant_config = PipelineQuantizationConfig(
+ quant_backend="bitsandbytes_4bit",
+ quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
+ components_to_quantize=["transformer", "text_encoder"],
+)
+
+text_encoder = UMT5EncoderModel.from_pretrained(
+ "Wan-AI/Wan2.1-T2V-14B-Diffusers", subfolder="text_encoder", torch_dtype=torch.bfloat16
+)
+pipeline = DiffusionPipeline.from_pretrained(
+ "Wan-AI/Wan2.1-T2V-14B-Diffusers",
+ quantization_config=pipeline_quant_config,
+ torch_dtype=torch.bfloat16,
+).to("cuda")
+
+# group offloading
+onload_device = torch.device("cuda")
+offload_device = torch.device("cpu")
+
+pipeline.transformer.enable_group_offload(
+ onload_device=onload_device,
+ offload_device=offload_device,
+ offload_type="leaf_level",
+ use_stream=True,
+ non_blocking=True
+)
+pipeline.vae.enable_group_offload(
+ onload_device=onload_device,
+ offload_device=offload_device,
+ offload_type="leaf_level",
+ use_stream=True,
+ non_blocking=True
+)
+apply_group_offloading(
+ pipeline.text_encoder,
+ onload_device=onload_device,
+ offload_type="leaf_level",
+ use_stream=True,
+ non_blocking=True
+)
+
+# compile
+pipeline.transformer.compile()
+
+prompt = """
+The camera rushes from far to near in a low-angle shot,
+revealing a white ferret on a log. It plays, leaps into the water, and emerges, as the camera zooms in
+for a close-up. Water splashes berry bushes nearby, while moss, snow, and leaves blanket the ground.
+Birch trees and a light blue sky frame the scene, with ferns in the foreground. Side lighting casts dynamic
+shadows and warm highlights. Medium composition, front view, low angle, with depth of field.
+"""
+negative_prompt = """
+Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality,
+low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured,
+misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards
+"""
+
+output = pipeline(
+ prompt=prompt,
+ negative_prompt=negative_prompt,
+ num_frames=81,
+ guidance_scale=5.0,
+).frames[0]
+export_to_video(output, "output.mp4", fps=16)
+```
+
+
+
\ No newline at end of file
diff --git a/docs/source/en/optimization/tome.md b/docs/source/en/optimization/tome.md
index f379bc97f4..ab368c9ccb 100644
--- a/docs/source/en/optimization/tome.md
+++ b/docs/source/en/optimization/tome.md
@@ -1,4 +1,4 @@
-
-# Quantization
+# Getting started
Quantization focuses on representing data with fewer bits while also trying to preserve the precision of the original data. This often means converting a data type to represent the same information with fewer bits. For example, if your model weights are stored as 32-bit floating points and they're quantized to 16-bit floating points, this halves the model size which makes it easier to store and reduces memory usage. Lower precision can also speedup inference because it takes less time to perform calculations with fewer bits.
@@ -19,19 +19,25 @@ Diffusers supports multiple quantization backends to make large diffusion models
## Pipeline-level quantization
-There are two ways you can use [`~quantizers.PipelineQuantizationConfig`] depending on the level of control you want over the quantization specifications of each model in the pipeline.
+There are two ways to use [`~quantizers.PipelineQuantizationConfig`] depending on how much customization you want to apply to the quantization configuration.
-- for more basic and simple use cases, you only need to define the `quant_backend`, `quant_kwargs`, and `components_to_quantize`
-- for more granular quantization control, provide a `quant_mapping` that provides the quantization specifications for the individual model components
+- for basic use cases, define the `quant_backend`, `quant_kwargs`, and `components_to_quantize` arguments
+- for granular quantization control, define a `quant_mapping` that provides the quantization configuration for individual model components
-### Simple quantization
+### Basic quantization
Initialize [`~quantizers.PipelineQuantizationConfig`] with the following parameters.
- `quant_backend` specifies which quantization backend to use. Currently supported backends include: `bitsandbytes_4bit`, `bitsandbytes_8bit`, `gguf`, `quanto`, and `torchao`.
-- `quant_kwargs` contains the specific quantization arguments to use.
+- `quant_kwargs` specifies the quantization arguments to use.
+
+> [!TIP]
+> These `quant_kwargs` arguments are different for each backend. Refer to the [Quantization API](../api/quantization) docs to view the arguments for each backend.
+
- `components_to_quantize` specifies which components of the pipeline to quantize. Typically, you should quantize the most compute intensive components like the transformer. The text encoder is another component to consider quantizing if a pipeline has more than one such as [`FluxPipeline`]. The example below quantizes the T5 text encoder in [`FluxPipeline`] while keeping the CLIP model intact.
+The example below loads the bitsandbytes backend with the following arguments from [`~quantizers.quantization_config.BitsAndBytesConfig`], `load_in_4bit`, `bnb_4bit_quant_type`, and `bnb_4bit_compute_dtype`.
+
```py
import torch
from diffusers import DiffusionPipeline
@@ -56,13 +62,13 @@ pipe = DiffusionPipeline.from_pretrained(
image = pipe("photo of a cute dog").images[0]
```
-### quant_mapping
+### Advanced quantization
-The `quant_mapping` argument provides more flexible options for how to quantize each individual component in a pipeline, like combining different quantization backends.
+The `quant_mapping` argument provides more options for how to quantize each individual component in a pipeline, like combining different quantization backends.
Initialize [`~quantizers.PipelineQuantizationConfig`] and pass a `quant_mapping` to it. The `quant_mapping` allows you to specify the quantization options for each component in the pipeline such as the transformer and text encoder.
-The example below uses two quantization backends, [`~quantizers.QuantoConfig`] and [`transformers.BitsAndBytesConfig`], for the transformer and text encoder.
+The example below uses two quantization backends, [`~quantizers.quantization_config.QuantoConfig`] and [`transformers.BitsAndBytesConfig`], for the transformer and text encoder.
```py
import torch
@@ -85,7 +91,7 @@ pipeline_quant_config = PipelineQuantizationConfig(
There is a separate bitsandbytes backend in [Transformers](https://huggingface.co/docs/transformers/main_classes/quantization#transformers.BitsAndBytesConfig). You need to import and use [`transformers.BitsAndBytesConfig`] for components that come from Transformers. For example, `text_encoder_2` in [`FluxPipeline`] is a [`~transformers.T5EncoderModel`] from Transformers so you need to use [`transformers.BitsAndBytesConfig`] instead of [`diffusers.BitsAndBytesConfig`].
> [!TIP]
-> Use the [simple quantization](#simple-quantization) method above if you don't want to manage these distinct imports or aren't sure where each pipeline component comes from.
+> Use the [basic quantization](#basic-quantization) method above if you don't want to manage these distinct imports or aren't sure where each pipeline component comes from.
```py
import torch
@@ -129,4 +135,4 @@ Check out the resources below to learn more about quantization.
- The Transformers quantization [Overview](https://huggingface.co/docs/transformers/quantization/overview#when-to-use-what) provides an overview of the pros and cons of different quantization backends.
-- Read the [Exploring Quantization Backends in Diffusers](https://huggingface.co/blog/diffusers-quantization) blog post for a brief introduction to each quantization backend, how to choose a backend, and combining quantization with other memory optimizations.
\ No newline at end of file
+- Read the [Exploring Quantization Backends in Diffusers](https://huggingface.co/blog/diffusers-quantization) blog post for a brief introduction to each quantization backend, how to choose a backend, and combining quantization with other memory optimizations.
diff --git a/docs/source/en/quantization/torchao.md b/docs/source/en/quantization/torchao.md
index 95b30a6e01..5c7578dcbb 100644
--- a/docs/source/en/quantization/torchao.md
+++ b/docs/source/en/quantization/torchao.md
@@ -1,4 +1,4 @@
-
-
-# Overview
-
-Welcome to 🧨 Diffusers! If you're new to diffusion models and generative AI, and want to learn more, then you've come to the right place. These beginner-friendly tutorials are designed to provide a gentle introduction to diffusion models and help you understand the library fundamentals - the core components and how 🧨 Diffusers is meant to be used.
-
-You'll learn how to use a pipeline for inference to rapidly generate things, and then deconstruct that pipeline to really understand how to use the library as a modular toolbox for building your own diffusion systems. In the next lesson, you'll learn how to train your own diffusion model to generate what you want.
-
-After completing the tutorials, you'll have gained the necessary skills to start exploring the library on your own and see how to use it for your own projects and applications.
-
-Feel free to join our community on [Discord](https://discord.com/invite/JfAtkvEtRb) or the [forums](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) to connect and collaborate with other users and developers!
-
-Let's start diffusing! 🧨
diff --git a/docs/source/en/tutorials/using_peft_for_inference.md b/docs/source/en/tutorials/using_peft_for_inference.md
index 7199361d5e..5cd47f8674 100644
--- a/docs/source/en/tutorials/using_peft_for_inference.md
+++ b/docs/source/en/tutorials/using_peft_for_inference.md
@@ -1,4 +1,4 @@
-
+
+# Batch inference
+
+Batch inference processes multiple prompts at a time to increase throughput. It is more efficient because processing multiple prompts at once maximizes GPU usage versus processing a single prompt and underutilizing the GPU.
+
+The downside is increased latency because you must wait for the entire batch to complete, and more GPU memory is required for large batches.
+
+
+
+
+For text-to-image, pass a list of prompts to the pipeline.
+
+```py
+import torch
+from diffusers import DiffusionPipeline
+
+pipeline = DiffusionPipeline.from_pretrained(
+ "stabilityai/stable-diffusion-xl-base-1.0",
+ torch_dtype=torch.float16
+).to("cuda")
+
+prompts = [
+ "cinematic photo of A beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
+ "cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
+ "pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
+]
+
+images = pipeline(
+ prompt=prompts,
+).images
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 12))
+axes = axes.flatten()
+
+for i, image in enumerate(images):
+ axes[i].imshow(image)
+ axes[i].set_title(f"Image {i+1}")
+ axes[i].axis('off')
+
+plt.tight_layout()
+plt.show()
+```
+
+To generate multiple variations of one prompt, use the `num_images_per_prompt` argument.
+
+```py
+import torch
+import matplotlib.pyplot as plt
+from diffusers import DiffusionPipeline
+
+pipeline = DiffusionPipeline.from_pretrained(
+ "stabilityai/stable-diffusion-xl-base-1.0",
+ torch_dtype=torch.float16
+).to("cuda")
+
+images = pipeline(
+ prompt="pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics",
+ num_images_per_prompt=4
+).images
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 12))
+axes = axes.flatten()
+
+for i, image in enumerate(images):
+ axes[i].imshow(image)
+ axes[i].set_title(f"Image {i+1}")
+ axes[i].axis('off')
+
+plt.tight_layout()
+plt.show()
+```
+
+Combine both approaches to generate different variations of different prompts.
+
+```py
+images = pipeline(
+ prompt=prompts,
+ num_images_per_prompt=2,
+).images
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 12))
+axes = axes.flatten()
+
+for i, image in enumerate(images):
+ axes[i].imshow(image)
+ axes[i].set_title(f"Image {i+1}")
+ axes[i].axis('off')
+
+plt.tight_layout()
+plt.show()
+```
+
+
+
+
+For image-to-image, pass a list of input images and prompts to the pipeline.
+
+```py
+import torch
+from diffusers.utils import load_image
+from diffusers import DiffusionPipeline
+
+pipeline = DiffusionPipeline.from_pretrained(
+ "stabilityai/stable-diffusion-xl-base-1.0",
+ torch_dtype=torch.float16
+).to("cuda")
+
+input_images = [
+ load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png"),
+ load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"),
+ load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
+]
+
+prompts = [
+ "cinematic photo of a beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
+ "cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
+ "pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
+]
+
+images = pipeline(
+ prompt=prompts,
+ image=input_images,
+ guidance_scale=8.0,
+ strength=0.5
+).images
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 12))
+axes = axes.flatten()
+
+for i, image in enumerate(images):
+ axes[i].imshow(image)
+ axes[i].set_title(f"Image {i+1}")
+ axes[i].axis('off')
+
+plt.tight_layout()
+plt.show()
+```
+
+To generate multiple variations of one prompt, use the `num_images_per_prompt` argument.
+
+```py
+import torch
+import matplotlib.pyplot as plt
+from diffusers.utils import load_image
+from diffusers import DiffusionPipeline
+
+pipeline = DiffusionPipeline.from_pretrained(
+ "stabilityai/stable-diffusion-xl-base-1.0",
+ torch_dtype=torch.float16
+).to("cuda")
+
+input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
+
+images = pipeline(
+ prompt="pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics",
+ image=input_image,
+ num_images_per_prompt=4
+).images
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 12))
+axes = axes.flatten()
+
+for i, image in enumerate(images):
+ axes[i].imshow(image)
+ axes[i].set_title(f"Image {i+1}")
+ axes[i].axis('off')
+
+plt.tight_layout()
+plt.show()
+```
+
+Combine both approaches to generate different variations of different prompts.
+
+```py
+input_images = [
+ load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"),
+ load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
+]
+
+prompts = [
+ "cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
+ "pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
+]
+
+images = pipeline(
+ prompt=prompts,
+ image=input_images,
+ num_images_per_prompt=2,
+).images
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 12))
+axes = axes.flatten()
+
+for i, image in enumerate(images):
+ axes[i].imshow(image)
+ axes[i].set_title(f"Image {i+1}")
+ axes[i].axis('off')
+
+plt.tight_layout()
+plt.show()
+```
+
+
+
+
+## Deterministic generation
+
+Enable reproducible batch generation by passing a list of [Generator’s](https://pytorch.org/docs/stable/generated/torch.Generator.html) to the pipeline and tie each `Generator` to a seed to reuse it.
+
+Use a list comprehension to iterate over the batch size specified in `range()` to create a unique `Generator` object for each image in the batch.
+
+Don't multiply the `Generator` by the batch size because that only creates one `Generator` object that is used sequentially for each image in the batch.
+
+```py
+generator = [torch.Generator(device="cuda").manual_seed(0)] * 3
+```
+
+Pass the `generator` to the pipeline.
+
+```py
+import torch
+from diffusers import DiffusionPipeline
+
+pipeline = DiffusionPipeline.from_pretrained(
+ "stabilityai/stable-diffusion-xl-base-1.0",
+ torch_dtype=torch.float16
+).to("cuda")
+
+generator = [torch.Generator(device="cuda").manual_seed(i) for i in range(3)]
+prompts = [
+ "cinematic photo of A beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
+ "cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
+ "pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
+]
+
+images = pipeline(
+ prompt=prompts,
+ generator=generator
+).images
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 12))
+axes = axes.flatten()
+
+for i, image in enumerate(images):
+ axes[i].imshow(image)
+ axes[i].set_title(f"Image {i+1}")
+ axes[i].axis('off')
+
+plt.tight_layout()
+plt.show()
+```
+
+You can use this to iteratively select an image associated with a seed and then improve on it by crafting a more detailed prompt.
\ No newline at end of file
diff --git a/docs/source/en/using-diffusers/callback.md b/docs/source/en/using-diffusers/callback.md
index 2462fed1a3..e0fa885784 100644
--- a/docs/source/en/using-diffusers/callback.md
+++ b/docs/source/en/using-diffusers/callback.md
@@ -1,4 +1,4 @@
-
-
-# Overview
-
-The inference pipeline supports and enables a wide range of techniques that are divided into two categories:
-
-* Pipeline functionality: these techniques modify the pipeline or extend it for other applications. For example, pipeline callbacks add new features to a pipeline and a pipeline can also be extended for distributed inference.
-* Improve inference quality: these techniques increase the visual quality of the generated images. For example, you can enhance your prompts with GPT2 to create better images with lower effort.
diff --git a/docs/source/en/using-diffusers/pag.md b/docs/source/en/using-diffusers/pag.md
index 1af690f86a..46d716bcf8 100644
--- a/docs/source/en/using-diffusers/pag.md
+++ b/docs/source/en/using-diffusers/pag.md
@@ -1,4 +1,4 @@
-