mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-29 07:22:12 +03:00
add some visuals
This commit is contained in:
@@ -984,13 +984,14 @@ image = pipeline(
|
||||
image.save("modular_ipa_out.png")
|
||||
```
|
||||
|
||||
|
||||
## A more practical example
|
||||
## Building Advanced Workflows: The Modular Way
|
||||
|
||||
We've learned the basic components of the Modular Diffusers System. Now let's tie everything together with more practical example that demonstrates the true power of Modular Diffusers: working between with multiple pipelines that can share components.
|
||||
|
||||
In this example, we'll generate latents from a text-to-image pipeline, then refine them with an image-to-image pipeline. We will use IP-adapter, LoRA, and ControlNet.
|
||||
|
||||
### Base Text-to-Image
|
||||
|
||||
Let's setup the text-to-image workflow. Instead of putting all blocks into one complete pipeline, we'll create separate `text_blocks` for encoding prompts, `t2i_blocks` for generating latents, and `decoder_blocks` for creating final images.
|
||||
|
||||
|
||||
@@ -1179,6 +1180,8 @@ image.save("modular_part2_t2i.png")
|
||||
|
||||
```
|
||||
|
||||
### Lora
|
||||
|
||||
Now let's add a LoRA to our pipeline. With the modular approach we will be able to reuse intermediate outputs from blocks that otherwise needs to be re-run. Let's load the LoRA weights and see what happens:
|
||||
|
||||
```py
|
||||
@@ -1218,6 +1221,8 @@ image = decoder_node(latents=latents_lora, output="images")[0]
|
||||
image.save("modular_part2_lora.png")
|
||||
```
|
||||
|
||||
### IP-adapter
|
||||
|
||||
IP-adapter can also be used as a standalone pipeline. We can generate the embeddings once and reuse them for different workflows.
|
||||
|
||||
```py
|
||||
@@ -1247,6 +1252,8 @@ image = decoder_node(latents=latents_ipa, output="images")[0]
|
||||
image.save("modular_part2_lora_ipa.png")
|
||||
```
|
||||
|
||||
### ControlNet
|
||||
|
||||
We can create a new ControlNet workflow by modifying the pipeline blocks, reusing components as much as possible, and see how it affects the generation.
|
||||
|
||||
We want to use a different ControlNet from the one that's already loaded.
|
||||
@@ -1287,6 +1294,8 @@ refiner_blocks.sub_blocks.pop("image_encoder")
|
||||
refiner_blocks.sub_blocks.pop("decode")
|
||||
```
|
||||
|
||||
### Refiner
|
||||
|
||||
Create refiner pipeline. refiner has a different unet and use only one text_encoder so it is hosted in a different repo. We pass the same components manager to refiner pipeline, along with a unique "refiner" collection.
|
||||
|
||||
```py
|
||||
@@ -1358,3 +1367,29 @@ refined_image = decoder_node(latents=refined_latents, output="images")[0]
|
||||
refined_image.save("modular_part2_control_refine_out.png")
|
||||
```
|
||||
|
||||
|
||||
### Results
|
||||
|
||||
Here are the results from our modular pipeline examples. You can find all the generated images in the [Hugging Face dataset](https://huggingface.co/datasets/YiYiXu/testing-images/tree/main/modular_quicktour).
|
||||
|
||||
#### Base Text-to-Image Generation
|
||||
| Base Text-to-Image | Base Text-to-Image (Refined) |
|
||||
|-------------------|------------------------------|
|
||||
|  |  |
|
||||
|
||||
#### LoRA
|
||||
| LoRA | LoRA (Refined) |
|
||||
|-------------------|------------------------------|
|
||||
|  |  |
|
||||
|
||||
#### LoRA + IP-Adapter
|
||||
| LoRA + IP-Adapter | LoRA + IP-Adapter (Refined) |
|
||||
|-------------------|------------------------------|
|
||||
|  |  |
|
||||
|
||||
### ControlNet + LoRA + IP-Adapter
|
||||
| ControlNet + LoRA + IP-Adapter | ControlNet + LoRA + IP-Adapter (Refined) |
|
||||
|-------------------|------------------------------|
|
||||
|  |  |
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user