mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
add sdxl to prompt weighting (#4439)
* add sdxl to prompt weighting * Update docs/source/en/using-diffusers/weighted_prompts.md * Update docs/source/en/using-diffusers/weighted_prompts.md * add sdxl to prompt weighting * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review * Update docs/source/en/using-diffusers/weighted_prompts.md * Apply suggestions from code review * correct --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
e391b789ac
commit
1a8843f93e
@@ -24,13 +24,16 @@ This is called "prompt-weighting" and has been a highly demanded feature by the
|
||||
|
||||
## How to do prompt-weighting in Diffusers
|
||||
|
||||
We believe the role of `diffusers` is to be a toolbox that provides essential features that enable other projects, such as [InvokeAI](https://github.com/invoke-ai/InvokeAI) or [diffuzers](https://github.com/abhishekkrthakur/diffuzers), to build powerful UIs. In order to support arbitrary methods to manipulate prompts, `diffusers` exposes a [`prompt_embeds`](https://huggingface.co/docs/diffusers/v0.14.0/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) function argument to many pipelines such as [`StableDiffusionPipeline`], allowing to directly pass the "prompt-weighted"/scaled text embeddings to the pipeline.
|
||||
We believe the role of `diffusers` is to be a toolbox that provides essential features that enable other projects, such as [InvokeAI](https://github.com/invoke-ai/InvokeAI) or [diffuzers](https://github.com/abhishekkrthakur/diffuzers), to build powerful UIs. In order to support arbitrary methods to manipulate prompts, `diffusers` exposes a [`prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) function argument and an optional [`negative_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.negative_prompt_embeds) function argument to many pipelines such as [`StableDiffusionPipeline`], [`StableDiffusionControlNetPipeline`], [`StableDiffusionXLPipeline`], allowing to directly pass the "prompt-weighted"/scaled text embeddings to the pipeline.
|
||||
|
||||
The [compel library](https://github.com/damian0815/compel) provides an easy way to emphasize or de-emphasize portions of the prompt for you. We strongly recommend it instead of preparing the embeddings yourself.
|
||||
|
||||
Let's look at a simple example. Imagine you want to generate an image of `"a red cat playing with a ball"` as
|
||||
follows:
|
||||
|
||||
|
||||
### StableDiffusionPipeline
|
||||
|
||||
```py
|
||||
from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler
|
||||
|
||||
@@ -53,8 +56,8 @@ As you can see, there is no "ball" in the image. Let's emphasize this part!
|
||||
|
||||
For this we should install the `compel` library:
|
||||
|
||||
```
|
||||
pip install compel
|
||||
```py
|
||||
pip install compel --upgrade
|
||||
```
|
||||
|
||||
and then create a `Compel` object:
|
||||
@@ -108,3 +111,59 @@ compel = Compel(
|
||||
|
||||
Also, please check out the documentation of the [compel](https://github.com/damian0815/compel) library for
|
||||
more information.
|
||||
|
||||
### StableDiffusionXLPipeline
|
||||
|
||||
For StableDiffusionXL we need to not only pass `prompt_embeds` (and optionally `negative_prompt_embeds`), but also [`pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.pooled_prompt_embeds) and optionally [`negative_pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.negative_pooled_prompt_embeds).
|
||||
In addition, [`StableDiffusionXLPipeline`] has two tokenizers and two text encoders which both need to be used to weight the prompt.
|
||||
Luckily, [`compel`](https://github.com/damian0815/compel) takes care of SDXL's special needs - all we have to do is to pass both tokenizers and text encoders to the `Compel` class.
|
||||
|
||||
|
||||
```py
|
||||
from compel import Compel, ReturnedEmbeddingsType
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
pipeline = DiffusionPipeline.from_pretrained(
|
||||
"stabilityai/stable-diffusion-xl-base-1.0",
|
||||
variant="fp16",
|
||||
use_safetensors=True,
|
||||
torch_dtype=torch.float16
|
||||
).to("cuda")
|
||||
|
||||
compel = Compel(
|
||||
tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] ,
|
||||
text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
|
||||
returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
|
||||
requires_pooled=[False, True]
|
||||
)
|
||||
```
|
||||
|
||||
Let's try our example from above again. We use the same seed for both prompts and upweight ball by a factor of 1.5 for the first
|
||||
prompt and downweight ball by 40% for the second prompt.
|
||||
|
||||
```py
|
||||
# upweight "ball"
|
||||
prompt = ["a red cat playing with a (ball)1.5", "a red cat playing with a (ball)0.6"]
|
||||
conditioning, pooled = compel(prompt)
|
||||
|
||||
|
||||
# generate image
|
||||
generator = [torch.Generator().manual_seed(33) for _ in range(len(prompt))]
|
||||
images = pipeline(prompt_embeds=conditioning, pooled_prompt_embeds=pooled, generator=generator, num_inference_steps=30).images
|
||||
```
|
||||
|
||||
Let's have a look at the result.
|
||||
|
||||
<div class="flex gap-4">
|
||||
<div>
|
||||
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/sdxl_ball1.png"/>
|
||||
<figcaption class="mt-2 text-center text-sm text-gray-500">"a red cat playing with a (ball)1.5"</figcaption>
|
||||
</div>
|
||||
<div>
|
||||
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/sdxl_ball2.png"/>
|
||||
<figcaption class="mt-2 text-center text-sm text-gray-500">a red cat playing with a (ball)0.6</figcaption>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
We can see that the ball is almost completely gone on the right image while it's clearly visible on the left image.
|
||||
For more information and more tricks you can use `compel` with, please have a look at the [compel docs](https://github.com/damian0815/compel/blob/main/doc/syntax.md) as well.
|
||||
|
||||
Reference in New Issue
Block a user