From 1a8843f93ec88585df18c895f0ec3d0914df8d10 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Thu, 3 Aug 2023 21:41:48 +0200 Subject: [PATCH] add sdxl to prompt weighting (#4439) * add sdxl to prompt weighting * Update docs/source/en/using-diffusers/weighted_prompts.md * Update docs/source/en/using-diffusers/weighted_prompts.md * add sdxl to prompt weighting * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review * Update docs/source/en/using-diffusers/weighted_prompts.md * Apply suggestions from code review * correct --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- .../en/using-diffusers/weighted_prompts.md | 65 ++++++++++++++++++- 1 file changed, 62 insertions(+), 3 deletions(-) diff --git a/docs/source/en/using-diffusers/weighted_prompts.md b/docs/source/en/using-diffusers/weighted_prompts.md index 5e6371d011..f0a705d38f 100644 --- a/docs/source/en/using-diffusers/weighted_prompts.md +++ b/docs/source/en/using-diffusers/weighted_prompts.md @@ -24,13 +24,16 @@ This is called "prompt-weighting" and has been a highly demanded feature by the ## How to do prompt-weighting in Diffusers -We believe the role of `diffusers` is to be a toolbox that provides essential features that enable other projects, such as [InvokeAI](https://github.com/invoke-ai/InvokeAI) or [diffuzers](https://github.com/abhishekkrthakur/diffuzers), to build powerful UIs. In order to support arbitrary methods to manipulate prompts, `diffusers` exposes a [`prompt_embeds`](https://huggingface.co/docs/diffusers/v0.14.0/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) function argument to many pipelines such as [`StableDiffusionPipeline`], allowing to directly pass the "prompt-weighted"/scaled text embeddings to the pipeline. +We believe the role of `diffusers` is to be a toolbox that provides essential features that enable other projects, such as [InvokeAI](https://github.com/invoke-ai/InvokeAI) or [diffuzers](https://github.com/abhishekkrthakur/diffuzers), to build powerful UIs. In order to support arbitrary methods to manipulate prompts, `diffusers` exposes a [`prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) function argument and an optional [`negative_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.negative_prompt_embeds) function argument to many pipelines such as [`StableDiffusionPipeline`], [`StableDiffusionControlNetPipeline`], [`StableDiffusionXLPipeline`], allowing to directly pass the "prompt-weighted"/scaled text embeddings to the pipeline. The [compel library](https://github.com/damian0815/compel) provides an easy way to emphasize or de-emphasize portions of the prompt for you. We strongly recommend it instead of preparing the embeddings yourself. Let's look at a simple example. Imagine you want to generate an image of `"a red cat playing with a ball"` as follows: + +### StableDiffusionPipeline + ```py from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler @@ -53,8 +56,8 @@ As you can see, there is no "ball" in the image. Let's emphasize this part! For this we should install the `compel` library: -``` -pip install compel +```py +pip install compel --upgrade ``` and then create a `Compel` object: @@ -108,3 +111,59 @@ compel = Compel( Also, please check out the documentation of the [compel](https://github.com/damian0815/compel) library for more information. + +### StableDiffusionXLPipeline + +For StableDiffusionXL we need to not only pass `prompt_embeds` (and optionally `negative_prompt_embeds`), but also [`pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.pooled_prompt_embeds) and optionally [`negative_pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.negative_pooled_prompt_embeds). +In addition, [`StableDiffusionXLPipeline`] has two tokenizers and two text encoders which both need to be used to weight the prompt. +Luckily, [`compel`](https://github.com/damian0815/compel) takes care of SDXL's special needs - all we have to do is to pass both tokenizers and text encoders to the `Compel` class. + + +```py +from compel import Compel, ReturnedEmbeddingsType +from diffusers import DiffusionPipeline + +pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", + variant="fp16", + use_safetensors=True, + torch_dtype=torch.float16 +).to("cuda") + +compel = Compel( + tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] , + text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2], + returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED, + requires_pooled=[False, True] +) +``` + +Let's try our example from above again. We use the same seed for both prompts and upweight ball by a factor of 1.5 for the first +prompt and downweight ball by 40% for the second prompt. + +```py +# upweight "ball" +prompt = ["a red cat playing with a (ball)1.5", "a red cat playing with a (ball)0.6"] +conditioning, pooled = compel(prompt) + + +# generate image +generator = [torch.Generator().manual_seed(33) for _ in range(len(prompt))] +images = pipeline(prompt_embeds=conditioning, pooled_prompt_embeds=pooled, generator=generator, num_inference_steps=30).images +``` + +Let's have a look at the result. + +
+
+ +
"a red cat playing with a (ball)1.5"
+
+
+ +
a red cat playing with a (ball)0.6
+
+
+ +We can see that the ball is almost completely gone on the right image while it's clearly visible on the left image. +For more information and more tricks you can use `compel` with, please have a look at the [compel docs](https://github.com/damian0815/compel/blob/main/doc/syntax.md) as well.