From 3dd4168d4c96c429d2b74c2baaee0678c57578da Mon Sep 17 00:00:00 2001 From: Steven Liu <59462357+stevhliu@users.noreply.github.com> Date: Sun, 25 Feb 2024 09:38:02 -0800 Subject: [PATCH] [docs] Minor updates (#7063) * updates * feedback --- docs/source/en/api/attnprocessor.md | 6 -- docs/source/en/optimization/fp16.md | 6 ++ docs/source/en/optimization/torch2.0.md | 3 + docs/source/en/training/lora.md | 58 ++++++++++++------- .../stable_diffusion_jax_how_to.md | 6 ++ 5 files changed, 51 insertions(+), 28 deletions(-) diff --git a/docs/source/en/api/attnprocessor.md b/docs/source/en/api/attnprocessor.md index 3c0ee0563f..ab89d4d260 100644 --- a/docs/source/en/api/attnprocessor.md +++ b/docs/source/en/api/attnprocessor.md @@ -41,12 +41,6 @@ An attention processor is a class for applying different types of attention mech ## FusedAttnProcessor2_0 [[autodoc]] models.attention_processor.FusedAttnProcessor2_0 -## LoRAAttnProcessor -[[autodoc]] models.attention_processor.LoRAAttnProcessor - -## LoRAAttnProcessor2_0 -[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0 - ## LoRAAttnAddedKVProcessor [[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor diff --git a/docs/source/en/optimization/fp16.md b/docs/source/en/optimization/fp16.md index 72e881e4f0..7a2cf93498 100644 --- a/docs/source/en/optimization/fp16.md +++ b/docs/source/en/optimization/fp16.md @@ -66,3 +66,9 @@ image = pipe(prompt).images[0] Don't use [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) in any of the pipelines as it can lead to black images and is always slower than pure float16 precision. + +## Distilled model + +You could also use a distilled Stable Diffusion model and autoencoder to speed up inference. During distillation, many of the UNet's residual and attention blocks are shed to reduce the model size. The distilled model is faster and uses less memory while generating images of comparable quality to the full Stable Diffusion model. + +Learn more about in the [Distilled Stable Diffusion inference](../using-diffusers/distilled_sd) guide! diff --git a/docs/source/en/optimization/torch2.0.md b/docs/source/en/optimization/torch2.0.md index 9c6febb06f..2475bb525d 100644 --- a/docs/source/en/optimization/torch2.0.md +++ b/docs/source/en/optimization/torch2.0.md @@ -75,6 +75,9 @@ Compilation requires some time to complete, so it is best suited for situations For more information and different options about `torch.compile`, refer to the [`torch_compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) tutorial. +> [!TIP] +> Learn more about other ways PyTorch 2.0 can help optimize your model in the [Accelerate inference of text-to-image diffusion models](../tutorials/fast_diffusion) tutorial. + ## Benchmark We conducted a comprehensive benchmark with PyTorch 2.0's efficient attention implementation and `torch.compile` across different GPUs and batch sizes for five of our most used pipelines. The code is benchmarked on 🤗 Diffusers v0.17.0.dev0 to optimize `torch.compile` usage (see [here](https://github.com/huggingface/diffusers/pull/3313) for more details). diff --git a/docs/source/en/training/lora.md b/docs/source/en/training/lora.md index b4f9e14bbc..9e87c224f4 100644 --- a/docs/source/en/training/lora.md +++ b/docs/source/en/training/lora.md @@ -113,36 +113,50 @@ The dataset preprocessing code and training loop are found in the [`main()`](htt As with the script parameters, a walkthrough of the training script is provided in the [Text-to-image](text2image#training-script) training guide. Instead, this guide takes a look at the LoRA relevant parts of the script. -The script begins by adding the [new LoRA weights](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L447) to the attention layers. This involves correctly configuring the weight size for each block in the UNet. You'll see the `rank` parameter is used to create the [`~models.attention_processor.LoRAAttnProcessor`]: + + + +Diffusers uses [`~peft.LoraConfig`] from the [PEFT](https://hf.co/docs/peft) library to set up the parameters of the LoRA adapter such as the rank, alpha, and which modules to insert the LoRA weights into. The adapter is added to the UNet, and only the LoRA layers are filtered for optimization in `lora_layers`. ```py -lora_attn_procs = {} -for name in unet.attn_processors.keys(): - cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim - if name.startswith("mid_block"): - hidden_size = unet.config.block_out_channels[-1] - elif name.startswith("up_blocks"): - block_id = int(name[len("up_blocks.")]) - hidden_size = list(reversed(unet.config.block_out_channels))[block_id] - elif name.startswith("down_blocks"): - block_id = int(name[len("down_blocks.")]) - hidden_size = unet.config.block_out_channels[block_id] +unet_lora_config = LoraConfig( + r=args.rank, + lora_alpha=args.rank, + init_lora_weights="gaussian", + target_modules=["to_k", "to_q", "to_v", "to_out.0"], +) - lora_attn_procs[name] = LoRAAttnProcessor( - hidden_size=hidden_size, - cross_attention_dim=cross_attention_dim, - rank=args.rank, - ) - -unet.set_attn_processor(lora_attn_procs) -lora_layers = AttnProcsLayers(unet.attn_processors) +unet.add_adapter(unet_lora_config) +lora_layers = filter(lambda p: p.requires_grad, unet.parameters()) ``` -The [optimizer](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L519) is initialized with the `lora_layers` because these are the only weights that'll be optimized: + + + +Diffusers also supports finetuning the text encoder with LoRA from the [PEFT](https://hf.co/docs/peft) library when necessary such as finetuning Stable Diffusion XL (SDXL). The [`~peft.LoraConfig`] is used to configure the parameters of the LoRA adapter which are then added to the text encoder, and only the LoRA layers are filtered for training. + +```py +text_lora_config = LoraConfig( + r=args.rank, + lora_alpha=args.rank, + init_lora_weights="gaussian", + target_modules=["q_proj", "k_proj", "v_proj", "out_proj"], +) + +text_encoder_one.add_adapter(text_lora_config) +text_encoder_two.add_adapter(text_lora_config) +text_lora_parameters_one = list(filter(lambda p: p.requires_grad, text_encoder_one.parameters())) +text_lora_parameters_two = list(filter(lambda p: p.requires_grad, text_encoder_two.parameters())) +``` + + + + +The [optimizer](https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/examples/text_to_image/train_text_to_image_lora.py#L529) is initialized with the `lora_layers` because these are the only weights that'll be optimized: ```py optimizer = optimizer_cls( - lora_layers.parameters(), + lora_layers, lr=args.learning_rate, betas=(args.adam_beta1, args.adam_beta2), weight_decay=args.adam_weight_decay, diff --git a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md index 2de2af96df..5b2c68853d 100644 --- a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md +++ b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md @@ -217,3 +217,9 @@ Check your image dimensions to see if they're correct: images.shape # (8, 1, 512, 512, 3) ``` + +## Resources + +To learn more about how JAX works with Stable Diffusion, you may be interested in reading: + +* [Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e](https://hf.co/blog/sdxl_jax)