diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 59d3915595..16acc87dde 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -245,7 +245,7 @@ The official training examples are maintained by the Diffusers' core maintainers This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models. If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author. -Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the +Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the training examples, it is required to clone the repository: ```bash @@ -255,7 +255,8 @@ git clone https://github.com/huggingface/diffusers as well as to install all additional dependencies required for training: ```bash -pip install -r /examples//requirements.txt +cd diffusers +pip install -r examples//requirements.txt ``` Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt). @@ -502,4 +503,4 @@ $ git push --set-upstream origin your-branch-for-syncing ### Style guide -For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html). +For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html). \ No newline at end of file diff --git a/docs/source/en/conceptual/contribution.md b/docs/source/en/conceptual/contribution.md index cc2e0ae07b..b4d33cb5e3 100644 --- a/docs/source/en/conceptual/contribution.md +++ b/docs/source/en/conceptual/contribution.md @@ -22,14 +22,13 @@ We enormously value feedback from the community, so please do not be afraid to s ## Overview -You can contribute in many ways ranging from answering questions on issues to adding new diffusion models to -the core library. +You can contribute in many ways ranging from answering questions on issues and discussions to adding new diffusion models to the core library. In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community. * 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR). -* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose). -* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues). +* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose) or new discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions/new/choose). +* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues) or discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions). * 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). * 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source). * 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples). @@ -63,7 +62,7 @@ In the same spirit, you are of immense help to the community by answering such q **Please** keep in mind that the more effort you put into asking or answering a question, the higher the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database. -In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section. +In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formatted/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section. **NOTE about channels**: [*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago. @@ -99,7 +98,7 @@ This means in more detail: - Format your code. - Do not include any external libraries except for Diffusers depending on them. - **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue. -- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, she cannot solve it. +- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, (s)he cannot solve it. - **Always** make sure the reader can reproduce your issue with as little effort as possible. If your code snippet cannot be run because of missing libraries or undefined variables, the reader cannot help you. Make sure your reproducible code snippet is as minimal as possible and can be copy-pasted into a simple Python shell. - If in order to reproduce your issue a model and/or dataset is required, make sure the reader has access to that model or dataset. You can always upload your model or dataset to the [Hub](https://huggingface.co) to make it easily downloadable. Try to keep your model and dataset as small as possible, to make the reproduction of your issue as effortless as possible. @@ -288,7 +287,7 @@ The official training examples are maintained by the Diffusers' core maintainers This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models. If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author. -Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the +Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the training examples, it is required to clone the repository: ```bash @@ -298,7 +297,8 @@ git clone https://github.com/huggingface/diffusers as well as to install all additional dependencies required for training: ```bash -pip install -r /examples//requirements.txt +cd diffusers +pip install -r examples//requirements.txt ``` Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt). @@ -316,7 +316,7 @@ Once an example script works, please make sure to add a comprehensive `README.md - A link to some training results (logs, models, etc.) that show what the user can expect as shown [here](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5). - If you are adding a non-official/research training example, **please don't forget** to add a sentence that you are maintaining this training example which includes your git handle as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/intel_opts#diffusers-examples-with-intel-optimizations). -If you are contributing to the official training examples, please also make sure to add a test to [examples/test_examples.py](https://github.com/huggingface/diffusers/blob/main/examples/test_examples.py). This is not necessary for non-official training examples. +If you are contributing to the official training examples, please also make sure to add a test to its folder such as [examples/dreambooth/test_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/test_dreambooth.py). This is not necessary for non-official training examples. ### 8. Fixing a "Good second issue" @@ -418,7 +418,7 @@ You will need basic `git` proficiency to be able to contribute to manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro Git](https://git-scm.com/book/en/v2) is a very good reference. -Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L244)): +Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/83bc6c94eaeb6f7704a2a428931cf2d9ad973ae9/setup.py#L270)): 1. Fork the [repository](https://github.com/huggingface/diffusers) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code @@ -565,4 +565,4 @@ $ git push --set-upstream origin your-branch-for-syncing ### Style guide -For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html). +For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html). \ No newline at end of file diff --git a/examples/advanced_diffusion_training/README.md b/examples/advanced_diffusion_training/README.md index a13ae719cf..7f0e82173c 100644 --- a/examples/advanced_diffusion_training/README.md +++ b/examples/advanced_diffusion_training/README.md @@ -11,16 +11,16 @@ In a nutshell, LoRA allows to adapt pretrained models by adding pairs of rank-de - Previous pretrained weights are kept frozen so that the model is not prone to [catastrophic forgetting](https://www.pnas.org/doi/10.1073/pnas.1611835114) - Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable. - LoRA attention layers allow to control to which extent the model is adapted towards new training images via a `scale` parameter. -[cloneofsimo](https://github.com/cloneofsimo) was the first to try out LoRA training for Stable Diffusion in +[cloneofsimo](https://github.com/cloneofsimo) was the first to try out LoRA training for Stable Diffusion in the popular [lora](https://github.com/cloneofsimo/lora) GitHub repository. -The `train_dreambooth_lora_sdxl_advanced.py` script shows how to implement dreambooth-LoRA, combining the training process shown in `train_dreambooth_lora_sdxl.py`, with -advanced features and techniques, inspired and built upon contributions by [Nataniel Ruiz](https://twitter.com/natanielruizg): [Dreambooth](https://dreambooth.github.io), [Rinon Gal](https://twitter.com/RinonGal): [Textual Inversion](https://textual-inversion.github.io), [Ron Mokady](https://twitter.com/MokadyRon): [Pivotal Tuning](https://arxiv.org/abs/2106.05744), [Simo Ryu](https://twitter.com/cloneofsimo): [cog-sdxl](https://github.com/replicate/cog-sdxl), +The `train_dreambooth_lora_sdxl_advanced.py` script shows how to implement dreambooth-LoRA, combining the training process shown in `train_dreambooth_lora_sdxl.py`, with +advanced features and techniques, inspired and built upon contributions by [Nataniel Ruiz](https://twitter.com/natanielruizg): [Dreambooth](https://dreambooth.github.io), [Rinon Gal](https://twitter.com/RinonGal): [Textual Inversion](https://textual-inversion.github.io), [Ron Mokady](https://twitter.com/MokadyRon): [Pivotal Tuning](https://arxiv.org/abs/2106.05744), [Simo Ryu](https://twitter.com/cloneofsimo): [cog-sdxl](https://github.com/replicate/cog-sdxl), [Kohya](https://twitter.com/kohya_tech/): [sd-scripts](https://github.com/kohya-ss/sd-scripts), [The Last Ben](https://twitter.com/__TheBen): [fast-stable-diffusion](https://github.com/TheLastBen/fast-stable-diffusion) ❤️ > [!NOTE] -> 💡If this is your first time training a Dreambooth LoRA, congrats!🥳 -> You might want to familiarize yourself more with the techniques: [Dreambooth blog](https://huggingface.co/blog/dreambooth), [Using LoRA for Efficient Stable Diffusion Fine-Tuning blog](https://huggingface.co/blog/lora) +> 💡If this is your first time training a Dreambooth LoRA, congrats!🥳 +> You might want to familiarize yourself more with the techniques: [Dreambooth blog](https://huggingface.co/blog/dreambooth), [Using LoRA for Efficient Stable Diffusion Fine-Tuning blog](https://huggingface.co/blog/lora) 📚 Read more about the advanced features and best practices in this community derived blog post: [LoRA training scripts of the world, unite!](https://huggingface.co/blog/sdxl_lora_advanced_script) @@ -64,19 +64,19 @@ from accelerate.utils import write_basic_config write_basic_config() ``` -When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. +When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. Note also that we use PEFT library as backend for LoRA training, make sure to have `peft>=0.6.0` installed in your environment. ### Pivotal Tuning **Training with text encoder(s)** -Alongside the UNet, LoRA fine-tuning of the text encoders is also supported. In addition to the text encoder optimization +Alongside the UNet, LoRA fine-tuning of the text encoders is also supported. In addition to the text encoder optimization available with `train_dreambooth_lora_sdxl_advanced.py`, in the advanced script **pivotal tuning** is also supported. -[pivotal tuning](https://huggingface.co/blog/sdxl_lora_advanced_script#pivotal-tuning) combines Textual Inversion with regular diffusion fine-tuning - -we insert new tokens into the text encoders of the model, instead of reusing existing ones. -We then optimize the newly-inserted token embeddings to represent the new concept. +[pivotal tuning](https://huggingface.co/blog/sdxl_lora_advanced_script#pivotal-tuning) combines Textual Inversion with regular diffusion fine-tuning - +we insert new tokens into the text encoders of the model, instead of reusing existing ones. +We then optimize the newly-inserted token embeddings to represent the new concept. -To do so, just specify `--train_text_encoder_ti` while launching training (for regular text encoder optimizations, use `--train_text_encoder`). +To do so, just specify `--train_text_encoder_ti` while launching training (for regular text encoder optimizations, use `--train_text_encoder`). Please keep the following points in mind: * SDXL has two text encoders. So, we fine-tune both using LoRA. @@ -101,7 +101,7 @@ snapshot_download( Let's review some of the advanced features we're going to be using for this example: - **custom captions**: -To use custom captioning, first ensure that you have the datasets library installed, otherwise you can install it by +To use custom captioning, first ensure that you have the datasets library installed, otherwise you can install it by ```bash pip install datasets ``` @@ -113,11 +113,11 @@ Now we'll simply specify the name of the dataset and caption column (in this cas --caption_column=prompt ``` -You can also load a dataset straight from by specifying it's name in `dataset_name`. -Look [here](https://huggingface.co/blog/sdxl_lora_advanced_script#custom-captioning) for more info on creating/loadin your own caption dataset. +You can also load a dataset straight from by specifying it's name in `dataset_name`. +Look [here](https://huggingface.co/blog/sdxl_lora_advanced_script#custom-captioning) for more info on creating/loading your own caption dataset. - **optimizer**: for this example, we'll use [prodigy](https://huggingface.co/blog/sdxl_lora_advanced_script#adaptive-optimizers) - an adaptive optimizer -- **pivotal tuning** +- **pivotal tuning** - **min SNR gamma** **Now, we can launch training:** @@ -161,7 +161,7 @@ accelerate launch train_dreambooth_lora_sdxl_advanced.py \ To better track our training experiments, we're using the following flags in the command above: * `report_to="wandb` will ensure the training runs are tracked on Weights and Biases. To use it, be sure to install `wandb` with `pip install wandb`. -* `validation_prompt` and `validation_epochs` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected. +* `validation_prompt` and `validation_epochs` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected. Our experiments were conducted on a single 40GB A100 GPU. @@ -204,11 +204,11 @@ pipe.load_textual_inversion(state_dict["clip_l"], token=["", ""], text_e pipe.load_textual_inversion(state_dict["clip_g"], token=["", ""], text_encoder=pipe.text_encoder_2, tokenizer=pipe.tokenizer_2) ``` -3. let's generate images +3. let's generate images ```python instance_token = "" -prompt = f"a {instance_token} icon of an orange llama eating ramen, in the style of {instance_token}" +prompt = f"a {instance_token} icon of an orange llama eating ramen, in the style of {instance_token}" image = pipe(prompt=prompt, num_inference_steps=25, cross_attention_kwargs={"scale": 1.0}).images[0] image.save("llama.png") @@ -218,37 +218,37 @@ image.save("llama.png") The new script fully supports textual inversion loading with Comfy UI and AUTOMATIC1111 formats! **AUTOMATIC1111 / SD.Next** \ -In AUTOMATIC1111/SD.Next we will load a LoRA and a textual embedding at the same time. -- *LoRA*: Besides the diffusers format, the script will also train a WebUI compatible LoRA. It is generated as `{your_lora_name}.safetensors`. You can then include it in your `models/Lora` directory. -- *Embedding*: the embedding is the same for diffusers and WebUI. You can download your `{lora_name}_emb.safetensors` file from a trained model, and include it in your `embeddings` directory. +In AUTOMATIC1111/SD.Next we will load a LoRA and a textual embedding at the same time. +- *LoRA*: Besides the diffusers format, the script will also train a WebUI compatible LoRA. It is generated as `{your_lora_name}.safetensors`. You can then include it in your `models/Lora` directory. +- *Embedding*: the embedding is the same for diffusers and WebUI. You can download your `{lora_name}_emb.safetensors` file from a trained model, and include it in your `embeddings` directory. -You can then run inference by prompting `a y2k_emb webpage about the movie Mean Girls `. You can use the `y2k_emb` token normally, including increasing its weight by doing `(y2k_emb:1.2)`. +You can then run inference by prompting `a y2k_emb webpage about the movie Mean Girls `. You can use the `y2k_emb` token normally, including increasing its weight by doing `(y2k_emb:1.2)`. **ComfyUI** \ -In ComfyUI we will load a LoRA and a textual embedding at the same time. +In ComfyUI we will load a LoRA and a textual embedding at the same time. - *LoRA*: Besides the diffusers format, the script will also train a ComfyUI compatible LoRA. It is generated as `{your_lora_name}.safetensors`. You can then include it in your `models/Lora` directory. Then you will load the LoRALoader node and hook that up with your model and CLIP. [Official guide for loading LoRAs](https://comfyanonymous.github.io/ComfyUI_examples/lora/) -- *Embedding*: the embedding is the same for diffusers and WebUI. You can download your `{lora_name}_emb.safetensors` file from a trained model, and include it in your `models/embeddings` directory and use it in your prompts like `embedding:y2k_emb`. [Official guide for loading embeddings](https://comfyanonymous.github.io/ComfyUI_examples/textual_inversion_embeddings/). -- +- *Embedding*: the embedding is the same for diffusers and WebUI. You can download your `{lora_name}_emb.safetensors` file from a trained model, and include it in your `models/embeddings` directory and use it in your prompts like `embedding:y2k_emb`. [Official guide for loading embeddings](https://comfyanonymous.github.io/ComfyUI_examples/textual_inversion_embeddings/). +- ### Specifying a better VAE SDXL's VAE is known to suffer from numerical instability issues. This is why we also expose a CLI argument namely `--pretrained_vae_model_name_or_path` that lets you specify the location of a better VAE (such as [this one](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)). -### DoRA training +### DoRA training The advanced script supports DoRA training too! -> Proposed in [DoRA: Weight-Decomposed Low-Rank Adaptation](https://arxiv.org/abs/2402.09353), -**DoRA** is very similar to LoRA, except it decomposes the pre-trained weight into two components, **magnitude** and **direction** and employs LoRA for _directional_ updates to efficiently minimize the number of trainable parameters. -The authors found that by using DoRA, both the learning capacity and training stability of LoRA are enhanced without any additional overhead during inference. +> Proposed in [DoRA: Weight-Decomposed Low-Rank Adaptation](https://arxiv.org/abs/2402.09353), +**DoRA** is very similar to LoRA, except it decomposes the pre-trained weight into two components, **magnitude** and **direction** and employs LoRA for _directional_ updates to efficiently minimize the number of trainable parameters. +The authors found that by using DoRA, both the learning capacity and training stability of LoRA are enhanced without any additional overhead during inference. > [!NOTE] -> 💡DoRA training is still _experimental_ +> 💡DoRA training is still _experimental_ > and is likely to require different hyperparameter values to perform best compared to a LoRA. -> Specifically, we've noticed 2 differences to take into account your training: +> Specifically, we've noticed 2 differences to take into account your training: > 1. **LoRA seem to converge faster than DoRA** (so a set of parameters that may lead to overfitting when training a LoRA may be working well for a DoRA) -> 2. **DoRA quality superior to LoRA especially in lower ranks** the difference in quality of DoRA of rank 8 and LoRA of rank 8 appears to be more significant than when training ranks of 32 or 64 for example. -> This is also aligned with some of the quantitative analysis shown in the paper. +> 2. **DoRA quality superior to LoRA especially in lower ranks** the difference in quality of DoRA of rank 8 and LoRA of rank 8 appears to be more significant than when training ranks of 32 or 64 for example. +> This is also aligned with some of the quantitative analysis shown in the paper. **Usage** -1. To use DoRA you need to install `peft` from main: +1. To use DoRA you need to install `peft` from main: ```bash pip install git+https://github.com/huggingface/peft.git ``` @@ -256,12 +256,12 @@ pip install git+https://github.com/huggingface/peft.git ```bash --use_dora ``` -**Inference** +**Inference** The inference is the same as if you train a regular LoRA 🤗 ## Conducting EDM-style training -It's now possible to perform EDM-style training as proposed in [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364). +It's now possible to perform EDM-style training as proposed in [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364). simply set: @@ -304,12 +304,12 @@ accelerate launch train_dreambooth_lora_sdxl_advanced.py \ > [!CAUTION] > Min-SNR gamma is not supported with the EDM-style training yet. When training with the PlaygroundAI model, it's recommended to not pass any "variant". -### B-LoRA training +### B-LoRA training The advanced script now supports B-LoRA training too! -> Proposed in [Implicit Style-Content Separation using B-LoRA](https://arxiv.org/abs/2403.14572), +> Proposed in [Implicit Style-Content Separation using B-LoRA](https://arxiv.org/abs/2403.14572), B-LoRA is a method that leverages LoRA to implicitly separate the style and content components of a **single** image. -It was shown that learning the LoRA weights of two specific blocks (referred to as B-LoRAs) -achieves style-content separation that cannot be achieved by training each B-LoRA independently. +It was shown that learning the LoRA weights of two specific blocks (referred to as B-LoRAs) +achieves style-content separation that cannot be achieved by training each B-LoRA independently. Once trained, the two B-LoRAs can be used as independent components to allow various image stylization tasks **Usage** @@ -336,7 +336,7 @@ You can train a B-LoRA with as little as 1 image, and 1000 steps. Try this defau --gradient_checkpointing \ --mixed_precision="fp16" ``` -**Inference** +**Inference** The inference is a bit different: 1. we need load *specific* unet layers (as opposed to a regular LoRA/DoRA) 2. the trained layers we load, changes based on our objective (e.g. style/content) @@ -354,8 +354,8 @@ def is_belong_to_blocks(key, blocks): return False except Exception as e: raise type(e)(f'failed to is_belong_to_block, due to: {e}') - -def lora_lora_unet_blocks(lora_path, alpha, target_blocks): + +def lora_lora_unet_blocks(lora_path, alpha, target_blocks): state_dict, _ = pipeline.lora_state_dict(lora_path) filtered_state_dict = {k: v * alpha for k, v in state_dict.items() if is_belong_to_blocks(k, target_blocks)} return filtered_state_dict @@ -367,7 +367,7 @@ pipeline = StableDiffusionXLPipeline.from_pretrained( torch_dtype=torch.float16, ).to("cuda") -# pick a blora for content/style (you can also set one to None) +# pick a blora for content/style (you can also set one to None) content_B_lora_path = "lora-library/B-LoRA-teddybear" style_B_lora_path= "lora-library/B-LoRA-pen_sketch" @@ -384,28 +384,28 @@ prompt = "a [v18] in [v30] style" pipeline(prompt, num_images_per_prompt=4).images ``` ### LoRA training of Targeted U-net Blocks -The advanced script now supports custom choice of U-net blocks to train during Dreambooth LoRA tuning. +The advanced script now supports custom choice of U-net blocks to train during Dreambooth LoRA tuning. > [!NOTE] > This feature is still experimental -> Recently, works like B-LoRA showed the potential advantages of learning the LoRA weights of specific U-net blocks, not only in speed & memory, -> but also in reducing the amount of needed data, improving style manipulation and overcoming overfitting issues. -> In light of this, we're introducing a new feature to the advanced script to allow for configurable U-net learned blocks. +> Recently, works like B-LoRA showed the potential advantages of learning the LoRA weights of specific U-net blocks, not only in speed & memory, +> but also in reducing the amount of needed data, improving style manipulation and overcoming overfitting issues. +> In light of this, we're introducing a new feature to the advanced script to allow for configurable U-net learned blocks. **Usage** -Configure LoRA learned U-net blocks adding a `lora_unet_blocks` flag, with a comma seperated string specifying the targeted blocks. +Configure LoRA learned U-net blocks adding a `lora_unet_blocks` flag, with a comma separated string specifying the targeted blocks. e.g: ```bash --lora_unet_blocks="unet.up_blocks.0.attentions.0,unet.up_blocks.0.attentions.1" ``` > [!NOTE] -> if you specify both `--use_blora` and `--lora_unet_blocks`, values given in --lora_unet_blocks will be ignored. -> When enabling --use_blora, targeted U-net blocks are automatically set to be "unet.up_blocks.0.attentions.0,unet.up_blocks.0.attentions.1" as discussed in the paper. +> if you specify both `--use_blora` and `--lora_unet_blocks`, values given in --lora_unet_blocks will be ignored. +> When enabling --use_blora, targeted U-net blocks are automatically set to be "unet.up_blocks.0.attentions.0,unet.up_blocks.0.attentions.1" as discussed in the paper. > If you wish to experiment with different blocks, specify `--lora_unet_blocks` only. -**Inference** -Inference is the same as for B-LoRAs, except the input targeted blocks should be modified based on your training configuration. +**Inference** +Inference is the same as for B-LoRAs, except the input targeted blocks should be modified based on your training configuration. ```python import torch from diffusers import StableDiffusionXLPipeline, AutoencoderKL @@ -419,8 +419,8 @@ def is_belong_to_blocks(key, blocks): return False except Exception as e: raise type(e)(f'failed to is_belong_to_block, due to: {e}') - -def lora_lora_unet_blocks(lora_path, alpha, target_blocks): + +def lora_lora_unet_blocks(lora_path, alpha, target_blocks): state_dict, _ = pipeline.lora_state_dict(lora_path) filtered_state_dict = {k: v * alpha for k, v in state_dict.items() if is_belong_to_blocks(k, target_blocks)} return filtered_state_dict @@ -436,7 +436,7 @@ lora_path = "lora-library/B-LoRA-pen_sketch" state_dict = lora_lora_unet_blocks(content_B_lora_path,alpha=1,target_blocks=["unet.up_blocks.0.attentions.0"]) -# Load traine dlora layers into the unet +# Load trained lora layers into the unet pipeline.load_lora_into_unet(state_dict, None, pipeline.unet) #generate diff --git a/examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py b/examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py index 1cab12ac5d..4d442b6233 100644 --- a/examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py +++ b/examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py @@ -326,7 +326,7 @@ def parse_args(input_args=None): type=str, default="TOK", help="identifier specifying the instance(or instances) as used in instance_prompt, validation prompt, " - "captions - e.g. TOK. To use multiple identifiers, please specify them in a comma seperated string - e.g. " + "captions - e.g. TOK. To use multiple identifiers, please specify them in a comma separated string - e.g. " "'TOK,TOK2,TOK3' etc.", ) @@ -559,7 +559,7 @@ def parse_args(input_args=None): "--prodigy_beta3", type=float, default=None, - help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " + help="coefficients for computing the Prodigy stepsize using running averages. If set to None, " "uses the value of square root of beta2. Ignored if optimizer is adamW", ) parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") @@ -736,7 +736,7 @@ class TokenEmbeddingsHandler: # random initialization of new tokens std_token_embedding = text_encoder.text_model.embeddings.token_embedding.weight.data.std() - print(f"{idx} text encodedr's std_token_embedding: {std_token_embedding}") + print(f"{idx} text encoder's std_token_embedding: {std_token_embedding}") text_encoder.text_model.embeddings.token_embedding.weight.data[self.train_ids] = ( torch.randn(len(self.train_ids), text_encoder.text_model.config.hidden_size) @@ -948,7 +948,7 @@ class DreamBoothDataset(Dataset): else: example["instance_prompt"] = self.instance_prompt - else: # costum prompts were provided, but length does not match size of image dataset + else: # custom prompts were provided, but length does not match size of image dataset example["instance_prompt"] = self.instance_prompt if self.class_data_root: @@ -1967,7 +1967,7 @@ def main(args): } ) - # Conver to WebUI format + # Convert to WebUI format lora_state_dict = load_file(f"{args.output_dir}/pytorch_lora_weights.safetensors") peft_state_dict = convert_all_state_dict_to_peft(lora_state_dict) kohya_state_dict = convert_state_dict_to_kohya(peft_state_dict) diff --git a/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py b/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py index ca311128e0..64fd0a6986 100644 --- a/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py +++ b/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py @@ -348,7 +348,7 @@ def parse_args(input_args=None): type=str, default="TOK", help="identifier specifying the instance(or instances) as used in instance_prompt, validation prompt, " - "captions - e.g. TOK. To use multiple identifiers, please specify them in a comma seperated string - e.g. " + "captions - e.g. TOK. To use multiple identifiers, please specify them in a comma separated string - e.g. " "'TOK,TOK2,TOK3' etc.", ) @@ -591,7 +591,7 @@ def parse_args(input_args=None): "--prodigy_beta3", type=float, default=None, - help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " + help="coefficients for computing the Prodigy stepsize using running averages. If set to None, " "uses the value of square root of beta2. Ignored if optimizer is adamW", ) parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") @@ -824,7 +824,7 @@ class TokenEmbeddingsHandler: # random initialization of new tokens std_token_embedding = text_encoder.text_model.embeddings.token_embedding.weight.data.std() - print(f"{idx} text encodedr's std_token_embedding: {std_token_embedding}") + print(f"{idx} text encoder's std_token_embedding: {std_token_embedding}") text_encoder.text_model.embeddings.token_embedding.weight.data[self.train_ids] = ( torch.randn(len(self.train_ids), text_encoder.text_model.config.hidden_size) @@ -1097,7 +1097,7 @@ class DreamBoothDataset(Dataset): else: example["instance_prompt"] = self.instance_prompt - else: # costum prompts were provided, but length does not match size of image dataset + else: # custom prompts were provided, but length does not match size of image dataset example["instance_prompt"] = self.instance_prompt if self.class_data_root: @@ -1794,7 +1794,7 @@ def main(args): if args.with_prior_preservation: prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) unet_add_text_embeds = torch.cat([unet_add_text_embeds, class_pooled_prompt_embeds], dim=0) - # if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the + # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # batch prompts on all training steps else: tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt, add_special_tokens) @@ -2411,7 +2411,7 @@ def main(args): } ) - # Conver to WebUI format + # Convert to WebUI format lora_state_dict = load_file(f"{args.output_dir}/pytorch_lora_weights.safetensors") peft_state_dict = convert_all_state_dict_to_peft(lora_state_dict) kohya_state_dict = convert_state_dict_to_kohya(peft_state_dict) diff --git a/examples/community/README.md b/examples/community/README.md index 600761aae7..012ea535ea 100755 --- a/examples/community/README.md +++ b/examples/community/README.md @@ -3595,7 +3595,7 @@ This pipeline provides drag-and-drop image editing using stochastic differential ![SDE Drag Image](https://github.com/huggingface/diffusers/assets/75928535/bd54f52f-f002-4951-9934-b2a4592771a5) -See [paper](https://arxiv.org/abs/2311.01410), [paper page](https://ml-gsai.github.io/SDE-Drag-demo/), [original repo](https://github.com/ML-GSAI/SDE-Drag) for more infomation. +See [paper](https://arxiv.org/abs/2311.01410), [paper page](https://ml-gsai.github.io/SDE-Drag-demo/), [original repo](https://github.com/ML-GSAI/SDE-Drag) for more information. ```py import PIL diff --git a/examples/community/pipeline_demofusion_sdxl.py b/examples/community/pipeline_demofusion_sdxl.py index 34b69ddb98..e02682dff5 100644 --- a/examples/community/pipeline_demofusion_sdxl.py +++ b/examples/community/pipeline_demofusion_sdxl.py @@ -795,10 +795,10 @@ class DemoFusionSDXLPipeline( Control the strength of dilated sampling. For specific impacts, please refer to Appendix C in the DemoFusion paper. cosine_scale_3 (`float`, defaults to 1): - Control the strength of the gaussion filter. For specific impacts, please refer to Appendix C + Control the strength of the gaussian filter. For specific impacts, please refer to Appendix C in the DemoFusion paper. sigma (`float`, defaults to 1): - The standerd value of the gaussian filter. + The standard value of the gaussian filter. show_image (`bool`, defaults to False): Determine whether to show intermediate results during generation. diff --git a/examples/dreambooth/train_dreambooth_lora_sd3.py b/examples/dreambooth/train_dreambooth_lora_sd3.py index 6006e54461..c8ee825ccc 100644 --- a/examples/dreambooth/train_dreambooth_lora_sd3.py +++ b/examples/dreambooth/train_dreambooth_lora_sd3.py @@ -517,7 +517,7 @@ def parse_args(input_args=None): "--prodigy_beta3", type=float, default=None, - help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " + help="coefficients for computing the Prodigy stepsize using running averages. If set to None, " "uses the value of square root of beta2. Ignored if optimizer is adamW", ) parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") @@ -788,7 +788,7 @@ class DreamBoothDataset(Dataset): else: example["instance_prompt"] = self.instance_prompt - else: # costum prompts were provided, but length does not match size of image dataset + else: # custom prompts were provided, but length does not match size of image dataset example["instance_prompt"] = self.instance_prompt if self.class_data_root: @@ -1359,7 +1359,7 @@ def main(args): if args.with_prior_preservation: prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, class_pooled_prompt_embeds], dim=0) - # if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the + # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # batch prompts on all training steps else: tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt) diff --git a/examples/dreambooth/train_dreambooth_lora_sdxl.py b/examples/dreambooth/train_dreambooth_lora_sdxl.py index 0c03584068..f5b6e5f65d 100644 --- a/examples/dreambooth/train_dreambooth_lora_sdxl.py +++ b/examples/dreambooth/train_dreambooth_lora_sdxl.py @@ -562,7 +562,7 @@ def parse_args(input_args=None): "--prodigy_beta3", type=float, default=None, - help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " + help="coefficients for computing the Prodigy stepsize using running averages. If set to None, " "uses the value of square root of beta2. Ignored if optimizer is adamW", ) parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") @@ -861,7 +861,7 @@ class DreamBoothDataset(Dataset): else: example["instance_prompt"] = self.instance_prompt - else: # costum prompts were provided, but length does not match size of image dataset + else: # custom prompts were provided, but length does not match size of image dataset example["instance_prompt"] = self.instance_prompt if self.class_data_root: @@ -1488,7 +1488,7 @@ def main(args): if args.with_prior_preservation: prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) unet_add_text_embeds = torch.cat([unet_add_text_embeds, class_pooled_prompt_embeds], dim=0) - # if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the + # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # batch prompts on all training steps else: tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt) diff --git a/examples/dreambooth/train_dreambooth_sd3.py b/examples/dreambooth/train_dreambooth_sd3.py index 124d7b48ee..c8f2fb1ac6 100644 --- a/examples/dreambooth/train_dreambooth_sd3.py +++ b/examples/dreambooth/train_dreambooth_sd3.py @@ -512,7 +512,7 @@ def parse_args(input_args=None): "--prodigy_beta3", type=float, default=None, - help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " + help="coefficients for computing the Prodigy stepsize using running averages. If set to None, " "uses the value of square root of beta2. Ignored if optimizer is adamW", ) parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") @@ -783,7 +783,7 @@ class DreamBoothDataset(Dataset): else: example["instance_prompt"] = self.instance_prompt - else: # costum prompts were provided, but length does not match size of image dataset + else: # custom prompts were provided, but length does not match size of image dataset example["instance_prompt"] = self.instance_prompt if self.class_data_root: @@ -1388,7 +1388,7 @@ def main(args): if args.with_prior_preservation: prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, class_pooled_prompt_embeds], dim=0) - # if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the + # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # batch prompts on all training steps else: tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt) diff --git a/examples/research_projects/scheduled_huber_loss_training/dreambooth/train_dreambooth_lora_sdxl.py b/examples/research_projects/scheduled_huber_loss_training/dreambooth/train_dreambooth_lora_sdxl.py index 00f95509be..8af6462202 100644 --- a/examples/research_projects/scheduled_huber_loss_training/dreambooth/train_dreambooth_lora_sdxl.py +++ b/examples/research_projects/scheduled_huber_loss_training/dreambooth/train_dreambooth_lora_sdxl.py @@ -561,7 +561,7 @@ def parse_args(input_args=None): "--prodigy_beta3", type=float, default=None, - help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " + help="coefficients for computing the Prodigy stepsize using running averages. If set to None, " "uses the value of square root of beta2. Ignored if optimizer is adamW", ) parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") @@ -880,7 +880,7 @@ class DreamBoothDataset(Dataset): else: example["instance_prompt"] = self.instance_prompt - else: # costum prompts were provided, but length does not match size of image dataset + else: # custom prompts were provided, but length does not match size of image dataset example["instance_prompt"] = self.instance_prompt if self.class_data_root: @@ -1561,7 +1561,7 @@ def main(args): if args.with_prior_preservation: prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) unet_add_text_embeds = torch.cat([unet_add_text_embeds, class_pooled_prompt_embeds], dim=0) - # if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the + # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # batch prompts on all training steps else: tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt) diff --git a/src/diffusers/configuration_utils.py b/src/diffusers/configuration_utils.py index be74ae0619..132d223e77 100644 --- a/src/diffusers/configuration_utils.py +++ b/src/diffusers/configuration_utils.py @@ -716,7 +716,7 @@ class LegacyConfigMixin(ConfigMixin): @classmethod def from_config(cls, config: Union[FrozenDict, Dict[str, Any]] = None, return_unused_kwargs=False, **kwargs): - # To prevent depedency import problem. + # To prevent dependency import problem. from .models.model_loading_utils import _fetch_remapped_cls_from_config # resolve remapping diff --git a/src/diffusers/models/controlnet.py b/src/diffusers/models/controlnet.py index b618ad7e08..8fb49d7b54 100644 --- a/src/diffusers/models/controlnet.py +++ b/src/diffusers/models/controlnet.py @@ -54,7 +54,7 @@ class ControlNetOutput(BaseOutput): be of shape `(batch_size, channel * resolution, height //resolution, width // resolution)`. Output can be used to condition the original UNet's downsampling activations. mid_down_block_re_sample (`torch.Tensor`): - The activation of the midde block (the lowest sample resolution). Each tensor should be of shape + The activation of the middle block (the lowest sample resolution). Each tensor should be of shape `(batch_size, channel * lowest_resolution, height // lowest_resolution, width // lowest_resolution)`. Output can be used to condition the original UNet's middle block activation. """ diff --git a/src/diffusers/models/embeddings.py b/src/diffusers/models/embeddings.py index a951021698..cb64bc61f3 100644 --- a/src/diffusers/models/embeddings.py +++ b/src/diffusers/models/embeddings.py @@ -980,7 +980,7 @@ class GLIGENTextBoundingboxProjection(nn.Module): objs = self.linears(torch.cat([positive_embeddings, xyxy_embedding], dim=-1)) - # positionet with text and image infomation + # positionet with text and image information else: phrases_masks = phrases_masks.unsqueeze(-1) image_masks = image_masks.unsqueeze(-1) @@ -1252,7 +1252,7 @@ class MultiIPAdapterImageProjection(nn.Module): if not isinstance(image_embeds, list): deprecation_message = ( "You have passed a tensor as `image_embeds`.This is deprecated and will be removed in a future release." - " Please make sure to update your script to pass `image_embeds` as a list of tensors to supress this warning." + " Please make sure to update your script to pass `image_embeds` as a list of tensors to suppress this warning." ) deprecate("image_embeds not a list", "1.0.0", deprecation_message, standard_warn=False) image_embeds = [image_embeds.unsqueeze(1)] diff --git a/src/diffusers/models/modeling_utils.py b/src/diffusers/models/modeling_utils.py index d4851ab403..3fca24c0fc 100644 --- a/src/diffusers/models/modeling_utils.py +++ b/src/diffusers/models/modeling_utils.py @@ -1169,7 +1169,7 @@ class LegacyModelMixin(ModelMixin): @classmethod @validate_hf_hub_args def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.PathLike]], **kwargs): - # To prevent depedency import problem. + # To prevent dependency import problem. from .model_loading_utils import _fetch_remapped_cls_from_config # Create a copy of the kwargs so that we don't mess with the keyword arguments in the downstream calls. diff --git a/src/diffusers/models/unets/unet_kandinsky3.py b/src/diffusers/models/unets/unet_kandinsky3.py index ff8ce25fd2..f611e7d82b 100644 --- a/src/diffusers/models/unets/unet_kandinsky3.py +++ b/src/diffusers/models/unets/unet_kandinsky3.py @@ -61,7 +61,7 @@ class Kandinsky3UNet(ModelMixin, ConfigMixin): ): super().__init__() - # TOOD(Yiyi): Give better name and put into config for the following 4 parameters + # TODO(Yiyi): Give better name and put into config for the following 4 parameters expansion_ratio = 4 compression_ratio = 2 add_cross_attention = (False, True, True, True) diff --git a/tests/models/unets/test_models_unet_2d.py b/tests/models/unets/test_models_unet_2d.py index 9732989899..5f827f2742 100644 --- a/tests/models/unets/test_models_unet_2d.py +++ b/tests/models/unets/test_models_unet_2d.py @@ -164,7 +164,7 @@ class UNetLDMModelTests(ModelTesterMixin, UNetTesterMixin, unittest.TestCase): @require_torch_accelerator def test_from_pretrained_accelerate_wont_change_results(self): - # by defautl model loading will use accelerate as `low_cpu_mem_usage=True` + # by default model loading will use accelerate as `low_cpu_mem_usage=True` model_accelerate, _ = UNet2DModel.from_pretrained("fusing/unet-ldm-dummy-update", output_loading_info=True) model_accelerate.to(torch_device) model_accelerate.eval()