diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index ae0007dbf8..01ff852f57 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -82,7 +82,7 @@ - local: training/text_inversion title: Textual Inversion - local: training/dreambooth - title: Dreambooth + title: DreamBooth - local: training/text2image title: Text-to-image - local: training/lora diff --git a/docs/source/en/training/dreambooth.mdx b/docs/source/en/training/dreambooth.mdx index 6604028e9b..206164ef74 100644 --- a/docs/source/en/training/dreambooth.mdx +++ b/docs/source/en/training/dreambooth.mdx @@ -10,55 +10,67 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# DreamBooth fine-tuning example +# DreamBooth -[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text-to-image models like stable diffusion given just a few (3~5) images of a subject. +[[open-in-colab]] + +[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text-to-image models like Stable Diffusion given just a few (3-5) images of a subject. It allows the model to generate contextualized images of the subject in different scenes, poses, and views. ![Dreambooth examples from the project's blog](https://dreambooth.github.io/DreamBooth_files/teaser_static.jpg) -_Dreambooth examples from the [project's blog](https://dreambooth.github.io)._ +Dreambooth examples from the project's blog. -The [Dreambooth training script](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) shows how to implement this training procedure on a pre-trained Stable Diffusion model. +This guide will show you how to finetune DreamBooth with the [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) model for various GPU sizes, and with Flax. All the training scripts for DreamBooth used in this guide can be found [here](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) if you're interested in digging deeper and seeing how things work. - - -Dreambooth fine-tuning is very sensitive to hyperparameters and easy to overfit. We recommend you take a look at our [in-depth analysis](https://huggingface.co/blog/dreambooth) with recommended settings for different subjects, and go from there. - - - -## Training locally - -### Installing the dependencies - -Before running the scripts, make sure to install the library's training dependencies. We also recommend to install `diffusers` from the `main` github branch. +Before running the scripts, make sure you install the library's training dependencies. We also recommend installing ๐Ÿงจ Diffusers from the `main` GitHub branch: ```bash pip install git+https://github.com/huggingface/diffusers pip install -U -r diffusers/examples/dreambooth/requirements.txt ``` -xFormers is not part of the training requirements, but [we recommend you install it if you can](../optimization/xformers). It could make your training faster and less memory intensive. +xFormers is not part of the training requirements, but we recommend you [install](../optimization/xformers) it if you can because it could make your training faster and less memory intensive. -After all dependencies have been set up you can configure a [๐Ÿค— Accelerate](https://github.com/huggingface/accelerate/) environment with: +After all the dependencies have been set up, initialize a [๐Ÿค— Accelerate](https://github.com/huggingface/accelerate/) environment with: ```bash accelerate config ``` -In this example we'll use model version `v1-4`, so please visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4) and carefully read the license before proceeding. +To setup a default ๐Ÿค— Accelerate environment without choosing any configurations: -The command below will download and cache the model weights from the Hub because we use the model's Hub id `CompVis/stable-diffusion-v1-4`. You may also clone the repo locally and use the local path in your system where the checkout was saved. +```bash +accelerate config default +``` -### Dog toy example +Or if your environment doesn't support an interactive shell like a notebook, you can use: -In this example we'll use [these images](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ) to add a new concept to Stable Diffusion using the Dreambooth process. They will be our training data. Please, download them and place them somewhere in your system. +```py +from accelerate.utils import write_basic_config -Then you can launch the training script using: +write_basic_config() +``` + +## Finetuning + + + +DreamBooth finetuning is very sensitive to hyperparameters and easy to overfit. We recommend you take a look at our [in-depth analysis](https://huggingface.co/blog/dreambooth) with recommended settings for different subjects to help you choose the appropriate hyperparameters. + + + + + +Let's try DreamBooth with a [few images of a dog](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ); download and save them to a directory and then set the `INSTANCE_DIR` environment variable to that path: ```bash export MODEL_NAME="CompVis/stable-diffusion-v1-4" export INSTANCE_DIR="path_to_training_images" export OUTPUT_DIR="path_to_saved_model" +``` +Then you can launch the training script (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)) with the following command: + +```bash accelerate launch train_dreambooth.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --instance_data_dir=$INSTANCE_DIR \ @@ -72,13 +84,44 @@ accelerate launch train_dreambooth.py \ --lr_warmup_steps=0 \ --max_train_steps=400 ``` + + +If you have access to TPUs or want to train even faster, you can try out the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_flax.py). The Flax training script doesn't support gradient checkpointing or gradient accumulation, so you'll need a GPU with at least 30GB of memory. -### Training with a prior-preserving loss +Before running the script, make sure you have the requirements installed: -Prior preservation is used to avoid overfitting and language-drift. Please, refer to the paper to learn more about it if you are interested. For prior preservation, we use other images of the same class as part of the training process. The nice thing is that we can generate those images using the Stable Diffusion model itself! The training script will save the generated images to a local path we specify. +```bash +pip install -U -r requirements.txt +``` -According to the paper, it's recommended to generate `num_epochs * num_samples` images for prior preservation. 200-300 works well for most cases. +Now you can launch the training script with the following command: +```bash +export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" +export INSTANCE_DIR="path-to-instance-images" +export OUTPUT_DIR="path-to-save-model" + +python train_dreambooth_flax.py \ + --pretrained_model_name_or_path=$MODEL_NAME \ + --instance_data_dir=$INSTANCE_DIR \ + --output_dir=$OUTPUT_DIR \ + --instance_prompt="a photo of sks dog" \ + --resolution=512 \ + --train_batch_size=1 \ + --learning_rate=5e-6 \ + --max_train_steps=400 +``` + + + +## Finetuning with prior-preserving loss + +Prior preservation is used to avoid overfitting and language-drift (check out the [paper](https://arxiv.org/abs/2208.12242) to learn more if you're interested). For prior preservation, you use other images of the same class as part of the training process. The nice thing is that you can generate those images using the Stable Diffusion model itself! The training script will save the generated images to a local path you specify. + +The author's recommend generating `num_epochs * num_samples` images for prior preservation. In most cases, 200-300 images work well. + + + ```bash export MODEL_NAME="CompVis/stable-diffusion-v1-4" export INSTANCE_DIR="path_to_training_images" @@ -102,32 +145,125 @@ accelerate launch train_dreambooth.py \ --num_class_images=200 \ --max_train_steps=800 ``` + + +```bash +export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" +export INSTANCE_DIR="path-to-instance-images" +export CLASS_DIR="path-to-class-images" +export OUTPUT_DIR="path-to-save-model" -### Saving checkpoints while training +python train_dreambooth_flax.py \ + --pretrained_model_name_or_path=$MODEL_NAME \ + --instance_data_dir=$INSTANCE_DIR \ + --class_data_dir=$CLASS_DIR \ + --output_dir=$OUTPUT_DIR \ + --with_prior_preservation --prior_loss_weight=1.0 \ + --instance_prompt="a photo of sks dog" \ + --class_prompt="a photo of dog" \ + --resolution=512 \ + --train_batch_size=1 \ + --learning_rate=5e-6 \ + --num_class_images=200 \ + --max_train_steps=800 +``` + + -It's easy to overfit while training with Dreambooth, so sometimes it's useful to save regular checkpoints during the process. One of the intermediate checkpoints might work better than the final model! To use this feature you need to pass the following argument to the training script: +## Finetuning the text encoder and UNet + +The script also allows you to finetune the `text_encoder` along with the `unet`. In our experiments (check out the [Training Stable Diffusion with DreamBooth using ๐Ÿงจ Diffusers](https://huggingface.co/blog/dreambooth) post for more details), this yields much better results, especially when generating images of faces. + + + +Training the text encoder requires additional memory and it won't fit on a 16GB GPU. You'll need at least 24GB VRAM to use this option. + + + +Pass the `--train_text_encoder` argument to the training script to enable finetuning the `text_encoder` and `unet`: + + + +```bash +export MODEL_NAME="CompVis/stable-diffusion-v1-4" +export INSTANCE_DIR="path_to_training_images" +export CLASS_DIR="path_to_class_images" +export OUTPUT_DIR="path_to_saved_model" + +accelerate launch train_dreambooth.py \ + --pretrained_model_name_or_path=$MODEL_NAME \ + --train_text_encoder \ + --instance_data_dir=$INSTANCE_DIR \ + --class_data_dir=$CLASS_DIR \ + --output_dir=$OUTPUT_DIR \ + --with_prior_preservation --prior_loss_weight=1.0 \ + --instance_prompt="a photo of sks dog" \ + --class_prompt="a photo of dog" \ + --resolution=512 \ + --train_batch_size=1 \ + --use_8bit_adam + --gradient_checkpointing \ + --learning_rate=2e-6 \ + --lr_scheduler="constant" \ + --lr_warmup_steps=0 \ + --num_class_images=200 \ + --max_train_steps=800 +``` + + +```bash +export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" +export INSTANCE_DIR="path-to-instance-images" +export CLASS_DIR="path-to-class-images" +export OUTPUT_DIR="path-to-save-model" + +python train_dreambooth_flax.py \ + --pretrained_model_name_or_path=$MODEL_NAME \ + --train_text_encoder \ + --instance_data_dir=$INSTANCE_DIR \ + --class_data_dir=$CLASS_DIR \ + --output_dir=$OUTPUT_DIR \ + --with_prior_preservation --prior_loss_weight=1.0 \ + --instance_prompt="a photo of sks dog" \ + --class_prompt="a photo of dog" \ + --resolution=512 \ + --train_batch_size=1 \ + --learning_rate=2e-6 \ + --num_class_images=200 \ + --max_train_steps=800 +``` + + + +## Finetuning with LoRA + +You can also use Low-Rank Adaptation of Large Language Models (LoRA), a fine-tuning technique for accelerating training large models, on DreamBooth. For more details, take a look at the [LoRA training](training/lora#dreambooth) guide. + +## Saving checkpoints while training + +It's easy to overfit while training with Dreambooth, so sometimes it's useful to save regular checkpoints during the training process. One of the intermediate checkpoints might actually work better than the final model! Pass the following argument to the training script to enable saving checkpoints: ```bash --checkpointing_steps=500 ``` -This will save the full training state in subfolders of your `output_dir`. Subfolder names begin with the prefix `checkpoint-`, and then the number of steps performed so far; for example: `checkpoint-1500` would be a checkpoint saved after 1500 training steps. +This saves the full training state in subfolders of your `output_dir`. Subfolder names begin with the prefix `checkpoint-`, followed by the number of steps performed so far; for example, `checkpoint-1500` would be a checkpoint saved after 1500 training steps. -#### Resuming training from a saved checkpoint +### Resume training from a saved checkpoint -If you want to resume training from any of the saved checkpoints, you can pass the argument `--resume_from_checkpoint` and then indicate the name of the checkpoint you want to use. You can also use the special string `"latest"` to resume from the last checkpoint saved (i.e., the one with the largest number of steps). For example, the following would resume training from the checkpoint saved after 1500 steps: +If you want to resume training from any of the saved checkpoints, you can pass the argument `--resume_from_checkpoint` to the script and specify the name of the checkpoint you want to use. You can also use the special string `"latest"` to resume from the last saved checkpoint (the one with the largest number of steps). For example, the following would resume training from the checkpoint saved after 1500 steps: ```bash --resume_from_checkpoint="checkpoint-1500" ``` -This would be a good opportunity to tweak some of your hyperparameters if you wish. +This is a good opportunity to tweak some of your hyperparameters if you wish. -#### Performing inference using a saved checkpoint +### Inference from a saved checkpoint -Saved checkpoints are stored in a format suitable for resuming training. They not only include the model weights, but also the state of the optimizer, data loaders and learning rate. +Saved checkpoints are stored in a format suitable for resuming training. They not only include the model weights, but also the state of the optimizer, data loaders, and learning rate. -**Note**: If you have installed `"accelerate>=0.16.0"` you can use the following code to run +If you have **`"accelerate>=0.16.0"`** installed, use the following code to run inference from an intermediate checkpoint. ```python @@ -150,7 +286,7 @@ pipeline.to("cuda") pipeline.save_pretrained("dreambooth-pipeline") ``` -If you have installed `"accelerate<0.16.0"` you need to first convert it to an inference pipeline. This is how you could do it: +If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first: ```python from accelerate import Accelerator @@ -179,15 +315,37 @@ pipeline = DiffusionPipeline.from_pretrained( pipeline.save_pretrained("dreambooth-pipeline") ``` -### Training on a 16GB GPU +## Optimizations for different GPU sizes -With the help of gradient checkpointing and the 8-bit optimizer from [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), it's possible to train dreambooth on a 16GB GPU. +Depending on your hardware, there are a few different ways to optimize DreamBooth on GPUs from 16GB to just 8GB! + +### xFormers + +[xFormers](https://github.com/facebookresearch/xformers) is a toolbox for optimizing Transformers, and it include a [memory-efficient attention](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) mechanism that is used in ๐Ÿงจ Diffusers. You'll need to [install xFormers](./optimization/xformers) and then add the following argument to your training script: + +```bash + --enable_xformers_memory_efficient_attention +``` + +xFormers is not available in Flax. + +### Set gradients to none + +Another way you can lower your memory footprint is to [set the gradients](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html) to `None` instead of zero. However, this may change certain behaviors, so if you run into any issues, try removing this argument. Add the following argument to your training script to set the gradients to `None`: + +```bash + --set_grads_to_none +``` + +### 16GB GPU + +With the help of gradient checkpointing and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) 8-bit optimizer, it's possible to train DreamBooth on a 16GB GPU. Make sure you have bitsandbytes installed: ```bash pip install bitsandbytes ``` -Then pass the `--use_8bit_adam` option to the training script. +Then pass the `--use_8bit_adam` option to the training script: ```bash export MODEL_NAME="CompVis/stable-diffusion-v1-4" @@ -214,25 +372,18 @@ accelerate launch train_dreambooth.py \ --max_train_steps=800 ``` -### Fine-tune the text encoder in addition to the UNet +### 12GB GPU -The script also allows to fine-tune the `text_encoder` along with the `unet`. It has been observed experimentally that this gives much better results, especially on faces. Please, refer to [our blog](https://huggingface.co/blog/dreambooth) for more details. - -To enable this option, pass the `--train_text_encoder` argument to the training script. - - -Training the text encoder requires additional memory, so training won't fit on a 16GB GPU. You'll need at least 24GB VRAM to use this option. - +To run DreamBooth on a 12GB GPU, you'll need to enable gradient checkpointing, the 8-bit optimizer, xFormers, and set the gradients to `None`: ```bash export MODEL_NAME="CompVis/stable-diffusion-v1-4" -export INSTANCE_DIR="path_to_training_images" -export CLASS_DIR="path_to_class_images" -export OUTPUT_DIR="path_to_saved_model" +export INSTANCE_DIR="path-to-instance-images" +export CLASS_DIR="path-to-class-images" +export OUTPUT_DIR="path-to-save-model" accelerate launch train_dreambooth.py \ --pretrained_model_name_or_path=$MODEL_NAME \ - --train_text_encoder \ --instance_data_dir=$INSTANCE_DIR \ --class_data_dir=$CLASS_DIR \ --output_dir=$OUTPUT_DIR \ @@ -241,8 +392,10 @@ accelerate launch train_dreambooth.py \ --class_prompt="a photo of dog" \ --resolution=512 \ --train_batch_size=1 \ - --use_8bit_adam - --gradient_checkpointing \ + --gradient_accumulation_steps=1 --gradient_checkpointing \ + --use_8bit_adam \ + --enable_xformers_memory_efficient_attention \ + --set_grads_to_none \ --learning_rate=2e-6 \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ @@ -250,19 +403,25 @@ accelerate launch train_dreambooth.py \ --max_train_steps=800 ``` -### Training on a 8 GB GPU: +### 8 GB GPU -Using [DeepSpeed](https://www.deepspeed.ai/) it's even possible to offload some -tensors from VRAM to either CPU or NVME, allowing training to proceed with less GPU memory. +For 8GB GPUs, you'll need the help of [DeepSpeed](https://www.deepspeed.ai/) to offload some +tensors from the VRAM to either the CPU or NVME, enabling training with less GPU memory. -DeepSpeed needs to be enabled with `accelerate config`. During configuration, -answer yes to "Do you want to use DeepSpeed?". Combining DeepSpeed stage 2, fp16 -mixed precision, and offloading both the model parameters and the optimizer state to CPU, it's -possible to train on under 8 GB VRAM. The drawback is that this requires more system RAM (about 25 GB). See [the DeepSpeed documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more configuration options. +Run the following command to configure your ๐Ÿค— Accelerate environment: -Changing the default Adam optimizer to DeepSpeed's special version of Adam -`deepspeed.ops.adam.DeepSpeedCPUAdam` gives a substantial speedup, but enabling -it requires the system's CUDA toolchain version to be the same as the one installed with PyTorch. 8-bit optimizers don't seem to be compatible with DeepSpeed at the moment. +```bash +accelerate config +``` + +During configuration, confirm that you want to use DeepSpeed. Now it's possible to train on under 8GB VRAM by combining DeepSpeed stage 2, fp16 mixed precision, and offloading the model parameters and the optimizer state to the CPU. The drawback is that this requires more system RAM, about 25 GB. See [the DeepSpeed documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more configuration options. + +You should also change the default Adam optimizer to DeepSpeed's optimized version of Adam +[`deepspeed.ops.adam.DeepSpeedCPUAdam`](https://deepspeed.readthedocs.io/en/latest/optimizers.html#adam-cpu) for a substantial speedup. Enabling `DeepSpeedCPUAdam` requires your system's CUDA toolchain version to be the same as the one installed with PyTorch. + +8-bit optimizers don't seem to be compatible with DeepSpeed at the moment. + +Launch training with the following command: ```bash export MODEL_NAME="CompVis/stable-diffusion-v1-4" @@ -292,11 +451,10 @@ accelerate launch train_dreambooth.py \ ## Inference -Once you have trained a model, inference can be done using the `StableDiffusionPipeline`, by simply indicating the path where the model was saved. Make sure that your prompts include the special `identifier` used during training (`sks` in the previous examples). - -**Note**: If you have installed `"accelerate>=0.16.0"` you can use the following code to run -inference from an intermediate checkpoint. +Once you have trained a model, specify the path to where the model is saved, and use it for inference in the [`StableDiffusionPipeline`]. Make sure your prompts include the special `identifier` used during training (`sks` in the previous examples). +If you have **`"accelerate>=0.16.0"`** installed, you can use the following code to run +inference from an intermediate checkpoint: ```python from diffusers import StableDiffusionPipeline @@ -311,4 +469,4 @@ image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] image.save("dog-bucket.png") ``` -You may also run inference from [any of the saved training checkpoints](#performing-inference-using-a-saved-checkpoint). +You may also run inference from any of the [saved training checkpoints](#inference-from-a-saved-checkpoint). \ No newline at end of file