From a6fb9407fd45e76ccd47d13f08f0dd835967d620 Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Tue, 20 Dec 2022 08:39:16 +0100 Subject: [PATCH] Dreambooth docs: minor fixes (#1758) * Section header for in-painting, inference from checkpoint. * Inference: link to section to perform inference from checkpoint. * Move Dreambooth in-painting instructions to the proper place. --- docs/source/training/dreambooth.mdx | 2 + examples/dreambooth/README.md | 99 +------------------ .../dreambooth_inpaint/README.md | 94 +++++++++++++++++- 3 files changed, 100 insertions(+), 95 deletions(-) diff --git a/docs/source/training/dreambooth.mdx b/docs/source/training/dreambooth.mdx index f8d7a025ff..7f9de798c4 100644 --- a/docs/source/training/dreambooth.mdx +++ b/docs/source/training/dreambooth.mdx @@ -283,3 +283,5 @@ image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] image.save("dog-bucket.png") ``` + +You may also run inference from [any of the saved training checkpoints](#performing-inference-using-a-saved-checkpoint). \ No newline at end of file diff --git a/examples/dreambooth/README.md b/examples/dreambooth/README.md index d34bc0a973..2bbdb7a5da 100644 --- a/examples/dreambooth/README.md +++ b/examples/dreambooth/README.md @@ -232,8 +232,11 @@ image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] image.save("dog-bucket.png") ``` +### Inference from a training checkpoint -## Running with Flax/JAX +You can also perform inference from one of the checkpoints saved during the training process, if you used the `--checkpointing_steps` argument. Please, refer to [the documentation](https://huggingface.co/docs/diffusers/main/en/training/dreambooth#performing-inference-using-a-saved-checkpoint) to see how to do it. + +## Training with Flax/JAX For faster training on TPUs and GPUs you can leverage the flax training example. Follow the instructions above to get the model and dataset before running the script. @@ -314,96 +317,4 @@ python train_dreambooth_flax.py \ --max_train_steps=800 ``` -### Training with prior-preservation loss - -Prior-preservation is used to avoid overfitting and language-drift. Refer to the paper to learn more about it. For prior-preservation we first generate images using the model with a class prompt and then use those during training along with our data. -According to the paper, it's recommended to generate `num_epochs * num_samples` images for prior-preservation. 200-300 works well for most cases. - -```bash -export MODEL_NAME="runwayml/stable-diffusion-inpainting" -export INSTANCE_DIR="path-to-instance-images" -export CLASS_DIR="path-to-class-images" -export OUTPUT_DIR="path-to-save-model" - -accelerate launch train_dreambooth_inpaint.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --gradient_accumulation_steps=1 \ - --learning_rate=5e-6 \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --num_class_images=200 \ - --max_train_steps=800 -``` - - -### Training with gradient checkpointing and 8-bit optimizer: - -With the help of gradient checkpointing and the 8-bit optimizer from bitsandbytes it's possible to run train dreambooth on a 16GB GPU. - -To install `bitandbytes` please refer to this [readme](https://github.com/TimDettmers/bitsandbytes#requirements--installation). - -```bash -export MODEL_NAME="runwayml/stable-diffusion-inpainting" -export INSTANCE_DIR="path-to-instance-images" -export CLASS_DIR="path-to-class-images" -export OUTPUT_DIR="path-to-save-model" - -accelerate launch train_dreambooth_inpaint.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --gradient_accumulation_steps=2 --gradient_checkpointing \ - --use_8bit_adam \ - --learning_rate=5e-6 \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --num_class_images=200 \ - --max_train_steps=800 -``` - -### Fine-tune text encoder with the UNet. - -The script also allows to fine-tune the `text_encoder` along with the `unet`. It's been observed experimentally that fine-tuning `text_encoder` gives much better results especially on faces. -Pass the `--train_text_encoder` argument to the script to enable training `text_encoder`. - -___Note: Training text encoder requires more memory, with this option the training won't fit on 16GB GPU. It needs at least 24GB VRAM.___ - -```bash -export MODEL_NAME="runwayml/stable-diffusion-inpainting" -export INSTANCE_DIR="path-to-instance-images" -export CLASS_DIR="path-to-class-images" -export OUTPUT_DIR="path-to-save-model" - -accelerate launch train_dreambooth_inpaint.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --train_text_encoder \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --use_8bit_adam \ - --gradient_checkpointing \ - --learning_rate=2e-6 \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --num_class_images=200 \ - --max_train_steps=800 -``` +You can also use Dreambooth to train the specialized in-painting model. See [the script in the research folder for details](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/dreambooth_inpaint). \ No newline at end of file diff --git a/examples/research_projects/dreambooth_inpaint/README.md b/examples/research_projects/dreambooth_inpaint/README.md index 6830d1e2cb..dec9195879 100644 --- a/examples/research_projects/dreambooth_inpaint/README.md +++ b/examples/research_projects/dreambooth_inpaint/README.md @@ -23,4 +23,96 @@ accelerate launch train_dreambooth_inpaint.py \ --max_train_steps=400 ``` -The script is also compatible with prior preservation loss and gradient checkpointing +### Training with prior-preservation loss + +Prior-preservation is used to avoid overfitting and language-drift. Refer to the paper to learn more about it. For prior-preservation we first generate images using the model with a class prompt and then use those during training along with our data. +According to the paper, it's recommended to generate `num_epochs * num_samples` images for prior-preservation. 200-300 works well for most cases. + +```bash +export MODEL_NAME="runwayml/stable-diffusion-inpainting" +export INSTANCE_DIR="path-to-instance-images" +export CLASS_DIR="path-to-class-images" +export OUTPUT_DIR="path-to-save-model" + +accelerate launch train_dreambooth_inpaint.py \ + --pretrained_model_name_or_path=$MODEL_NAME \ + --instance_data_dir=$INSTANCE_DIR \ + --class_data_dir=$CLASS_DIR \ + --output_dir=$OUTPUT_DIR \ + --with_prior_preservation --prior_loss_weight=1.0 \ + --instance_prompt="a photo of sks dog" \ + --class_prompt="a photo of dog" \ + --resolution=512 \ + --train_batch_size=1 \ + --gradient_accumulation_steps=1 \ + --learning_rate=5e-6 \ + --lr_scheduler="constant" \ + --lr_warmup_steps=0 \ + --num_class_images=200 \ + --max_train_steps=800 +``` + + +### Training with gradient checkpointing and 8-bit optimizer: + +With the help of gradient checkpointing and the 8-bit optimizer from bitsandbytes it's possible to run train dreambooth on a 16GB GPU. + +To install `bitandbytes` please refer to this [readme](https://github.com/TimDettmers/bitsandbytes#requirements--installation). + +```bash +export MODEL_NAME="runwayml/stable-diffusion-inpainting" +export INSTANCE_DIR="path-to-instance-images" +export CLASS_DIR="path-to-class-images" +export OUTPUT_DIR="path-to-save-model" + +accelerate launch train_dreambooth_inpaint.py \ + --pretrained_model_name_or_path=$MODEL_NAME \ + --instance_data_dir=$INSTANCE_DIR \ + --class_data_dir=$CLASS_DIR \ + --output_dir=$OUTPUT_DIR \ + --with_prior_preservation --prior_loss_weight=1.0 \ + --instance_prompt="a photo of sks dog" \ + --class_prompt="a photo of dog" \ + --resolution=512 \ + --train_batch_size=1 \ + --gradient_accumulation_steps=2 --gradient_checkpointing \ + --use_8bit_adam \ + --learning_rate=5e-6 \ + --lr_scheduler="constant" \ + --lr_warmup_steps=0 \ + --num_class_images=200 \ + --max_train_steps=800 +``` + +### Fine-tune text encoder with the UNet. + +The script also allows to fine-tune the `text_encoder` along with the `unet`. It's been observed experimentally that fine-tuning `text_encoder` gives much better results especially on faces. +Pass the `--train_text_encoder` argument to the script to enable training `text_encoder`. + +___Note: Training text encoder requires more memory, with this option the training won't fit on 16GB GPU. It needs at least 24GB VRAM.___ + +```bash +export MODEL_NAME="runwayml/stable-diffusion-inpainting" +export INSTANCE_DIR="path-to-instance-images" +export CLASS_DIR="path-to-class-images" +export OUTPUT_DIR="path-to-save-model" + +accelerate launch train_dreambooth_inpaint.py \ + --pretrained_model_name_or_path=$MODEL_NAME \ + --train_text_encoder \ + --instance_data_dir=$INSTANCE_DIR \ + --class_data_dir=$CLASS_DIR \ + --output_dir=$OUTPUT_DIR \ + --with_prior_preservation --prior_loss_weight=1.0 \ + --instance_prompt="a photo of sks dog" \ + --class_prompt="a photo of dog" \ + --resolution=512 \ + --train_batch_size=1 \ + --use_8bit_adam \ + --gradient_checkpointing \ + --learning_rate=2e-6 \ + --lr_scheduler="constant" \ + --lr_warmup_steps=0 \ + --num_class_images=200 \ + --max_train_steps=800 +```