diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml index 79d2cdec06..8fdae99883 100644 --- a/.github/workflows/build_documentation.yml +++ b/.github/workflows/build_documentation.yml @@ -11,17 +11,13 @@ on: jobs: build: steps: - - name: Install dependencies - run: | - apt-get update && apt-get install libsndfile1-dev libgl1 -y - - - name: Build doc - uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main - with: - commit_sha: ${{ github.sha }} - package: diffusers - notebook_folder: diffusers_doc - languages: en ko zh + uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main + with: + commit_sha: ${{ github.sha }} + install_libgl1: true + package: diffusers + notebook_folder: diffusers_doc + languages: en ko zh secrets: token: ${{ secrets.HUGGINGFACE_PUSH }} diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml index 248644b7e9..18b606ca75 100644 --- a/.github/workflows/build_pr_documentation.yml +++ b/.github/workflows/build_pr_documentation.yml @@ -9,15 +9,10 @@ concurrency: jobs: build: - steps: - - name: Install dependencies - run: | - apt-get update && apt-get install libsndfile1-dev libgl1 -y - - - name: Build doc - uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main - with: - commit_sha: ${{ github.event.pull_request.head.sha }} - pr_number: ${{ github.event.number }} - package: diffusers - languages: en ko zh + uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main + with: + commit_sha: ${{ github.event.pull_request.head.sha }} + pr_number: ${{ github.event.number }} + install_libgl1: true + package: diffusers + languages: en ko zh diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx index b87d51af23..64abb9eef8 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx +++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx @@ -12,22 +12,134 @@ specific language governing permissions and limitations under the License. # Stable diffusion XL -Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of [Stable Diffusion 1](https://stability.ai/blog/stable-diffusion-public-release). -The project to train Stable Diffusion 2 was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). +Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/abs/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach -*The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels. -These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).* +The abstract of the paper is the following: -For more details about how Stable Diffusion 2 works and how it differs from Stable Diffusion 1, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-v2-release). +*We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.* ## Tips +- Stable Diffusion XL works especially well with images between 768 and 1024. +- Stable Diffusion XL output image can be improved by making use of a refiner as shown below + ### Available checkpoints: - *Text-to-Image (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-base-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) with [`StableDiffusionXLPipeline`] - *Image-to-Image / Refiner (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9) with [`StableDiffusionXLImg2ImgPipeline`] -TODO +## Usage Example + +Before using SDXL make sure to have `transformers`, `accelerate`, `safetensors` and `invisible_watermark` installed. +You can install the libraries as follows: + +``` +pip install transformers +pip install accelerate +pip install safetensors +pip install invisible-watermark>=2.0 +``` + +### *Text-to-Image* + +You can use SDXL as follows for *text-to-image*: + +```py +from diffusers import StableDiffusionXLPipeline +import torch + +pipe = StableDiffusionXLPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True +) +pipe.to("cuda") + +prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" +image = pipe(prompt=prompt).images[0] +``` + +### Refining the image output + +The image can be refined by making use of [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9). +In this case, you only have to output the `latents` from the base model. + +```py +from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline +import torch + +pipe = StableDiffusionXLPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True +) +pipe.to("cuda") + +refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16" +) +refiner.to("cuda") + +prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" + +image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0] +image = refiner(prompt=prompt, image=image[None, :]).images[0] +``` + +### Loading single file checkpoitns / original file format + +By making use of [`~diffusers.loaders.FromSingleFileMixin.from_single_file`] you can also load the +original file format into `diffusers`: + +```py +from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline +import torch + +pipe = StableDiffusionXLPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True +) +pipe.to("cuda") + +refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16" +) +refiner.to("cuda") +``` + +### Memory optimization via model offloading + +If you are seeing out-of-memory errors, we recommend making use of [`StableDiffusionXLPipeline.enable_model_cpu_offload`]. + +```diff +- pipe.to("cuda") ++ pipe.enable_model_cpu_offload() +``` + +and + +```diff +- refiner.to("cuda") ++ refiner.enable_model_cpu_offload() +``` + +### Speed-up inference with `torch.compile` + +You can speed up inference by making use of `torch.compile`. This should give you **ca.** 20% speed-up. + +```diff ++ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) ++ refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True) +``` + +### Running with `torch` < 2.0 + +**Note** that if you want to run Stable Diffusion XL with `torch` < 2.0, please make sure to enable xformers +attention: + +``` +pip install xformers +``` + +```diff ++pipe.enable_xformers_memory_efficient_attention() ++refiner.enable_xformers_memory_efficient_attention() +``` ## StableDiffusionXLPipeline diff --git a/src/diffusers/utils/import_utils.py b/src/diffusers/utils/import_utils.py index 287992207e..3a7539cfb0 100644 --- a/src/diffusers/utils/import_utils.py +++ b/src/diffusers/utils/import_utils.py @@ -504,7 +504,7 @@ TORCHSDE_IMPORT_ERROR = """ # docstyle-ignore INVISIBLE_WATERMARK_IMPORT_ERROR = """ -{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install git+https://github.com/patrickvonplaten/invisible-watermark.git@remove_onnxruntime_depedency` +{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install invisible-watermark>=2.0` """