Fix SD XL Docs (#3971)

* finish sd xl docs * make style * Apply suggestions from code review * uP * uP * Correct
2026-01-27 17:22:53 +03:00 · 2023-07-06 19:21:03 +02:00
parent b8f089c5a3
commit 38e563d0c7
4 changed files with 133 additions and 30 deletions
--- a/.github/workflows/build_documentation.yml
+++ b/.github/workflows/build_documentation.yml
@@ -11,17 +11,13 @@ on:
 jobs:
  build:
    steps:
-      - name: Install dependencies
-        run: |
-          apt-get update && apt-get install libsndfile1-dev libgl1 -y
-
-      - name: Build doc
-        uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
-        with:
-          commit_sha: ${{ github.sha }}
-          package: diffusers
-          notebook_folder: diffusers_doc
-          languages: en ko zh
+      uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
+      with:
+        commit_sha: ${{ github.sha }}
+        install_libgl1: true
+        package: diffusers
+        notebook_folder: diffusers_doc
+        languages: en ko zh

    secrets:
      token: ${{ secrets.HUGGINGFACE_PUSH }}
--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
@@ -9,15 +9,10 @@ concurrency:

 jobs:
  build:
-    steps:
-      - name: Install dependencies
-        run: |
-          apt-get update && apt-get install libsndfile1-dev libgl1 -y
-
-      - name: Build doc
-        uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
-        with:
-          commit_sha: ${{ github.event.pull_request.head.sha }}
-          pr_number: ${{ github.event.number }}
-          package: diffusers
-          languages: en ko zh
+    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
+    with:
+      commit_sha: ${{ github.event.pull_request.head.sha }}
+      pr_number: ${{ github.event.number }}
+      install_libgl1: true
+      package: diffusers
+      languages: en ko zh
--- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
+++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
@@ -12,22 +12,134 @@ specific language governing permissions and limitations under the License.

 # Stable diffusion XL

-Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of [Stable Diffusion 1](https://stability.ai/blog/stable-diffusion-public-release). 
-The project to train Stable Diffusion 2 was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/).
+Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/abs/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach

-*The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels. 
-These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).*
+The abstract of the paper is the following:

-For more details about how Stable Diffusion 2 works and how it differs from Stable Diffusion 1, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-v2-release).
+*We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.*

 ## Tips

+- Stable Diffusion XL works especially well with images between 768 and 1024.
+- Stable Diffusion XL output image can be improved by making use of a refiner as shown below
+
 ### Available checkpoints:

 - *Text-to-Image (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-base-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) with [`StableDiffusionXLPipeline`]
 - *Image-to-Image / Refiner (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9) with [`StableDiffusionXLImg2ImgPipeline`]

-TODO
+## Usage Example
+
+Before using SDXL make sure to have `transformers`, `accelerate`, `safetensors` and `invisible_watermark` installed. 
+You can install the libraries as follows:
+
+```
+pip install transformers
+pip install accelerate
+pip install safetensors
+pip install invisible-watermark>=2.0
+```
+
+### *Text-to-Image*
+
+You can use SDXL as follows for *text-to-image*:
+
+```py
+from diffusers import StableDiffusionXLPipeline
+import torch
+
+pipe = StableDiffusionXLPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
+)
+pipe.to("cuda")
+
+prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
+image = pipe(prompt=prompt).images[0]
+```
+
+### Refining the image output
+
+The image can be refined by making use of [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9).
+In this case, you only have to output the `latents` from the base model.
+
+```py
+from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
+import torch
+
+pipe = StableDiffusionXLPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
+)
+pipe.to("cuda")
+
+refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
+)
+refiner.to("cuda")
+
+prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
+
+image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0]
+image = refiner(prompt=prompt, image=image[None, :]).images[0]
+```
+
+### Loading single file checkpoitns / original file format
+
+By making use of [`~diffusers.loaders.FromSingleFileMixin.from_single_file`] you can also load the 
+original file format into `diffusers`:
+
+```py
+from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
+import torch
+
+pipe = StableDiffusionXLPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
+)
+pipe.to("cuda")
+
+refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
+)
+refiner.to("cuda")
+```
+
+### Memory optimization via model offloading 
+
+If you are seeing out-of-memory errors, we recommend making use of [`StableDiffusionXLPipeline.enable_model_cpu_offload`].
+
+```diff
+- pipe.to("cuda")
+ pipe.enable_model_cpu_offload()
+```
+
+and 
+
+```diff
+- refiner.to("cuda")
+ refiner.enable_model_cpu_offload()
+```
+
+### Speed-up inference with `torch.compile`
+
+You can speed up inference by making use of `torch.compile`. This should give you **ca.** 20% speed-up.
+
+```diff
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+ refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)
+```
+
+### Running with `torch` < 2.0
+
+**Note** that if you want to run Stable Diffusion XL with `torch` < 2.0, please make sure to enable xformers 
+attention:
+
+```
+pip install xformers
+```
+
+```diff
+pipe.enable_xformers_memory_efficient_attention()
+refiner.enable_xformers_memory_efficient_attention()
+```

 ## StableDiffusionXLPipeline

--- a/src/diffusers/utils/import_utils.py
+++ b/src/diffusers/utils/import_utils.py
@@ -504,7 +504,7 @@ TORCHSDE_IMPORT_ERROR = """

 # docstyle-ignore
 INVISIBLE_WATERMARK_IMPORT_ERROR = """
-{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install git+https://github.com/patrickvonplaten/invisible-watermark.git@remove_onnxruntime_depedency`
+{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install invisible-watermark>=2.0`
 """