[Docs] Fix typos and update files at API's Main Classes, Models, and Schedulers pages (#5720)

* Fix typos, update, add Copyright info, and trim trailing whitespaces * Update docs/source/en/api/loaders.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/api/models/autoencoder_tiny.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/api/models/autoencoder_tiny.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2026-01-27 17:22:53 +03:00 · 2023-11-14 01:32:59 +03:00
parent 4b45a1e147
commit a359ff7644
48 changed files with 337 additions and 93 deletions
--- a/docs/source/en/api/activations.md
+++ b/docs/source/en/api/activations.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Activation functions

 Customized activation functions for supporting various models in 🤗 Diffusers.
@@ -12,4 +24,4 @@ Customized activation functions for supporting various models in 🤗 Diffusers.

 ## ApproximateGELU

-[[autodoc]] models.activations.ApproximateGELU
+[[autodoc]] models.activations.ApproximateGELU
--- a/docs/source/en/api/attnprocessor.md
+++ b/docs/source/en/api/attnprocessor.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Attention Processor

 An attention processor is a class for applying different types of attention mechanisms.
--- a/docs/source/en/api/image_processor.md
+++ b/docs/source/en/api/image_processor.md
@@ -12,9 +12,9 @@ specific language governing permissions and limitations under the License.

 # VAE Image Processor

-The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]'s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays. 
+The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.

-All pipelines with [`VaeImageProcessor`] accepts PIL Image, PyTorch tensor, or NumPy arrays as image inputs and returns outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="pt"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines. 
+All pipelines with [`VaeImageProcessor`] accept PIL Image, PyTorch tensor, or NumPy arrays as image inputs and return outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="latent"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.

 ## VaeImageProcessor

@@ -24,4 +24,4 @@ All pipelines with [`VaeImageProcessor`] accepts PIL Image, PyTorch tensor, or N

 The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.

-[[autodoc]] image_processor.VaeImageProcessorLDM3D
+[[autodoc]] image_processor.VaeImageProcessorLDM3D
--- a/docs/source/en/api/internal_classes_overview.md
+++ b/docs/source/en/api/internal_classes_overview.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Overview

 The APIs in this section are more experimental and prone to breaking changes. Most of them are used internally for development, but they may also be useful to you if you're interested in building a diffusion model with some custom parts or if you're interested in some of our helper utilities for working with 🤗 Diffusers.
--- a/docs/source/en/api/loaders.md
+++ b/docs/source/en/api/loaders.md
@@ -12,11 +12,11 @@ specific language governing permissions and limitations under the License.

 # Loaders

-Adapters (textual inversion, LoRA, hypernetworks) allow you to modify a diffusion model to generate images in a specific style without training or finetuning the entire model. The adapter weights are typically only a tiny fraction of the pretrained model's which making them very portable. 🤗 Diffusers provides an easy-to-use `LoaderMixin` API to load adapter weights.
+Adapters (textual inversion, LoRA, hypernetworks) allow you to modify a diffusion model to generate images in a specific style without training or finetuning the entire model. The adapter weights are very portable because they're typically only a tiny fraction of the pretrained model weights. 🤗 Diffusers provides an easy-to-use `LoaderMixin` API to load adapter weights.

 <Tip warning={true}>

-🧪 The `LoaderMixins` are highly experimental and prone to future changes. To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
+🧪 The `LoaderMixin`s are highly experimental and prone to future changes. To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.

 </Tip>

--- a/docs/source/en/api/logging.md
+++ b/docs/source/en/api/logging.md
@@ -51,7 +51,7 @@ logger.warning("WARN")

 All methods of the logging module are documented below. The main methods are
 [`logging.get_verbosity`] to get the current level of verbosity in the logger and
-[`logging.set_verbosity`] to set the verbosity to the level of your choice. 
+[`logging.set_verbosity`] to set the verbosity to the level of your choice.

 In order from the least verbose to the most verbose:

--- a/docs/source/en/api/models/asymmetricautoencoderkl.md
+++ b/docs/source/en/api/models/asymmetricautoencoderkl.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # AsymmetricAutoencoderKL

 Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
@@ -6,7 +18,7 @@ The abstract from the paper is:

 *StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting. Secondly, the decoder is much heavier than the encoder, allowing for more detailed recovery while only slightly increasing the total inference cost. The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged. Our asymmetric VQGAN can be widely used in StableDiffusion-based inpainting and local editing methods. Extensive experiments demonstrate that it can significantly improve the inpainting and editing performance, while maintaining the original text-to-image capability. The code is available at https://github.com/buxiangzhiren/Asymmetric_VQGAN*

-Evaluation results can be found in section 4.1 of the original paper. 
+Evaluation results can be found in section 4.1 of the original paper.

 ## Available checkpoints

@@ -16,30 +28,23 @@ Evaluation results can be found in section 4.1 of the original paper.
 ## Example Usage

 ```python
-from io import BytesIO
-from PIL import Image
-import requests
 from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline
+from diffusers.utils import load_image, make_image_grid


-def download_image(url: str) -> Image.Image:
-    response = requests.get(url)
-    return Image.open(BytesIO(response.content)).convert("RGB")
-
-
-prompt = "a photo of a person"
+prompt = "a photo of a person with beard"
 img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
 mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"

-image = download_image(img_url).resize((256, 256))
-mask_image = download_image(mask_url).resize((256, 256))
+original_image = load_image(img_url).resize((512, 512))
+mask_image = load_image(mask_url).resize((512, 512))

 pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
 pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
 pipe.to("cuda")

-image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
-image.save("image.jpeg")
+image = pipe(prompt=prompt, image=original_image, mask_image=mask_image).images[0]
+make_image_grid([original_image, mask_image, image], rows=1, cols=3)
 ```

 ## AsymmetricAutoencoderKL
--- a/docs/source/en/api/models/autoencoder_tiny.md
+++ b/docs/source/en/api/models/autoencoder_tiny.md
@@ -1,6 +1,18 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Tiny AutoEncoder

-Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [`StableDiffusionPipeline`] or [`StableDiffusionXLPipeline`] almost instantly. 
+Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [`StableDiffusionPipeline`] or [`StableDiffusionXLPipeline`] almost instantly.

 To use with Stable Diffusion v-2.1:

@@ -16,7 +28,7 @@ pipe = pipe.to("cuda")

 prompt = "slice of delicious New York-style berry cheesecake"
 image = pipe(prompt, num_inference_steps=25).images[0]
-image.save("cheesecake.png")
+image
 ```

 To use with Stable Diffusion XL 1.0
@@ -33,7 +45,7 @@ pipe = pipe.to("cuda")

 prompt = "slice of delicious New York-style berry cheesecake"
 image = pipe(prompt, num_inference_steps=25).images[0]
-image.save("cheesecake_sdxl.png")
+image
 ```

 ## AutoencoderTiny
@@ -42,4 +54,4 @@ image.save("cheesecake_sdxl.png")

 ## AutoencoderTinyOutput

-[[autodoc]] models.autoencoder_tiny.AutoencoderTinyOutput
+[[autodoc]] models.autoencoder_tiny.AutoencoderTinyOutput
--- a/docs/source/en/api/models/autoencoderkl.md
+++ b/docs/source/en/api/models/autoencoderkl.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # AutoencoderKL

 The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
@@ -14,7 +26,7 @@ from the original format using [`FromOriginalVAEMixin.from_single_file`] as foll
 ```py
 from diffusers import AutoencoderKL

-url = "https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors"  # can also be local file
+url = "https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors"  # can also be a local file
 model = AutoencoderKL.from_single_file(url)
 ```

--- a/docs/source/en/api/models/controlnet.md
+++ b/docs/source/en/api/models/controlnet.md
@@ -1,10 +1,22 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # ControlNet

-The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
+The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.

 The abstract from the paper is:

-*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
+*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*

 ## Loading from the original format

--- a/docs/source/en/api/models/overview.md
+++ b/docs/source/en/api/models/overview.md
@@ -1,8 +1,20 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Models

-🤗 Diffusers provides pretrained models for popular algorithms and modules to create custom diffusion systems. The primary function of models is to denoise an input sample as modeled by the distribution \\(p_{\theta}(x_{t-1}|x_{t})\\).
+🤗 Diffusers provides pretrained models for popular algorithms and modules to create custom diffusion systems. The primary function of models is to denoise an input sample as modeled by the distribution  \\(p_{\theta}(x_{t-1}|x_{t})\\).

-All models are built from the base [`ModelMixin`] class which is a [`torch.nn.module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) providing basic functionality for saving and loading models, locally and from the Hugging Face Hub.
+All models are built from the base [`ModelMixin`] class which is a [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) providing basic functionality for saving and loading models, locally and from the Hugging Face Hub.

 ## ModelMixin
 [[autodoc]] ModelMixin
@@ -13,4 +25,4 @@ All models are built from the base [`ModelMixin`] class which is a [`torch.nn.mo

 ## PushToHubMixin

-[[autodoc]] utils.PushToHubMixin
+[[autodoc]] utils.PushToHubMixin
--- a/docs/source/en/api/models/prior_transformer.md
+++ b/docs/source/en/api/models/prior_transformer.md
@@ -1,7 +1,18 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Prior Transformer

-The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents
-](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.
+The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.

 The abstract from the paper is:

@@ -13,4 +24,4 @@ The abstract from the paper is:

 ## PriorTransformerOutput

-[[autodoc]] models.prior_transformer.PriorTransformerOutput
+[[autodoc]] models.prior_transformer.PriorTransformerOutput
--- a/docs/source/en/api/models/transformer2d.md
+++ b/docs/source/en/api/models/transformer2d.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Transformer2D

 A Transformer model for image-like data from [CompVis](https://huggingface.co/CompVis) that is based on the [Vision Transformer](https://huggingface.co/papers/2010.11929) introduced by Dosovitskiy et al. The [`Transformer2DModel`] accepts discrete (classes of vector embeddings) or continuous (actual embeddings) inputs.
--- a/docs/source/en/api/models/transformer_temporal.md
+++ b/docs/source/en/api/models/transformer_temporal.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Transformer Temporal

 A Transformer model for video-like data.
@@ -8,4 +20,4 @@ A Transformer model for video-like data.

 ## TransformerTemporalModelOutput

-[[autodoc]] models.transformer_temporal.TransformerTemporalModelOutput
+[[autodoc]] models.transformer_temporal.TransformerTemporalModelOutput
--- a/docs/source/en/api/models/unet-motion.md
+++ b/docs/source/en/api/models/unet-motion.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # UNetMotionModel

 The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.
--- a/docs/source/en/api/models/unet.md
+++ b/docs/source/en/api/models/unet.md
@@ -1,6 +1,18 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # UNet1DModel

-The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 1D UNet model.
+The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 1D UNet model.

 The abstract from the paper is:

@@ -10,4 +22,4 @@ The abstract from the paper is:
 [[autodoc]] UNet1DModel

 ## UNet1DOutput
-[[autodoc]] models.unet_1d.UNet1DOutput
+[[autodoc]] models.unet_1d.UNet1DOutput
--- a/docs/source/en/api/models/unet2d-cond.md
+++ b/docs/source/en/api/models/unet2d-cond.md
@@ -1,6 +1,18 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # UNet2DConditionModel

-The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet conditional model.
+The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet conditional model.

 The abstract from the paper is:

@@ -16,4 +28,4 @@ The abstract from the paper is:
 [[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionModel

 ## FlaxUNet2DConditionOutput
-[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionOutput
+[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionOutput
--- a/docs/source/en/api/models/unet2d.md
+++ b/docs/source/en/api/models/unet2d.md
@@ -1,6 +1,18 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # UNet2DModel

-The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.
+The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.

 The abstract from the paper is:

@@ -10,4 +22,4 @@ The abstract from the paper is:
 [[autodoc]] UNet2DModel

 ## UNet2DOutput
-[[autodoc]] models.unet_2d.UNet2DOutput
+[[autodoc]] models.unet_2d.UNet2DOutput
--- a/docs/source/en/api/models/unet3d-cond.md
+++ b/docs/source/en/api/models/unet3d-cond.md
@@ -1,6 +1,18 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # UNet3DConditionModel

-The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 3D UNet conditional model.
+The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 3D UNet conditional model.

 The abstract from the paper is:

@@ -10,4 +22,4 @@ The abstract from the paper is:
 [[autodoc]] UNet3DConditionModel

 ## UNet3DConditionOutput
-[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput
+[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput
--- a/docs/source/en/api/models/vq.md
+++ b/docs/source/en/api/models/vq.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # VQModel

 The VQ-VAE model was introduced in [Neural Discrete Representation Learning](https://huggingface.co/papers/1711.00937) by Aaron van den Oord, Oriol Vinyals and Koray Kavukcuoglu. The model is used in 🤗 Diffusers to decode latent representations into images. Unlike [`AutoencoderKL`], the [`VQModel`] works in a quantized latent space.
@@ -12,4 +24,4 @@ The abstract from the paper is:

 ## VQEncoderOutput

-[[autodoc]] models.vq_model.VQEncoderOutput
+[[autodoc]] models.vq_model.VQEncoderOutput
--- a/docs/source/en/api/normalization.md
+++ b/docs/source/en/api/normalization.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Normalization layers

 Customized normalization layers for supporting various models in 🤗 Diffusers.
@@ -10,6 +22,10 @@ Customized normalization layers for supporting various models in 🤗 Diffusers.

 [[autodoc]] models.normalization.AdaLayerNormZero

+## AdaLayerNormSingle
+
+[[autodoc]] models.normalization.AdaLayerNormSingle
+
 ## AdaGroupNorm

-[[autodoc]] models.normalization.AdaGroupNorm
+[[autodoc]] models.normalization.AdaGroupNorm
--- a/docs/source/en/api/outputs.md
+++ b/docs/source/en/api/outputs.md
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.

 # Outputs

-All models outputs are subclasses of [`~utils.BaseOutput`], data structures containing all the information returned by the model. The outputs can also be used as tuples or dictionaries.
+All model outputs are subclasses of [`~utils.BaseOutput`], data structures containing all the information returned by the model. The outputs can also be used as tuples or dictionaries.

 For example:

@@ -64,4 +64,4 @@ To check a specific pipeline or model output, refer to its corresponding API doc

 ## ImageTextPipelineOutput

-[[autodoc]] ImageTextPipelineOutput
+[[autodoc]] ImageTextPipelineOutput
--- a/docs/source/en/api/schedulers/cm_stochastic_iterative.md
+++ b/docs/source/en/api/schedulers/cm_stochastic_iterative.md
@@ -1,10 +1,22 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # CMStochasticIterativeScheduler

 [Consistency Models](https://huggingface.co/papers/2303.01469) by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever introduced a multistep and onestep scheduler (Algorithm 1) that is capable of generating good samples in one or a small number of steps.

 The abstract from the paper is:

-*Diffusion models have made significant breakthroughs in image, audio, and video generation, but they depend on an iterative generation process that causes slow sampling speed and caps their potential for real-time applications. To overcome this limitation, we propose consistency models, a new family of generative models that achieve high sample quality without adversarial training. They support fast one-step generation by design, while still allowing for few-step sampling to trade compute for sample quality. They also support zero-shot data editing, like image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either as a way to distill pre-trained diffusion models, or as standalone generative models. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step generation. For example, we achieve the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained as standalone generative models, consistency models also outperform single-step, non-adversarial generative models on standard benchmarks like CIFAR-10, ImageNet 64x64 and LSUN 256x256.*
+*Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256.*

 The original codebase can be found at [openai/consistency_models](https://github.com/openai/consistency_models).

@@ -12,4 +24,4 @@ The original codebase can be found at [openai/consistency_models](https://github
 [[autodoc]] CMStochasticIterativeScheduler

 ## CMStochasticIterativeSchedulerOutput
-[[autodoc]] schedulers.scheduling_consistency_models.CMStochasticIterativeSchedulerOutput
+[[autodoc]] schedulers.scheduling_consistency_models.CMStochasticIterativeSchedulerOutput
--- a/docs/source/en/api/schedulers/ddim.md
+++ b/docs/source/en/api/schedulers/ddim.md
@@ -16,13 +16,11 @@ specific language governing permissions and limitations under the License.

 The abstract from the paper is:

-*Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, 
-yet they require simulating a Markov chain for many steps to produce a sample. 
+*Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample.
 To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models
-with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. 
+with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process.
 We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from.
-We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off 
-computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.*
+We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.*

 The original codebase of this paper can be found at [ermongroup/ddim](https://github.com/ermongroup/ddim), and you can contact the author on [tsong.me](https://tsong.me/).

@@ -57,13 +55,14 @@ pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spaci
 4. rescale classifier-free guidance to prevent over-exposure

 ```py
-image = pipeline(prompt, guidance_rescale=0.7).images[0]
+image = pipe(prompt, guidance_rescale=0.7).images[0]
 ```

 For example:

 ```py
 from diffusers import DiffusionPipeline, DDIMScheduler
+import torch

 pipe = DiffusionPipeline.from_pretrained("ptx0/pseudo-journey-v2", torch_dtype=torch.float16)
 pipe.scheduler = DDIMScheduler.from_config(
@@ -72,7 +71,8 @@ pipe.scheduler = DDIMScheduler.from_config(
 pipe.to("cuda")

 prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"
-image = pipeline(prompt, guidance_rescale=0.7).images[0]
+image = pipe(prompt, guidance_rescale=0.7).images[0]
+image
 ```

 ## DDIMScheduler
--- a/docs/source/en/api/schedulers/ddim_inverse.md
+++ b/docs/source/en/api/schedulers/ddim_inverse.md
@@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License.
 # DDIMInverseScheduler

 `DDIMInverseScheduler` is the inverted scheduler from [Denoising Diffusion Implicit Models](https://huggingface.co/papers/2010.02502) (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon.
-The implementation is mostly based on the DDIM inversion definition from [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://huggingface.co/papers/2211.09794.pdf).
+The implementation is mostly based on the DDIM inversion definition from [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://huggingface.co/papers/2211.09794).

 ## DDIMInverseScheduler
 [[autodoc]] DDIMInverseScheduler
--- a/docs/source/en/api/schedulers/ddpm.md
+++ b/docs/source/en/api/schedulers/ddpm.md
@@ -16,10 +16,10 @@ specific language governing permissions and limitations under the License.

 The abstract from the paper is:

-*We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN.*
+*We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at [this https URL](https://github.com/hojonathanho/diffusion).*

 ## DDPMScheduler
 [[autodoc]] DDPMScheduler

 ## DDPMSchedulerOutput
-[[autodoc]] schedulers.scheduling_ddpm.DDPMSchedulerOutput
+[[autodoc]] schedulers.scheduling_ddpm.DDPMSchedulerOutput
--- a/docs/source/en/api/schedulers/deis.md
+++ b/docs/source/en/api/schedulers/deis.md
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.

 # DEISMultistepScheduler

-Diffusion Exponential Integrator Sampler (DEIS) is proposed in [Fast Sampling of Diffusion Models with Exponential Integrator](https://huggingface.co/papers/2204.13902) by Qinsheng Zhang and Yongxin Chen. `DEISMultistepScheduler` is a fast high order solver for diffusion ordinary differential equations (ODEs). 
+Diffusion Exponential Integrator Sampler (DEIS) is proposed in [Fast Sampling of Diffusion Models with Exponential Integrator](https://huggingface.co/papers/2204.13902) by Qinsheng Zhang and Yongxin Chen. `DEISMultistepScheduler` is a fast high order solver for diffusion ordinary differential equations (ODEs).

 This implementation modifies the polynomial fitting formula in log-rho space instead of the original linear `t` space in the DEIS paper. The modification enjoys closed-form coefficients for exponential multistep update instead of replying on the numerical solver.

@@ -20,8 +20,6 @@ The abstract from the paper is:

 *The past few years have witnessed the great success of Diffusion models~(DMs) in generating high-fidelity samples in generative modeling tasks. A major limitation of the DM is its notoriously slow sampling procedure which normally requires hundreds to thousands of time discretization steps of the learned diffusion process to reach the desired accuracy. Our goal is to develop a fast sampling method for DMs with a much less number of steps while retaining high sample quality. To this end, we systematically analyze the sampling procedure in DMs and identify key factors that affect the sample quality, among which the method of discretization is most crucial. By carefully examining the learned diffusion process, we propose Diffusion Exponential Integrator Sampler~(DEIS). It is based on the Exponential Integrator designed for discretizing ordinary differential equations (ODEs) and leverages a semilinear structure of the learned diffusion process to reduce the discretization error. The proposed method can be applied to any DMs and can generate high-fidelity samples in as few as 10 steps. In our experiments, it takes about 3 minutes on one A6000 GPU to generate 50k images from CIFAR10. Moreover, by directly using pre-trained DMs, we achieve the state-of-art sampling performance when the number of score function evaluation~(NFE) is limited, e.g., 4.17 FID with 10 NFEs, 3.37 FID, and 9.74 IS with only 15 NFEs on CIFAR10. Code is available at [this https URL](https://github.com/qsh-zh/deis).*

-The original codebase can be found at [qsh-zh/deis](https://github.com/qsh-zh/deis).
-
 ## Tips

 It is recommended to set `solver_order` to 2 or 3, while `solver_order=1` is equivalent to [`DDIMScheduler`].
@@ -33,4 +31,4 @@ diffusion models, you can set `thresholding=True` to use the dynamic thresholdin
 [[autodoc]] DEISMultistepScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/dpm_discrete.md
+++ b/docs/source/en/api/schedulers/dpm_discrete.md
@@ -20,4 +20,4 @@ The original codebase can be found at [crowsonkb/k-diffusion](https://github.com
 [[autodoc]] KDPM2DiscreteScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/dpm_discrete_ancestral.md
+++ b/docs/source/en/api/schedulers/dpm_discrete_ancestral.md
@@ -20,4 +20,4 @@ The original codebase can be found at [crowsonkb/k-diffusion](https://github.com
 [[autodoc]] KDPM2AncestralDiscreteScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/dpm_sde.md
+++ b/docs/source/en/api/schedulers/dpm_sde.md
@@ -18,4 +18,4 @@ The `DPMSolverSDEScheduler` is inspired by the stochastic sampler from the [Eluc
 [[autodoc]] DPMSolverSDEScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/euler.md
+++ b/docs/source/en/api/schedulers/euler.md
@@ -19,4 +19,4 @@ The Euler scheduler (Algorithm 2) is from the [Elucidating the Design Space of D
 [[autodoc]] EulerDiscreteScheduler

 ## EulerDiscreteSchedulerOutput
-[[autodoc]] schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput
+[[autodoc]] schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput
--- a/docs/source/en/api/schedulers/euler_ancestral.md
+++ b/docs/source/en/api/schedulers/euler_ancestral.md
@@ -18,4 +18,4 @@ A scheduler that uses ancestral sampling with Euler method steps. This is a fast
 [[autodoc]] EulerAncestralDiscreteScheduler

 ## EulerAncestralDiscreteSchedulerOutput
-[[autodoc]] schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteSchedulerOutput
+[[autodoc]] schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteSchedulerOutput
--- a/docs/source/en/api/schedulers/heun.md
+++ b/docs/source/en/api/schedulers/heun.md
@@ -18,4 +18,4 @@ The Heun scheduler (Algorithm 1) is from the [Elucidating the Design Space of Di
 [[autodoc]] HeunDiscreteScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/ipndm.md
+++ b/docs/source/en/api/schedulers/ipndm.md
@@ -18,4 +18,4 @@ specific language governing permissions and limitations under the License.
 [[autodoc]] IPNDMScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/lcm.md
+++ b/docs/source/en/api/schedulers/lcm.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Latent Consistency Model Multistep Scheduler

 ## Overview
--- a/docs/source/en/api/schedulers/lms_discrete.md
+++ b/docs/source/en/api/schedulers/lms_discrete.md
@@ -18,4 +18,4 @@ specific language governing permissions and limitations under the License.
 [[autodoc]] LMSDiscreteScheduler

 ## LMSDiscreteSchedulerOutput
-[[autodoc]] schedulers.scheduling_lms_discrete.LMSDiscreteSchedulerOutput
+[[autodoc]] schedulers.scheduling_lms_discrete.LMSDiscreteSchedulerOutput
--- a/docs/source/en/api/schedulers/multistep_dpm_solver.md
+++ b/docs/source/en/api/schedulers/multistep_dpm_solver.md
@@ -21,7 +21,7 @@ samples, and it can generate quite good samples even in 10 steps.

 It is recommended to set `solver_order` to 2 for guide sampling, and `solver_order=3` for unconditional sampling.

-Dynamic thresholding from Imagen (https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
+Dynamic thresholding from [Imagen](https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
 diffusion models, you can set both `algorithm_type="dpmsolver++"` and `thresholding=True` to use the dynamic
 thresholding. This thresholding method is unsuitable for latent-space diffusion models such as
 Stable Diffusion.
@@ -32,4 +32,4 @@ The SDE variant of DPMSolver and DPM-Solver++ is also supported, but only for th
 [[autodoc]] DPMSolverMultistepScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/multistep_dpm_solver_inverse.md
+++ b/docs/source/en/api/schedulers/multistep_dpm_solver_inverse.md
@@ -14,11 +14,11 @@ specific language governing permissions and limitations under the License.

 `DPMSolverMultistepInverse` is the inverted scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.

-The implementation is mostly based on the DDIM inversion definition of [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://huggingface.co/papers/2211.09794.pdf) and notebook implementation of the [`DiffEdit`] latent inversion from [Xiang-cd/DiffEdit-stable-diffusion](https://github.com/Xiang-cd/DiffEdit-stable-diffusion/blob/main/diffedit.ipynb).
+The implementation is mostly based on the DDIM inversion definition of [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://huggingface.co/papers/2211.09794) and notebook implementation of the [`DiffEdit`] latent inversion from [Xiang-cd/DiffEdit-stable-diffusion](https://github.com/Xiang-cd/DiffEdit-stable-diffusion/blob/main/diffedit.ipynb).

 ## Tips

-Dynamic thresholding from Imagen (https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
+Dynamic thresholding from [Imagen](https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
 diffusion models, you can set both `algorithm_type="dpmsolver++"` and `thresholding=True` to use the dynamic
 thresholding. This thresholding method is unsuitable for latent-space diffusion models such as
 Stable Diffusion.
--- a/docs/source/en/api/schedulers/overview.md
+++ b/docs/source/en/api/schedulers/overview.md
@@ -61,4 +61,4 @@ The different schedulers in this class, depending on the ordinary differential e

 ## PushToHubMixin

-[[autodoc]] utils.PushToHubMixin
+[[autodoc]] utils.PushToHubMixin
--- a/docs/source/en/api/schedulers/pndm.md
+++ b/docs/source/en/api/schedulers/pndm.md
@@ -18,4 +18,4 @@ specific language governing permissions and limitations under the License.
 [[autodoc]] PNDMScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/repaint.md
+++ b/docs/source/en/api/schedulers/repaint.md
@@ -16,7 +16,7 @@ specific language governing permissions and limitations under the License.

 The abstract from the paper is:

-*Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image information. Since this technique does not modify or condition the original DDPM network itself, the model produces high-quality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks. RePaint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions. Github Repository: git.io/RePaint*.
+*Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image information. Since this technique does not modify or condition the original DDPM network itself, the model produces high-quality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks. RePaint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions. GitHub Repository: [this http URL](http://git.io/RePaint).*

 The original implementation can be found at [andreas128/RePaint](https://github.com/andreas128/).

@@ -24,4 +24,4 @@ The original implementation can be found at [andreas128/RePaint](https://github.
 [[autodoc]] RePaintScheduler

 ## RePaintSchedulerOutput
-[[autodoc]] schedulers.scheduling_repaint.RePaintSchedulerOutput
+[[autodoc]] schedulers.scheduling_repaint.RePaintSchedulerOutput
--- a/docs/source/en/api/schedulers/score_sde_ve.md
+++ b/docs/source/en/api/schedulers/score_sde_ve.md
@@ -16,10 +16,10 @@ specific language governing permissions and limitations under the License.

 The abstract from the paper is:

-*Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model*.
+*Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.*

 ## ScoreSdeVeScheduler
 [[autodoc]] ScoreSdeVeScheduler

 ## SdeVeOutput
-[[autodoc]] schedulers.scheduling_sde_ve.SdeVeOutput
+[[autodoc]] schedulers.scheduling_sde_ve.SdeVeOutput
--- a/docs/source/en/api/schedulers/score_sde_vp.md
+++ b/docs/source/en/api/schedulers/score_sde_vp.md
@@ -16,7 +16,7 @@ specific language governing permissions and limitations under the License.

 The abstract from the paper is:

-*Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model*.
+*Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.*

 <Tip warning={true}>

--- a/docs/source/en/api/schedulers/singlestep_dpm_solver.md
+++ b/docs/source/en/api/schedulers/singlestep_dpm_solver.md
@@ -23,7 +23,7 @@ The original implementation can be found at [LuChengTHU/dpm-solver](https://gith

 It is recommended to set `solver_order` to 2 for guide sampling, and `solver_order=3` for unconditional sampling.

-Dynamic thresholding from Imagen (https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
+Dynamic thresholding from [Imagen](https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
 diffusion models, you can set both `algorithm_type="dpmsolver++"` and `thresholding=True` to use dynamic
 thresholding. This thresholding method is unsuitable for latent-space diffusion models such as
 Stable Diffusion.
@@ -32,4 +32,4 @@ Stable Diffusion.
 [[autodoc]] DPMSolverSinglestepScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/stochastic_karras_ve.md
+++ b/docs/source/en/api/schedulers/stochastic_karras_ve.md
@@ -12,10 +12,10 @@ specific language governing permissions and limitations under the License.

 # KarrasVeScheduler

-`KarrasVeScheduler` is a stochastic sampler tailored o variance-expanding (VE) models. It is based on the [Elucidating the Design Space of Diffusion-Based Generative Models](https://huggingface.co/papers/2206.00364) and [Score-based generative modeling through stochastic differential equations](https://huggingface.co/papers/2011.13456) papers.
+`KarrasVeScheduler` is a stochastic sampler tailored to variance-expanding (VE) models. It is based on the [Elucidating the Design Space of Diffusion-Based Generative Models](https://huggingface.co/papers/2206.00364) and [Score-based generative modeling through stochastic differential equations](https://huggingface.co/papers/2011.13456) papers.

 ## KarrasVeScheduler
 [[autodoc]] KarrasVeScheduler

 ## KarrasVeOutput
-[[autodoc]] schedulers.scheduling_karras_ve.KarrasVeOutput
+[[autodoc]] schedulers.scheduling_karras_ve.KarrasVeOutput
--- a/docs/source/en/api/schedulers/unipc.md
+++ b/docs/source/en/api/schedulers/unipc.md
@@ -19,19 +19,17 @@ UniPC is by design model-agnostic, supporting pixel-space/latent-space DPMs on u

 The abstract from the paper is:

-*Diffusion probabilistic models (DPMs) have demonstrated a very promising ability in high-resolution image synthesis. However, sampling from a pre-trained DPM usually requires hundreds of model evaluations, which is computationally expensive. Despite recent progress in designing high-order solvers for DPMs, there still exists room for further speedup, especially in extremely few steps (e.g., 5~10 steps). Inspired by the predictor-corrector for ODE solvers, we develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy without extra model evaluations, and derive a unified predictor (UniP) that supports arbitrary order as a byproduct. Combining UniP and UniC, we propose a unified predictor-corrector framework called UniPC for the fast sampling of DPMs, which has a unified analytical form for any order and can significantly improve the sampling quality over previous methods. We evaluate our methods through extensive experiments including both unconditional and conditional sampling using pixel-space and latent-space DPMs. Our UniPC can achieve 3.87 FID on CIFAR10 (unconditional) and 7.51 FID on ImageNet 256times256 (conditional) with only 10 function evaluations. Code is available at https://github.com/wl-zhao/UniPC*.
-
-The original codebase can be found at [wl-zhao/UniPC](https://github.com/wl-zhao/UniPC).
+*Diffusion probabilistic models (DPMs) have demonstrated a very promising ability in high-resolution image synthesis. However, sampling from a pre-trained DPM is time-consuming due to the multiple evaluations of the denoising network, making it more and more important to accelerate the sampling of DPMs. Despite recent progress in designing fast samplers, existing methods still cannot generate satisfying images in many applications where fewer steps (e.g., <10) are favored. In this paper, we develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy without extra model evaluations, and derive a unified predictor (UniP) that supports arbitrary order as a byproduct. Combining UniP and UniC, we propose a unified predictor-corrector framework called UniPC for the fast sampling of DPMs, which has a unified analytical form for any order and can significantly improve the sampling quality over previous methods, especially in extremely few steps. We evaluate our methods through extensive experiments including both unconditional and conditional sampling using pixel-space and latent-space DPMs. Our UniPC can achieve 3.87 FID on CIFAR10 (unconditional) and 7.51 FID on ImageNet 256×256 (conditional) with only 10 function evaluations. Code is available at [this https URL](https://github.com/wl-zhao/UniPC).*

 ## Tips

 It is recommended to set `solver_order` to 2 for guide sampling, and `solver_order=3` for unconditional sampling.

-Dynamic thresholding from Imagen (https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
+Dynamic thresholding from [Imagen](https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
 diffusion models, you can set both `predict_x0=True` and `thresholding=True` to use dynamic thresholding. This thresholding method is unsuitable for latent-space diffusion models such as Stable Diffusion.

 ## UniPCMultistepScheduler
 [[autodoc]] UniPCMultistepScheduler

 ## SchedulerOutput
-[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
+[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
--- a/docs/source/en/api/schedulers/vq_diffusion.md
+++ b/docs/source/en/api/schedulers/vq_diffusion.md
@@ -22,4 +22,4 @@ The abstract from the paper is:
 [[autodoc]] VQDiffusionScheduler

 ## VQDiffusionSchedulerOutput
-[[autodoc]] schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput
+[[autodoc]] schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput
--- a/docs/source/en/api/utilities.md
+++ b/docs/source/en/api/utilities.md
@@ -1,3 +1,15 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Utilities

 Utility and helper functions for working with 🤗 Diffusers.
@@ -24,4 +36,4 @@ Utility and helper functions for working with 🤗 Diffusers.

 ## make_image_grid

-[[autodoc]] utils.pil_utils.make_image_grid
+[[autodoc]] utils.make_image_grid