mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
* Changes for VQ-diffusion VQVAE Add specify dimension of embeddings to VQModel: `VQModel` will by default set the dimension of embeddings to the number of latent channels. The VQ-diffusion VQVAE has a smaller embedding dimension, 128, than number of latent channels, 256. Add AttnDownEncoderBlock2D and AttnUpDecoderBlock2D to the up and down unet block helpers. VQ-diffusion's VQVAE uses those two block types. * Changes for VQ-diffusion transformer Modify attention.py so SpatialTransformer can be used for VQ-diffusion's transformer. SpatialTransformer: - Can now operate over discrete inputs (classes of vector embeddings) as well as continuous. - `in_channels` was made optional in the constructor so two locations where it was passed as a positional arg were moved to kwargs - modified forward pass to take optional timestep embeddings ImagePositionalEmbeddings: - added to provide positional embeddings to discrete inputs for latent pixels BasicTransformerBlock: - norm layers were made configurable so that the VQ-diffusion could use AdaLayerNorm with timestep embeddings - modified forward pass to take optional timestep embeddings CrossAttention: - now may optionally take a bias parameter for its query, key, and value linear layers FeedForward: - Internal layers are now configurable ApproximateGELU: - Activation function in VQ-diffusion's feedforward layer AdaLayerNorm: - Norm layer modified to incorporate timestep embeddings * Add VQ-diffusion scheduler * Add VQ-diffusion pipeline * Add VQ-diffusion convert script to diffusers * Add VQ-diffusion dummy objects * Add VQ-diffusion markdown docs * Add VQ-diffusion tests * some renaming * some fixes * more renaming * correct * fix typo * correct weights * finalize * fix tests * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * finish * finish * up Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
146 lines
6.2 KiB
Plaintext
146 lines
6.2 KiB
Plaintext
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# Schedulers
|
|
|
|
Diffusers contains multiple pre-built schedule functions for the diffusion process.
|
|
|
|
## What is a scheduler?
|
|
|
|
The schedule functions, denoted *Schedulers* in the library take in the output of a trained model, a sample which the diffusion process is iterating on, and a timestep to return a denoised sample.
|
|
|
|
- Schedulers define the methodology for iteratively adding noise to an image or for updating a sample based on model outputs.
|
|
- adding noise in different manners represent the algorithmic processes to train a diffusion model by adding noise to images.
|
|
- for inference, the scheduler defines how to update a sample based on an output from a pretrained model.
|
|
- Schedulers are often defined by a *noise schedule* and an *update rule* to solve the differential equation solution.
|
|
|
|
### Discrete versus continuous schedulers
|
|
|
|
All schedulers take in a timestep to predict the updated version of the sample being diffused.
|
|
The timesteps dictate where in the diffusion process the step is, where data is generated by iterating forward in time and inference is executed by propagating backwards through timesteps.
|
|
Different algorithms use timesteps that both discrete (accepting `int` inputs), such as the [`DDPMScheduler`] or [`PNDMScheduler`], and continuous (accepting `float` inputs), such as the score-based schedulers [`ScoreSdeVeScheduler`] or [`ScoreSdeVpScheduler`].
|
|
|
|
## Designing Re-usable schedulers
|
|
|
|
The core design principle between the schedule functions is to be model, system, and framework independent.
|
|
This allows for rapid experimentation and cleaner abstractions in the code, where the model prediction is separated from the sample update.
|
|
To this end, the design of schedulers is such that:
|
|
|
|
- Schedulers can be used interchangeably between diffusion models in inference to find the preferred trade-off between speed and generation quality.
|
|
- Schedulers are currently by default in PyTorch, but are designed to be framework independent (partial Jax support currently exists).
|
|
|
|
|
|
## API
|
|
|
|
The core API for any new scheduler must follow a limited structure.
|
|
- Schedulers should provide one or more `def step(...)` functions that should be called to update the generated sample iteratively.
|
|
- Schedulers should provide a `set_timesteps(...)` method that configures the parameters of a schedule function for a specific inference task.
|
|
- Schedulers should be framework-specific.
|
|
|
|
The base class [`SchedulerMixin`] implements low level utilities used by multiple schedulers.
|
|
|
|
### SchedulerMixin
|
|
[[autodoc]] SchedulerMixin
|
|
|
|
### SchedulerOutput
|
|
The class [`SchedulerOutput`] contains the outputs from any schedulers `step(...)` call.
|
|
|
|
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
|
|
|
|
### Implemented Schedulers
|
|
|
|
#### Denoising diffusion implicit models (DDIM)
|
|
|
|
Original paper can be found here.
|
|
|
|
[[autodoc]] DDIMScheduler
|
|
|
|
#### Denoising diffusion probabilistic models (DDPM)
|
|
|
|
Original paper can be found [here](https://arxiv.org/abs/2010.02502).
|
|
|
|
[[autodoc]] DDPMScheduler
|
|
|
|
#### Variance exploding, stochastic sampling from Karras et. al
|
|
|
|
Original paper can be found [here](https://arxiv.org/abs/2006.11239).
|
|
|
|
[[autodoc]] KarrasVeScheduler
|
|
|
|
#### Linear multistep scheduler for discrete beta schedules
|
|
|
|
Original implementation can be found [here](https://arxiv.org/abs/2206.00364).
|
|
|
|
|
|
[[autodoc]] LMSDiscreteScheduler
|
|
|
|
#### Pseudo numerical methods for diffusion models (PNDM)
|
|
|
|
Original implementation can be found [here](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L181).
|
|
|
|
[[autodoc]] PNDMScheduler
|
|
|
|
#### variance exploding stochastic differential equation (VE-SDE) scheduler
|
|
|
|
Original paper can be found [here](https://arxiv.org/abs/2011.13456).
|
|
|
|
[[autodoc]] ScoreSdeVeScheduler
|
|
|
|
#### improved pseudo numerical methods for diffusion models (iPNDM)
|
|
|
|
Original implementation can be found [here](https://github.com/crowsonkb/v-diffusion-pytorch/blob/987f8985e38208345c1959b0ea767a625831cc9b/diffusion/sampling.py#L296).
|
|
|
|
[[autodoc]] IPNDMScheduler
|
|
|
|
#### variance preserving stochastic differential equation (VP-SDE) scheduler
|
|
|
|
Original paper can be found [here](https://arxiv.org/abs/2011.13456).
|
|
|
|
<Tip warning={true}>
|
|
|
|
Score SDE-VP is under construction.
|
|
|
|
</Tip>
|
|
|
|
[[autodoc]] schedulers.scheduling_sde_vp.ScoreSdeVpScheduler
|
|
|
|
#### Euler scheduler
|
|
|
|
Euler scheduler (Algorithm 2) from the paper [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364) by Karras et al. (2022). Based on the original [k-diffusion](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L51) implementation by Katherine Crowson.
|
|
Fast scheduler which often times generates good outputs with 20-30 steps.
|
|
|
|
[[autodoc]] EulerDiscreteScheduler
|
|
|
|
|
|
#### Euler Ancestral scheduler
|
|
|
|
Ancestral sampling with Euler method steps. Based on the original (k-diffusion)[https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L72] implementation by Katherine Crowson.
|
|
Fast scheduler which often times generates good outputs with 20-30 steps.
|
|
|
|
[[autodoc]] EulerAncestralDiscreteScheduler
|
|
|
|
|
|
#### VQDiffusionScheduler
|
|
|
|
Original paper can be found [here](https://arxiv.org/abs/2111.14822)
|
|
|
|
[[autodoc]] VQDiffusionScheduler
|
|
|
|
#### RePaint scheduler
|
|
|
|
DDPM-based inpainting scheduler for unsupervised inpainting with extreme masks.
|
|
Intended for use with [`RePaintPipeline`].
|
|
Based on the paper [RePaint: Inpainting using Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2201.09865)
|
|
and the original implementation by Andreas Lugmayr et al.: https://github.com/andreas128/RePaint
|
|
|
|
[[autodoc]] RePaintScheduler
|