mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
* Changes for VQ-diffusion VQVAE Add specify dimension of embeddings to VQModel: `VQModel` will by default set the dimension of embeddings to the number of latent channels. The VQ-diffusion VQVAE has a smaller embedding dimension, 128, than number of latent channels, 256. Add AttnDownEncoderBlock2D and AttnUpDecoderBlock2D to the up and down unet block helpers. VQ-diffusion's VQVAE uses those two block types. * Changes for VQ-diffusion transformer Modify attention.py so SpatialTransformer can be used for VQ-diffusion's transformer. SpatialTransformer: - Can now operate over discrete inputs (classes of vector embeddings) as well as continuous. - `in_channels` was made optional in the constructor so two locations where it was passed as a positional arg were moved to kwargs - modified forward pass to take optional timestep embeddings ImagePositionalEmbeddings: - added to provide positional embeddings to discrete inputs for latent pixels BasicTransformerBlock: - norm layers were made configurable so that the VQ-diffusion could use AdaLayerNorm with timestep embeddings - modified forward pass to take optional timestep embeddings CrossAttention: - now may optionally take a bias parameter for its query, key, and value linear layers FeedForward: - Internal layers are now configurable ApproximateGELU: - Activation function in VQ-diffusion's feedforward layer AdaLayerNorm: - Norm layer modified to incorporate timestep embeddings * Add VQ-diffusion scheduler * Add VQ-diffusion pipeline * Add VQ-diffusion convert script to diffusers * Add VQ-diffusion dummy objects * Add VQ-diffusion markdown docs * Add VQ-diffusion tests * some renaming * some fixes * more renaming * correct * fix typo * correct weights * finalize * fix tests * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * finish * finish * up Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
75 lines
2.1 KiB
Plaintext
75 lines
2.1 KiB
Plaintext
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# Models
|
|
|
|
Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
|
|
The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
|
|
The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
|
|
|
|
## ModelMixin
|
|
[[autodoc]] ModelMixin
|
|
|
|
## UNet2DOutput
|
|
[[autodoc]] models.unet_2d.UNet2DOutput
|
|
|
|
## UNet1DModel
|
|
[[autodoc]] UNet1DModel
|
|
|
|
## UNet2DModel
|
|
[[autodoc]] UNet2DModel
|
|
|
|
## UNet2DConditionOutput
|
|
[[autodoc]] models.unet_2d_condition.UNet2DConditionOutput
|
|
|
|
## UNet2DConditionModel
|
|
[[autodoc]] UNet2DConditionModel
|
|
|
|
## DecoderOutput
|
|
[[autodoc]] models.vae.DecoderOutput
|
|
|
|
## VQEncoderOutput
|
|
[[autodoc]] models.vae.VQEncoderOutput
|
|
|
|
## VQModel
|
|
[[autodoc]] VQModel
|
|
|
|
## AutoencoderKLOutput
|
|
[[autodoc]] models.vae.AutoencoderKLOutput
|
|
|
|
## AutoencoderKL
|
|
[[autodoc]] AutoencoderKL
|
|
|
|
## Transformer2DModel
|
|
[[autodoc]] Transformer2DModel
|
|
|
|
## Transformer2DModelOutput
|
|
[[autodoc]] models.attention.Transformer2DModelOutput
|
|
|
|
## FlaxModelMixin
|
|
[[autodoc]] FlaxModelMixin
|
|
|
|
## FlaxUNet2DConditionOutput
|
|
[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionOutput
|
|
|
|
## FlaxUNet2DConditionModel
|
|
[[autodoc]] FlaxUNet2DConditionModel
|
|
|
|
## FlaxDecoderOutput
|
|
[[autodoc]] models.vae_flax.FlaxDecoderOutput
|
|
|
|
## FlaxAutoencoderKLOutput
|
|
[[autodoc]] models.vae_flax.FlaxAutoencoderKLOutput
|
|
|
|
## FlaxAutoencoderKL
|
|
[[autodoc]] FlaxAutoencoderKL
|