1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00
Commit Graph

95 Commits

Author SHA1 Message Date
Daniel Gu
54cfa3b370 Use a CLIPVisionModelWithProjection instead of CLIPVisionModel for image_encoder to better match the original UniDiffuser implementation. 2023-05-11 16:19:49 -07:00
Daniel Gu
8a57342d12 Fix code quality via make style. 2023-05-10 23:00:46 -07:00
Daniel Gu
a7f50f4c9c Move the UniDiffusers convert_from_ckpy.py script to diffusers/scripts/convert_unidiffuser_to_diffusers.py 2023-05-10 22:56:36 -07:00
Sanchit Gandhi
abd86d1c17 [AudioLDM] Generalise conversion script (#3328)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-05-06 12:00:42 +01:00
Patrick von Platen
e51f19aee8 add model (#3230)
* add

* clean

* up

* clean up more

* fix more tests

* Improve docs further

* improve

* more fixes docs

* Improve docs more

* Update src/diffusers/models/unet_2d_condition.py

* fix

* up

* update doc links

* make fix-copies

* add safety checker and watermarker to stage 3 doc page code snippets

* speed optimizations docs

* memory optimization docs

* make style

* add watermarking snippets to doc string examples

* make style

* use pt_to_pil helper functions in doc strings

* skip mps tests

* Improve safety

* make style

* new logic

* fix

* fix bad onnx design

* make new stable diffusion upscale pipeline model arguments optional

* define has_nsfw_concept when non-pil output type

* lowercase linked to notebook name

---------

Co-authored-by: William Berman <WLBberman@gmail.com>
2023-04-25 14:20:43 -07:00
Patrick von Platen
49609768b4 make style 2023-03-30 18:26:41 +02:00
Alon Burg
9062b2847d Support fp16 in conversion from original ckpt (#2733)
add --half to convert_original_stable_diffusion_to_diffusers.py
2023-03-30 17:26:18 +01:00
Pedro Cuenca
1d7b4b60b7 Ruff: apply same rules as in transformers (#2827)
* Apply same ruff settings as in transformers

See https://github.com/huggingface/transformers/blob/main/pyproject.toml
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>

* Apply new style rules

* Style

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>

* style

* remove list, ruff wouldn't auto fix.

---------

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2023-03-27 16:18:57 +02:00
Sanchit Gandhi
b94880e536 Add AudioLDM (#2232)
* Add AudioLDM

* up

* add vocoder

* start unet

* unconditional unet

* clap, vocoder and vae

* clean-up: conversion scripts

* fix: conversion script token_type_ids

* clean-up: pipeline docstring

* tests: from SD

* clean-up: cpu offload vocoder instead of safety checker

* feat: adapt tests to audioldm

* feat: add docs

* clean-up: amend pipeline docstrings

* clean-up: make style

* clean-up: make fix-copies

* fix: add doc path to toctree

* clean-up: args for conversion script

* clean-up: paths to checkpoints

* fix: use conditional unet

* clean-up: make style

* fix: type hints for UNet

* clean-up: docstring for UNet

* clean-up: make style

* clean-up: remove duplicate in docstring

* clean-up: make style

* clean-up: make fix-copies

* clean-up: move imports to start in code snippet

* fix: pass cross_attention_dim as a list/tuple to unet

* clean-up: make fix-copies

* fix: update checkpoint path

* fix: unet cross_attention_dim in tests

* film embeddings -> class embeddings

* Apply suggestions from code review

Co-authored-by: Will Berman <wlbberman@gmail.com>

* fix: unet film embed to use existing args

* fix: unet tests to use existing args

* fix: make style

* fix: transformers import and version in init

* clean-up: make style

* Revert "clean-up: make style"

This reverts commit 5d6d1f8b32.

* clean-up: make style

* clean-up: use pipeline tester mixin tests where poss

* clean-up: skip attn slicing test

* fix: add torch dtype to docs

* fix: remove conversion script out of src

* fix: remove .detach from 1d waveform

* fix: reduce default num inf steps

* fix: swap height/width -> audio_length_in_s

* clean-up: make style

* fix: remove nightly tests

* fix: imports in conversion script

* clean-up: slim-down to two slow tests

* clean-up: slim-down fast tests

* fix: batch consistent tests

* clean-up: make style

* clean-up: remove vae slicing fast test

* clean-up: propagate changes to doc

* fix: increase test tol to 1e-2

* clean-up: finish docs

* clean-up: make style

* feat: vocoder / VAE compatibility check

* feat: possibly expand / cut audio waveform

* fix: pipeline call signature test

* fix: slow tests output len

* clean-up: make style

* make style

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: William Berman <WLBberman@gmail.com>
2023-03-23 19:00:21 +01:00
Kashif Rasul
2ef9bdd76f Music Spectrogram diffusion pipeline (#1044)
* initial TokenEncoder and ContinuousEncoder

* initial modules

* added ContinuousContextTransformer

* fix copy paste error

* use numpy for get_sequence_length

* initial terminal relative positional encodings

* fix weights keys

* fix assert

* cross attend style: concat encodings

* make style

* concat once

* fix formatting

* Initial SpectrogramPipeline

* fix input_tokens

* make style

* added mel output

* ignore weights for config

* move mel to numpy

* import pipeline

* fix class names and import

* moved models to models folder

* import ContinuousContextTransformer and SpectrogramDiffusionPipeline

* initial spec diffusion converstion script

* renamed config to t5config

* added weight loading

* use arguments instead of t5config

* broadcast noise time to batch dim

* fix call

* added scale_to_features

* fix weights

* transpose laynorm weight

* scale is a vector

* scale the query outputs

* added comment

* undo scaling

* undo depth_scaling

* inital get_extended_attention_mask

* attention_mask is none in self-attention

* cleanup

* manually invert attention

* nn.linear need bias=False

* added T5LayerFFCond

* remove to fix conflict

* make style and dummy

* remove unsed variables

* remove predict_epsilon

* Move accelerate to a soft-dependency (#1134)

* finish

* finish

* Update src/diffusers/modeling_utils.py

* Update src/diffusers/pipeline_utils.py

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

* more fixes

* fix

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

* fix order

* added initial midi to note token data pipeline

* added int to int tokenizer

* remove duplicate

* added logic for segments

* add melgan to pipeline

* move autoregressive gen into pipeline

* added note_representation_processor_chain

* fix dtypes

* remove immutabledict req

* initial doc

* use np.where

* require note_seq

* fix typo

* update dependency

* added note-seq to test

* added is_note_seq_available

* fix import

* added toc

* added example usage

* undo for now

* moved docs

* fix merge

* fix imports

* predict first segment

* avoid un-needed copy to and from cpu

* make style

* Copyright

* fix style

* add test and fix inference steps

* remove bogus files

* reorder models

* up

* remove transformers dependency

* make work with diffusers cross attention

* clean more

* remove @

* improve further

* up

* uP

* Apply suggestions from code review

* Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py

* loop over all tokens

* make style

* Added a section on the model

* fix formatting

* grammer

* formatting

* make fix-copies

* Update src/diffusers/pipelines/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/spectrogram_diffusion/pipeline_spectrogram_diffusion.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* added callback ad optional ionnx

* do not squeeze batch dim

* clean up more

* upload

* convert jax to nnumpy

* make style

* fix warning

* make fix-copies

* fix warning

* add initial fast tests

* add initial pipeline_params

* eval mode due to dropout

* skip batch tests as pipeline runs on a single file

* make style

* fix relative path

* fix doc tests

* Update src/diffusers/models/t5_film_transformer.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/t5_film_transformer.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/api/pipelines/spectrogram_diffusion.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* add MidiProcessor

* format

* fix org

* Apply suggestions from code review

* Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py

* make style

* pin protobuf to <4

* fix formatting

* white space

* tensorboard needs protobuf

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
2023-03-23 14:06:17 +01:00
Naoki Ainoya
14e3a28c12 Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' (#2732)
The 'CLIPFeatureExtractor' class name has been renamed to 'CLIPImageProcessor' in order to comply with future deprecation. This commit includes the necessary changes to the affected files.
2023-03-23 13:49:22 +01:00
Patrick von Platen
ca1a22296d [MS Text To Video] Add first text to video (#2738)
* [MS Text To Video} Add first text to video

* upload

* make first model example

* match unet3d params

* make sure weights are correcctly converted

* improve

* forward pass works, but diff result

* make forward work

* fix more

* finish

* refactor video output class.

* feat: add support for a video export utility.

* fix: opencv availability check.

* run make fix-copies.

* add: docs for the model components.

* add: standalone pipeline doc.

* edit docstring of the pipeline.

* add: right path to TransformerTempModel

* add: first set of tests.

* complete fast tests for text to video.

* fix bug

* up

* three fast tests failing.

* add: note on slow tests

* make work with all schedulers

* apply styling.

* add slow tests

* change file name

* update

* more correction

* more fixes

* finish

* up

* Apply suggestions from code review

* up

* finish

* make copies

* fix pipeline tests

* fix more tests

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* apply suggestions

* up

* revert

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-03-22 18:39:33 +01:00
Will Berman
a28acb5dcc controlnet sd 2.1 checkpoint conversions (#2593)
* controlnet sd 2.1 checkpoint conversions

* remove global_step -> make config file mandatory
2023-03-10 08:22:02 -08:00
Patrick von Platen
d761b58bfc [From pretrained] Speed-up loading from cache (#2515)
* [From pretrained] Speed-up loading from cache

* up

* Fix more

* fix one more bug

* make style

* bigger refactor

* factor out function

* Improve more

* better

* deprecate return cache folder

* clean up

* improve tests

* up

* upload

* add nice tests

* simplify

* finish

* correct

* fix version

* rename

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* rename

* correct doc string

* correct more

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* apply code suggestions

* finish

---------

Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-03-10 11:56:10 +01:00
Patrick von Platen
1598a57958 make style 2023-03-06 10:51:03 +00:00
Haofan Wang
63805f8af7 Support convert LoRA safetensors into diffusers format (#2403)
* add lora convertor

* Update convert_lora_safetensor_to_diffusers.py

* Update README.md

* Update convert_lora_safetensor_to_diffusers.py
2023-03-06 11:50:46 +01:00
Patrick von Platen
f38e3626cd make style 2023-03-06 10:40:18 +00:00
ForserX
5f826a35fb Add custom vae (diffusers type) to onnx converter (#2325) 2023-03-06 11:39:55 +01:00
Takuma Mori
8dfff7c015 Add a ControlNet model & pipeline (#2407)
* add scaffold
- copied convert_controlnet_to_diffusers.py from
convert_original_stable_diffusion_to_diffusers.py

* Add support to load ControlNet (WIP)
- this makes Missking Key error on ControlNetModel

* Update to convert ControlNet without error msg
- init impl for StableDiffusionControlNetPipeline
- init impl for ControlNetModel

* cleanup of commented out

* split create_controlnet_diffusers_config()
from create_unet_diffusers_config()

- add config: hint_channels

* Add input_hint_block, input_zero_conv and
middle_block_out
- this makes missing key error on loading model

* add unet_2d_blocks_controlnet.py
- copied from unet_2d_blocks.py as impl CrossAttnDownBlock2D,DownBlock2D
- this makes missing key error on loading model

* Add loading for input_hint_block, zero_convs
and middle_block_out

- this makes no error message on model loading

* Copy from UNet2DConditionalModel except __init__

* Add ultra primitive test for ControlNetModel
inference

* Support ControlNetModel inference
- without exceptions

* copy forward() from UNet2DConditionModel

* Impl ControlledUNet2DConditionModel inference
- test_controlled_unet_inference passed

* Frozen weight & biases for training

* Minimized version of ControlNet/ControlledUnet
- test_modules_controllnet.py passed

* make style

* Add support model loading for minimized ver

* Remove all previous version files

* from_pretrained and inference test passed

* copied from pipeline_stable_diffusion.py
except `__init__()`

* Impl pipeline, pixel match test (almost) passed.

* make style

* make fix-copies

* Fix to add import ControlNet blocks
for `make fix-copies`

* Remove einops dependency

* Support  np.ndarray, PIL.Image for controlnet_hint

* set default config file as lllyasviel's

* Add support grayscale (hw) numpy array

* Add and update docstrings

* add control_net.mdx

* add control_net.mdx to toctree

* Update copyright year

* Fix to add PIL.Image RGB->BGR conversion
- thanks @Mystfit

* make fix-copies

* add basic fast test for controlnet

* add slow test for controlnet/unet

* Ignore down/up_block len check on ControlNet

* add a copy from test_stable_diffusion.py

* Accept controlnet_hint is None

* merge pipeline_stable_diffusion.py diff

* Update class name to SDControlNetPipeline

* make style

* Baseline fast test almost passed (w long desc)

* still needs investigate.

Following didn't passed descriped in TODO comment:
- test_stable_diffusion_long_prompt
- test_stable_diffusion_no_safety_checker

Following didn't passed same as stable_diffusion_pipeline:
- test_attention_slicing_forward_pass
- test_inference_batch_single_identical
- test_xformers_attention_forwardGenerator_pass
these seems come from calc accuracy.

* Add note comment related vae_scale_factor

* add test_stable_diffusion_controlnet_ddim

* add assertion for vae_scale_factor != 8

* slow test of pipeline almost passed
Failed: test_stable_diffusion_pipeline_with_model_offloading
- ImportError: `enable_model_offload` requires `accelerate v0.17.0` or higher

but currently latest version == 0.16.0

* test_stable_diffusion_long_prompt passed

* test_stable_diffusion_no_safety_checker passed

- due to its model size, move to slow test

* remove PoC test files

* fix num_of_image, prompt length issue add add test

* add support List[PIL.Image] for controlnet_hint

* wip

* all slow test passed

* make style

* update for slow test

* RGB(PIL)->BGR(ctrlnet) conversion

* fixes

* remove manual num_images_per_prompt test

* add document

* add `image` argument docstring

* make style

* Add line to correct conversion

* add controlnet_conditioning_scale (aka control_scales
strength)

* rgb channel ordering by default

* image batching logic

* Add control image descriptions for each checkpoint

* Only save controlnet model in conversion script

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

typo

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* add gerated image example

* a depth mask -> a depth map

* rename control_net.mdx to controlnet.mdx

* fix toc title

* add ControlNet abstruct and link

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: dqueue <dbyqin@gmail.com>

* remove controlnet constructor arguments re: @patrickvonplaten

* [integration tests] test canny

* test_canny fixes

* [integration tests] test_depth

* [integration tests] test_hed

* [integration tests] test_mlsd

* add channel order config to controlnet

* [integration tests] test normal

* [integration tests] test_openpose test_scribble

* change height and width to default to conditioning image

* [integration tests] test seg

* style

* test_depth fix

* [integration tests] size fixes

* [integration tests] cpu offloading

* style

* generalize controlnet embedding

* fix conversion script

* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Style adapted to the documentation of pix2pix

* merge main by hand

* style

* [docs] controlling generation doc nits

* correct some things

* add: controlnetmodel to autodoc.

* finish docs

* finish

* finish 2

* correct images

* finish controlnet

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* uP

* upload model

* up

* up

---------

Co-authored-by: William Berman <WLBberman@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: dqueue <dbyqin@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-03-02 15:34:07 +01:00
Patrick von Platen
eadf0e2555 [Copyright] 2023 (#2524) 2023-03-01 10:31:00 +01:00
Will Berman
62b3c9e06a unCLIP variant (#2297)
* pipeline_variant

* Add docs for when clip_stats_path is specified

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* prepare_latents # Copied from re: @patrickvonplaten

* NoiseAugmentor->ImageNormalizer

* stable_unclip_prior default to None re: @patrickvonplaten

* prepare_prior_extra_step_kwargs

* prior denoising scale model input

* {DDIM,DDPM}Scheduler -> KarrasDiffusionSchedulers re: @patrickvonplaten

* docs

* Update docs/source/en/api/pipelines/stable_unclip.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-14 11:28:57 -08:00
Will Berman
fd5c3c09af misc fixes (#2282)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-08 09:02:42 -08:00
Patrick von Platen
a7ca03aa85 Replace flake8 with ruff and update black (#2279)
* before running make style

* remove left overs from flake8

* finish

* make fix-copies

* final fix

* more fixes
2023-02-07 23:46:23 +01:00
Patrick von Platen
0f04e799dc fix vae pt script 2023-02-07 08:34:19 +00:00
YiYi Xu
1051ca81a6 Stable Diffusion Latent Upscaler (#2059)
* Modify UNet2DConditionModel

- allow skipping mid_block

- adding a norm_group_size argument so that we can set the `num_groups` for group norm using `num_channels//norm_group_size`

- allow user to set dimension for the timestep embedding (`time_embed_dim`)

- the kernel_size for `conv_in` and `conv_out` is now configurable

- add random fourier feature layer (`GaussianFourierProjection`) for `time_proj`

- allow user to add the time and class embeddings before passing through the projection layer together - `time_embedding(t_emb + class_label))`

- added 2 arguments `attn1_types` and `attn2_types`

  * currently we have argument `only_cross_attention`: when it's set to `True`, we will have a to the
`BasicTransformerBlock` block with 2 cross-attention , otherwise we
get a self-attention followed by a cross-attention; in k-upscaler, we need to have blocks that include just one cross-attention, or self-attention -> cross-attention;
so I added `attn1_types` and `attn2_types` to the unet's argument list to allow user specify the attention types for the 2 positions in each block;  note that I stil kept
the `only_cross_attention` argument for unet for easy configuration, but it will be converted to `attn1_type` and `attn2_type` when passing down to the down blocks

- the position of downsample layer and upsample layer is now configurable

- in k-upscaler unet, there is only one skip connection per each up/down block (instead of each layer in stable diffusion unet), added `skip_freq = "block"` to support
this use case

- if user passes attention_mask to unet, it will prepare the mask and pass a flag to cross attention processer to skip the `prepare_attention_mask` step
inside cross attention block

add up/down blocks for k-upscaler

modify CrossAttention class

- make the `dropout` layer in `to_out` optional

- `use_conv_proj` - use conv instead of linear for all projection layers (i.e. `to_q`, `to_k`, `to_v`, `to_out`) whenever possible. note that when it's used to do cross
attention, to_k, to_v has to be linear because the `encoder_hidden_states` is not 2d

- `cross_attention_norm` - add an optional layernorm on encoder_hidden_states

- `attention_dropout`: add an optional dropout on attention score

adapt BasicTransformerBlock

- add an ada groupnorm layer  to conditioning attention input with timestep embedding

- allow skipping the FeedForward layer in between the attentions

- replaced the only_cross_attention argument with attn1_type and attn2_type for more flexible configuration

update timestep embedding: add new act_fn  gelu and an optional act_2

modified ResnetBlock2D

- refactored with AdaGroupNorm class (the timestep scale shift normalization)

- add `mid_channel` argument - allow the first conv to have a different output dimension from the second conv

- add option to use input AdaGroupNorm on the input instead of groupnorm

- add options to add a dropout layer after each conv

- allow user to set the bias in conv_shortcut (needed for k-upscaler)

- add gelu

adding conversion script for k-upscaler unet

add pipeline

* fix attention mask

* fix a typo

* fix a bug

* make sure model can be used with GPU

* make pipeline work with fp16

* fix an error in BasicTransfomerBlock

* make style

* fix typo

* some more fixes

* uP

* up

* correct more

* some clean-up

* clean time proj

* up

* uP

* more changes

* remove the upcast_attention=True from unet config

* remove attn1_types, attn2_types etc

* fix

* revert incorrect changes up/down samplers

* make style

* remove outdated files

* Apply suggestions from code review

* attention refactor

* refactor cross attention

* Apply suggestions from code review

* update

* up

* update

* Apply suggestions from code review

* finish

* Update src/diffusers/models/cross_attention.py

* more fixes

* up

* up

* up

* finish

* more corrections of conversion state

* act_2 -> act_2_fn

* remove dropout_after_conv from ResnetBlock2D

* make style

* simplify KAttentionBlock

* add fast test for latent upscaler pipeline

* add slow test

* slow test fp16

* make style

* add doc string for pipeline_stable_diffusion_latent_upscale

* add api doc page for latent upscaler pipeline

* deprecate attention mask

* clean up embeddings

* simplify resnet

* up

* clean up resnet

* up

* correct more

* up

* up

* improve a bit more

* correct more

* more clean-ups

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* add docstrings for new unet config

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* # Copied from

* encode the image if not latent

* remove force casting vae to fp32

* fix

* add comments about preconditioning parameters from k-diffusion paper

* attn1_type, attn2_type -> add_self_attention

* clean up get_down_block and get_up_block

* fix

* fixed a typo(?) in ada group norm

* update slice attention processer for cross attention

* update slice

* fix fast test

* update the checkpoint

* finish tests

* fix-copies

* fix-copy for modeling_text_unet.py

* make style

* make style

* fix f-string

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix import

* correct changes

* fix resnet

* make fix-copies

* correct euler scheduler

* add missing #copied from for preprocess

* revert

* fix

* fix copies

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/models/cross_attention.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* clean up conversion script

* KDownsample2d,KUpsample2d -> KDownsample2D,KUpsample2D

* more

* Update src/diffusers/models/unet_2d_condition.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* remove prepare_extra_step_kwargs

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix a typo in timestep embedding

* remove num_image_per_prompt

* fix fasttest

* make style + fix-copies

* fix

* fix xformer test

* fix style

* doc string

* make style

* fix-copies

* docstring for time_embedding_norm

* make style

* final finishes

* make fix-copies

* fix tests

---------

Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-07 09:11:57 +01:00
chavinlo
717a956a02 Create convert_vae_pt_to_diffusers.py (#2215)
* Create convert_vae_pt_to_diffusers.py

Just a simple script to convert VAE.pt files to diffusers format
Tested with: https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/VAEs/orangemix.vae.pt

* Update convert_vae_pt_to_diffusers.py

Forgot to add the function call

* make style

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: chavinlo <example@example.com>
2023-02-07 09:10:34 +01:00
cafe+ai — かふぇあい
d75ad93ca7 Safetensors loading in "convert_diffusers_to_original_stable_diffusion" (#2054)
* Safetensors loading in "convert_diffusers_to_original_stable_diffusion"

Adds diffusers format saftetensors loading support

* Fix import sort order: convert_diffusers_to_original_stable_diffusion.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-01-23 09:44:55 +01:00
Damian Stewart
3d2f24b099 Module-ise "original stable diffusion to diffusers" conversion script (#2019)
* convert __main__ to a function call and call it

* add missing type hint

* make style check pass

* move loading to src/diffusers

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-01-20 17:30:44 +01:00
Kashif Rasul
37d113cce7 DiT Pipeline (#1806)
* added dit model

* import

* initial pipeline

* initial convert script

* initial pipeline

* make style

* raise valueerror

* single function

* rename classes

* use DDIMScheduler

* timesteps embedder

* samples to cpu

* fix var names

* fix numpy type

* use timesteps class for proj

* fix typo

* fix arg name

* flip_sin_to_cos and better var names

* fix C shape cal

* make style

* remove unused imports

* cleanup

* add back patch_size

* initial dit doc

* typo

* Update docs/source/api/pipelines/dit.mdx

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* added copyright license headers

* added example usage and toc

* fix variable names asserts

* remove comment

* added docs

* fix typo

* upstream changes

* set proper device for drop_ids

* added initial dit pipeline test

* update docs

* fix imports

* make fix-copies

* isort

* fix imports

* get rid of more magic numbers

* fix code when guidance is off

* remove block_kwargs

* cleanup script

* removed to_2tuple

* use FeedForward class instead of another MLP

* style

* work on mergint DiTBlock with BasicTransformerBlock

* added missing final_dropout and args to BasicTransformerBlock

* use norm from block

* fix arg

* remove unused arg

* fix call to class_embedder

* use timesteps

* make style

* attn_output gets multiplied

* removed commented code

* use Transformer2D

* use self.is_input_patches

* fix flags

* fixed conversion to use Transformer2DModel

* fixes for pipeline

* remove dit.py

* fix timesteps device

* use randn_tensor and fix fp16 inf.

* timesteps_emb already the right dtype

* fix dit test class

* fix test and style

* fix norm2 usage in vq-diffusion

* added author names to pipeline and lmagenet labels link

* fix tests

* use norm_type as string

* rename dit to transformer

* fix name

* fix test

* set  norm_type = "layer" by default

* fix tests

* do not skip common tests

* Update src/diffusers/models/attention.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* revert AdaLayerNorm API

* fix norm_type name

* make sure all components are in eval mode

* revert norm2 API

* compact

* finish deprecation

* add slow tests

* remove @

* refactor some stuff

* upload

* Update src/diffusers/pipelines/dit/pipeline_dit.py

* finish more

* finish docs

* improve docs

* finish docs

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: William Berman <WLBberman@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-01-17 23:09:29 +01:00
Patrick von Platen
8a3f0c1f71 [Conversion] Improve safetensors (#1989) 2023-01-16 14:26:56 +01:00
蓝色的秋风
651c5adf8a [Conversion] Support convert diffusers to safetensors (#1996)
fix: support diffusers to safetensors
2023-01-16 12:58:01 +01:00
Katsuya
9147c4c954 Fix unused upcast_attn flag in convert_original_stable_diffusion_to_diffusers script (#1942)
Fix unused upcast_attn flag in sd to diffusers script
2023-01-12 19:55:40 +01:00
Patrick von Platen
beb932c5d1 [Conversion SD] Make sure weirdly sorted keys work as well (#1959) 2023-01-10 01:23:14 +01:00
Patrick von Platen
409387889d [Conversion] Make sure ema weights are extracted correctly (#1937)
* [Conversion] Make sure ema weights are extracted correctly

* up

* finish
2023-01-06 07:08:39 +01:00
Patrick von Platen
d67c305120 allow conversion from no state dict checkpoints 2023-01-03 19:48:13 +00:00
Patrick von Platen
8ed08e4270 [Deterministic torch randn] Allow tensors to be generated on CPU (#1902)
* [Deterministic torch randn] Allow tensors to be generated on CPU

* fix more

* up

* fix more

* up

* Update src/diffusers/utils/torch_utils.py

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

* Apply suggestions from code review

* up

* up

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-01-03 18:22:40 +01:00
Patrick von Platen
29b2c93c90 Make repo structure consistent (#1862)
* move files a bit

* more refactors

* fix more

* more fixes

* fix more onnx

* make style

* upload

* fix

* up

* fix more

* up again

* up

* small fix

* Update src/diffusers/__init__.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* correct

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2022-12-30 11:51:08 +01:00
Will Berman
53c8147afe unCLIP image variation (#1781)
* unCLIP image variation

* remove prior comment re: @pcuenca

* stable diffusion -> unCLIP re: @pcuenca

* add copy froms re: @patil-suraj
2022-12-28 14:17:09 +01:00
camenduru
1f1b6c6544 Device to use (e.g. cpu, cuda:0, cuda:1, etc.) (#1844)
* Device to use (e.g. cpu, cuda:0, cuda:1, etc.)

* "cuda" if torch.cuda.is_available() else "cpu"
2022-12-27 14:42:56 +01:00
Mikołaj Siedlarek
8890758823 Correct help text for scheduler_type flag in scripts. (#1749) 2022-12-19 11:27:23 +01:00
Will Berman
2dcf64b72a kakaobrain unCLIP (#1428)
* [wip] attention block updates

* [wip] unCLIP unet decoder and super res

* [wip] unCLIP prior transformer

* [wip] scheduler changes

* [wip] text proj utility class

* [wip] UnCLIPPipeline

* [wip] kakaobrain unCLIP convert script

* [unCLIP pipeline] fixes re: @patrickvonplaten

remove callbacks

move denoising loops into call function

* UNCLIPScheduler re: @patrickvonplaten

Revert changes to DDPMScheduler. Make UNCLIPScheduler, a modified
DDPM scheduler with changes to support karlo

* mask -> attention_mask re: @patrickvonplaten

* [DDPMScheduler] remove leftover change

* [docs] PriorTransformer

* [docs] UNet2DConditionModel and UNet2DModel

* [nit] UNCLIPScheduler -> UnCLIPScheduler

matches existing unclip naming better

* [docs] SchedulingUnCLIP

* [docs] UnCLIPTextProjModel

* refactor

* finish licenses

* rename all to attention_mask and prep in models

* more renaming

* don't expose unused configs

* final renaming fixes

* remove x attn mask when not necessary

* configure kakao script to use new class embedding config

* fix copies

* [tests] UnCLIPScheduler

* finish x attn

* finish

* remove more

* rename condition blocks

* clean more

* Apply suggestions from code review

* up

* fix

* [tests] UnCLIPPipelineFastTests

* remove unused imports

* [tests] UnCLIPPipelineIntegrationTests

* correct

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-12-18 15:15:30 -08:00
apolinario
b417042291 Fix wrong type checking in convert_diffusers_to_original_stable_diffusion.py (#1681)
* Fix type checking remainders

* Remove IS_V20_MODEL flag always being True

Co-authored-by: apolinario <joaopaulo.passos+multimodal@gmail.com>
2022-12-13 12:44:20 +01:00
Patrick von Platen
3ce6380d3a [SD] Make sure scheduler is correct when converting (#1667) 2022-12-12 16:57:48 +01:00
Cyberes
d2dc4de303 Handle missing global_step key in scripts/convert_original_stable_diffusion_to_diffusers.py (#1612)
handle missing global_step key and don't download config if it already exists
2022-12-12 16:10:52 +01:00
lawfordp2017
31444f5790 Add text encoder conversion (#1559)
* Initial code for attempt at improving SD <--> diffusers conversions for v2.0

* Updates to support round-trip between orig. SD 2.0 and diffusers models

* Corrected formatting to Black standard

* Correcting import formatting

* Fixed imports (properly this time)

* add some corrections

* remove inference files

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-12-12 10:07:42 +01:00
Patrick von Platen
896c98a2ae Add paint by example (#1533)
* add paint by example

* mkae loading possibel

* up

* Update src/diffusers/models/attention.py

* up

* finalize weight structure

* make example work

* make it work

* up

* up

* fix

* del

* add

* update

* Apply suggestions from code review

* correct transformer 2d

* finish

* up

* up

* up

* up

* fix

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Apply suggestions from code review

* up

* finish

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2022-12-07 11:06:30 +01:00
Patrick von Platen
922d56a19c Correct type from int to str in conversion script sd 2022-12-05 18:51:29 +00:00
Patrick von Platen
f21415d1d9 Update conversion script to correctly handle SD 2 (#1511)
* Conversion SD 2

* finish
2022-12-02 12:28:01 +01:00
Anton Lozhkov
e65b71aba4 Add an explicit --image_size to the conversion script (#1509)
* Add an explicit `--image_size` to the conversion script

* style
2022-12-01 19:22:48 +01:00
Anton Lozhkov
86aa747da9 Fix ONNX conversion and inference (#1416) 2022-11-25 14:51:17 +01:00