1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00
Commit Graph

1769 Commits

Author SHA1 Message Date
Will Berman
fd5c3c09af misc fixes (#2282)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-08 09:02:42 -08:00
Patrick von Platen
648090e26e fix pix2pix docs (#2290) 2023-02-08 16:38:18 +01:00
Patrick von Platen
1ed6b77781 [Examples] Test all examples on CPU (#2289)
* [Examples] Test all examples on CPU

* add

* correct

* Apply suggestions from code review
2023-02-08 15:59:13 +01:00
Chenguo Lin
9d0d070996 EMA: fix state_dict() and load_state_dict() & add cur_decay_value (#2146)
* EMA: fix `state_dict()` & add `cur_decay_value`

* EMA: fix a bug in `load_state_dict()`

'float' object (`state_dict["power"]`) has no attribute 'get'.

* del train_unconditional_ort.py
2023-02-08 10:44:50 +01:00
Isamu Isozaki
c1971a53bc Textual inv save log memory (#2184)
* Quality check and adding tokenizer

* Adapted stable diffusion to mixed precision+finished up style fixes

* Fixed based on patrick's review

* Fixed oom from number of validation images

* Removed unnecessary np.array conversion

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-08 10:37:10 +01:00
Patrick von Platen
41db2dbf90 correct tests 2023-02-08 11:12:51 +02:00
Patrick von Platen
a7ca03aa85 Replace flake8 with ruff and update black (#2279)
* before running make style

* remove left overs from flake8

* finish

* make fix-copies

* final fix

* more fixes
2023-02-07 23:46:23 +01:00
Patrick von Platen
f5ccffecf7 Use accelerate save & loading hooks to have better checkpoint structure (#2048)
* better accelerated saving

* up

* finish

* finish

* uP

* up

* up

* fix

* Apply suggestions from code review

* correct ema

* Remove @

* up

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/training/dreambooth.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-07 20:03:59 +01:00
Pedro Cuenca
e619db24be mps cross-attention hack: don't crash on fp16 (#2258)
* mps cross-attention hack: don't crash on fp16

* Make conversion explicit.
2023-02-07 19:51:33 +01:00
wfng92
111228cb39 Fix torchvision.transforms and transforms function naming clash (#2274)
* Fix torchvision.transforms and transforms function naming clash

* Update unconditional script for onnx

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-07 17:36:32 +01:00
Patrick von Platen
bbb46ad3d5 [Tests] Fix slow tests (#2271) 2023-02-07 14:42:12 +01:00
wfng92
b1dad2e9d3 Make center crop and random flip as args for unconditional image generation (#2259)
* Add center crop and horizontal flip to args

* Update command to use center crop and random flip

* Add center crop and horizontal flip to args

* Update command to use center crop and random flip
2023-02-07 11:58:31 +01:00
Patrick von Platen
cd52475560 [Examples] Remove datasets important that is not needed (#2267)
* [Examples] Remove datasets important that is not needed

* remove from lora tambien
2023-02-07 11:55:34 +01:00
Patrick von Platen
0f04e799dc fix vae pt script 2023-02-07 08:34:19 +00:00
YiYi Xu
1051ca81a6 Stable Diffusion Latent Upscaler (#2059)
* Modify UNet2DConditionModel

- allow skipping mid_block

- adding a norm_group_size argument so that we can set the `num_groups` for group norm using `num_channels//norm_group_size`

- allow user to set dimension for the timestep embedding (`time_embed_dim`)

- the kernel_size for `conv_in` and `conv_out` is now configurable

- add random fourier feature layer (`GaussianFourierProjection`) for `time_proj`

- allow user to add the time and class embeddings before passing through the projection layer together - `time_embedding(t_emb + class_label))`

- added 2 arguments `attn1_types` and `attn2_types`

  * currently we have argument `only_cross_attention`: when it's set to `True`, we will have a to the
`BasicTransformerBlock` block with 2 cross-attention , otherwise we
get a self-attention followed by a cross-attention; in k-upscaler, we need to have blocks that include just one cross-attention, or self-attention -> cross-attention;
so I added `attn1_types` and `attn2_types` to the unet's argument list to allow user specify the attention types for the 2 positions in each block;  note that I stil kept
the `only_cross_attention` argument for unet for easy configuration, but it will be converted to `attn1_type` and `attn2_type` when passing down to the down blocks

- the position of downsample layer and upsample layer is now configurable

- in k-upscaler unet, there is only one skip connection per each up/down block (instead of each layer in stable diffusion unet), added `skip_freq = "block"` to support
this use case

- if user passes attention_mask to unet, it will prepare the mask and pass a flag to cross attention processer to skip the `prepare_attention_mask` step
inside cross attention block

add up/down blocks for k-upscaler

modify CrossAttention class

- make the `dropout` layer in `to_out` optional

- `use_conv_proj` - use conv instead of linear for all projection layers (i.e. `to_q`, `to_k`, `to_v`, `to_out`) whenever possible. note that when it's used to do cross
attention, to_k, to_v has to be linear because the `encoder_hidden_states` is not 2d

- `cross_attention_norm` - add an optional layernorm on encoder_hidden_states

- `attention_dropout`: add an optional dropout on attention score

adapt BasicTransformerBlock

- add an ada groupnorm layer  to conditioning attention input with timestep embedding

- allow skipping the FeedForward layer in between the attentions

- replaced the only_cross_attention argument with attn1_type and attn2_type for more flexible configuration

update timestep embedding: add new act_fn  gelu and an optional act_2

modified ResnetBlock2D

- refactored with AdaGroupNorm class (the timestep scale shift normalization)

- add `mid_channel` argument - allow the first conv to have a different output dimension from the second conv

- add option to use input AdaGroupNorm on the input instead of groupnorm

- add options to add a dropout layer after each conv

- allow user to set the bias in conv_shortcut (needed for k-upscaler)

- add gelu

adding conversion script for k-upscaler unet

add pipeline

* fix attention mask

* fix a typo

* fix a bug

* make sure model can be used with GPU

* make pipeline work with fp16

* fix an error in BasicTransfomerBlock

* make style

* fix typo

* some more fixes

* uP

* up

* correct more

* some clean-up

* clean time proj

* up

* uP

* more changes

* remove the upcast_attention=True from unet config

* remove attn1_types, attn2_types etc

* fix

* revert incorrect changes up/down samplers

* make style

* remove outdated files

* Apply suggestions from code review

* attention refactor

* refactor cross attention

* Apply suggestions from code review

* update

* up

* update

* Apply suggestions from code review

* finish

* Update src/diffusers/models/cross_attention.py

* more fixes

* up

* up

* up

* finish

* more corrections of conversion state

* act_2 -> act_2_fn

* remove dropout_after_conv from ResnetBlock2D

* make style

* simplify KAttentionBlock

* add fast test for latent upscaler pipeline

* add slow test

* slow test fp16

* make style

* add doc string for pipeline_stable_diffusion_latent_upscale

* add api doc page for latent upscaler pipeline

* deprecate attention mask

* clean up embeddings

* simplify resnet

* up

* clean up resnet

* up

* correct more

* up

* up

* improve a bit more

* correct more

* more clean-ups

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* add docstrings for new unet config

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* # Copied from

* encode the image if not latent

* remove force casting vae to fp32

* fix

* add comments about preconditioning parameters from k-diffusion paper

* attn1_type, attn2_type -> add_self_attention

* clean up get_down_block and get_up_block

* fix

* fixed a typo(?) in ada group norm

* update slice attention processer for cross attention

* update slice

* fix fast test

* update the checkpoint

* finish tests

* fix-copies

* fix-copy for modeling_text_unet.py

* make style

* make style

* fix f-string

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix import

* correct changes

* fix resnet

* make fix-copies

* correct euler scheduler

* add missing #copied from for preprocess

* revert

* fix

* fix copies

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/models/cross_attention.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* clean up conversion script

* KDownsample2d,KUpsample2d -> KDownsample2D,KUpsample2D

* more

* Update src/diffusers/models/unet_2d_condition.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* remove prepare_extra_step_kwargs

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix a typo in timestep embedding

* remove num_image_per_prompt

* fix fasttest

* make style + fix-copies

* fix

* fix xformer test

* fix style

* doc string

* make style

* fix-copies

* docstring for time_embedding_norm

* make style

* final finishes

* make fix-copies

* fix tests

---------

Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-07 09:11:57 +01:00
Patrick von Platen
3b66cc0fc1 make style 2023-02-07 08:11:22 +00:00
chavinlo
717a956a02 Create convert_vae_pt_to_diffusers.py (#2215)
* Create convert_vae_pt_to_diffusers.py

Just a simple script to convert VAE.pt files to diffusers format
Tested with: https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/VAEs/orangemix.vae.pt

* Update convert_vae_pt_to_diffusers.py

Forgot to add the function call

* make style

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: chavinlo <example@example.com>
2023-02-07 09:10:34 +01:00
Jorge C. Gomes
d43972ae71 Fixes prompt input checks in StableDiffusion img2img pipeline (#2206)
* Fixes prompt input checks in img2img

Allows providing prompt_embeds instead of the prompt, which is not currently possible as the first check fails.
This becomes the same as the function found in 8267c78445/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py (L393)

* Continues the fix

This also needs to be fixed. Becomes consistent with 8267c78445/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py (L558)

I've now tested this implementation, and it produces the expected results.
2023-02-07 09:10:24 +01:00
Fazzie-Maqianli
ffed2420c4 fix distributed init twice (#2252)
fix colossalai dreambooth
2023-02-07 08:55:39 +01:00
Pedro Cuenca
8178c840f2 Mention training problems with xFormers 0.0.16 (#2254) 2023-02-06 11:19:26 +01:00
nickkolok
3a0d3da66f Fix a typo: bfloa16 -> bfloat16 (#2243) 2023-02-06 09:14:08 +01:00
psychedelicious
22c1ba56c2 Fix k_dpm_2 & k_dpm_2_a on MPS (#2241)
Needed to convert `timesteps` to `float32` a bit sooner.

Fixes #1537
2023-02-05 23:45:15 +01:00
Pedro Cuenca
7386e7730c Show error when loading safety_checker from_flax (#2187)
* Show error when loading safety_checker `from_flax`

* fix style
2023-02-04 20:55:11 +01:00
Pedro Cuenca
154a7865fc [Flax DDPM] Make key optional so default pipelines don't fail (#2176)
Make `key` optional so default pipelines don't fail.
2023-02-04 20:45:20 +01:00
Robin Hutmacher
9baa29e9c0 Fix typo in StableDiffusionInpaintPipeline (#2197)
* Fix typo in StableDiffusionInpaintPipeline

* Add embedded prompt handling

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-03 19:03:15 +01:00
Jorge C. Gomes
58c416ab0c Fixes LoRAXFormersCrossAttnProcessor (#2207)
Related to #2124 
The current implementation is throwing a shape mismatch error. Which makes sense, as this line is obviously missing, comparing to XFormersCrossAttnProcessor and LoRACrossAttnProcessor.

I don't have formal tests, but I compared `LoRACrossAttnProcessor` and `LoRAXFormersCrossAttnProcessor` ad-hoc, and they produce the same results with this fix.
2023-02-03 18:10:48 +01:00
Isamu Isozaki
d46d78c584 Hotfix textual inv logging (#2183) 2023-02-03 18:08:46 +01:00
Patrick von Platen
05168e5d83 make style 2023-02-03 19:03:13 +02:00
Justin Merrell
948022e1e8 fix: flagged_images implementation (#1947)
Flagged images would be set to the blank image instead of the original image that contained the NSF concept for optional viewing.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-03 18:02:56 +01:00
Patrick von Platen
2f9a70aa85 [LoRA] Make sure validation works in multi GPU setup (#2172)
* [LoRA] Make sure validation works in multi GPU setup

* more fixes

* up
2023-02-03 16:50:10 +01:00
Sayak Paul
e43e206dc7 removes ~s in favor of full-fledged links. (#2229)
remove ~ in favor of full-fledged links.
2023-02-03 20:18:39 +05:30
Will Berman
99c39b4012 [nit] negative_prompt typo (#2227)
* negative_prompt typo

* fix
2023-02-03 14:05:50 +01:00
dymil
7547f9b475 Fix timestep dtype in legacy inpaint (#2120)
* Fix timestep dtype in legacy inpaint

This matches the structure in the text2img, img2img, and inpaint ONNX pipelines

* Fix style in dtype patch
2023-02-03 13:04:21 +01:00
Prathik Rao
a87e87fcbe refactor onnxruntime integration (#2042)
* refactor onnxruntime integration

* fix requirements.txt bug

* make style

* add support for textual_inversion

* make style

* add readme

* cleanup README files

* 1/27/2023 update to training scripts

* make style

* 1/30 update to train_unconditional

* style with black-22.8.0

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com>
Co-authored-by: anton- <anton@huggingface.co>
2023-02-03 12:04:59 +01:00
Dudu Moshe
ecadcdefe1 [Bug] scheduling_ddpm: fix variance in the case of learned_range type. (#2090)
scheduling_ddpm: fix variance in the case of learned_range type.

In the case of learned_range variance type, there are missing logs
and exponent comparing to the theory (see "Improved Denoising Diffusion
Probabilistic Models" section 3.1 equation 15:
https://arxiv.org/pdf/2102.09672.pdf).
2023-02-03 09:42:42 +01:00
Pedro Cuenca
2bbd532990 Docs: short section on changing the scheduler in Flax (#2181)
* Short doc on changing the scheduler in Flax.

* Apply fix from @patil-suraj

Co-authored-by: Suraj Patil <surajp815@gmail.com>

---------

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2023-02-02 18:52:21 +01:00
Adalberto
68ef0666e2 Create train_dreambooth_inpaint_lora.py (#2205)
* Create train_dreambooth_inpaint_lora.py

* Update train_dreambooth_inpaint_lora.py

* Update train_dreambooth_inpaint_lora.py

* Update train_dreambooth_inpaint_lora.py

* Update train_dreambooth_inpaint_lora.py
2023-02-02 13:15:15 +01:00
Kashif Rasul
7ac95703cd add CITATION.cff (#2211)
add citation.cff
2023-02-02 12:46:44 +01:00
Pedro Cuenca
3816c9ad9f Update xFormers docs (#2208)
Update xFormers docs.
2023-02-01 19:56:32 +01:00
Patrick von Platen
8267c78445 [Loading] Better error message on missing keys (#2198)
* up

* finish
2023-02-01 14:22:39 +01:00
Muyang Li
4fc7084875 Fix a dimension bug in Transform2d (#2144)
The dimension does not match when `inner_dim` is not equal to `in_channels`.
2023-02-01 10:11:45 +01:00
Sayak Paul
9213d81bd0 add: guide on kerascv conversion tool. (#2169)
* add: guide on kerascv conversion tool.

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* address additional suggestions from review.

* change links to documentation-images.

* add separate links for training and inference goodies from diffusers.

* address Patrick's comments.

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2023-02-01 09:41:00 +01:00
Asad Memon
dd3cae3327 Pass LoRA rank to LoRALinearLayer (#2191) 2023-02-01 09:40:02 +01:00
Patrick von Platen
f73d0b6bec [Docs] remove license (#2188) 2023-01-31 22:11:32 +01:00
Patrick von Platen
d0d7ffffbd [Docs] Add components to docs (#2175) 2023-01-31 22:11:14 +01:00
Abhishek Varma
87cf88ed3d Use requests instead of wget in convert_from_ckpt.py (#2168)
-- This commit adopts `requests` in place of `wget` to fetch config `.yaml`
   files as part of `load_pipeline_from_original_stable_diffusion_ckpt` API.
-- This was done because in Windows PowerShell one needs to explicitly ensure
   that `wget` binary is part of the PATH variable. If not present, this leads
   to the code not being able to download the `.yaml` config file.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-31 14:35:45 +01:00
Patrick von Platen
60d915fbed make style 2023-01-31 11:46:48 +00:00
1lint
d1efefe15e [Breaking change] fix legacy inpaint noise and resize mask tensor (#2147)
* fix legacy inpaint noise and resize mask tensor

* updated legacy inpaint pipe test expected_slice
2023-01-31 12:44:35 +01:00
Sayak Paul
7d96b38b70 [examples] Fix CLI argument in the launch script command for text2image with LoRA (#2171)
* Update README.md

* Update README.md
2023-01-31 09:47:09 +01:00
Dudu Moshe
cedafb8600 [Bug]: fix DDPM scheduler arbitrary infer steps count. (#2076)
scheduling_ddpm: fix evaluate with lower timesteps count than train.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-01-31 09:13:26 +01:00