diffusers

mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Author	SHA1	Message	Date
Suraj Patil	63f767ef15	Add SVD (#5895 ) * begin model * finish blocks * add_embedding * addition_time_embed_dim * use TimestepEmbedding * fix temporal res block * fix time_pos_embed * fix add_embedding * add conversion script * fix model * up * add new resnet blocks * make forward work * return sample in original shape * fix temb shape in TemporalResnetBlock * add spatio temporal transformers * add vae blocks * fix blocks * update * update * fix shapes in Alphablender and add time activation in res blcok * use new blocks * style * fix temb shape * fix SpatioTemporalResBlock * reuse TemporalBasicTransformerBlock * fix TemporalBasicTransformerBlock * use TransformerSpatioTemporalModel * fix TransformerSpatioTemporalModel * fix time_context dim * clean up * make temb optional * add blocks * rename model * update conversion script * remove UNetMidBlockSpatioTemporal * add in init * remove unused arg * remove unused arg * remove more unsed args * up * up * check for None * update vae * update up/mid blocks for decoder * begin pipeline * adapt scheduler * add guidance scalings * fix norm eps in temporal transformers * add temporal autoencoder * make pipeline run * fix frame decodig * decode in float32 * decode n frames at a time * pass decoding_t to decode_latents * fix decode_latents * vae encode/decode in fp32 * fix dtype in TransformerSpatioTemporalModel * type image_latents same as image_embeddings * allow using differnt eps in temporal block for video decoder * fix default values in vae * pass num frames in decode * switch spatial to temporal for mixing in VAE * fix num frames during split decoding * cast alpha to sample dtype * fix attention in MidBlockTemporalDecoder * fix typo * fix guidance_scales dtype * fix missing activation in TemporalDecoder * skip_post_quant_conv * add vae conversion * style * take guidance scale as input * up * allow passing PIL to export_video * accept fps as arg * add pipeline and vae in init * remove hack * use AutoencoderKLTemporalDecoder * don't scale image latents * add unet tests * clean up unet * clean TransformerSpatioTemporalModel * add slow svd test * clean up * make temb optional in Decoder mid block * fix norm eps in TransformerSpatioTemporalModel * clean up temp decoder * clean up * clean up * use c_noise values for timesteps * use math for log * update * fix copies * doc * upcast vae * update forward pass for gradient checkpointing * make added_time_ids is tensor * up * fix upcasting * remove post quant conv * add _resize_with_antialiasing * fix _compute_padding * cleanup model * more cleanup * more cleanup * more cleanup * remove freeu * remove attn slice * small clean * up * up * remove extra step kwargs * remove eta * remove dropout * remove callback * remove merge factor args * clean * clean up * move to dedicated folder * remove attention_head_dim * docstr and small fix * update unet doc strings * rename decoding_t * correct linting * store c_skip and c_out * cleanup * clean TemporalResnetBlock * more cleanup * clean up vae * clean up * begin doc * more cleanup * up * up * doc * Improve * better naming * better naming * better naming * better naming * better naming * better naming * better naming * better naming * Apply suggestions from code review * Default chunk size to None * add example * Better * Apply suggestions from code review * update doc * Update src/diffusers/pipelines/stable_diffusion_video/pipeline_stable_diffusion_video.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style * Get torch compile working * up * rename * fix doc * add chunking * torch compile * torch compile * add modelling outputs * torch compile * Improve chunking * Apply suggestions from code review * Update docs/source/en/using-diffusers/svd.md * Close diff tag * remove slicing * resnet docstr * add docstr in resnet * rename * Apply suggestions from code review * update tests * Fix output type latents * fix more * fix more * Update docs/source/en/using-diffusers/svd.md * fix more * add pipeline tests * remove unused arg * clean up * make sure get_scaling receives tensors * fix euler scheduler * fix get_scalings * simply euler for now * remove old test file * use randn_tensor to create noise * fix device for rand tensor * increase expected_max_difference * fix test_inference_batch_single_identical * actually fix test_inference_batch_single_identical * disable test_save_load_float16 * skip test_float16_inference * skip test_inference_batch_single_identical * fix test_xformers_attention_forwardGenerator_pass * Apply suggestions from code review * update StableVideoDiffusionPipelineSlowTests * update image * add diffusers example * fix more --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: apolinário <joaopaulo.passos@gmail.com>	2023-11-29 19:13:36 +01:00
Patrick von Platen	b978334d71	[@cene555][Kandinsky 3.0] Add Kandinsky 3.0 (#5913 ) * finalize * finalize * finalize * add slow test * add slow test * add slow test * Fix more * add slow test * fix more * fix more * fix more * fix more * fix more * fix more * fix more * fix more * fix more * Better * Fix more * Fix more * add slow test * Add auto pipelines * add slow test * Add all * add slow test * add slow test * add slow test * add slow test * add slow test * Apply suggestions from code review * add slow test * add slow test	2023-11-24 17:46:00 +01:00
Kashif Rasul	6b04d61cf6	[Styling] stylify using ruff (#5841 ) * ruff format * not need to use doc-builder's black styling as the doc is styled in ruff * make fix-copies * comment * use run_ruff	2023-11-20 11:48:34 +01:00
Lucain	c896b841e4	Set `usedforsecurity=False` in hashlib methods (FIPS compliance) (#5790 ) * Set usedforsecurity=False in hashlib methods (FIPS compliance) * update version dependency * bump hfh version * bump hfh version	2023-11-17 14:56:58 +01:00
Will Berman	2fd46405cd	consistency decoder (#5694 ) * consistency decoder * rename * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update src/diffusers/pipelines/consistency_models/pipeline_consistency_models.py * uP * Apply suggestions from code review * uP * uP * uP --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2023-11-09 12:21:41 +01:00
Sayak Paul	d61889fc17	[Feat] PixArt-Alpha (#5642 ) * init pixart alpha pipeline * fix: import * script * script * script * add: vae to the pipeline * add: vae_scale_factor * add: checkpoint_path * clean conversion script a bit. * size embeddings. * fix: size embedding * update scrip * support for interpolation of position embedding. * support for conditioning. * .. * .. * .. * final layer * final layer * align if encode_prompt * support for caption embedding * refactor * refactor * refactor * start cross attention * start cross attention * cross_attention_dim * cross * cross * support for resolution and aspect_ratio * support for caption projection * refactor patch embeddings * batch_size * up * commit * commit * commit. * squeeze * squeeze * squeeze * squeeze * squeeze * squeeze * squeeze * squeeze * squeeze * squeeze * squeeze * squeeze. * squeeze. * fix final block./ * fix final block./ * fix final block./ * clean * fix: interpolation scale. * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging' * debugging * debugging * debugging * debugging * debugging * debugging * debugging * make --checkpoint_path non-required. * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * remove num_tokens * timesteps -> timestep * timesteps -> timestep * timesteps -> timestep * timesteps -> timestep * timesteps -> timestep * timesteps -> timestep * debug * debug * update conversion script. * update conversion script. * update conversion script. * debug * debug * debug * clean * debug * debug * debug * debug * debug * debug * debug * debug * deug * debug * debug * debug * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * clean * fix * fix * boom * boom * some changes * boom * save * up * remove i * fix more tests * DPMSolverMultistepScheduler * fix * offloading * fix conversion script * fix conversion script * remove print * remove support for negative prompt embeds. * typo. * remove extra kwargs * bring conversion script to where it was * fix * trying mu luck * trying my luck again * again * again * again * clean up * up * up * update example * support for 512 * remove spacing * finalize docs. * test debug * fix: assertion values. * debug * debug * debug * fix: repeat * remove prints. * Apply suggestions from code review * Apply suggestions from code review * Correct more * Apply suggestions from code review * Change all * Clean more * fix more * Fix more * Fix more * Correct more * address patrick's comments. * remove unneeded args * clean up pipeline. * sty;e * make the use of additional conditions better conditioned. * None better * dtype * height and width validation * add a note about size brackets. * fix * spit out slow test outputs. * fix? * fix optional test * fix more * remove unneeded comment * debug --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-11-06 08:40:04 +01:00
dg845	cd1b8d7ca8	[WIP] Refactor UniDiffuser Pipeline and Tests (#4948 ) * Add VAE slicing and tiling methods. * Switch to using VaeImageProcessing for preprocessing and postprocessing of images. * Rename the VaeImageProcessor to vae_image_processor to avoid a name clash with the CLIPImageProcessor (image_processor). * Remove the postprocess() function because we're using a VaeImageProcessor instead. * Remove UniDiffuserPipeline.decode_image_latents because we're using VaeImageProcessor instead. * Refactor generating text from text latents into a decode_text_latents method. * Add enable_full_determinism() to UniDiffuser tests. * make style * Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests. * Remove enable_model_cpu_offload since it is now part of DiffusionPipeline. * Rename the VaeImageProcessor instance to self.image_processor for consistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash. * Update UniDiffuser conversion script. * Make safe_serialization configurable in UniDiffuser conversion script. * Rename image_processor to clip_image_processor in UniDiffuser tests. * Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests. * Add initial test for compiling the UniDiffuser model (not tested yet). * Update encode_prompt and _encode_prompt to match that of StableDiffusionPipeline. * Turn off standard classifier-free guidance for now. * make style * make fix-copies * apply suggestions from review --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-10-02 18:24:55 +02:00
Ayush Mangal	157c9011d8	Add BLIP Diffusion (#4388 ) * Add BLIP Diffusion skeleton * Add other model components * Add BLIP2, need to change it for now * Fix pipeline imports * Load pretrained ViT * Make qformer fwd pass same * Replicate fwd passes * Fix device bug * Add accelerate functions * Remove extra functions from Blip2 * Minor bug * Integrate initial review changes * Refactoring * Refactoring * Refactor * Add controlnet * Refactor * Update conversion script * Add image processor * Shift postprocessing to ImageProcessor * Refactor * Fix device * Add fast tests * Update conversion script * Fix checkpoint conversion script * Integrate review changes * Integrate reivew changes * Remove unused functions from test * Reuse HF image processor in Cond image * Create new BlipImageProcessor based on transfomers * Fix image preprocessor * Minor * Minor * Add canny preprocessing * Fix controlnet preprocessing * Fix blip diffusion test * Add controlnet test * Add initial doc strings * Integrate review changes * Refactor * Update examples * Remove DDIM comments * Add copied from for prepare_latents * Add type anotations * Add docstrings * Do black formatting * Add batch support * Make tests pass * Make controlnet tests pass * Black formatting * Fix progress bar * Fix some licensing comments * Fix imports * Refactor controlnet * Make tests faster * Edit examples * Black formatting/Ruff * Add doc * Minor Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Move controlnet pipeline * Make tests faster * Fix imports * Fix formatting * Fix make errors * Fix make errors * Minor * Add suggested doc changes Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Edit docs * Fix 16 bit loading * Update examples * Edit toctree * Update docs/source/en/api/pipelines/blip_diffusion.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Minor * Add tips * Edit examples * Update model paths --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2023-09-21 17:05:35 +01:00
김태민	5b78141fd3	[FIX BUG] add config_files parser #5114 (#5115 ) * add config_files parser #5114 * add config_files parser_fix #5114	2023-09-20 16:17:47 +02:00
dg845	4c8a05f115	Fix Consistency Models UNet2DMidBlock2D Attention GroupNorm Bug (#4863 ) * Add attn_groups argument to UNet2DMidBlock2D to control theinternal Attention block's GroupNorm. * Add docstring for attn_norm_num_groups in UNet2DModel. * Since the test UNet config uses resnet_time_scale_shift == 'scale_shift', also set attn_norm_num_groups to 32. * Add test for attn_norm_num_groups to UNet2DModelTests. * Fix expected slices for slow tests. * Also fix tolerances for slow tests. --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2023-09-15 11:27:51 +01:00
Vladimir Mandic	ef29b24fda	allow loading of sd models from safetensors without online lookups using local config files (#5019 ) finish config_files implementation	2023-09-14 12:30:15 +02:00
Kashif Rasul	16a056a7b5	Wuerstchen fixes (#4942 ) * fix arguments and make example code work * change arguments in combined test * Add default timesteps * style * fixed test * fix broken test * formatting * fix docstrings * fix num_images_per_prompt * fix doc styles * please dont change this * fix tests * rename to DEFAULT_STAGE_C_TIMESTEPS --------- Co-authored-by: Dominic Rampas <d6582533@gmail.com>	2023-09-11 15:47:53 +02:00
Kashif Rasul	541bb6ee63	Würstchen model (#3849 ) * initial * initial * added initial convert script for paella vqmodel * initial wuerstchen pipeline * add LayerNorm2d * added modules * fix typo * use model_v2 * embed clip caption amd negative_caption * fixed name of var * initial modules in one place * WuerstchenPriorPipeline * inital shape * initial denoising prior loop * fix output * add WuerstchenPriorPipeline to __init__.py * use the noise ratio in the Prior * try to save pipeline * save_pretrained working * Few additions * add _execution_device * shape is int * fix batch size * fix shape of ratio * fix shape of ratio * fix output dataclass * tests folder * fix formatting * fix float16 + started with generator * Update pipeline_wuerstchen.py * removed vqgan code * add WuerstchenGeneratorPipeline * fix WuerstchenGeneratorPipeline * fix docstrings * fix imports * convert generator pipeline * fix convert * Work on Generator Pipeline. WIP * Pipeline works with our diffuzz code * apply scale factor * removed vqgan.py * use cosine schedule * redo the denoising loop * Update src/diffusers/models/resnet.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * use torch.lerp * use warp-diffusion org * clip_sample=False, * some refactoring * use model_v3_stage_c * c_cond size * use clip-bigG * allow stage b clip to be None * add dummy * würstchen scheduler * minor changes * set clip=None in the pipeline * fix attention mask * add attention_masks to text_encoder * make fix-copies * add back clip * add text_encoder * gen_text_encoder and tokenizer * fix import * updated pipeline test * undo changes to pipeline test * nip * fix typo * fix output name * set guidance_scale=0 and remove diffuze * fix doc strings * make style * nip * removed unused * initial docs * rename * toc * cleanup * remvoe test script * fix-copies * fix multi images * remove dup * remove unused modules * undo changes for debugging * no new line * remove dup conversion script * fix doc string * cleanup * pass default args * dup permute * fix some tests * fix prepare_latents * move Prior class to modules * offload only the text encoder and vqgan * fix resolution calculation for prior * nip * removed testing script * fix shape * fix argument to set_timesteps * do not change .gitignore * fix resolution calculations + readme * resolution calculation fix + readme * small fixes * Add combined pipeline * rename generator -> decoder * Update .gitignore Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * removed efficient_net * create combined WuerstchenPipeline * make arguments consistent with VQ model * fix var names * no need to return text_encoder_hidden_states * add latent_dim_scale to config * split model into its own file * add WuerschenPipeline to docs * remove unused latent_size * register latent_dim_scale * update script * update docstring * use Attention preprocessor * concat with normed input * fix-copies * add docs * fix test * fix style * add to cpu_offloaded_model * updated type * remove 1-line func * updated type * initial decoder test * formatting * formatting * fix autodoc link * num_inference_steps is int * remove comments * fix example in docs * Update src/diffusers/pipelines/wuerstchen/diffnext.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * rename layernorm to WuerstchenLayerNorm * rename DiffNext to WuerstchenDiffNeXt * added comment about MixingResidualBlock * move paella vq-vae to pipelines' folder * initial decoder test * increased test_float16_inference expected diff * self_attn is always true * more passing decoder tests * batch image_embeds * fix failing tests * set the correct dtype * relax inference test * update prior * added combined pipeline test * faster test * faster test * Update src/diffusers/pipelines/wuerstchen/pipeline_wuerstchen_combined.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fix issues from review * update wuerstchen.md + change generator name * resolve issues * fix copied from usage and add back batch_size * fix API * fix arguments * fix combined test * Added timesteps argument + fixes * Update tests/pipelines/test_pipelines_common.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update tests/pipelines/wuerstchen/test_wuerstchen_prior.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/wuerstchen/pipeline_wuerstchen_combined.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/wuerstchen/pipeline_wuerstchen_combined.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/wuerstchen/pipeline_wuerstchen_combined.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/wuerstchen/pipeline_wuerstchen_combined.py * up * Fix more * failing tests * up * up * correct naming * correct docs * correct docs * fix test params * correct docs * fix classifier free guidance * fix classifier free guidance * fix more * fix all * make tests faster --------- Co-authored-by: Dominic Rampas <d6582533@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Dominic Rampas <61938694+dome272@users.noreply.github.com>	2023-09-06 16:15:51 +02:00
Nguyễn Công Tú Anh	38466c369f	Add GLIGEN Text Image implementation (#4777 ) * Add GLIGEN Text Image implementation * add style transfer from image * fix check_repository_consistency * add convert script GLIGEN model to Diffusers * rename attention type * fix style code * remove PositionNetTextImage * Revert "fix check_repository_consistency" This reverts commit `15f098c96e`. * change attention type name * update docs for GLIGEN * change examples with hf-document-image * fix style * add CLIPImageProjection for GLIGEN * Add new encode_prompt, load project matrix in pipe init * move CLIPImageProjection to stable_diffusion * add comment	2023-09-01 15:48:01 +05:30
Alexsey Shestacov	3eeaf4e041	Fix convert_original_stable_diffusion_to_diffusers script (#4817 ) Fix stable diffusion conversion script	2023-08-29 09:14:45 +02:00
Sanchit Gandhi	b1290d3fb8	Convert MusicLDM (#4579 ) * from audioldm * fix vae * move to new pipeline * copied from audioldm * remove redundant control flow * iterate * fix docstring * finish pipeline * tests: from audioldm2 * iterate * finish fast tests * finish slow integration tests * add docs * remove dtype test * update toctree * "copied from" in conversion (where possible) * Update docs/source/en/api/pipelines/musicldm.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fix docstring * make nightly * style * fix dtype test --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-08-25 13:31:00 +01:00
realliujiaxu	ecded50ad5	add convert diffuser pipeline of XL to original stable diffusion (#4596 ) convert diffuser pipeline of XL to original stable diffusion Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>	2023-08-22 19:11:06 +05:30
Sanchit Gandhi	7a24977ce3	Add AudioLDM 2 (#4549 ) * from audioldm * unet down + mid * vae, clap, flan-t5 * start sequence audio mae * iterate on audioldm encoder * finish encoder * finish weight conversion * text pre-processing * gpt2 pre-processing * fix projection model * working * unet equivalence * finish in base * add unet cond * finish unet * finish custom unet * start clean-up * revert base unet changes * refactor pre-processing * tests: from audioldm * fix some tests * more fixes * iterate on tests * make fix copies * harden fast tests * slow integration tests * finish tests * update checkpoint * update copyright * docs * remove outdated method * add docstring * make style * remove decode latents * enable cpu offload * (text_encoder_1, tokenizer_1) -> (text_encoder, tokenizer) * more clean up * more refactor * build pr docs * Update docs/source/en/api/pipelines/audioldm2.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * small clean * tidy conversion * update for large checkpoint * generate -> generate_language_model * full clap model * shrink clap-audio in tests * fix large integration test * fix fast tests * use generation config * make style * update docs * finish docs * finish doc * update tests * fix last test * syntax * finalise tests * refactor projection model in prep for TTS * fix fast tests * style --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2023-08-21 12:34:21 +01:00
AisingioroHao	1b739e7344	Fixed invalid pipeline_class_name parameter. (#4590 ) * Fixed invalid pipeline_class_name parameter. * Fix the format	2023-08-14 17:21:17 +05:30
Abhipsha Das	c8d86e9f0a	Remove code snippets containing `is_safetensors_available()` (#4521 ) * [WIP] Remove code snippets containing `is_safetensors_available()` * Modifying `import_utils.py` * update pipeline tests for safetensor default * fix test related to cached requests * address import nits --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>	2023-08-11 11:05:22 +05:30
dotieuthien	b28cd3fba0	Convert Stable Diffusion ControlNet to TensorRT (#4465 ) * convert tensorrt controlnet * Fix code quality * Fix code quality * Fix code quality * Fix code quality * Fix code quality * Fix code quality * Fix number controlnet condition * Add convert SD XL to onnx * Add convert SD XL to tensorrt * Add convert SD XL to tensorrt * Add examples in comments * Add examples in comments * Add test onnx controlnet * Add tensorrt test * Remove copied * Move file test to examples/community * Remove script * Remove script * Remove text --------- Co-authored-by: dotieuthien <thien.do@mservice.com.vn> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-08-11 08:12:26 +05:30
VV-A-VV	3fd45eb10f	fix some typo error (#4546 ) * fix some typo error * Undo changes to capitalization	2023-08-10 06:49:25 +05:30
YiYi Xu	aef11cbf66	add pipeline_class_name argument to Stable Diffusion conversion script (#4461 ) * add pipeline class * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-08-07 06:44:31 -10:00
Sayak Paul	18fc40c169	[Feat] add tiny Autoencoder for (almost) instant decoding (#4384 ) * add: model implementation of tiny autoencoder. * add: inits. * push the latest devs. * add: conversion script and finish. * add: scaling factor args. * debugging * fix denormalization. * fix: positional argument. * handle use_torch_2_0_or_xformers. * handle post_quant_conv * handle dtype * fix: sdxl image processor for tiny ae. * fix: sdxl image processor for tiny ae. * unify upcasting logic. * copied from madness. * remove trailing whitespace. * set is_tiny_vae = False * address PR comments. * change to AutoencoderTiny * make act_fn an str throughout * fix: apply_forward_hook decorator call * get rid of the special is_tiny_vae flag. * directly scale the output. * fix dummies? * fix: act_fn. * get rid of the Clamp() layer. * bring back copied from. * movement of the blocks to appropriate modules. * add: docstrings to AutoencoderTiny * add: documentation. * changes to the conversion script. * add doc entry. * settle tests. * style * add one slow test. * fix * fix 2 * fix 2 * fix: 4 * fix: 5 * finish integration tests * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * style --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2023-08-02 23:58:05 +05:30
Xin Kong	615c04db15	[Pipelines] Add community pipeline for Zero123 (#4295 ) * add zero123 pipeline to community * add community doc * reformat * update zero123 pipeline, including cc_projection within diffusers; add convert ckpt scripts; support diffusers weights	2023-08-02 19:36:49 +02:00
YiYi Xu	47b3346422	Shap-E: add support for mesh output (#4062 ) * add output_type=mesh * update img2img * make style * add doc * make style * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * add docstring for output_type * add a section in doc about hub mesh visualization/ rotation * update conversion script so default background is white * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * renderer -> shap_e_renderer * img2img renderer -> shap_e_renderer * fix tests --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2023-07-20 18:05:13 +02:00
Ruslan Vorovchenko	07f1fbb18e	Asymmetric vqgan (#3956 ) * added AsymmetricAutoencoderKL * fixed copies+dummy * added script to convert original asymmetric vqgan * added docs * updated docs * fixed style * fixes, added tests * update doc * fixed doc * fixed tests * naming Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * naming Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * udpated code example * updated doc * comments fixes * added docstring Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * comments fixes * added inpaint pipeline tests * comment suggestion: delete method * yet another fixes --------- Co-authored-by: Ruslan Vorovchenko <r.vorovchenko@prequelapp.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-07-20 17:51:06 +02:00
Will Berman	a0597f33ac	t2i pipeline (#3932 ) * Quick implementation of t2i-adapter Load adapter module with from_pretrained Prototyping generalized adapter framework Writeup doc string for sideload framework(WIP) + some minor update on implementation Update adapter models Remove old adapter optional args in UNet Add StableDiffusionAdapterPipeline unit test Handle cpu offload in StableDiffusionAdapterPipeline Auto correct coding style Update model repo name to "RzZ/sd-v1-4-adapter-pipeline" Refactor MultiAdapter to better compatible with config system Export MultiAdapter Create pipeline document template from controlnet Create dummy objects Supproting new AdapterLight model Fix StableDiffusionAdapterPipeline common pipeline test [WIP] Update adapter pipeline document Handle num_inference_steps in StableDiffusionAdapterPipeline Update definition of Adapter "channels_in" Update documents Apply code style Fix doc typo and merge error Update doc string and example Quality of life improvement Remove redundant code and file from prototyping Remove unused pageage Remove comments Fix title Fix typo Add conditioning scale arg Bring back old implmentation Offload sideload Add supply info on document Update src/diffusers/models/adapter.py Co-authored-by: Will Berman <wlbberman@gmail.com> Update MultiAdapter constructor Swap out custom checkpoint and update pipeline constructor Update docment Apply suggestions from code review Co-authored-by: Will Berman <wlbberman@gmail.com> Correcting style Following single-file policy Update auto size in image preprocess func Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_adapter.py Co-authored-by: Will Berman <wlbberman@gmail.com> fix copies Update adapter pipeline behavior Add adapter_conditioning_scale doc string Add the missing doc string Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Fix few bugs from suggestion Handle L-mode PIL image as control image Rename to differentiate adapter resblock Update src/diffusers/models/adapter.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Fix typo Update adapter parameter name Update test case and code style Fix copies Fix typo Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_adapter.py Co-authored-by: Will Berman <wlbberman@gmail.com> Update Adapter class name Add checkpoint converting script Fix style Fix-copies Remove dev script Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Updates for parameter rename Fix convert_adapter remove main fix diff more refactoring more more small fixes refactor tests more slow tests more tests Update docs/source/en/api/pipelines/overview.mdx Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> add community contributor to docs Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> fix remove from_adapters license paper link docs more url fixes more docs fix fixes fix fix * fix sample inplace add * additional_kwargs -> additional_residuals * move t2i adapter pipeline to own module * preprocess -> _preprocess_adapter_image * add TencentArc to license * fix example code links * add image converter and fix example doc string * fix links * clearer additional residual application --------- Co-authored-by: HimariO <dsfhe49854@gmail.com>	2023-07-17 12:55:44 -07:00
YiYi Xu	45f6d52b10	Add Shap-E (#3742 ) * refactor prior_transformer adding conversion script add pipeline add step_index from pipeline, + remove permute add zero pad token remove copy from statement for betas_for_alpha_bar function * add * add * update conversion script for renderer model * refactor camera a little bit * clean up * style * fix copies * Update src/diffusers/schedulers/scheduling_heun_discrete.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * alpha_transform_type * remove step_index argument * remove get_sigmas_karras * remove _yiyi_sigma_to_t * move the rescale prompt_embeds from prior_transformer to pipeline * replace baddbmm with einsum to match origial repo * Revert "replace baddbmm with einsum to match origial repo" This reverts commit `3f6b435d65`. * add step_index to scale_model_input * Revert "move the rescale prompt_embeds from prior_transformer to pipeline" This reverts commit `5b5a8e6be9`. * move rescale from prior_transformer to pipeline * correct step_index in scale_model_input * remove print lines * refactor prior - reduce arguments * make style * add prior_image * arg embedding_proj_norm -> norm_embedding_proj * add pre-norm for proj_embedding * move rescale prompt from pipeline to _encode_prompt * add img2img pipeline * style * copies * Update src/diffusers/models/prior_transformer.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/prior_transformer.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/prior_transformer.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/prior_transformer.py add arg: encoder_hid_proj Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/prior_transformer.py add new config: norm_in_type Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/prior_transformer.py add new config: added_emb_type Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/prior_transformer.py rename out_dim -> clip_embed_dim Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/prior_transformer.py rename config: out_dim -> clip_embed_dim Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/prior_transformer.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/prior_transformer.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * finish refactor prior_tranformer * make style * refactor renderer * fix * make style * refactor img2img * remove params_proj * add test * add upcast_softmax to prior_transformer * enable num_images_per_prompt, add save_gif utility * add * add fast test * make style * add slow test * style * add test for img2img * refactor * enable batching * style * refactor scheduler * update test * style * attempt to solve batch related tests timeout * add doc * Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * hardcode rendering related config * update betas_for_alpha_bar on ddpm_scheduler * fix copies * fix * export_to_gif * style * second attempt to speed up batching tests * add doc page to index * Remove intermediate clipping * 3rd attempt to speed up batching tests * Remvoe time index * simplify scheduler * Fix more * Fix more * fix more * make style * fix schedulers * fix some more tests * finish * add one more test * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style * apply feedbacks * style * fix copies * add one example * style * add example for img2img * fix doc * fix more doc strings * size -> frame_size * style * update doc * style * fix on doc * update repo name * improve the usage example in shap-e img2img * add usage examples in the shap-e docs. * consolidate examples. * minor fix. * update doc * Apply suggestions from code review * Apply suggestions from code review * remove upcast * Make sure background is white * Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py * Apply suggestions from code review * Finish * Apply suggestions from code review * Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py * Make style --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2023-07-06 15:20:42 +02:00
Patrick von Platen	bc9a8cef6f	[SD-XL] Add new pipelines (#3859 ) * Add new text encoder * add transformers depth * More * Correct conversion script * Fix more * Fix more * Correct more * correct text encoder * Finish all * proof that in works in run local xl * clean up * Get refiner to work * Add red castle * Fix batch size * Improve pipelines more * Finish text2image tests * Add img2img test * Fix more * fix import * Fix embeddings for classic models (#3888) Fix embeddings for classic SD models. * Allow multiple prompts to be passed to the refiner (#3895) * finish more * Apply suggestions from code review * add watermarker * Model offload (#3889) * Model offload. * Model offload for refiner / img2img * Hardcode encoder offload on img2img vae encode Saves some GPU RAM in img2img / refiner tasks so it remains below 8 GB. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * correct * fix * clean print * Update install warning for `invisible-watermark` * add: missing docstrings. * fix and simplify the usage example in img2img. * fix setup for watermarking. * Revert "fix setup for watermarking." This reverts commit `491bc9f5a6`. * fix: watermarking setup. * fix: op. * run make fix-copies. * make sure tests pass * improve convert * make tests pass * make tests pass * better error message * fiinsh * finish * Fix final test --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2023-07-06 13:37:27 +02:00
dg845	aed7499a8d	Add Consistency Models Pipeline (#3492 ) * initial commit * Improve consistency models sampling implementation. * Add CMStochasticIterativeScheduler, which implements the multi-step sampler (stochastic_iterative_sampler) in the original code, and make further improvements to sampling. * Add Unet blocks for consistency models * Add conversion script for Unet * Fix bug in new unet blocks * Fix attention weight loading * Make design improvements to ConsistencyModelPipeline and CMStochasticIterativeScheduler and add initial version of tests. * make style * Make small random test UNet class conditional and set resnet_time_scale_shift to 'scale_shift' to better match consistency model checkpoints. * Add support for converting a test UNet and non-class-conditional UNets to the consistency models conversion script. * make style * Change num_class_embeds to 1000 to better match the original consistency models implementation. * Add support for distillation in pipeline_consistency_models.py. * Improve consistency model tests: - Get small testing checkpoints from hub - Modify tests to take into account "distillation" parameter of ConsistencyModelPipeline - Add onestep, multistep tests for distillation and distillation + class conditional - Add expected image slices for onestep tests * make style * Improve ConsistencyModelPipeline: - Add initial support for class-conditional generation - Fix initial sigma for onestep generation - Fix some sigma shape issues * make style * Improve ConsistencyModelPipeline: - add latents __call__ argument and prepare_latents method - add check_inputs method - add initial docstrings for ConsistencyModelPipeline.__call__ * make style * Fix bug when randomly generating class labels for class-conditional generation. * Switch CMStochasticIterativeScheduler to configuring a sigma schedule and make related changes to the pipeline and tests. * Remove some unused code and make style. * Fix small bug in CMStochasticIterativeScheduler. * Add expected slices for multistep sampling tests and make them pass. * Work on consistency model fast tests: - in pipeline, call self.scheduler.scale_model_input before denoising - get expected slices for Euler and Heun scheduler tests - make Euler test pass - mark Heun test as expected fail because it doesn't support prediction_type "sample" yet - remove DPM and Euler Ancestral tests because they don't support use_karras_sigmas * make style * Refactor conversion script to make it easier to add more model architectures to convert in the future. * Work on ConsistencyModelPipeline tests: - Fix device bug when handling class labels in ConsistencyModelPipeline.__call__ - Add slow tests for onestep and multistep sampling and make them pass - Refactor fast tests - Refactor ConsistencyModelPipeline.__init__ * make style * Remove the add_noise and add_noise_to_input methods from CMStochasticIterativeScheduler for now. * Run python utils/check_copies.py --fix_and_overwrite python utils/check_dummies.py --fix_and_overwrite to make dummy objects for new pipeline and scheduler. * Make fast tests from PipelineTesterMixin pass. * make style * Refactor consistency models pipeline and scheduler: - Remove support for Karras schedulers (only support CMStochasticIterativeScheduler) - Move sigma manipulation, input scaling, denoising from pipeline to scheduler - Make corresponding changes to tests and ensure they pass * make style * Add docstrings and further refactor pipeline and scheduler. * make style * Add initial version of the consistency models documentation. * Refactor custom timesteps logic following DDPMScheduler/IFPipeline and temporarily add torch 2.0 SDPA kernel selection logic for debugging. * make style * Convert current slow tests to use fp16 and flash attention. * make style * Add slow tests for normal attention on cuda device. * make style * Fix attention weights loading * Update consistency model fast tests for new test checkpoints with attention fix. * make style * apply suggestions * Add add_noise method to CMStochasticIterativeScheduler (copied from EulerDiscreteScheduler). * Conversion script now outputs pipeline instead of UNet and add support for LSUN-256 models and different schedulers. * When both timesteps and num_inference_steps are supplied, raise warning instead of error (timesteps take precedence). * make style * Add remaining diffusers model checkpoints for models in the original consistency model release and update usage example. * apply suggestions from review * make style * fix attention naming * Add tests for CMStochasticIterativeScheduler. * make style * Make CMStochasticIterativeScheduler tests pass. * make style * Override test_step_shape in CMStochasticIterativeSchedulerTest instead of modifying it in SchedulerCommonTest. * make style * rename some models * Improve API * rename some models * Remove duplicated block * Add docstring and make torch compile work * More fixes * Fixes * Apply suggestions from code review * Apply suggestions from code review * add more docstring * update consistency conversion script --------- Co-authored-by: ayushmangal <ayushmangal@microsoft.com> Co-authored-by: Ayush Mangal <43698245+ayushtues@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-07-05 19:33:58 +02:00
Patrick von Platen	5df2acf7d2	[Conversion] Small fixes (#3848 ) * [Conversion] Small fixes * Update src/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py	2023-06-22 13:52:59 +02:00
YiYi Xu	7761b89d7b	update conversion script for Kandinsky unet (#3766 ) * update kandinsky conversion script * style --------- Co-authored-by: yiyixuxu <yixu310@gmail,com>	2023-06-14 06:57:53 -10:00
Will Berman	462956be7b	small tweaks for parsing thibaudz controlnet checkpoints (#3657 )	2023-06-05 10:24:31 -07:00
dg845	352ca3198c	[WIP] Add UniDiffuser model and pipeline (#2963 ) * Fix a bug of pano when not doing CFG (#3030) * Fix a bug of pano when not doing CFG * enhance code quality * apply formatting. --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Text2video zero refinements (#3070) * fix progress bar issue in pipeline_text_to_video_zero.py. Copy scheduler after first backward * fix tensor loading in test_text_to_video_zero.py * make style && make quality * Release: v0.15.0 * [Tests] Speed up panorama tests (#3067) * fix: norm group test for UNet3D. * chore: speed up the panorama tests (fast). * set default value of _test_inference_batch_single_identical. * fix: batch_sizes default value. * [Post release] v0.16.0dev (#3072) * Adds profiling flags, computes train metrics average. (#3053) * WIP controlnet training - bugfix --streaming - bugfix running report_to!='wandb' - adds memory profile before validation * Adds final logging statement. * Sets train epochs to 11. Looking at a longer ~16ep run, we see only good validation images after ~11ep: https://wandb.ai/andsteing/controlnet_fill50k/runs/3j2hx6n8 * Removes --logging_dir (it's not used). * Adds --profile flags. * Updates --output_dir=runs/fill-circle-{timestamp}. * Compute mean of `train_metrics`. Previously `train_metrics[-1]` was logged, resulting in very bumpy train metrics. * Improves logging a bit. - adds l2_grads gradient norm logging - adds steps_per_sec - sets walltime as x coordinate of train/step - logs controlnet_params config * Adds --ccache (doesn't really help though). * minor fix in controlnet flax example (#2986) * fix the error when push_to_hub but not log validation * contronet_from_pt & controlnet_revision * add intermediate checkpointing to the guide * Bugfix --profile_steps * Sets `RACKER_PROJECT_NAME='controlnet_fill50k'`. * Logs fractional epoch. * Adds relative `walltime` metric. * Adds `StepTraceAnnotation` and uses `global_step` insetad of `step`. * Applied `black`. * Streamlines commands in README a bit. * Removes `--ccache`. This makes only a very small difference (~1 min) with this model size, so removing the option introduced in cdb3cc. * Re-ran `black`. * Update examples/controlnet/README.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Converts spaces to tab. * Removes repeated args. * Skips first step (compilation) in profiling * Updates README with profiling instructions. * Unifies tabs/spaces in README. * Re-ran style & quality. --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * [Pipelines] Make sure that None functions are correctly not saved (#3080) * doc string example remove from_pt (#3083) * [Tests] parallelize (#3078) * [Tests] parallelize * finish folder structuring * Parallelize tests more * Correct saving of pipelines * make sure logging level is correct * try again * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Throw deprecation warning for return_cached_folder (#3092) Throw deprecation warning * Allow SD attend and excite pipeline to work with any size output images (#2835) Allow stable diffusion attend and excite pipeline to work with any size output image. Re: #2476, #2603 * [docs] Update community pipeline docs (#2989) * update community pipeline docs * fix formatting * explain sharing workflows * Add to support Guess Mode for StableDiffusionControlnetPipleline (#2998) * add guess mode (WIP) * fix uncond/cond order * support guidance_scale=1.0 and batch != 1 * remove magic coeff * add docstring * add intergration test * add document to controlnet.mdx * made the comments a bit more explanatory * fix table * fix default value for attend-and-excite (#3099) * fix default * remvoe one line as requested by gc team (#3077) remvoe one line * ddpm custom timesteps (#3007) add custom timesteps test add custom timesteps descending order check docs timesteps -> custom_timesteps can only pass one of num_inference_steps and timesteps * Fix breaking change in `pipeline_stable_diffusion_controlnet.py` (#3118) fix breaking change * Add global pooling to controlnet (#3121) * [Bug fix] Fix img2img processor with safety checker (#3127) Fix img2img processor with safety checker * [Bug fix] Make sure correct timesteps are chosen for img2img (#3128) Make sure correct timesteps are chosen for img2img * Improve deprecation warnings (#3131) * Fix config deprecation (#3129) * Better deprecation message * Better deprecation message * Better doc string * Fixes * fix more * fix more * Improve __getattr__ * correct more * fix more * fix * Improve more * more improvements * fix more * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * make style * Fix all rest & add tests & remove old deprecation fns --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * feat: verfication of multi-gpu support for select examples. (#3126) * feat: verfication of multi-gpu support for select examples. * add: multi-gpu training sections to the relvant doc pages. * speed up attend-and-excite fast tests (#3079) * Optimize log_validation in train_controlnet_flax (#3110) extract pipeline from log_validation * make style * Correct textual inversion readme (#3145) * Update README.md * Apply suggestions from code review * Add unet act fn to other model components (#3136) Adding act fn config to the unet timestep class embedding and conv activation. The custom activation defaults to silu which is the default activation function for both the conv act and the timestep class embeddings so default behavior is not changed. The only unet which use the custom activation is the stable diffusion latent upscaler https://huggingface.co/stabilityai/sd-x2-latent-upscaler/blob/main/unet/config.json (I ran a script against the hub to confirm). The latent upscaler does not use the conv activation nor the timestep class embeddings so we don't change its behavior. * class labels timestep embeddings projection dtype cast (#3137) This mimics the dtype cast for the standard time embeddings * [ckpt loader] Allow loading the Inpaint and Img2Img pipelines, while loading a ckpt model (#2705) * [ckpt loader] Allow loading the Inpaint and Img2Img pipelines, while loading a ckpt model * Address review comment from PR * PyLint formatting * Some more pylint fixes, unrelated to our change * Another pylint fix * Styling fix * add from_ckpt method as Mixin (#2318) * add mixin class for pipeline from original sd ckpt * Improve * make style * merge main into * Improve more * fix more * up * Apply suggestions from code review * finish docs * rename * make style --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils (#2974) * Add SD/txt2img Community Pipeline to diffusers along with TensorRT utils Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> * update installation command Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> * update tensorrt installation Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> * changes 1. Update setting of cache directory 2. Address comments: merge utils and pipeline code. 3. Address comments: Add section in README Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> * apply make style Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> --------- Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Correct `Transformer2DModel.forward` docstring (#3074) ⚙️chore(transformer_2d) update function signature for encoder_hidden_states * Update pipeline_stable_diffusion_inpaint_legacy.py (#2903) * Update pipeline_stable_diffusion_inpaint_legacy.py * fix preprocessing of Pil images with adequate batch size * revert map * add tests * reformat * Update test_stable_diffusion_inpaint_legacy.py * Update test_stable_diffusion_inpaint_legacy.py * Update test_stable_diffusion_inpaint_legacy.py * Update test_stable_diffusion_inpaint_legacy.py * next try to fix the style * wth is this * Update testing_utils.py * Update testing_utils.py * Update test_stable_diffusion_inpaint_legacy.py * Update test_stable_diffusion_inpaint_legacy.py * Update test_stable_diffusion_inpaint_legacy.py * Update test_stable_diffusion_inpaint_legacy.py * Update test_stable_diffusion_inpaint_legacy.py * Update test_stable_diffusion_inpaint_legacy.py --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Modified altdiffusion pipline to support altdiffusion-m18 (#2993) * Modified altdiffusion pipline to support altdiffusion-m18 * Modified altdiffusion pipline to support altdiffusion-m18 * Modified altdiffusion pipline to support altdiffusion-m18 * Modified altdiffusion pipline to support altdiffusion-m18 * Modified altdiffusion pipline to support altdiffusion-m18 * Modified altdiffusion pipline to support altdiffusion-m18 * Modified altdiffusion pipline to support altdiffusion-m18 --------- Co-authored-by: root <fulong_ye@163.com> * controlnet training resize inputs to multiple of 8 (#3135) controlnet training center crop input images to multiple of 8 The pipeline code resizes inputs to multiples of 8. Not doing this resizing in the training script is causing the encoded image to have different height/width dimensions than the encoded conditioning image (which uses a separate encoder that's part of the controlnet model). We resize and center crop the inputs to make sure they're the same size (as well as all other images in the batch). We also check that the initial resolution is a multiple of 8. * adding custom diffusion training to diffusers examples (#3031) * diffusers==0.14.0 update * custom diffusion update * custom diffusion update * custom diffusion update * custom diffusion update * custom diffusion update * custom diffusion update * custom diffusion * custom diffusion * custom diffusion * custom diffusion * custom diffusion * apply formatting and get rid of bare except. * refactor readme and other minor changes. * misc refactor. * fix: repo_id issue and loaders logging bug. * fix: save_model_card. * fix: save_model_card. * fix: save_model_card. * add: doc entry. * refactor doc,. * custom diffusion * custom diffusion * custom diffusion * apply style. * remove tralining whitespace. * fix: toctree entry. * remove unnecessary print. * custom diffusion * custom diffusion * custom diffusion test * custom diffusion xformer update * custom diffusion xformer update * custom diffusion xformer update --------- Co-authored-by: Nupur Kumari <nupurkumari@Nupurs-MacBook-Pro.local> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nupur Kumari <nupurkumari@nupurs-mbp.wifi.local.cmu.edu> * make style * Update custom_diffusion.mdx (#3165) Add missing newlines for rendering the links correctly * Added distillation for quantization example on textual inversion. (#2760) * Added distillation for quantization example on textual inversion. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * refined readme and code style. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * Update text2images.py * refined code of model load and added compatibility check. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * fixed code style. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * fix C403 [] Unnecessary `list` comprehension (rewrite as a `set` comprehension) Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> --------- Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> Update Noise Autocorrelation Loss Function for Pix2PixZero Pipeline (#2942) * Update Pix2PixZero Auto-correlation Loss * Add fast inversion tests * Clarify purpose and mark as deprecated Fix inversion prompt broadcasting * Register modules set to `None` in config for `test_save_load_optional_components` * Update new tests to coordinate with #2953 * [DreamBooth] add text encoder LoRA support in the DreamBooth training script (#3130) * add: LoRA text encoder support for DreamBooth example. * fix initialization. * fix: modification call. * add: entry in the readme. * use dog dataset from hub. * fix: params to clip. * add entry to the LoRA doc. * add: tests for lora. * remove unnecessary list comprehension./ * Update Habana Gaudi documentation (#3169) * Update Habana Gaudi doc * Fix tables * Add model offload to x4 upscaler (#3187) * Add model offload to x4 upscaler * fix * [docs] Deterministic algorithms (#3172) deterministic algos * Update custom_diffusion.mdx to credit the author (#3163) * Update custom_diffusion.mdx * fix: unnecessary list comprehension. * Fix TensorRT community pipeline device set function (#3157) pass silence_dtype_warnings as kwarg Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make `from_flax` work for controlnet (#3161) fix from_flax Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [docs] Clarify training args (#3146) * clarify training arg * apply feedback * Multi Vector Textual Inversion (#3144) * Multi Vector * Improve * fix multi token * improve test * make style * Update examples/test_examples.py * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * update * Finish * Apply suggestions from code review --------- Co-authored-by: Suraj Patil <surajp815@gmail.com> * Add `Karras sigmas` to HeunDiscreteScheduler (#3160) * Add karras pattern to discrete heun scheduler * Add integration test * Fix failing CI on pytorch test on M1 (mps) --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [AudioLDM] Fix dtype of returned waveform (#3189) * Fix bug in train_dreambooth_lora (#3183) * Update train_dreambooth_lora.py fix bug * Update train_dreambooth_lora.py * [Community Pipelines] Update lpw_stable_diffusion pipeline (#3197) * Update lpw_stable_diffusion.py * fix cpu offload * Make sure VAE attention works with Torch 2_0 (#3200) * Make sure attention works with Torch 2_0 * make style * Fix more * Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline" (#3201) Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline (#3197)" This reverts commit `9965cb50ea`. * [Bug fix] Fix batch size attention head size mismatch (#3214) * fix mixed precision training on train_dreambooth_inpaint_lora (#3138) cast to weight dtype * adding enable_vae_tiling and disable_vae_tiling functions (#3225) adding enable_vae_tiling and disable_val_tiling functions * Add ControlNet v1.1 docs (#3226) Add v1.1 docs * Fix issue in maybe_convert_prompt (#3188) When the token used for textual inversion does not have any special symbols (e.g. it is not surrounded by <>), the tokenizer does not properly split the replacement tokens. Adding a space for the padding tokens fixes this. * Sync cache version check from transformers (#3179) sync cache version check from transformers * Fix docs text inversion (#3166) * Fix docs text inversion * Apply suggestions from code review * add model (#3230) * add * clean * up * clean up more * fix more tests * Improve docs further * improve * more fixes docs * Improve docs more * Update src/diffusers/models/unet_2d_condition.py * fix * up * update doc links * make fix-copies * add safety checker and watermarker to stage 3 doc page code snippets * speed optimizations docs * memory optimization docs * make style * add watermarking snippets to doc string examples * make style * use pt_to_pil helper functions in doc strings * skip mps tests * Improve safety * make style * new logic * fix * fix bad onnx design * make new stable diffusion upscale pipeline model arguments optional * define has_nsfw_concept when non-pil output type * lowercase linked to notebook name --------- Co-authored-by: William Berman <WLBberman@gmail.com> * Allow return pt x4 (#3236) * Add all files * update * Allow fp16 attn for x4 upscaler (#3239) * Add all files * update * Make sure vae is memory efficient for PT 1 * make style * fix fast test (#3241) * Adds a document on token merging (#3208) * add document on token merging. * fix headline. * fix: headline. * add some samples for comparison. * [AudioLDM] Update docs to use updated ckpt (#3240) * [AudioLDM] Update docs to use updated ckpt * make style * Release: v0.16.0 * Post release for 0.16.0 (#3244) * Post release * fix more * [docs] only mention one stage (#3246) * [docs] only mention one stage * add blurb on auto accepting --------- Co-authored-by: William Berman <WLBberman@gmail.com> * Write model card in controlnet training script (#3229) Write model card in controlnet training script. * [2064]: Add stochastic sampler (sample_dpmpp_sde) (#3020) * [2064]: Add stochastic sampler * [2064]: Add stochastic sampler * [2064]: Add stochastic sampler * [2064]: Add stochastic sampler * [2064]: Add stochastic sampler * [2064]: Add stochastic sampler * [2064]: Add stochastic sampler * Review comments * [Review comment]: Add is_torchsde_available() * [Review comment]: Test and docs * [Review comment] * [Review comment] * [Review comment] * [Review comment] * [Review comment] --------- Co-authored-by: njindal <njindal@adobe.com> * [Stochastic Sampler][Slow Test]: Cuda test fixes (#3257) [Slow Test]: Cuda test fixes Co-authored-by: njindal <njindal@adobe.com> * Remove required from tracker_project_name (#3260) Remove required from tracker_project_name. As observed by https://github.com/off99555 in https://github.com/huggingface/diffusers/issues/2695#issuecomment-1470755050, it already has a default value. * adding required parameters while calling the get_up_block and get_down_block (#3210) * removed unnecessary parameters from get_up_block and get_down_block functions * adding resnet_skip_time_act, resnet_out_scale_factor and cross_attention_norm to get_up_block and get_down_block functions --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * [docs] Update interface in repaint.mdx (#3119) Update repaint.mdx accomodate to #1701 * Update IF name to XL (#3262) Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com> * fix typo in score sde pipeline (#3132) * Fix typo in textual inversion JAX training script (#3123) The pipeline is built as `pipe` but then used as `pipeline`. * AudioDiffusionPipeline - fix encode method after config changes (#3114) * config fixes * deprecate get_input_dims * Revert "Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline"" (#3265) Revert "Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline" (#3201)" This reverts commit `91a2a80eb2`. * Fix community pipelines (#3266) * update notebook (#3259) Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan> * [docs] add notes for stateful model changes (#3252) * [docs] add notes for stateful model changes * Update docs/source/en/optimization/fp16.mdx Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * link to accelerate docs for discarding hooks --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * [LoRA] quality of life improvements in the loading semantics and docs (#3180) * 👽 qol improvements for LoRA. * better function name? * fix: LoRA weight loading with the new format. * address Patrick's comments. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * change wording around encouraging the use of load_lora_weights(). * fix: function name. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Community Pipelines] EDICT pipeline implementation (#3153) * EDICT pipeline initial commit - Starting point taking from https://github.com/Joqsan/edict-diffusion * refactor __init__() method * minor refactoring * refactor scheduler code - remove scheduler and move its methods to the EDICTPipeline class * make CFG optional - refactor encode_prompt(). - include optional generator for sampling with vae. - minor variable renaming * add EDICT pipeline description to README.md * replace preprocess() with VaeImageProcessor * run make style and make quality commands --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Docs]zh translated docs update (#3245) * zh translated docs update * update _toctree * Update logging.mdx (#2863) Fix typos * Add multiple conditions to StableDiffusionControlNetInpaintPipeline (#3125) * try multi controlnet inpaint * multi controlnet inpaint * multi controlnet inpaint * Let's make sure that dreambooth always uploads to the Hub (#3272) * Update Dreambooth README * Adapt all docs as well * automatically write model card * fix * make style * Diffedit Zero-Shot Inpainting Pipeline (#2837) * Update Pix2PixZero Auto-correlation Loss * Add Stable Diffusion DiffEdit pipeline * Add draft documentation and import code * Bugfixes and refactoring * Add option to not decode latents in the inversion process * Harmonize preprocessing * Revert "Update Pix2PixZero Auto-correlation Loss" This reverts commit `b218062fed`. * Update annotations * rename `compute_mask` to `generate_mask` * Update documentation * Update docs * Update Docs * Fix copy * Change shape of output latents to batch first * Update docs * Add first draft for tests * Bugfix and update tests * Add `cross_attention_kwargs` support for all pipeline methods * Fix Copies * Add support for PIL image latents Add support for mask broadcasting Update docs and tests Align `mask` argument to `mask_image` Remove height and width arguments * Enable MPS Tests * Move example docstrings * Fix test * Fix test * fix pipeline inheritance * Harmonize `prepare_image_latents` with StableDiffusionPix2PixZeroPipeline * Register modules set to `None` in config for `test_save_load_optional_components` * Move fixed logic to specific test class * Clean changes to other pipelines * Update new tests to coordinate with #2953 * Update slow tests for better results * Safety to avoid potential problems with torch.inference_mode * Add reference in SD Pipeline Overview * Fix tests again * Enforce determinism in noise for generate_mask * Fix copies * Widen test tolerance for fp16 based on `test_stable_diffusion_upscale_pipeline_fp16` * Add LoraLoaderMixin and update `prepare_image_latents` * clean up repeat and reg * bugfix * Remove invalid args from docs Suppress spurious warning by repeating image before latent to mask gen * add constant learning rate with custom rule (#3133) * add constant lr with rules * add constant with rules in TYPE_TO_SCHEDULER_FUNCTION * add constant lr rate with rule * hotfix code quality * fix doc style * change name constant_with_rules to piecewise constant * Allow disabling torch 2_0 attention (#3273) * Allow disabling torch 2_0 attention * make style * Update src/diffusers/models/attention.py * [doc] add link to training script (#3271) add link to training script Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan> * temp disable spectogram diffusion tests (#3278) The note-seq package throws an error on import because the default installed version of Ipython is not compatible with python 3.8 which we run in the CI. https://github.com/huggingface/diffusers/actions/runs/4830121056/jobs/8605954838#step:7:9 * Changed sample[0] to images[0] (#3304) A pipeline object stores the results in `images` not in `sample`. Current code blocks don't work. * Typo in tutorial (#3295) * Torch compile graph fix (#3286) * fix more * Fix more * fix more * Apply suggestions from code review * fix * make style * make fix-copies * fix * make sure torch compile * Clean * fix test * Postprocessing refactor img2img (#3268) * refactor img2img VaeImageProcessor.postprocess * remove copy from for init, run_safety_checker, decode_latents Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> --------- Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * [Torch 2.0 compile] Fix more torch compile breaks (#3313) * Fix more torch compile breaks * add tests * Fix all * fix controlnet * fix more * Add Horace He as co-author. > > Co-authored-by: Horace He <horacehe2007@yahoo.com> * Add Horace He as co-author. Co-authored-by: Horace He <horacehe2007@yahoo.com> --------- Co-authored-by: Horace He <horacehe2007@yahoo.com> * fix: scale_lr and sync example readme and docs. (#3299) * fix: scale_lr and sync example readme and docs. * fix doc link. * Update stable_diffusion.mdx (#3310) fixed import statement * Fix missing variable assign in DeepFloyd-IF-II (#3315) Fix missing variable assign lol * Correct doc build for patch releases (#3316) Update build_documentation.yml * Add Stable Diffusion RePaint to community pipelines (#3320) * Add Stable Diffsuion RePaint to community pipelines - Adds Stable Diffsuion RePaint to community pipelines - Add Readme enty for pipeline * Fix: Remove wrong import - Remove wrong import - Minor change in comments * Fix: Code formatting of stable_diffusion_repaint * Fix: ruff errors in stable_diffusion_repaint * Fix multistep dpmsolver for cosine schedule (suitable for deepfloyd-if) (#3314) * fix multistep dpmsolver for cosine schedule (deepfloy-if) * fix a typo * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * update all dpmsolver (singlestep, multistep, dpm, dpm++) for cosine noise schedule * add test, fix style --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [docs] Improve LoRA docs (#3311) * update docs * add to toctree * apply feedback * Added input pretubation (#3292) * Added input pretubation * Fixed spelling * Update write_own_pipeline.mdx (#3323) * update controlling generation doc with latest goodies. (#3321) * [Quality] Make style (#3341) * Fix config dpm (#3343) * Add the SDE variant of DPM-Solver and DPM-Solver++ (#3344) * add SDE variant of DPM-Solver and DPM-Solver++ * add test * fix typo * fix typo * Add upsample_size to AttnUpBlock2D, AttnDownBlock2D (#3275) The argument `upsample_size` needs to be added to these modules to allow compatibility with other blocks that require this argument. * Add UniDiffuser classes to __init__ files, modify transformer block to support pre- and post-LN, add fast default tests, fix some bugs. * Update fast tests to use test checkpoints stored on the hub and to better match the reference UniDiffuser implementation. * Fix code with make style. * Revert "Fix code style with make style." This reverts commit `10a174a12c`. * Add self.image_encoder, self.text_decoder to list of models to offload to CPU in the enable_sequential_cpu_offload(...)/enable_model_cpu_offload(...) methods to make test_cpu_offload_forward_pass pass. * Fix code quality with make style. * Support using a data type embedding for UniDiffuser-v1. * Add fast test for checking UniDiffuser-v1 sampling. * Make changes so that the repository consistency tests pass. * Add UniDiffuser dummy objects via make fix-copies. * Fix bugs and make improvements to the UniDiffuser pipeline: - Improve batch size inference and fix bugs when num_images_per_prompt or num_prompts_per_image > 1 - Add tests for num_images_per_prompt, num_prompts_per_image > 1 - Improve check_inputs, especially regarding checking supplied latents - Add reset_mode method so that mode inference can be re-enabled after mode is set manually - Fix some warnings related to accessing class members directly instead of through their config - Small amount of refactoring in pipeline_unidiffuser.py * Fix code style with make style. * Add/edit docstrings for added classes and public pipeline methods. Also do some light refactoring. * Add documentation for UniDiffuser and fix some typos/formatting in docstrings. * Fix code with make style. * Refactor and improve the UniDiffuser convert_from_ckpt.py script. * Move the UniDiffusers convert_from_ckpy.py script to diffusers/scripts/convert_unidiffuser_to_diffusers.py * Fix code quality via make style. * Improve UniDiffuser slow tests. * make style * Fix some typos in the UniDiffuser docs. * Remove outdated logic based on transformers version in UniDiffuser pipeline __init__.py * Remove dependency on einops by refactoring einops operations to pure torch operations. * make style * Add slow test on full checkpoint for joint mode and correct expected image slices/text prefixes. * make style * Fix mixed precision issue by wrapping the offending code with the torch.autocast context manager. * Revert "Fix mixed precision issue by wrapping the offending code with the torch.autocast context manager." This reverts commit `1a58958ab4`. * Add fast test for CUDA/fp16 model behavior (currently failing). * Fix the mixed precision issue and add additional tests of the pipeline cuda/fp16 functionality. * make style * Use a CLIPVisionModelWithProjection instead of CLIPVisionModel for image_encoder to better match the original UniDiffuser implementation. * Make style and remove some testing code. * Fix shape errors for the 'joint' and 'img2text' modes. * Fix tests and remove some testing code. * Add option to use fixed latents for UniDiffuserPipelineSlowTests and fix issue in modeling_text_decoder.py. * Improve UniDiffuser docs, particularly the usage examples, and improve slow tests with new expected outputs. * make style * Fix examples to load model in float16. * In image-to-text mode, sample from the autoencoder moment distribution instead of always getting its mode. * make style * When encoding the image using the VAE, scale the image latents by the VAE's scaling factor. * make style * Clean up code and make slow tests pass. * make fix-copies * [docs] Fix docstring (#3334) fix docstring Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * if dreambooth lora (#3360) * update IF stage I pipelines add fixed variance schedulers and lora loading * added kv lora attn processor * allow loading into alternative lora attn processor * make vae optional * throw away predicted variance * allow loading into added kv lora layer * allow load T5 * allow pre compute text embeddings * set new variance type in schedulers * fix copies * refactor all prompt embedding code class prompts are now included in pre-encoding code max tokenizer length is now configurable embedding attention mask is now configurable * fix for when variance type is not defined on scheduler * do not pre compute validation prompt if not present * add example test for if lora dreambooth * add check for train text encoder and pre compute text embeddings * Postprocessing refactor all others (#3337) * add text2img * fix-copies * add * add all other pipelines * add * add * add * add * add * make style * style + fix copies --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> * [docs] Improve safetensors docstring (#3368) * clarify safetensor docstring * fix typo * apply feedback * add: a warning message when using xformers in a PT 2.0 env. (#3365) * add: a warning message when using xformers in a PT 2.0 env. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * StableDiffusionInpaintingPipeline - resize image w.r.t height and width (#3322) * StableDiffusionInpaintingPipeline now resizes input images and masks w.r.t to passed input height and width. Default is already set to 512. This addresses the common tensor mismatch error. Also moved type check into relevant funciton to keep main pipeline body tidy. * Fixed StableDiffusionInpaintingPrepareMaskAndMaskedImageTests Due to previous commit these tests were failing as height and width need to be passed into the prepare_mask_and_masked_image function, I have updated the code and added a height/width variable per unit test as it seemed more appropriate than the current hard coded solution * Added a resolution test to StableDiffusionInpaintPipelineSlowTests this unit test simply gets the input and resizes it into some that would fail (e.g. would throw a tensor mismatch error/not a mult of 8). Then passes it through the pipeline and verifies it produces output with correct dims w.r.t the passed height and width --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * [docs] Adapt a model (#3326) * first draft * apply feedback * conv_in.weight thrown away * [docs] Load safetensors (#3333) * safetensors * apply feedback * apply feedback * Apply suggestions from code review --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * [Docs] Fix stable_diffusion.mdx typo (#3398) Fix typo in last code block. Correct "prommpts" to "prompt" * Support ControlNet v1.1 shuffle properly (#3340) * add inferring_controlnet_cond_batch * Revert "add inferring_controlnet_cond_batch" This reverts commit `abe8d6311d`. * set guess_mode to True whenever global_pool_conditions is True Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * nit * add integration test --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Tests] better determinism (#3374) * enable deterministic pytorch and cuda operations. * disable manual seeding. * make style && make quality for unet_2d tests. * enable determinism for the unet2dconditional model. * add CUBLAS_WORKSPACE_CONFIG for better reproducibility. * relax tolerance (very weird issue, though). * revert to torch manual_seed() where needed. * relax more tolerance. * better placement of the cuda variable and relax more tolerance. * enable determinism for 3d condition model. * relax tolerance. * add: determinism to alt_diffusion. * relax tolerance for alt diffusion. * dance diffusion. * dance diffusion is flaky. * test_dict_tuple_outputs_equivalent edit. * fix two more tests. * fix more ddim tests. * fix: argument. * change to diff in place of difference. * fix: test_save_load call. * test_save_load_float16 call. * fix: expected_max_diff * fix: paint by example. * relax tolerance. * add determinism to 1d unet model. * torch 2.0 regressions seem to be brutal * determinism to vae. * add reason to skipping. * up tolerance. * determinism to vq. * determinism to cuda. * determinism to the generic test pipeline file. * refactor general pipelines testing a bit. * determinism to alt diffusion i2i * up tolerance for alt diff i2i and audio diff * up tolerance. * determinism to audioldm * increase tolerance for audioldm lms. * increase tolerance for paint by paint. * increase tolerance for repaint. * determinism to cycle diffusion and sd 1. * relax tol for cycle diffusion 🚲 * relax tol for sd 1.0 * relax tol for controlnet. * determinism to img var. * relax tol for img variation. * tolerance to i2i sd * make style * determinism to inpaint. * relax tolerance for inpaiting. * determinism for inpainting legacy * relax tolerance. * determinism to instruct pix2pix * determinism to model editing. * model editing tolerance. * panorama determinism * determinism to pix2pix zero. * determinism to sag. * sd 2. determinism * sd. tolerance * disallow tf32 matmul. * relax tolerance is all you need. * make style and determinism to sd 2 depth * relax tolerance for depth. * tolerance to diffedit. * tolerance to sd 2 inpaint. * up tolerance. * determinism in upscaling. * tolerance in upscaler. * more tolerance relaxation. * determinism to v pred. * up tol for v_pred * unclip determinism * determinism to unclip img2img * determinism to text to video. * determinism to last set of tests * up tol. * vq cumsum doesn't have a deterministic kernel * relax tol * relax tol * [docs] Add transformers to install (#3388) add transformers to install * [deepspeed] partial ZeRO-3 support (#3076) * [deepspeed] partial ZeRO-3 support * cleanup * improve deepspeed fixes * Improve * make style --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add omegaconf for tests (#3400) Add omegaconfg * Fix various bugs with LoRA Dreambooth and Dreambooth script (#3353) * Improve checkpointing lora * fix more * Improve doc string * Update src/diffusers/loaders.py * make stytle * Apply suggestions from code review * Update src/diffusers/loaders.py * Apply suggestions from code review * Apply suggestions from code review * better * Fix all * Fix multi-GPU dreambooth * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix all * make style * make style --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix docker file (#3402) * up * up * fix: deepseepd_plugin retrieval from accelerate state (#3410) * [Docs] Add `sigmoid` beta_scheduler to docstrings of relevant Schedulers (#3399) * Add `sigmoid` beta scheduler to `DDPMScheduler` docstring * Add `sigmoid` beta scheduler to `RePaintScheduler` docstring --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Don't install accelerate and transformers from source (#3415) * Don't install transformers and accelerate from source (#3414) * Improve fast tests (#3416) Update pr_tests.yml * attention refactor: the trilogy (#3387) * Replace `AttentionBlock` with `Attention` * use _from_deprecated_attn_block check re: @patrickvonplaten * [Docs] update the PT 2.0 optimization doc with latest findings (#3370) * add: benchmarking stats for A100 and V100. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * address patrick's comments. * add: rtx 4090 stats * ⚔ benchmark reports done * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * 3313 pr link. * add: plots. Co-authored-by: Pedro <pedro@huggingface.co> * fix formattimg * update number percent. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix style rendering (#3433) * Fix style rendering. * Fix typo * unCLIP scheduler do not use note (#3417) * Replace deprecated command with environment file (#3409) Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fix warning message pipeline loading (#3446) * add stable diffusion tensorrt img2img pipeline (#3419) * add stable diffusion tensorrt img2img pipeline Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> * update docstrings Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> --------- Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> * Refactor controlnet and add img2img and inpaint (#3386) * refactor controlnet and add img2img and inpaint * First draft to get pipelines to work * make style * Fix more * Fix more * More tests * Fix more * Make inpainting work * make style and more tests * Apply suggestions from code review * up * make style * Fix imports * Fix more * Fix more * Improve examples * add test * Make sure import is correctly deprecated * Make sure everything works in compile mode * make sure authorship is correctly attributed * [Scheduler] DPM-Solver (++) Inverse Scheduler (#3335) * Add DPM-Solver Multistep Inverse Scheduler * Add draft tests for DiffEdit * Add inverse sde-dpmsolver steps to tune image diversity from inverted latents * Fix tests --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Docs] Fix incomplete docstring for resnet.py (#3438) Fix incomplete docstrings for resnet.py * fix tiled vae blend extent range (#3384) fix tiled vae bleand extent range * Small update to "Next steps" section (#3443) Small update to "Next steps" section: - PyTorch 2 is recommended. - Updated improvement figures. * Allow arbitrary aspect ratio in IFSuperResolutionPipeline (#3298) * Update pipeline_if_superresolution.py Allow arbitrary aspect ratio in IFSuperResolutionPipeline by using the input image shape * IFSuperResolutionPipeline: allow the user to override the height and width through the arguments * update IFSuperResolutionPipeline width/height doc string to match StableDiffusionInpaintPipeline conventions --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Adding 'strength' parameter to StableDiffusionInpaintingPipeline (#3424) * Added explanation of 'strength' parameter * Added get_timesteps function which relies on new strength parameter * Added `strength` parameter which defaults to 1. * Swapped ordering so `noise_timestep` can be calculated before masking the image this is required when you aren't applying 100% noise to the masked region, e.g. strength < 1. * Added strength to check_inputs, throws error if out of range * Changed `prepare_latents` to initialise latents w.r.t strength inspired from the stable diffusion img2img pipeline, init latents are initialised by converting the init image into a VAE latent and adding noise (based upon the strength parameter passed in), e.g. random when strength = 1, or the init image at strength = 0. * WIP: Added a unit test for the new strength parameter in the StableDiffusionInpaintingPipeline still need to add correct regression values * Created a is_strength_max to initialise from pure random noise * Updated unit tests w.r.t new strength parameter + fixed new strength unit test * renamed parameter to avoid confusion with variable of same name * Updated regression values for new strength test - now passes * removed 'copied from' comment as this method is now different and divergent from the cpy * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Ensure backwards compatibility for prepare_mask_and_masked_image created a return_image boolean and initialised to false * Ensure backwards compatibility for prepare_latents * Fixed copy check typo * Fixes w.r.t backward compibility changes * make style * keep function argument ordering same for backwards compatibility in callees with copied from statements * make fix-copies --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: William Berman <WLBberman@gmail.com> * [WIP] Bugfix - Pipeline.from_pretrained is broken when the pipeline is partially downloaded (#3448) Added bugfix using f strings. * Fix gradient checkpointing bugs in freezing part of models (requires_grad=False) (#3404) * gradient checkpointing bug fix * bug fix; changes for reviews * reformat * reformat --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Make dreambooth lora more robust to orig unet (#3462) * Make dreambooth lora more robust to orig unet * up * Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) (#3463) Release large tensors in attention (as soon as they're no longer required). Reduces peak VRAM by nearly 2 GB for 1024x1024 (even after slicing), and the savings scale up with image size. * Add min snr to text2img lora training script (#3459) add min snr to text2img lora training script * Add inpaint lora scale support (#3460) * add inpaint lora scale support * add inpaint lora scale test --------- Co-authored-by: yueyang.hyy <yueyang.hyy@alibaba-inc.com> * [From ckpt] Fix from_ckpt (#3466) * Correct from_ckpt * make style * Update full dreambooth script to work with IF (#3425) * Add IF dreambooth docs (#3470) * parameterize pass single args through tuple (#3477) * attend and excite tests disable determinism on the class level (#3478) * dreambooth docs torch.compile note (#3471) * dreambooth docs torch.compile note * Update examples/dreambooth/README.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update examples/dreambooth/README.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * add: if entry in the dreambooth training docs. (#3472) * [docs] Textual inversion inference (#3473) * add textual inversion inference to docs * add to toctree --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * [docs] Distributed inference (#3376) * distributed inference * move to inference section * apply feedback * update with split_between_processes * apply feedback * [{Up,Down}sample1d] explicit view kernel size as number elements in flattened indices (#3479) explicit view kernel size as number elements in flattened indices * mps & onnx tests rework (#3449) * Remove ONNX tests from PR. They are already a part of push_tests.yml. * Remove mps tests from PRs. They are already performed on push. * Fix workflow name for fast push tests. * Extract mps tests to a workflow. For better control/filtering. * Remove --extra-index-url from mps tests * Increase tolerance of mps test This test passes in my Mac (Ventura 13.3) but fails in the CI hardware (Ventura 13.2). I ran the local tests following the same steps that exist in the CI workflow. * Temporarily run mps tests on pr So we can test. * Revert "Temporarily run mps tests on pr" Tests passed, go back to running on push. * [Attention processor] Better warning message when shifting to `AttnProcessor2_0` (#3457) * add: debugging to enabling memory efficient processing * add: better warning message. * [Docs] add note on local directory path. (#3397) add note on local directory path. Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Refactor full determinism (#3485) * up * fix more * Apply suggestions from code review * fix more * fix more * Check it * Remove 16:8 * fix more * fix more * fix more * up * up * Test only stable diffusion * Test only two files * up * Try out spinning up processes that can be killed * up * Apply suggestions from code review * up * up * Fix DPM single (#3413) * Fix DPM single * add test * fix one more bug * Apply suggestions from code review Co-authored-by: StAlKeR7779 <stalkek7779@yandex.ru> --------- Co-authored-by: StAlKeR7779 <stalkek7779@yandex.ru> * Add `use_Karras_sigmas` to DPMSolverSinglestepScheduler (#3476) * add use_karras_sigmas * add karras test * add doc * Adds local_files_only bool to prevent forced online connection (#3486) * make style * [Docs] Korean translation (optimization, training) (#3488) * feat) optimization kr translation * fix) typo, italic setting * feat) dreambooth, text2image kr * feat) lora kr * fix) LoRA * fix) fp16 fix * fix) doc-builder style * fix) fp16 일부 단어 수정 * fix) fp16 style fix * fix) opt, training docs update * feat) toctree update * feat) toctree update --------- Co-authored-by: Chanran Kim <seriousran@gmail.com> * DataLoader respecting EXIF data in Training Images (#3465) * DataLoader will now bake in any transforms or image manipulations contained in the EXIF Images may have rotations stored in EXIF. Training using such images will cause those transforms to be ignored while training and thus produce unexpected results * Fixed the Dataloading EXIF issue in main DreamBooth training as well * Run make style (black & isort) * make style * feat: allow disk offload for diffuser models (#3285) * allow disk offload for diffuser models * sort import * add max_memory argument * Changed sample[0] to images[0] (#3304) A pipeline object stores the results in `images` not in `sample`. Current code blocks don't work. * Typo in tutorial (#3295) * Torch compile graph fix (#3286) * fix more * Fix more * fix more * Apply suggestions from code review * fix * make style * make fix-copies * fix * make sure torch compile * Clean * fix test * Postprocessing refactor img2img (#3268) * refactor img2img VaeImageProcessor.postprocess * remove copy from for init, run_safety_checker, decode_latents Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> --------- Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * [Torch 2.0 compile] Fix more torch compile breaks (#3313) * Fix more torch compile breaks * add tests * Fix all * fix controlnet * fix more * Add Horace He as co-author. > > Co-authored-by: Horace He <horacehe2007@yahoo.com> * Add Horace He as co-author. Co-authored-by: Horace He <horacehe2007@yahoo.com> --------- Co-authored-by: Horace He <horacehe2007@yahoo.com> * fix: scale_lr and sync example readme and docs. (#3299) * fix: scale_lr and sync example readme and docs. * fix doc link. * Update stable_diffusion.mdx (#3310) fixed import statement * Fix missing variable assign in DeepFloyd-IF-II (#3315) Fix missing variable assign lol * Correct doc build for patch releases (#3316) Update build_documentation.yml * Add Stable Diffusion RePaint to community pipelines (#3320) * Add Stable Diffsuion RePaint to community pipelines - Adds Stable Diffsuion RePaint to community pipelines - Add Readme enty for pipeline * Fix: Remove wrong import - Remove wrong import - Minor change in comments * Fix: Code formatting of stable_diffusion_repaint * Fix: ruff errors in stable_diffusion_repaint * Fix multistep dpmsolver for cosine schedule (suitable for deepfloyd-if) (#3314) * fix multistep dpmsolver for cosine schedule (deepfloy-if) * fix a typo * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * update all dpmsolver (singlestep, multistep, dpm, dpm++) for cosine noise schedule * add test, fix style --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [docs] Improve LoRA docs (#3311) * update docs * add to toctree * apply feedback * Added input pretubation (#3292) * Added input pretubation * Fixed spelling * Update write_own_pipeline.mdx (#3323) * update controlling generation doc with latest goodies. (#3321) * [Quality] Make style (#3341) * Fix config dpm (#3343) * Add the SDE variant of DPM-Solver and DPM-Solver++ (#3344) * add SDE variant of DPM-Solver and DPM-Solver++ * add test * fix typo * fix typo * Add upsample_size to AttnUpBlock2D, AttnDownBlock2D (#3275) The argument `upsample_size` needs to be added to these modules to allow compatibility with other blocks that require this argument. * Rename --only_save_embeds to --save_as_full_pipeline (#3206) * Set --only_save_embeds to False by default Due to how the option is named, it makes more sense to behave like this. * Refactor only_save_embeds to save_as_full_pipeline * [AudioLDM] Generalise conversion script (#3328) Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Fix TypeError when using prompt_embeds and negative_prompt (#2982) * test: Added test case * fix: fixed type checking issue on _encode_prompt * fix: fixed copies consistency * fix: one copy was not sufficient * Fix pipeline class on README (#3345) Update README.md * Inpainting: typo in docs (#3331) Typo in docs Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add `use_Karras_sigmas` to LMSDiscreteScheduler (#3351) * add karras sigma to lms discrete scheduler * add test for lms_scheduler karras * reformat test lms * Batched load of textual inversions (#3277) * Batched load of textual inversions - Only call resize_token_embeddings once per batch as it is the most expensive operation - Allow pretrained_model_name_or_path and token to be an optional list - Remove Dict from type annotation pretrained_model_name_or_path as it was not supported in this function - Add comment that single files (e.g. .pt/.safetensors) are supported - Add comment for token parameter - Convert token override log message from warning to info * Update src/diffusers/loaders.py Check for duplicate tokens Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update condition for None tokens --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make fix-copies * [docs] Fix docstring (#3334) fix docstring Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * if dreambooth lora (#3360) * update IF stage I pipelines add fixed variance schedulers and lora loading * added kv lora attn processor * allow loading into alternative lora attn processor * make vae optional * throw away predicted variance * allow loading into added kv lora layer * allow load T5 * allow pre compute text embeddings * set new variance type in schedulers * fix copies * refactor all prompt embedding code class prompts are now included in pre-encoding code max tokenizer length is now configurable embedding attention mask is now configurable * fix for when variance type is not defined on scheduler * do not pre compute validation prompt if not present * add example test for if lora dreambooth * add check for train text encoder and pre compute text embeddings * Postprocessing refactor all others (#3337) * add text2img * fix-copies * add * add all other pipelines * add * add * add * add * add * make style * style + fix copies --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> * [docs] Improve safetensors docstring (#3368) * clarify safetensor docstring * fix typo * apply feedback * add: a warning message when using xformers in a PT 2.0 env. (#3365) * add: a warning message when using xformers in a PT 2.0 env. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * StableDiffusionInpaintingPipeline - resize image w.r.t height and width (#3322) * StableDiffusionInpaintingPipeline now resizes input images and masks w.r.t to passed input height and width. Default is already set to 512. This addresses the common tensor mismatch error. Also moved type check into relevant funciton to keep main pipeline body tidy. * Fixed StableDiffusionInpaintingPrepareMaskAndMaskedImageTests Due to previous commit these tests were failing as height and width need to be passed into the prepare_mask_and_masked_image function, I have updated the code and added a height/width variable per unit test as it seemed more appropriate than the current hard coded solution * Added a resolution test to StableDiffusionInpaintPipelineSlowTests this unit test simply gets the input and resizes it into some that would fail (e.g. would throw a tensor mismatch error/not a mult of 8). Then passes it through the pipeline and verifies it produces output with correct dims w.r.t the passed height and width --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * [docs] Adapt a model (#3326) * first draft * apply feedback * conv_in.weight thrown away * [docs] Load safetensors (#3333) * safetensors * apply feedback * apply feedback * Apply suggestions from code review --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * [Docs] Fix stable_diffusion.mdx typo (#3398) Fix typo in last code block. Correct "prommpts" to "prompt" * Support ControlNet v1.1 shuffle properly (#3340) * add inferring_controlnet_cond_batch * Revert "add inferring_controlnet_cond_batch" This reverts commit `abe8d6311d`. * set guess_mode to True whenever global_pool_conditions is True Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * nit * add integration test --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Tests] better determinism (#3374) * enable deterministic pytorch and cuda operations. * disable manual seeding. * make style && make quality for unet_2d tests. * enable determinism for the unet2dconditional model. * add CUBLAS_WORKSPACE_CONFIG for better reproducibility. * relax tolerance (very weird issue, though). * revert to torch manual_seed() where needed. * relax more tolerance. * better placement of the cuda variable and relax more tolerance. * enable determinism for 3d condition model. * relax tolerance. * add: determinism to alt_diffusion. * relax tolerance for alt diffusion. * dance diffusion. * dance diffusion is flaky. * test_dict_tuple_outputs_equivalent edit. * fix two more tests. * fix more ddim tests. * fix: argument. * change to diff in place of difference. * fix: test_save_load call. * test_save_load_float16 call. * fix: expected_max_diff * fix: paint by example. * relax tolerance. * add determinism to 1d unet model. * torch 2.0 regressions seem to be brutal * determinism to vae. * add reason to skipping. * up tolerance. * determinism to vq. * determinism to cuda. * determinism to the generic test pipeline file. * refactor general pipelines testing a bit. * determinism to alt diffusion i2i * up tolerance for alt diff i2i and audio diff * up tolerance. * determinism to audioldm * increase tolerance for audioldm lms. * increase tolerance for paint by paint. * increase tolerance for repaint. * determinism to cycle diffusion and sd 1. * relax tol for cycle diffusion 🚲 * relax tol for sd 1.0 * relax tol for controlnet. * determinism to img var. * relax tol for img variation. * tolerance to i2i sd * make style * determinism to inpaint. * relax tolerance for inpaiting. * determinism for inpainting legacy * relax tolerance. * determinism to instruct pix2pix * determinism to model editing. * model editing tolerance. * panorama determinism * determinism to pix2pix zero. * determinism to sag. * sd 2. determinism * sd. tolerance * disallow tf32 matmul. * relax tolerance is all you need. * make style and determinism to sd 2 depth * relax tolerance for depth. * tolerance to diffedit. * tolerance to sd 2 inpaint. * up tolerance. * determinism in upscaling. * tolerance in upscaler. * more tolerance relaxation. * determinism to v pred. * up tol for v_pred * unclip determinism * determinism to unclip img2img * determinism to text to video. * determinism to last set of tests * up tol. * vq cumsum doesn't have a deterministic kernel * relax tol * relax tol * [docs] Add transformers to install (#3388) add transformers to install * [deepspeed] partial ZeRO-3 support (#3076) * [deepspeed] partial ZeRO-3 support * cleanup * improve deepspeed fixes * Improve * make style --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add omegaconf for tests (#3400) Add omegaconfg * Fix various bugs with LoRA Dreambooth and Dreambooth script (#3353) * Improve checkpointing lora * fix more * Improve doc string * Update src/diffusers/loaders.py * make stytle * Apply suggestions from code review * Update src/diffusers/loaders.py * Apply suggestions from code review * Apply suggestions from code review * better * Fix all * Fix multi-GPU dreambooth * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix all * make style * make style --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix docker file (#3402) * up * up * fix: deepseepd_plugin retrieval from accelerate state (#3410) * [Docs] Add `sigmoid` beta_scheduler to docstrings of relevant Schedulers (#3399) * Add `sigmoid` beta scheduler to `DDPMScheduler` docstring * Add `sigmoid` beta scheduler to `RePaintScheduler` docstring --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Don't install accelerate and transformers from source (#3415) * Don't install transformers and accelerate from source (#3414) * Improve fast tests (#3416) Update pr_tests.yml * attention refactor: the trilogy (#3387) * Replace `AttentionBlock` with `Attention` * use _from_deprecated_attn_block check re: @patrickvonplaten * [Docs] update the PT 2.0 optimization doc with latest findings (#3370) * add: benchmarking stats for A100 and V100. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * address patrick's comments. * add: rtx 4090 stats * ⚔ benchmark reports done * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * 3313 pr link. * add: plots. Co-authored-by: Pedro <pedro@huggingface.co> * fix formattimg * update number percent. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix style rendering (#3433) * Fix style rendering. * Fix typo * unCLIP scheduler do not use note (#3417) * Replace deprecated command with environment file (#3409) Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fix warning message pipeline loading (#3446) * add stable diffusion tensorrt img2img pipeline (#3419) * add stable diffusion tensorrt img2img pipeline Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> * update docstrings Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> --------- Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> * Refactor controlnet and add img2img and inpaint (#3386) * refactor controlnet and add img2img and inpaint * First draft to get pipelines to work * make style * Fix more * Fix more * More tests * Fix more * Make inpainting work * make style and more tests * Apply suggestions from code review * up * make style * Fix imports * Fix more * Fix more * Improve examples * add test * Make sure import is correctly deprecated * Make sure everything works in compile mode * make sure authorship is correctly attributed * [Scheduler] DPM-Solver (++) Inverse Scheduler (#3335) * Add DPM-Solver Multistep Inverse Scheduler * Add draft tests for DiffEdit * Add inverse sde-dpmsolver steps to tune image diversity from inverted latents * Fix tests --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Docs] Fix incomplete docstring for resnet.py (#3438) Fix incomplete docstrings for resnet.py * fix tiled vae blend extent range (#3384) fix tiled vae bleand extent range * Small update to "Next steps" section (#3443) Small update to "Next steps" section: - PyTorch 2 is recommended. - Updated improvement figures. * Allow arbitrary aspect ratio in IFSuperResolutionPipeline (#3298) * Update pipeline_if_superresolution.py Allow arbitrary aspect ratio in IFSuperResolutionPipeline by using the input image shape * IFSuperResolutionPipeline: allow the user to override the height and width through the arguments * update IFSuperResolutionPipeline width/height doc string to match StableDiffusionInpaintPipeline conventions --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Adding 'strength' parameter to StableDiffusionInpaintingPipeline (#3424) * Added explanation of 'strength' parameter * Added get_timesteps function which relies on new strength parameter * Added `strength` parameter which defaults to 1. * Swapped ordering so `noise_timestep` can be calculated before masking the image this is required when you aren't applying 100% noise to the masked region, e.g. strength < 1. * Added strength to check_inputs, throws error if out of range * Changed `prepare_latents` to initialise latents w.r.t strength inspired from the stable diffusion img2img pipeline, init latents are initialised by converting the init image into a VAE latent and adding noise (based upon the strength parameter passed in), e.g. random when strength = 1, or the init image at strength = 0. * WIP: Added a unit test for the new strength parameter in the StableDiffusionInpaintingPipeline still need to add correct regression values * Created a is_strength_max to initialise from pure random noise * Updated unit tests w.r.t new strength parameter + fixed new strength unit test * renamed parameter to avoid confusion with variable of same name * Updated regression values for new strength test - now passes * removed 'copied from' comment as this method is now different and divergent from the cpy * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Ensure backwards compatibility for prepare_mask_and_masked_image created a return_image boolean and initialised to false * Ensure backwards compatibility for prepare_latents * Fixed copy check typo * Fixes w.r.t backward compibility changes * make style * keep function argument ordering same for backwards compatibility in callees with copied from statements * make fix-copies --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: William Berman <WLBberman@gmail.com> * [WIP] Bugfix - Pipeline.from_pretrained is broken when the pipeline is partially downloaded (#3448) Added bugfix using f strings. * Fix gradient checkpointing bugs in freezing part of models (requires_grad=False) (#3404) * gradient checkpointing bug fix * bug fix; changes for reviews * reformat * reformat --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Make dreambooth lora more robust to orig unet (#3462) * Make dreambooth lora more robust to orig unet * up * Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) (#3463) Release large tensors in attention (as soon as they're no longer required). Reduces peak VRAM by nearly 2 GB for 1024x1024 (even after slicing), and the savings scale up with image size. * Add min snr to text2img lora training script (#3459) add min snr to text2img lora training script * Add inpaint lora scale support (#3460) * add inpaint lora scale support * add inpaint lora scale test --------- Co-authored-by: yueyang.hyy <yueyang.hyy@alibaba-inc.com> * [From ckpt] Fix from_ckpt (#3466) * Correct from_ckpt * make style * Update full dreambooth script to work with IF (#3425) * Add IF dreambooth docs (#3470) * parameterize pass single args through tuple (#3477) * attend and excite tests disable determinism on the class level (#3478) * dreambooth docs torch.compile note (#3471) * dreambooth docs torch.compile note * Update examples/dreambooth/README.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update examples/dreambooth/README.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * add: if entry in the dreambooth training docs. (#3472) * [docs] Textual inversion inference (#3473) * add textual inversion inference to docs * add to toctree --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * [docs] Distributed inference (#3376) * distributed inference * move to inference section * apply feedback * update with split_between_processes * apply feedback * [{Up,Down}sample1d] explicit view kernel size as number elements in flattened indices (#3479) explicit view kernel size as number elements in flattened indices * mps & onnx tests rework (#3449) * Remove ONNX tests from PR. They are already a part of push_tests.yml. * Remove mps tests from PRs. They are already performed on push. * Fix workflow name for fast push tests. * Extract mps tests to a workflow. For better control/filtering. * Remove --extra-index-url from mps tests * Increase tolerance of mps test This test passes in my Mac (Ventura 13.3) but fails in the CI hardware (Ventura 13.2). I ran the local tests following the same steps that exist in the CI workflow. * Temporarily run mps tests on pr So we can test. * Revert "Temporarily run mps tests on pr" Tests passed, go back to running on push. --------- Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> Co-authored-by: Ilia Larchenko <41329713+IliaLarchenko@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Horace He <horacehe2007@yahoo.com> Co-authored-by: Umar <55330742+mu94-csl@users.noreply.github.com> Co-authored-by: Mylo <36931363+gitmylo@users.noreply.github.com> Co-authored-by: Markus Pobitzer <markuspobitzer@gmail.com> Co-authored-by: Cheng Lu <lucheng.lc15@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Isamu Isozaki <isamu.website@gmail.com> Co-authored-by: Cesar Aybar <csaybar@gmail.com> Co-authored-by: Will Rice <will@spokestack.io> Co-authored-by: Adrià Arrufat <1671644+arrufat@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: At-sushi <dkahw210@kyoto.zaq.ne.jp> Co-authored-by: Lucca Zenóbio <luccazen@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: pdoane <pdoane2@gmail.com> Co-authored-by: Will Berman <wlbberman@gmail.com> Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Rupert Menneer <71332436+rupertmenneer@users.noreply.github.com> Co-authored-by: sudowind <wfpkueecs@163.com> Co-authored-by: Takuma Mori <takuma104@gmail.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Laureηt <laurentfainsin@protonmail.com> Co-authored-by: Jongwoo Han <jongwooo.han@gmail.com> Co-authored-by: asfiyab-nvidia <117682710+asfiyab-nvidia@users.noreply.github.com> Co-authored-by: clarencechen <clarencechenct@gmail.com> Co-authored-by: Laureηt <laurent@fainsin.bzh> Co-authored-by: superlabs-dev <133080491+superlabs-dev@users.noreply.github.com> Co-authored-by: Dev Aggarwal <devxpy@gmail.com> Co-authored-by: Vimarsh Chaturvedi <vimarsh.c@gmail.com> Co-authored-by: 7eu7d7 <31194890+7eu7d7@users.noreply.github.com> Co-authored-by: cmdr2 <shashank.shekhar.global@gmail.com> Co-authored-by: wfng92 <43742196+wfng92@users.noreply.github.com> Co-authored-by: Glaceon-Hyy <ffheyy0017@gmail.com> Co-authored-by: yueyang.hyy <yueyang.hyy@alibaba-inc.com> * [Community] reference only control (#3435) * add reference only control * add reference only control * add reference only control * fix lint * fix lint * reference adain * bugfix EulerAncestralDiscreteScheduler * fix style fidelity rule * fix default output size * del unused line * fix deterministic * Support for cross-attention bias / mask (#2634) * Cross-attention masks prefer qualified symbol, fix accidental Optional prefer qualified symbol in AttentionProcessor prefer qualified symbol in embeddings.py qualified symbol in transformed_2d qualify FloatTensor in unet_2d_blocks move new transformer_2d params attention_mask, encoder_attention_mask to the end of the section which is assumed (e.g. by functions such as checkpoint()) to have a stable positional param interface. regard return_dict as a special-case which is assumed to be injected separately from positional params (e.g. by create_custom_forward()). move new encoder_attention_mask param to end of CrossAttn block interfaces and Unet2DCondition interface, to maintain positional param interface. regenerate modeling_text_unet.py remove unused import unet_2d_condition encoder_attention_mask docs Co-authored-by: Pedro Cuenca <pedro@huggingface.co> versatile_diffusion/modeling_text_unet.py encoder_attention_mask docs Co-authored-by: Pedro Cuenca <pedro@huggingface.co> transformer_2d encoder_attention_mask docs Co-authored-by: Pedro Cuenca <pedro@huggingface.co> unet_2d_blocks.py: add parameter name comments Co-authored-by: Pedro Cuenca <pedro@huggingface.co> revert description. bool-to-bias treatment happens in unet_2d_condition only. comment parameter names fix copies, style * encoder_attention_mask for SimpleCrossAttnDownBlock2D, SimpleCrossAttnUpBlock2D * encoder_attention_mask for UNetMidBlock2DSimpleCrossAttn * support attention_mask, encoder_attention_mask in KCrossAttnDownBlock2D, KCrossAttnUpBlock2D, KAttentionBlock. fix binding of attention_mask, cross_attention_kwargs params in KCrossAttnDownBlock2D, KCrossAttnUpBlock2D checkpoint invocations. * fix mistake made during merge conflict resolution * regenerate versatile_diffusion * pass time embedding into checkpointed attention invocation * always assume encoder_attention_mask is a mask (i.e. not a bias). * style, fix-copies * add tests for cross-attention masks * add test for padding of attention mask * explain mask's query_tokens dim. fix explanation about broadcasting over channels; we actually broadcast over query tokens * support both masks and biases in Transformer2DModel#forward. document behaviour * fix-copies * delete attention_mask docs on the basis I never tested self-attention masking myself. not comfortable explaining it, since I don't actually understand how a self-attn mask can work in its current form: the key length will be different in every ResBlock (we don't downsample the mask when we downsample the image). * review feedback: the standard Unet blocks shouldn't pass temb to attn (only to resnet). remove from KCrossAttnDownBlock2D,KCrossAttnUpBlock2D#forward. * remove encoder_attention_mask param from SimpleCrossAttn{Up,Down}Block2D,UNetMidBlock2DSimpleCrossAttn, and mask-choice in those blocks' #forward, on the basis that they only do one type of attention, so the consumer can pass whichever type of attention_mask is appropriate. * put attention mask padding back to how it was (since the SD use-case it enabled wasn't important, and it breaks the original unclip use-case). disable the test which was added. * fix-copies * style * fix-copies * put encoder_attention_mask param back into Simple block forward interfaces, to ensure consistency of forward interface. * restore passing of emb to KAttentionBlock#forward, on the basis that removal caused test failures. restore also the passing of emb to checkpointed calls to KAttentionBlock#forward. * make simple unet2d blocks use encoder_attention_mask, but only when attention_mask is None. this should fix UnCLIP compatibility. * fix copies * do not scale the initial global step by gradient accumulation steps when loading from checkpoint (#3506) * Remove CPU latents logic for UniDiffuserPipelineFastTests. * make style * Revert "Clean up code and make slow tests pass." This reverts commit `ec7fb8735b`. * Revert bad commit and clean up code. * add: contributor note. * Batched load of textual inversions (#3277) * Batched load of textual inversions - Only call resize_token_embeddings once per batch as it is the most expensive operation - Allow pretrained_model_name_or_path and token to be an optional list - Remove Dict from type annotation pretrained_model_name_or_path as it was not supported in this function - Add comment that single files (e.g. .pt/.safetensors) are supported - Add comment for token parameter - Convert token override log message from warning to info * Update src/diffusers/loaders.py Check for duplicate tokens Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update condition for None tokens --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Revert "add: contributor note." This reverts commit `302fde9409`. * Re-add contributor note and refactored fast tests fixed latents code to remove CPU specific logic. * make style * Refactored the code: - Updated the checkpoint ids to the new ids where appropriate - Refactored the UniDiffuserTextDecoder methods to return only tensors (and made other changes to support this) - Cleaned up the code following suggestions by patrickvonplaten * make style * Remove padding logic from UniDiffuserTextDecoder.generate_beam since the inputs are already padded to a consistent length. * Update checkpoint id for small test v1 checkpoint to hf-internal-testing/unidiffuser-test-v1. * make style * Make improvements to the documentation. * Move ImageTextPipelineOutput documentation from /api/pipelines/unidiffuser.mdx to /api/diffusion_pipeline.mdx. * Change order of arguments for UniDiffuserTextDecoder.generate_beam. * make style * Update docs/source/en/api/pipelines/unidiffuser.mdx --------- Signed-off-by: Asfiya Baig <asfiyab@nvidia.com> Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> Co-authored-by: Ernie Chu <51432514+ernestchu@users.noreply.github.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Andranik Movsisyan <48154088+19and99@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Andreas Steiner <andstein@google.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Joseph Coffland <github@joe.coffland.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Takuma Mori <takuma104@gmail.com> Co-authored-by: Will Berman <wlbberman@gmail.com> Co-authored-by: Tommaso De Rossi <beats.by.morse@gmail.com> Co-authored-by: Cristian Garcia <cgarcia.e88@gmail.com> Co-authored-by: cmdr2 <secondary.cmdr2@gmail.com> Co-authored-by: 1lint <105617163+1lint@users.noreply.github.com> Co-authored-by: asfiyab-nvidia <117682710+asfiyab-nvidia@users.noreply.github.com> Co-authored-by: Chanchana Sornsoontorn <off9955555@gmail.com> Co-authored-by: hwuebben <wbben123@yahoo.de> Co-authored-by: superhero-7 <57797766+superhero-7@users.noreply.github.com> Co-authored-by: root <fulong_ye@163.com> Co-authored-by: nupurkmr9 <nupurkmr9@gmail.com> Co-authored-by: Nupur Kumari <nupurkumari@Nupurs-MacBook-Pro.local> Co-authored-by: Nupur Kumari <nupurkumari@nupurs-mbp.wifi.local.cmu.edu> Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu> Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com> Co-authored-by: clarencechen <clarencechenct@gmail.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Youssef Adarrab <104783077+youssefadr@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Chengrui Wang <80876977+crywang@users.noreply.github.com> Co-authored-by: SkyTNT <SKYTNT@outlook.com> Co-authored-by: Lucca Zenóbio <luccazen@gmail.com> Co-authored-by: Isaac <34376531+init-22@users.noreply.github.com> Co-authored-by: pdoane <pdoane2@gmail.com> Co-authored-by: Yuchen Fan <fyc0624@gmail.com> Co-authored-by: Nipun Jindal <jindal.nipun@gmail.com> Co-authored-by: njindal <njindal@adobe.com> Co-authored-by: apolinário <joaopaulo.passos@gmail.com> Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com> Co-authored-by: Xie Zejian <xiezej@gmail.com> Co-authored-by: Jair Trejo <jairtrejo@gmail.com> Co-authored-by: Robert Dargavel Smith <teticio@gmail.com> Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan> Co-authored-by: Joqsan <6027118+Joqsan@users.noreply.github.com> Co-authored-by: NimenDavid <312648004@qq.com> Co-authored-by: M. Tolga Cangöz <46008593+standardAI@users.noreply.github.com> Co-authored-by: timegate <timegate@kaist.ac.kr> Co-authored-by: Jason Kuan <jason9075@users.noreply.github.com> Co-authored-by: Ilia Larchenko <41329713+IliaLarchenko@users.noreply.github.com> Co-authored-by: Horace He <horacehe2007@yahoo.com> Co-authored-by: Umar <55330742+mu94-csl@users.noreply.github.com> Co-authored-by: Mylo <36931363+gitmylo@users.noreply.github.com> Co-authored-by: Markus Pobitzer <markuspobitzer@gmail.com> Co-authored-by: Cheng Lu <lucheng.lc15@gmail.com> Co-authored-by: Isamu Isozaki <isamu.website@gmail.com> Co-authored-by: Cesar Aybar <csaybar@gmail.com> Co-authored-by: Will Rice <will@spokestack.io> Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Rupert Menneer <71332436+rupertmenneer@users.noreply.github.com> Co-authored-by: sudowind <wfpkueecs@163.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Laureηt <laurentfainsin@protonmail.com> Co-authored-by: Jongwoo Han <jongwooo.han@gmail.com> Co-authored-by: Laureηt <laurent@fainsin.bzh> Co-authored-by: superlabs-dev <133080491+superlabs-dev@users.noreply.github.com> Co-authored-by: Dev Aggarwal <devxpy@gmail.com> Co-authored-by: Vimarsh Chaturvedi <vimarsh.c@gmail.com> Co-authored-by: 7eu7d7 <31194890+7eu7d7@users.noreply.github.com> Co-authored-by: cmdr2 <shashank.shekhar.global@gmail.com> Co-authored-by: wfng92 <43742196+wfng92@users.noreply.github.com> Co-authored-by: Glaceon-Hyy <ffheyy0017@gmail.com> Co-authored-by: yueyang.hyy <yueyang.hyy@alibaba-inc.com> Co-authored-by: StAlKeR7779 <stalkek7779@yandex.ru> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: w4ffl35 <w4ffl35@ml1.net> Co-authored-by: Seongsu Park <tjdtnsu@gmail.com> Co-authored-by: Chanran Kim <seriousran@gmail.com> Co-authored-by: Ambrosiussen <paul@ambrosiussen.com> Co-authored-by: Hari Krishna <37787894+hari10599@users.noreply.github.com> Co-authored-by: Adrià Arrufat <1671644+arrufat@users.noreply.github.com> Co-authored-by: At-sushi <dkahw210@kyoto.zaq.ne.jp> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: takuoko <to78314910@gmail.com> Co-authored-by: Birch-san <Birch-san@users.noreply.github.com>	2023-05-26 17:27:30 +05:30
YiYi Xu	03b7a84cbe	Add Kandinsky 2.1 (#3308 ) add kandinsky2.1 --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Ayush Mangal <43698245+ayushtues@users.noreply.github.com> Co-authored-by: ayushmangal <ayushmangal@microsoft.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2023-05-25 11:28:34 -10:00
Sanchit Gandhi	abd86d1c17	[AudioLDM] Generalise conversion script (#3328 ) Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-05-06 12:00:42 +01:00
Patrick von Platen	e51f19aee8	add model (#3230 ) * add * clean * up * clean up more * fix more tests * Improve docs further * improve * more fixes docs * Improve docs more * Update src/diffusers/models/unet_2d_condition.py * fix * up * update doc links * make fix-copies * add safety checker and watermarker to stage 3 doc page code snippets * speed optimizations docs * memory optimization docs * make style * add watermarking snippets to doc string examples * make style * use pt_to_pil helper functions in doc strings * skip mps tests * Improve safety * make style * new logic * fix * fix bad onnx design * make new stable diffusion upscale pipeline model arguments optional * define has_nsfw_concept when non-pil output type * lowercase linked to notebook name --------- Co-authored-by: William Berman <WLBberman@gmail.com>	2023-04-25 14:20:43 -07:00
Patrick von Platen	49609768b4	make style	2023-03-30 18:26:41 +02:00
Alon Burg	9062b2847d	Support fp16 in conversion from original ckpt (#2733 ) add --half to convert_original_stable_diffusion_to_diffusers.py	2023-03-30 17:26:18 +01:00
Pedro Cuenca	1d7b4b60b7	Ruff: apply same rules as in transformers (#2827 ) * Apply same ruff settings as in transformers See https://github.com/huggingface/transformers/blob/main/pyproject.toml Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> * Apply new style rules * Style Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> * style * remove list, ruff wouldn't auto fix. --------- Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2023-03-27 16:18:57 +02:00
Sanchit Gandhi	b94880e536	Add AudioLDM (#2232 ) * Add AudioLDM * up * add vocoder * start unet * unconditional unet * clap, vocoder and vae * clean-up: conversion scripts * fix: conversion script token_type_ids * clean-up: pipeline docstring * tests: from SD * clean-up: cpu offload vocoder instead of safety checker * feat: adapt tests to audioldm * feat: add docs * clean-up: amend pipeline docstrings * clean-up: make style * clean-up: make fix-copies * fix: add doc path to toctree * clean-up: args for conversion script * clean-up: paths to checkpoints * fix: use conditional unet * clean-up: make style * fix: type hints for UNet * clean-up: docstring for UNet * clean-up: make style * clean-up: remove duplicate in docstring * clean-up: make style * clean-up: make fix-copies * clean-up: move imports to start in code snippet * fix: pass cross_attention_dim as a list/tuple to unet * clean-up: make fix-copies * fix: update checkpoint path * fix: unet cross_attention_dim in tests * film embeddings -> class embeddings * Apply suggestions from code review Co-authored-by: Will Berman <wlbberman@gmail.com> * fix: unet film embed to use existing args * fix: unet tests to use existing args * fix: make style * fix: transformers import and version in init * clean-up: make style * Revert "clean-up: make style" This reverts commit `5d6d1f8b32`. * clean-up: make style * clean-up: use pipeline tester mixin tests where poss * clean-up: skip attn slicing test * fix: add torch dtype to docs * fix: remove conversion script out of src * fix: remove .detach from 1d waveform * fix: reduce default num inf steps * fix: swap height/width -> audio_length_in_s * clean-up: make style * fix: remove nightly tests * fix: imports in conversion script * clean-up: slim-down to two slow tests * clean-up: slim-down fast tests * fix: batch consistent tests * clean-up: make style * clean-up: remove vae slicing fast test * clean-up: propagate changes to doc * fix: increase test tol to 1e-2 * clean-up: finish docs * clean-up: make style * feat: vocoder / VAE compatibility check * feat: possibly expand / cut audio waveform * fix: pipeline call signature test * fix: slow tests output len * clean-up: make style * make style --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: William Berman <WLBberman@gmail.com>	2023-03-23 19:00:21 +01:00
Kashif Rasul	2ef9bdd76f	Music Spectrogram diffusion pipeline (#1044 ) * initial TokenEncoder and ContinuousEncoder * initial modules * added ContinuousContextTransformer * fix copy paste error * use numpy for get_sequence_length * initial terminal relative positional encodings * fix weights keys * fix assert * cross attend style: concat encodings * make style * concat once * fix formatting * Initial SpectrogramPipeline * fix input_tokens * make style * added mel output * ignore weights for config * move mel to numpy * import pipeline * fix class names and import * moved models to models folder * import ContinuousContextTransformer and SpectrogramDiffusionPipeline * initial spec diffusion converstion script * renamed config to t5config * added weight loading * use arguments instead of t5config * broadcast noise time to batch dim * fix call * added scale_to_features * fix weights * transpose laynorm weight * scale is a vector * scale the query outputs * added comment * undo scaling * undo depth_scaling * inital get_extended_attention_mask * attention_mask is none in self-attention * cleanup * manually invert attention * nn.linear need bias=False * added T5LayerFFCond * remove to fix conflict * make style and dummy * remove unsed variables * remove predict_epsilon * Move accelerate to a soft-dependency (#1134) * finish * finish * Update src/diffusers/modeling_utils.py * Update src/diffusers/pipeline_utils.py Co-authored-by: Anton Lozhkov <anton@huggingface.co> * more fixes * fix Co-authored-by: Anton Lozhkov <anton@huggingface.co> * fix order * added initial midi to note token data pipeline * added int to int tokenizer * remove duplicate * added logic for segments * add melgan to pipeline * move autoregressive gen into pipeline * added note_representation_processor_chain * fix dtypes * remove immutabledict req * initial doc * use np.where * require note_seq * fix typo * update dependency * added note-seq to test * added is_note_seq_available * fix import * added toc * added example usage * undo for now * moved docs * fix merge * fix imports * predict first segment * avoid un-needed copy to and from cpu * make style * Copyright * fix style * add test and fix inference steps * remove bogus files * reorder models * up * remove transformers dependency * make work with diffusers cross attention * clean more * remove @ * improve further * up * uP * Apply suggestions from code review * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py * loop over all tokens * make style * Added a section on the model * fix formatting * grammer * formatting * make fix-copies * Update src/diffusers/pipelines/__init__.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/spectrogram_diffusion/pipeline_spectrogram_diffusion.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * added callback ad optional ionnx * do not squeeze batch dim * clean up more * upload * convert jax to nnumpy * make style * fix warning * make fix-copies * fix warning * add initial fast tests * add initial pipeline_params * eval mode due to dropout * skip batch tests as pipeline runs on a single file * make style * fix relative path * fix doc tests * Update src/diffusers/models/t5_film_transformer.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/models/t5_film_transformer.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update docs/source/en/api/pipelines/spectrogram_diffusion.mdx Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * add MidiProcessor * format * fix org * Apply suggestions from code review * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py * make style * pin protobuf to <4 * fix formatting * white space * tensorboard needs protobuf --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Anton Lozhkov <anton@huggingface.co>	2023-03-23 14:06:17 +01:00
Naoki Ainoya	14e3a28c12	Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' (#2732 ) The 'CLIPFeatureExtractor' class name has been renamed to 'CLIPImageProcessor' in order to comply with future deprecation. This commit includes the necessary changes to the affected files.	2023-03-23 13:49:22 +01:00
Patrick von Platen	ca1a22296d	[MS Text To Video] Add first text to video (#2738 ) * [MS Text To Video} Add first text to video * upload * make first model example * match unet3d params * make sure weights are correcctly converted * improve * forward pass works, but diff result * make forward work * fix more * finish * refactor video output class. * feat: add support for a video export utility. * fix: opencv availability check. * run make fix-copies. * add: docs for the model components. * add: standalone pipeline doc. * edit docstring of the pipeline. * add: right path to TransformerTempModel * add: first set of tests. * complete fast tests for text to video. * fix bug * up * three fast tests failing. * add: note on slow tests * make work with all schedulers * apply styling. * add slow tests * change file name * update * more correction * more fixes * finish * up * Apply suggestions from code review * up * finish * make copies * fix pipeline tests * fix more tests * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * apply suggestions * up * revert --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2023-03-22 18:39:33 +01:00
Will Berman	a28acb5dcc	controlnet sd 2.1 checkpoint conversions (#2593 ) * controlnet sd 2.1 checkpoint conversions * remove global_step -> make config file mandatory	2023-03-10 08:22:02 -08:00
Patrick von Platen	d761b58bfc	[From pretrained] Speed-up loading from cache (#2515 ) * [From pretrained] Speed-up loading from cache * up * Fix more * fix one more bug * make style * bigger refactor * factor out function * Improve more * better * deprecate return cache folder * clean up * improve tests * up * upload * add nice tests * simplify * finish * correct * fix version * rename * Apply suggestions from code review Co-authored-by: Lucain <lucainp@gmail.com> * rename * correct doc string * correct more * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * apply code suggestions * finish --------- Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2023-03-10 11:56:10 +01:00
Patrick von Platen	1598a57958	make style	2023-03-06 10:51:03 +00:00
Haofan Wang	63805f8af7	Support convert LoRA safetensors into diffusers format (#2403 ) * add lora convertor * Update convert_lora_safetensor_to_diffusers.py * Update README.md * Update convert_lora_safetensor_to_diffusers.py	2023-03-06 11:50:46 +01:00
Patrick von Platen	f38e3626cd	make style	2023-03-06 10:40:18 +00:00

1 2 3

128 Commits