* update
* fix
* non_blocking; handle parameters and buffers
* update
* Group offloading with cuda stream prefetching (#10516)
* cuda stream prefetch
* remove breakpoints
* update
* copy model hook implementation from pab
* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite
* more workarounds to make it actually work
* cleanup
* rewrite
* update
* make sure to sync current stream before overwriting with pinned params
not doing so will lead to erroneous computations on the GPU and cause bad results
* better check
* update
* remove hook implementation to not deal with merge conflict
* re-add hook changes
* why use more memory when less memory do trick
* why still use slightly more memory when less memory do trick
* optimise
* add model tests
* add pipeline tests
* update docs
* add layernorm and groupnorm
* address review comments
* improve tests; add docs
* improve docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* apply suggestions from code review
* update tests
* apply suggestions from review
* enable_group_offloading -> enable_group_offload for naming consistency
* raise errors if multiple offloading strategies used; add relevant tests
* handle .to() when group offload applied
* refactor some repeated code
* remove unintentional change from merge conflict
* handle .cuda()
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update
* update
* make style
* remove dynamo disable
* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
* update
* update
* update
* update mixin
* add some basic tests
* update
* update
* non_blocking
* improvements
* update
* norm.* -> norm
* apply suggestions from review
* add example
* update hook implementation to the latest changes from pyramid attention broadcast
* deinitialize should raise an error
* update doc page
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update docs
* update
* refactor
* fix _always_upcast_modules for asym ae and vq_model
* fix lumina embedding forward to not depend on weight dtype
* refactor tests
* add simple lora inference tests
* _always_upcast_modules -> _precision_sensitive_module_patterns
* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case
* check layer dtypes in lora test
* fix UNet1DModelTests::test_layerwise_upcasting_inference
* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback
* skip test in NCSNppModelTests
* skip tests for AutoencoderTinyTests
* skip tests for AutoencoderOobleckTests
* skip tests for UNet1DModelTests - unsupported pytorch operations
* layerwise_upcasting -> layerwise_casting
* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support
* add layerwise fp8 pipeline test
* use xfail
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)
* add note about memory consumption on tesla CI runner for failing test
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* remove 2 shapes from SDFunctionTesterMixin::test_vae_tiling
* combine freeu enable/disable test to reduce many inference runs
* remove low signal unet test for signature
* remove low signal embeddings test
* remove low signal progress bar test from PipelineTesterMixin
* combine ip-adapter single and multi tests to save many inferences
* fix broken tests
* Update tests/pipelines/test_pipelines_common.py
* Update tests/pipelines/test_pipelines_common.py
* add progress bar tests
* speed up test_vae_slicing in animatediff
* speed up test_karras_schedulers_shape for attend and excite.
* style.
* get the static slices out.
* specify torch print options.
* modify
* test run with controlnet
* specify kwarg
* fix: things
* not None
* flatten
* controlnet img2img
* complete controlet sd
* finish more
* finish more
* finish more
* finish more
* finish the final batch
* add cpu check for expected_pipe_slice.
* finish the rest
* remove print
* style
* fix ssd1b controlnet test
* checking ssd1b
* disable the test.
* make the test_ip_adapter_single controlnet test more robust
* fix: simple inpaint
* multi
* disable panorama
* enable again
* panorama is shaky so leave it for now
* remove print
* raise tolerance.
* Bug fix for controlnetpipeline check_image
Bug fix for controlnetpipeline check_image when using multicontrolnet and prompt list
* Update test_inference_multiple_prompt_input function
* Update test_controlnet.py
add test for multiple prompts and multiple image conditioning
* Update test_controlnet.py
Fix format error
---------
Co-authored-by: Lvkesheng Shen <45848260+Fantast416@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add properties and `IPAdapterTesterMixin` tests for `StableDiffusionPanoramaPipeline`
* Fix variable name typo and update comments
* Update deprecated `output_type="numpy"` to "np" in test files
* Discard changes to src/diffusers/pipelines/stable_diffusion_panorama/pipeline_stable_diffusion_panorama.py
* Update test_stable_diffusion_panorama.py
* Update numbers in README.md
* Update get_guidance_scale_embedding method to use timesteps instead of w
* Update number of checkpoints in README.md
* Add type hints and fix var name
* Fix PyTorch's convention for inplace functions
* Fix a typo
* Revert "Fix PyTorch's convention for inplace functions"
This reverts commit 74350cf65b.
* Fix typos
* Indent
* Refactor get_guidance_scale_embedding method in LEditsPPPipelineStableDiffusionXL class
* [Fix] Multiple image conditionings in a single batch for `StableDiffusionControlNetPipeline`.
* Refactor `check_inputs` in `StableDiffusionControlNetPipeline` to avoid redundant codes.
* Make the behavior of MultiControlNetModel to be the same to the original ControlNetModel
* Keep the code change minimum for nested list support
* Add fast test `test_inference_nested_image_input`
* Remove redundant check for nested image condition in `check_inputs`
Remove `len(image) == len(prompt)` check out of `check_image()`
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Better `ValueError` message for incompatible nested image list size
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Fix syntax error in `check_inputs`
* Remove warning message for multi-ControlNets with multiple prompts
* Fix a typo in test_controlnet.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Add test case for multiple prompts, single image conditioning in `StableDiffusionMultiControlNetPipelineFastTests`
* Improved `ValueError` message for nested `controlnet_conditioning_scale`
* Documenting the behavior of image list as `StableDiffusionControlNetPipeline` input
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Add custom timesteps support to LCMScheduler.
* Add custom timesteps support to StableDiffusionPipeline.
* Add custom timesteps support to StableDiffusionXLPipeline.
* Add custom timesteps support to remaining Stable Diffusion pipelines which support LCMScheduler (img2img, inpaint).
* Add custom timesteps support to remaining Stable Diffusion XL pipelines which support LCMScheduler (img2img, inpaint).
* Add custom timesteps support to StableDiffusionControlNetPipeline.
* Add custom timesteps support to T21 Stable Diffusion (XL) Adapters.
* Clean up Stable Diffusion inpaint tests.
* Manually add support for custom timesteps to AltDiffusion pipelines since make fix-copies doesn't appear to work correctly (it deletes the whole pipeline).
* make style
* Refactor pipeline timestep handling into the retrieve_timesteps function.
* decrease UNet2DConditionModel & ControlNetModel blocks
* decrease UNet2DConditionModel & ControlNetModel blocks
* decrease even more blocks & number of norm groups
* decrease vae block out channels and n of norm goups
* fix code style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix bug in ControlNetPipelines with MultiControlNetModel of length 1
* Add tests for varying number of ControlNet models
* Fix missing indexing for control_guidance_start and control_guidance_end
* Fix code quality
* Separate test for MultiControlNet with one model
* Revert formatting of earlier test
* Add controlnet from single file
* Updates
* make style
* finish
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Correct controlnet out of list error
* Apply suggestions from code review
* correct tests
* correct tests
* fix
* test all
* Apply suggestions from code review
* test all
* test all
* Apply suggestions from code review
* Apply suggestions from code review
* fix more tests
* Fix more
* Apply suggestions from code review
* finish
* Apply suggestions from code review
* Update src/diffusers/schedulers/scheduling_k_dpm_2_ancestral_discrete.py
* finish
* Add guidance start/stop
* Add guidance start/stop to inpaint class
* Black formatting
* Add support for guidance for multicontrolnet
* Add inclusive end
* Improve design
* correct imports
* Finish
* Finish all
* Correct more
* make style
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
VaeImageProcessor.preprocess refactor
* refactored VaeImageProcessor
- allow passing optional height and width argument to resize()
- add convert_to_rgb
* refactored prepare_latents method for img2img pipelines so that if we pass latents directly as image input, it will not encode it again
* added a test in test_pipelines_common.py to test latents as image inputs
* refactored img2img pipelines that accept latents as image:
- controlnet img2img, stable diffusion img2img , instruct_pix2pix
---------
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Run ControlNet compile test in a separate subprocess
`torch.compile()` spawns several subprocesses and the GPU memory used
was not reclaimed after the test ran. This approach was taken from
`transformers`.
* Style
* Prepare a couple more compile tests to run in subprocess.
* Use require_torch_2 decorator.
* Test inpaint_compile in subprocess.
* Run img2img compile test in subprocess.
* Run stable diffusion compile test in subprocess.
* style
* Temporarily trigger on pr to test.
* Revert "Temporarily trigger on pr to test."
This reverts commit 82d76868dd.
* up
* fix more
* Apply suggestions from code review
* fix more
* fix more
* Check it
* Remove 16:8
* fix more
* fix more
* fix more
* up
* up
* Test only stable diffusion
* Test only two files
* up
* Try out spinning up processes that can be killed
* up
* Apply suggestions from code review
* up
* up
* refactor controlnet and add img2img and inpaint
* First draft to get pipelines to work
* make style
* Fix more
* Fix more
* More tests
* Fix more
* Make inpainting work
* make style and more tests
* Apply suggestions from code review
* up
* make style
* Fix imports
* Fix more
* Fix more
* Improve examples
* add test
* Make sure import is correctly deprecated
* Make sure everything works in compile mode
* make sure authorship is correctly attributed