* update
* update
* make style
* remove dynamo disable
* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
* update
* update
* update
* update mixin
* add some basic tests
* update
* update
* non_blocking
* improvements
* update
* norm.* -> norm
* apply suggestions from review
* add example
* update hook implementation to the latest changes from pyramid attention broadcast
* deinitialize should raise an error
* update doc page
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update docs
* update
* refactor
* fix _always_upcast_modules for asym ae and vq_model
* fix lumina embedding forward to not depend on weight dtype
* refactor tests
* add simple lora inference tests
* _always_upcast_modules -> _precision_sensitive_module_patterns
* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case
* check layer dtypes in lora test
* fix UNet1DModelTests::test_layerwise_upcasting_inference
* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback
* skip test in NCSNppModelTests
* skip tests for AutoencoderTinyTests
* skip tests for AutoencoderOobleckTests
* skip tests for UNet1DModelTests - unsupported pytorch operations
* layerwise_upcasting -> layerwise_casting
* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support
* add layerwise fp8 pipeline test
* use xfail
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)
* add note about memory consumption on tesla CI runner for failing test
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Motion Model / Adapter versatility
- allow to use a different number of layers per block
- allow to use a different number of transformer per layers per block
- allow a different number of motion attention head per block
- use dropout argument in get_down/up_block in 3d blocks
* Motion Model added arguments renamed & refactoring
* Add test for asymmetric UNetMotionModel
* Fix sharding when no device_map is passed
* style
* add tests
* align
* add docstring
* format
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: support saving a model in sharded checkpoints.
* feat: make loading of sharded checkpoints work.
* add tests
* cleanse the loading logic a bit more.
* more resilience while loading from the Hub.
* parallelize shard downloads by using snapshot_download()/
* default to a shard size.
* more fix
* Empty-Commit
* debug
* fix
* uality
* more debugging
* fix more
* initial comments from Benjamin
* move certain methods to loading_utils
* add test to check if the correct number of shards are present.
* add a test to check if loading of sharded checkpoints from the Hub is okay
* clarify the unit when passed as an int.
* use hf_hub for sharding.
* remove unnecessary code
* remove unnecessary function
* lucain's comments.
* fixes
* address high-level comments.
* fix test
* subfolder shenanigans./
* Update src/diffusers/utils/hub_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* remove _huggingface_hub_version as not needed.
* address more feedback.
* add a test for local_files_only=True/
* need hf hub to be at least 0.23.2
* style
* final comment.
* clean up subfolder.
* deal with suffixes in code.
* _add_variant default.
* use weights_name_pattern
* remove add_suffix_keyword
* clean up downloading of sharded ckpts.
* don't return something special when using index.json
* fix more
* don't use bare except
* remove comments and catch the errors better
* fix a couple of things when using is_file()
* empty
---------
Co-authored-by: Lucain <lucainp@gmail.com>
* reduce block sizes for unet1d.
* reduce blocks for unet_2d.
* reduce block size for unet_motion
* increase channels.
* correctly increase channels.
* reduce number of layers in unet2dconditionmodel tests.
* reduce block sizes for unet2dconditionmodel tests
* reduce block sizes for unet3dconditionmodel.
* fix: test_feed_forward_chunking
* fix: test_forward_with_norm_groups
* skip spatiotemporal tests on MPS.
* reduce block size in AutoencoderKL.
* reduce block sizes for vqmodel.
* further reduce block size.
* make style.
* Empty-Commit
* reduce sizes for ConsistencyDecoderVAETests
* further reduction.
* further block reductions in AutoencoderKL and AssymetricAutoencoderKL.
* massively reduce the block size in unet2dcontionmodel.
* reduce sizes for unet3d
* fix tests in unet3d.
* reduce blocks further in motion unet.
* fix: output shape
* add attention_head_dim to the test configuration.
* remove unexpected keyword arg
* up a bit.
* groups.
* up again
* fix