* add scheduled pseudo-huber loss training scripts
See #7488
* add reduction modes to huber loss
* [DB Lora] *2 multiplier to huber loss cause of 1/2 a^2 conv.
pairing of c6495def1f
* [DB Lora] add option for smooth l1 (huber / delta)
Pairing of dd22958caa
* [DB Lora] unify huber scheduling
Pairing of 19a834c3ab
* [DB Lora] add snr huber scheduler
Pairing of 47fb1a6854
* fixup examples link
* use snr schedule by default in DB
* update all huber scripts with snr
* code quality
* huber: make style && make quality
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Initialize target_unet from unet rather than teacher_unet so that we correctly add time_embedding.cond_proj if necessary.
* Use UNet2DConditionModel.from_config to initialize target_unet from unet's config.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* give it a shot.
* print.
* correct assertion.
* gather results from the rest of the tests.
* change the assertion values where needed.
* remove print statements.
* get device <-> component mapping when using multiple gpus.
* condition the device_map bits.
* relax condition
* device_map progress.
* device_map enhancement
* some cleaning up and debugging
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* incorporate suggestions from PR.
* remove multi-gpu condition for now.
* guard check the component -> device mapping
* fix: device_memory variable
* dispatching transformers model to have force_hooks=True
* better guarding for transformers device_map
* introduce support balanced_low_memory and balanced_ultra_low_memory.
* remove device_map patch.
* fix: intermediate variable scoping.
* fix: condition in cpu offload.
* fix: flax class restrictions.
* remove modifications from cpu_offload and model_offload
* incorporate changes.
* add a simple forward pass test
* add: torch_device in get_inputs()
* add: tests
* remove print
* safe-guard to(), model offloading and cpu offloading when balanced is used as a device_map.
* style
* remove .
* safeguard device_map with more checks and remove invalid device_mapping strategues.
* make a class attribute and adjust tests accordingly.
* fix device_map check
* fix test
* adjust comment
* fix: device_map attribute
* fix: dispatching.
* max_memory test for pipeline
* version guard the tests
* fix guard.
* address review feedback.
* reset_device_map method.
* add: test for reset_hf_device_map
* fix a couple things.
* add reset_device_map() in the error message.
* add tests for checking reset_device_map doesn't have unintended consequences.
* fix reset_device_map and offloading tests.
* create _get_final_device_map utility.
* hf_device_map -> _hf_device_map
* add documentation
* add notes suggested by Marc.
* styling.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* move updates within gpu condition.
* other docs related things
* note on ignore a device not specified in .
* provide a suggestion if device mapping errors out.
* fix: typo.
* _hf_device_map -> hf_device_map
* Empty-Commit
* add: example hf_device_map.
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* remove libsndfile1-dev and libgl1 from workflows and ensure that re present in the respective dockerfiles.
* change to self-hosted runner; let's see 🤞
* add libsndfile1-dev libgl1 for now
* use self-hosted runners for building and push too.
* Restore unet params back to normal from EMA when validation call is finished
* empty commit
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Allow safety and feature extractor arguments to be passed to convert_from_ckpt
Allows management of safety checker and feature extractor
from outside of the convert ckpt class.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* reduce block sizes for unet1d.
* reduce blocks for unet_2d.
* reduce block size for unet_motion
* increase channels.
* correctly increase channels.
* reduce number of layers in unet2dconditionmodel tests.
* reduce block sizes for unet2dconditionmodel tests
* reduce block sizes for unet3dconditionmodel.
* fix: test_feed_forward_chunking
* fix: test_forward_with_norm_groups
* skip spatiotemporal tests on MPS.
* reduce block size in AutoencoderKL.
* reduce block sizes for vqmodel.
* further reduce block size.
* make style.
* Empty-Commit
* reduce sizes for ConsistencyDecoderVAETests
* further reduction.
* further block reductions in AutoencoderKL and AssymetricAutoencoderKL.
* massively reduce the block size in unet2dcontionmodel.
* reduce sizes for unet3d
* fix tests in unet3d.
* reduce blocks further in motion unet.
* fix: output shape
* add attention_head_dim to the test configuration.
* remove unexpected keyword arg
* up a bit.
* groups.
* up again
* fix
* Skip `test_freeu_enabled ` on MPS
* Small fixes
- import skip_mps correctly
- disable all instances of test_freeu_enabled
* Empty commit to trigger tests
* Empty commit to trigger CI
* increase number of workers for the tests.
* move to beefier runner.
* improve the fast push tests too.
* use a beefy machine for pytorch pipeline tests
* up the number of workers further.
* UniPC Multistep add `rescale_betas_zero_snr`
Same patch as DPM and Euler with the patched final alpha cumprod
BF16 doesn't seem to break down, I think cause UniPC upcasts during some
phases already? We could still force an upcast since it only
loses ≈ 0.005 it/s for me but the difference in output is very small. A
better endeavor might upcasting in step() and removing all the other
upcasts elsewhere?
* UniPC ZSNR UT
* Re-add `rescale_betas_zsnr` doc oops
* UniPC UTs iterate solvers on FP16
It wasn't catching errs on order==3. Might be excessive?
* UniPC Multistep fix tensor dtype/device on order=3
* UniPC UTs Add v_pred to fp16 test iter
For completions sake. Probably overkill?