* Fix QwenImage txt_seq_lens handling
* formatting
* formatting
* remove txt_seq_lens and use bool mask
* use compute_text_seq_len_from_mask
* add seq_lens to dispatch_attention_fn
* use joint_seq_lens
* remove unused index_block
* WIP: Remove seq_lens parameter and use mask-based approach
- Remove seq_lens parameter from dispatch_attention_fn
- Update varlen backends to extract seqlens from masks
- Update QwenImage to pass 2D joint_attention_mask
- Fix native backend to handle 2D boolean masks
- Fix sage_varlen seqlens_q to match seqlens_k for self-attention
Note: sage_varlen still producing black images, needs further investigation
* fix formatting
* undo sage changes
* xformers support
* hub fix
* fix torch compile issues
* fix tests
* use _prepare_attn_mask_native
* proper deprecation notice
* add deprecate to txt_seq_lens
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Only create the mask if there's actual padding
* fix order of docstrings
* Adds performance benchmarks and optimization details for QwenImage
Enhances documentation with comprehensive performance insights for QwenImage pipeline:
* rope_text_seq_len = text_seq_len
* rename to max_txt_seq_len
* removed deprecated args
* undo unrelated change
* Updates QwenImage performance documentation
Removes detailed attention backend benchmarks and simplifies torch.compile performance description
Focuses on key performance improvement with torch.compile, highlighting the specific speedup from 4.70s to 1.93s on an A100 GPU
Streamlines the documentation to provide more concise and actionable performance insights
* Updates deprecation warnings for txt_seq_lens parameter
Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models
Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter
* fix compile
* formatting
* fix compile tests
* rename helper
* remove duplicate
* smaller values
* removed
* use torch.cond for torch compile
* Construct joint attention mask once
* test different backends
* construct joint attention mask once to avoid reconstructing in every block
* Update src/diffusers/models/attention_dispatch.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* formatting
* raising an error from the EditPlus pipeline when batch_size > 1
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: cdutr <dutra_carlos@hotmail.com>
* Basic implementation of request scheduling
* Basic editing in SD and Flux Pipelines
* Small Fix
* Fix
* Update for more pipelines
* Add examples/server-async
* Add examples/server-async
* Updated RequestScopedPipeline to handle a single tokenizer lock to avoid race conditions
* Fix
* Fix _TokenizerLockWrapper
* Fix _TokenizerLockWrapper
* Delete _TokenizerLockWrapper
* Fix tokenizer
* Update examples/server-async
* Fix server-async
* Optimizations in examples/server-async
* We keep the implementation simple in examples/server-async
* Update examples/server-async/README.md
* Update examples/server-async/README.md for changes to tokenizer locks and backward-compatible retrieve_timesteps
* The changes to the diffusers core have been undone and all logic is being moved to exmaples/server-async
* Update examples/server-async/utils/*
* Fix BaseAsyncScheduler
* Rollback in the core of the diffusers
* Update examples/server-async/README.md
* Complete rollback of diffusers core files
* Simple implementation of an asynchronous server compatible with SD3-3.5 and Flux Pipelines
* Update examples/server-async/README.md
* Fixed import errors in 'examples/server-async/serverasync.py'
* Flux Pipeline Discard
* Update examples/server-async/README.md
* Apply style fixes
* Add thread-safe wrappers for components in pipeline
Refactor requestscopedpipeline.py to add thread-safe wrappers for tokenizer, VAE, and image processor. Introduce locking mechanisms to ensure thread safety during concurrent access.
* Add wrappers.py
* Apply style fixes
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Store vae.config.scaling_factor to prevent missing attr reference
In sdxl advanced dreambooth training script
vae.config.scaling_factor becomes inaccessible after: del vae
when: --cache_latents, and no --validation_prompt
Co-authored-by: Teriks <Teriks@users.noreply.github.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Community Pipeline: Add z-image differential img2img
* add pipeline for z-image differential img2img diffusion examples : run make style , make quality, and fix white spaces in example doc string.
---------
Co-authored-by: r4inm4ker <jefri.yeh@gmail.com>
Use `T5Tokenizer` instead of `MT5Tokenizer`
Given that the `MT5Tokenizer` in `transformers` is just a "re-export" of
`T5Tokenizer` as per
https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/models/mt5/tokenization_mt5.py
)on latest available stable Transformers i.e., v4.57.3), this commit
updates the imports to point to `T5Tokenizer` instead, so that those
still work with Transformers v5.0.0rc0 onwards.
* Fix examples not loading LoRA adapter weights from checkpoint
* Updated lora saving logic with accelerate save_model_hook and load_model_hook
* Formatted the changes using ruff
* import and upcasting changed
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add vae
* Initial commit for Flux 2 Transformer implementation
* add pipeline part
* small edits to the pipeline and conversion
* update conversion script
* fix
* up up
* finish pipeline
* Remove Flux IP Adapter logic for now
* Remove deprecated 3D id logic
* Remove ControlNet logic for now
* Add link to ViT-22B paper as reference for parallel transformer blocks such as the Flux 2 single stream block
* update pipeline
* Don't use biases for input projs and output AdaNorm
* up
* Remove bias for double stream block text QKV projections
* Add script to convert Flux 2 transformer to diffusers
* make style and make quality
* fix a few things.
* allow sft files to go.
* fix image processor
* fix batch
* style a bit
* Fix some bugs in Flux 2 transformer implementation
* Fix dummy input preparation and fix some test bugs
* fix dtype casting in timestep guidance module.
* resolve conflicts.,
* remove ip adapter stuff.
* Fix Flux 2 transformer consistency test
* Fix bug in Flux2TransformerBlock (double stream block)
* Get remaining Flux 2 transformer tests passing
* make style; make quality; make fix-copies
* remove stuff.
* fix type annotaton.
* remove unneeded stuff from tests
* tests
* up
* up
* add sf support
* Remove unused IP Adapter and ControlNet logic from transformer (#9)
* copied from
* Apply suggestions from code review
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
* up
* up
* up
* up
* up
* Refactor Flux2Attention into separate classes for double stream and single stream attention
* Add _supports_qkv_fusion to AttentionModuleMixin to allow subclasses to disable QKV fusion
* Have Flux2ParallelSelfAttention inherit from AttentionModuleMixin with _supports_qkv_fusion=False
* Log debug message when calling fuse_projections on a AttentionModuleMixin subclass that does not support QKV fusion
* Address review comments
* Update src/diffusers/pipelines/flux2/pipeline_flux2.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* up
* Remove maybe_allow_in_graph decorators for Flux 2 transformer blocks (#12)
* up
* support ostris loras. (#13)
* up
* update schdule
* up
* up (#17)
* add training scripts (#16)
* add training scripts
Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
* model cpu offload in validation.
* add flux.2 readme
* add img2img and tests
* cpu offload in log validation
* Apply suggestions from code review
* fix
* up
* fixes
* remove i2i training tests for now.
---------
Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
Co-authored-by: linoytsaban <linoy@huggingface.co>
* up
---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-10-53-87-203.ec2.internal>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
Co-authored-by: linoytsaban <linoy@huggingface.co>
* Fix: update type hints for Tuple parameters across multiple files to support variable-length tuples
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* Upgrade huggingface-hub to version 0.35.0
Updated huggingface-hub version from 0.26.1 to 0.35.0.
* Add uvicorn and accelerate to requirements
* Fix install instructions for server
* Basic implementation of request scheduling
* Basic editing in SD and Flux Pipelines
* Small Fix
* Fix
* Update for more pipelines
* Add examples/server-async
* Add examples/server-async
* Updated RequestScopedPipeline to handle a single tokenizer lock to avoid race conditions
* Fix
* Fix _TokenizerLockWrapper
* Fix _TokenizerLockWrapper
* Delete _TokenizerLockWrapper
* Fix tokenizer
* Update examples/server-async
* Fix server-async
* Optimizations in examples/server-async
* We keep the implementation simple in examples/server-async
* Update examples/server-async/README.md
* Update examples/server-async/README.md for changes to tokenizer locks and backward-compatible retrieve_timesteps
* The changes to the diffusers core have been undone and all logic is being moved to exmaples/server-async
* Update examples/server-async/utils/*
* Fix BaseAsyncScheduler
* Rollback in the core of the diffusers
* Update examples/server-async/README.md
* Complete rollback of diffusers core files
* Simple implementation of an asynchronous server compatible with SD3-3.5 and Flux Pipelines
* Update examples/server-async/README.md
* Fixed import errors in 'examples/server-async/serverasync.py'
* Flux Pipeline Discard
* Update examples/server-async/README.md
* Apply style fixes
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* feat: support lora in qwen image and training script
* up
* up
* up
* up
* up
* up
* add lora tests
* fix
* add tests
* fix
* reviewer feedback
* up[
* Apply suggestions from code review
Co-authored-by: Aryan <aryan@huggingface.co>
---------
Co-authored-by: Aryan <aryan@huggingface.co>