1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00
Commit Graph

6145 Commits

Author SHA1 Message Date
Sayak Paul
8470ce3d06 Merge branch 'main' into cache-docs-fixes 2026-01-10 09:13:39 +05:30
sayakpaul
73601980c2 up 2026-01-10 09:09:44 +05:30
Sayak Paul
25795856e0 Update docs/source/en/optimization/cache.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2026-01-10 09:07:46 +05:30
Jay Wu
02c7adc356 [ChronoEdit] support multiple loras (#12679)
Co-authored-by: wjay <wjay@nvidia.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2026-01-09 15:50:16 -10:00
Sayak Paul
a3cc0e7a52 [modular] error early in enable_auto_cpu_offload (#12578)
error early in auto_cpu_offload
2026-01-09 15:30:52 -10:00
Daniel Socek
2a6cdc0b3e Fix ftfy name error in Wan pipeline (#12314)
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2026-01-09 14:02:40 -10:00
SahilCarterr
1791306739 [Fix] syntax in QwenImageEditPlusPipeline (#12371)
* Fixes syntax for consistency among pipelines

* Update test_qwenimage_edit_plus.py
2026-01-09 13:55:42 -10:00
Samu Tamminen
df6516a716 Align HunyuanVideoConditionEmbedding with CombinedTimestepGuidanceTextProjEmbeddings (#12316)
conditioning additions inline with CombinedTimestepGuidanceTextProjEmbeddings

Co-authored-by: Samu Tamminen <samutamm@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2026-01-09 13:51:04 -10:00
Steven Liu
5794ffffbe [docs] Remote inference (#12372)
* init

* fix
2026-01-09 13:32:14 -10:00
Titong Jiang
4fb44bdf91 Fix wrong param types, docs, and handles noise=None in scale_noise of FlowMatching schedulers (#11669)
* Bug: Fix wrong params, docs, and handles noise=None

* make noise a required arg

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2026-01-09 11:42:33 -10:00
Linoy Tsaban
b7a81582ae [LoRA] add lora_alpha to sana README (#11780)
add lora alpha to readme
2026-01-09 11:28:39 -10:00
Bhavya Bahl
4b64b5603f Change timestep device to cpu for xla (#11501)
* Change timestep device to cpu for xla

* Add all pipelines

* ruff format

* Apply style fixes

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-01-09 11:22:51 -10:00
Kashif Rasul
2bb640f8ea [Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL (#11573)
* initial

* added readme

* fix formatting

* added logging

* formatting

* use config

* debug

* better

* handle SNR

* floats have no item()

* remove debug

* formatting

* add paper link

* acknowledge reference source

* rename script

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-01-09 10:24:21 -10:00
Fredy Rivera
2dc9d2af50 Add thread-safe wrappers for components in pipeline (examples/server-async/utils/requestscopedpipeline.py) (#12515)
* Basic implementation of request scheduling

* Basic editing in SD and Flux Pipelines

* Small Fix

* Fix

* Update for more pipelines

* Add examples/server-async

* Add examples/server-async

* Updated RequestScopedPipeline to handle a single tokenizer lock to avoid race conditions

* Fix

* Fix _TokenizerLockWrapper

* Fix _TokenizerLockWrapper

* Delete _TokenizerLockWrapper

* Fix tokenizer

* Update examples/server-async

* Fix server-async

* Optimizations in examples/server-async

* We keep the implementation simple in examples/server-async

* Update examples/server-async/README.md

* Update examples/server-async/README.md for changes to tokenizer locks and backward-compatible retrieve_timesteps

* The changes to the diffusers core have been undone and all logic is being moved to exmaples/server-async

* Update examples/server-async/utils/*

* Fix BaseAsyncScheduler

* Rollback in the core of the diffusers

* Update examples/server-async/README.md

* Complete rollback of diffusers core files

* Simple implementation of an asynchronous server compatible with SD3-3.5 and Flux Pipelines

* Update examples/server-async/README.md

* Fixed import errors in 'examples/server-async/serverasync.py'

* Flux Pipeline Discard

* Update examples/server-async/README.md

* Apply style fixes

* Add thread-safe wrappers for components in pipeline

Refactor requestscopedpipeline.py to add thread-safe wrappers for tokenizer, VAE, and image processor. Introduce locking mechanisms to ensure thread safety during concurrent access.

* Add wrappers.py

* Apply style fixes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-01-09 09:43:14 -10:00
Teriks
57e57cfae0 Store vae.config.scaling_factor to prevent missing attr reference (sdxl advanced dreambooth training script) (#12346)
Store vae.config.scaling_factor to prevent missing attr reference

In sdxl advanced dreambooth training script

vae.config.scaling_factor becomes inaccessible after: del vae

when: --cache_latents, and no --validation_prompt

Co-authored-by: Teriks <Teriks@users.noreply.github.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2026-01-09 09:42:30 -10:00
gapatron
644169433f Laplace Scheduler for DDPM (#11320)
* Add Laplace scheduler that samples more around mid-range noise levels (around log SNR=0), increasing performance (lower FID) with faster convergence speed, and robust to resolution and objective. Reference:  https://arxiv.org/pdf/2407.03297.

* Fix copies.

* Apply style fixes

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-01-09 09:16:02 -10:00
Ishan Modi
632765a5ee [Feature] MultiControlNet support for SD3Impainting (#11251)
* update

* update

* addressed PR comments

* update

* Apply suggestions from code review

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2026-01-09 08:55:16 -10:00
David El Malih
d36564f06a Improve docstrings and type hints in scheduling_consistency_models.py (#12931)
docs: improve docstring scheduling_consistency_models.py
2026-01-09 09:56:56 -08:00
Sayak Paul
441b69eabf [core] Handle progress bar and logging in distributed environments (#12806)
* disable progressbar in distributed.

* up

* up

* up

* up

* up

* up
2026-01-09 22:23:13 +05:30
Sayak Paul
d568c9773f [chore] remove controlnet implementations outside controlnet module. (#12152)
* remove controlnet implementations outside controlnet module.

* fix

* fix

* fix
2026-01-09 21:22:45 +05:30
Sayak Paul
3981c955ce [modular] Tests for custom blocks in modular diffusers (#12557)
* start custom block testing.

* simplify modular workflow ci.

* up

* style.

* up

* up

* up

* up

* up

* up

* Apply suggestions from code review

* up
2026-01-09 15:57:23 +05:30
YiYi Xu
1903383e94 [Modular] qwen refactor (#12872)
* 3 files

* add conditoinal pipeline

* refactor qwen modular

* add layered

* up up

* u p

* add to import

* more refacotr, make layer work

* clean up a bit git add src

* more

* style

* style
2026-01-08 23:38:49 -10:00
Leo Jiang
08f8b7af9a Bugfix for dreambooth flux2 img2img2 (#12825)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2026-01-09 12:34:44 +05:30
Howard Zhang
2f66edc880 Torchao floatx version guard (#12923)
* Adding torchao version guard for floatx usage

Summary: TorchAO removing floatx support, added version guard in quantization_config.py

* Adding torchao version guard for floatx usage

Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support

* Adding torchao version guard for floatx usage

Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support

* Adding torchao version guard for floatx usage

Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support

* Adding torchao version guard for floatx usage

Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support

* Adding torchao version guard for floatx usage

Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-01-09 10:51:53 +05:30
TmacAaron
be38f41f9f [NPU] npu attention enable ulysses (#12919)
* npu attention enable ulysses

* clean the format

* register _native_npu_attention to _supports_context_parallel

Signed-off-by: yyt <yangyit139@gmail.com>

* change npu_fusion_attention's input_layout to BSND to eliminate redundant transpose

Signed-off-by: yyt <yangyit139@gmail.com>

* Update format

---------

Signed-off-by: yyt <yangyit139@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-01-09 10:11:49 +05:30
MSD
91e5134175 fix the warning torch_dtype is deprecated (#12841)
* fix the warning torch_dtype is deprecated

* Add transformers version check (>= 4.56.0) for dtype parameter

* Fix linting errors
2026-01-09 08:35:26 +05:30
Salman Chishti
a812c87465 Upgrade GitHub Actions for Node 24 compatibility (#12865)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-01-09 08:28:58 +05:30
Aditya Borate
8b9f817ef5 Fix: Remove hardcoded CUDA autocast in Kandinsky 5 to fix import warning (#12814)
* Fix: Remove hardcoded CUDA autocast in Kandinsky 5 to fix import warning

* Apply style fixes

* Fix: Remove import-time autocast in Kandinsky to prevent warnings

- Removed @torch.autocast decorator from Kandinsky classes.
- Implemented manual F.linear casting to ensure numerical parity with FP32.
- Verified bit-exact output matches main branch.

Co-authored-by: hlky <hlky@hlky.ac>

* Used _keep_in_fp32_modules to align with standards

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
2026-01-08 09:00:58 -10:00
David El Malih
b1f06b780a Improve docstrings and type hints in scheduling_consistency_decoder.py (#12928)
docs: improve docstring scheduling_consistency_decoder.py
2026-01-08 09:45:38 -08:00
Pauline Bailly-Masson
8600b4c10d Add environment variables to checkout step (#12927) 2026-01-08 13:38:06 +05:30
dg845
c10bdd9b73 Add LTX 2.0 Video Pipelines (#12915)
* Initial LTX 2.0 transformer implementation

* Add tests for LTX 2 transformer model

* Get LTX 2 transformer tests working

* Rename LTX 2 compile test class to have LTX2

* Remove RoPE debug print statements

* Get LTX 2 transformer compile tests passing

* Fix LTX 2 transformer shape errors

* Initial script to convert LTX 2 transformer to diffusers

* Add more LTX 2 transformer audio arguments

* Allow LTX 2 transformer to be loaded from local path for conversion

* Improve dummy inputs and add test for LTX 2 transformer consistency

* Fix LTX 2 transformer bugs so consistency test passes

* Initial implementation of LTX 2.0 video VAE

* Explicitly specify temporal and spatial VAE scale factors when converting

* Add initial LTX 2.0 video VAE tests

* Add initial LTX 2.0 video VAE tests (part 2)

* Get diffusers implementation on par with official LTX 2.0 video VAE implementation

* Initial LTX 2.0 vocoder implementation

* Use RMSNorm implementation closer to original for LTX 2.0 video VAE

* start audio decoder.

* init registration.

* up

* simplify and clean up

* up

* Initial LTX 2.0 text encoder implementation

* Rough initial LTX 2.0 pipeline implementation

* up

* up

* up

* up

* Add imports for LTX 2.0 Audio VAE

* Conversion script for LTX 2.0 Audio VAE Decoder

* Add Audio VAE logic to T2V pipeline

* Duplicate scheduler for audio latents

* Support num_videos_per_prompt for prompt embeddings

* LTX 2.0 scheduler and full pipeline conversion

* Add script to test full LTX2Pipeline T2V inference

* Fix pipeline return bugs

* Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__

* Fix more bugs in LTX2Pipeline.__call__

* Improve CPU offload support

* Fix pipeline audio VAE decoding dtype bug

* Fix video shape error in full pipeline test script

* Get LTX 2 T2V pipeline to produce reasonable outputs

* Make LTX 2.0 scheduler more consistent with original code

* Fix typo when applying scheduler fix in T2V inference script

* Refactor Audio VAE to be simpler and remove helpers (#7)

* remove resolve causality axes stuff.

* remove a bunch of helpers.

* remove adjust output shape helper.

* remove the use of audiolatentshape.

* move normalization and patchify out of pipeline.

* fix

* up

* up

* Remove unpatchify and patchify ops before audio latents denormalization (#9)

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Add support for I2V (#8)

* start i2v.

* up

* up

* up

* up

* up

* remove uniform strategy code.

* remove unneeded code.

* Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11)

* test i2v.

* Move Video and Audio Text Encoder Connectors to Transformer (#12)

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* precompute run_connectors,.

* fixes

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* Make connectors a separate module (#18)

* remove text_encoder.py

* address yiyi's comments.

* up

* up

* up

* up

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>

* up (#19)

* address initial feedback from lightricks team (#16)

* cross_attn_timestep_scale_multiplier to 1000

* implement split rope type.

* up

* propagate rope_type to rope embed classes as well.

* up

* When using split RoPE, make sure that the output dtype is same as input dtype

* Fix apply split RoPE shape error when reshaping x to 4D

* Add export_utils file for exporting LTX 2.0 videos with audio

* Tests for T2V and I2V (#6)

* add ltx2 pipeline tests.

* up

* up

* up

* up

* remove content

* style

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* up

* up

* i2v tests.

* up

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* revert unneded changes.

* up

* up

* update to split style rope.

* up

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* up

* use export util funcs.

* Point original checkpoint to LTX 2.0 official checkpoint

* Allow the I2V pipeline to accept image URLs

* make style and make quality

* remove function map.

* remove args.

* update docs.

* update doc entries.

* disable ltx2_consistency test

* Simplify LTX 2 RoPE forward by removing coords is None logic

* make style and make quality

* Support LTX 2.0 audio VAE encoder

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Remove print statement in audio VAE

* up

* Fix bug when calculating audio RoPE coords

* Ltx 2 latent upsample pipeline (#12922)

* Initial implementation of LTX 2.0 latent upsampling pipeline

* Add new LTX 2.0 spatial latent upsampler logic

* Add test script for LTX 2.0 latent upsampling

* Add option to enable VAE tiling in upsampling test script

* Get latent upsampler working with video latents

* Fix typo in BlurDownsample

* Add latent upsample pipeline docstring and example

* Remove deprecated pipeline VAE slicing/tiling methods

* make style and make quality

* When returning latents, return unpacked and denormalized latents for T2V and I2V

* Add model_cpu_offload_seq for latent upsampling pipeline

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* Fix latent upsampler filename in LTX 2 conversion script

* Add latent upsample pipeline to LTX 2 docs

* Add dummy objects for LTX 2 latent upsample pipeline

* Set default FPS to official LTX 2 ckpt default of 24.0

* Set default CFG scale to official LTX 2 ckpt default of 4.0

* Update LTX 2 pipeline example docstrings

* make style and make quality

* Remove LTX 2 test scripts

* Fix LTX 2 upsample pipeline example docstring

* Add logic to convert and save a LTX 2 upsampling pipeline

* Document LTX2VideoTransformer3DModel forward pass

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2026-01-07 21:24:27 -08:00
Álvaro Somoza
dab000e88b [Modular] Video for Mellon (#12924)
num_frames and videos
2026-01-07 12:35:59 -10:00
David El Malih
9fb6b89d49 Improve docstrings and type hints in scheduling_edm_euler.py (#12871)
* docs: add comprehensive docstrings and refine type hints for EDM scheduler methods and config parameters.

* refactor: Add type hints to DPM-Solver scheduler methods.
2026-01-07 11:18:00 -08:00
Sayak Paul
6fb4c99f5a Update wan.md to remove unneeded hfoptions (#12890) 2026-01-07 09:47:19 -08:00
Sayak Paul
961b9b27d3 [docs] fix torchao typo. (#12883)
fix torchao typo.
2026-01-07 09:43:02 -08:00
Tolga Cangöz
8f30bfff1f Add transformer cache context for SkyReels-V2 pipelines & Update docs (#12837)
* feat: Add transformer cache context for conditional and unconditional predictions for skyreels-v2 pipes.

* docs: Remove SkyReels-V2 FLF2V model link and add contributor attribution.
2026-01-06 22:30:30 -10:00
Leo Jiang
b4be29bda2 Add FSDP option for Flux2 (#12860)
* Add FSDP option for Flux2

* Apply style fixes

* Add FSDP option for Flux2

* Add FSDP option for Flux2

* Add FSDP option for Flux2

* Add FSDP option for Flux2

* Add FSDP option for Flux2

* Update examples/dreambooth/README_flux2.md

* guard accelerate import.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-01-07 13:17:46 +05:30
Hu Yaoqi
98479a94c2 LTX Video 0.9.8 long multi prompt (#12614)
* LTX Video 0.9.8  long multi prompt

* Further align comfyui

- Added the “LTXEulerAncestralRFScheduler” scheduler, aligned with [sample_euler_ancestral_RF](7d6103325e/comfy/k_diffusion/sampling.py (L234))

- Updated the LTXI2VLongMultiPromptPipeline.from_pretrained() method:
  - Now uses LTXEulerAncestralRFScheduler by default, for better compatibility with the ComfyUI LTXV workflow.

- Changed the default value of cond_strength from 1.0 to 0.5, aligning with ComfyUI’s default.

- Optimized cross-window overlap blending: moved the latent-space guidance injection to before the UNet and after each step, aligned with[KSamplerX0Inpaint]([ComfyUI/comfy/samplers.py at master · comfyanonymous/ComfyUI](https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/samplers.py#L391))

- Adjusted the default value of skip_steps_sigma_threshold to 1.

* align with diffusers contribute rule

* Add new pipelines and update imports

* Enhance LTXI2VLongMultiPromptPipeline with noise rescaling

Refactor LTXI2VLongMultiPromptPipeline to improve documentation and add noise rescaling functionality.

* Clean up comments in scheduling_ltx_euler_ancestral_rf.py

Removed design notes and limitations from the implementation.

* Enhance video generation example with scheduler

Updated LTXI2VLongMultiPromptPipeline example to include LTXEulerAncestralRFScheduler for ComfyUI parity.

* clean up

* style

* copies

* import ltx scheduler

* copies

* fix

* fix more

* up up

* up up up

* up upup

* Apply suggestions from code review

* Update docs/source/en/api/pipelines/ltx_video.md

* Update docs/source/en/api/pipelines/ltx_video.md

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2026-01-06 18:18:04 -10:00
zhangtao0408
ade1059ae2 [Flux.1] improve pos embed for ascend npu by computing on npu (#12897)
* [Flux.1] improve pos embed for ascend npu by setting it back to npu computation.

* [Flux.2] improve pos embed for ascend npu by setting it back to npu computation.

* [LongCat-Image] improve pos embed for ascend npu by setting it back to npu computation.

* [Ovis-Image] improve pos embed for ascend npu by setting it back to npu computation.

* Remove unused import of is_torch_npu_available

---------

Co-authored-by: zhangtao <zhangtao529@huawei.com>
2026-01-06 08:48:04 -10:00
dxqb
41a6e86faf Check for attention mask in backends that don't support it (#12892)
* check attention mask

* Apply style fixes

* bugfix

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-01-06 22:52:12 +05:30
Pauline Bailly-Masson
9b5a244653 CodeQL workflow for security analysis 2026-01-06 17:26:08 +01:00
Pauline Bailly-Masson
417f6b2d33 Delete .github/workflows/codeql.yml 2026-01-06 17:25:38 +01:00
Pauline Bailly-Masson
e46354d2d0 Add codeQL workflow (#12917)
Updated CodeQL workflow to use reusable workflow from Hugging Face and simplified language matrix.
2026-01-06 17:19:48 +01:00
Pauline Bailly-Masson
db37140474 Refactor environment variable assignments in workflow (#12916) 2026-01-06 13:39:18 +01:00
hlky
88ffb00139 Detect 2.0 vs 2.1 ZImageControlNetModel (#12861)
* Detect 2.0 vs 2.1 ZImageControlNetModel

* Possibility of control_noise_refiner being removed
2026-01-05 20:28:52 -10:00
Sayak Paul
b6098ca006 [core] remove unneeded autoencoder methods when subclassing from AutoencoderMixin (#12873)
up
2026-01-05 19:43:54 -10:00
Sayak Paul
7c6d314549 fix the use of device_map in CP docs (#12902)
up
2026-01-05 19:42:32 -10:00
DefTruth
3138e37fe6 Fix wan 2.1 i2v context parallel (#12909)
* fix wan 2.1 i2v context parallel

* fix wan 2.1 i2v context parallel

* fix wan 2.1 i2v context parallel

* format
2026-01-06 07:42:53 +05:30
Miguel Martin
0da1aa90b5 Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py (#12914) 2026-01-05 15:44:39 -10:00
Jefri Haryono
5ffb65803d Community Pipeline: Add z-image differential img2img (#12882)
* Community Pipeline: Add z-image differential img2img

* add pipeline for z-image differential img2img diffusion examples : run make style , make quality, and fix white spaces in example doc string.

---------

Co-authored-by: r4inm4ker <jefri.yeh@gmail.com>
2026-01-05 09:53:52 -03:00