1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-29 07:22:12 +03:00
Commit Graph

6172 Commits

Author SHA1 Message Date
Daniel Gu
aa9b65d0fc When returning latents, return unpacked and denormalized latents for T2V and I2V 2026-01-07 09:04:34 +01:00
Daniel Gu
e6e7e7b26f make style and make quality 2026-01-07 08:07:24 +01:00
Daniel Gu
5e48a114b5 Remove deprecated pipeline VAE slicing/tiling methods 2026-01-07 08:06:07 +01:00
Daniel Gu
32df138fef Add latent upsample pipeline docstring and example 2026-01-07 08:03:41 +01:00
Daniel Gu
0637b549a0 Fix typo in BlurDownsample 2026-01-07 03:36:19 +01:00
Daniel Gu
8f1ddb1b1e Get latent upsampler working with video latents 2026-01-07 01:58:25 +01:00
Daniel Gu
245d056c7d Add option to enable VAE tiling in upsampling test script 2026-01-06 08:07:33 +01:00
Daniel Gu
a7d6916afc Add test script for LTX 2.0 latent upsampling 2026-01-06 05:58:31 +01:00
Daniel Gu
84c0b2fb84 Merge branch 'ltx-2-transformer' into ltx-2-latent-upsample-pipeline 2026-01-06 04:53:42 +01:00
Daniel Gu
d97fd2dd35 Add new LTX 2.0 spatial latent upsampler logic 2026-01-06 04:47:06 +01:00
sayakpaul
550eca3530 use export util funcs. 2026-01-06 09:14:38 +05:30
sayakpaul
c039c87b99 up 2026-01-06 08:09:59 +05:30
sayakpaul
9b8788cc98 resolve conflicts. 2026-01-06 08:09:37 +05:30
Sayak Paul
93a417f24a Tests for T2V and I2V (#6)
* add ltx2 pipeline tests.

* up

* up

* up

* up

* remove content

* style

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* up

* up

* i2v tests.

* up

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* revert unneded changes.

* up

* up

* update to split style rope.

* up

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>
2026-01-06 08:05:30 +05:30
Daniel Gu
084490cd98 Merge branch 'ltx-2-transformer' into ltx-2-latent-upsample-pipeline 2026-01-06 03:29:38 +01:00
dg845
ce9da5d472 Merge pull request #20 from huggingface/video-export-utils-file
Add export_utils file for exporting LTX 2.0 videos with audio
2026-01-05 18:25:29 -08:00
Daniel Gu
90516804e0 Merge branch 'ltx-2-transformer' into ltx-2-latent-upsample-pipeline 2026-01-06 03:18:51 +01:00
Daniel Gu
cb50cacba5 Add export_utils file for exporting LTX 2.0 videos with audio 2026-01-06 02:17:39 +01:00
Daniel Gu
bff989110c Fix apply split RoPE shape error when reshaping x to 4D 2026-01-06 01:22:05 +01:00
Daniel Gu
2fa4f8471f When using split RoPE, make sure that the output dtype is same as input dtype 2026-01-06 00:19:39 +01:00
Sayak Paul
c5b52d6c9f address initial feedback from lightricks team (#16)
* cross_attn_timestep_scale_multiplier to 1000

* implement split rope type.

* up

* propagate rope_type to rope embed classes as well.

* up
2026-01-05 21:13:10 +05:30
Sayak Paul
0be4f31620 up (#19) 2026-01-05 21:13:01 +05:30
dg845
caae16768a Move Video and Audio Text Encoder Connectors to Transformer (#12)
* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* precompute run_connectors,.

* fixes

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* Make connectors a separate module (#18)

* remove text_encoder.py

* address yiyi's comments.

* up

* up

* up

* up

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2026-01-05 20:11:13 +05:30
Daniel Gu
fe3ba3b698 Initial implementation of LTX 2.0 latent upsampling pipeline 2026-01-02 20:18:32 +01:00
hlky
47378066c0 Z-Image-Turbo from_single_file fix (#12888) 2026-01-02 22:29:24 +05:30
Maxim Balabanski
208cda8f6d fix Qwen Image Transformer single file loading mapping function to be consistent with other loader APIs (#12894)
fix Qwen single file loading to be consistent with other loader API
2026-01-02 12:59:11 +05:30
dg845
aae70b90db Merge pull request #10 from huggingface/make-scheduler-consistent
Make LTX 2.0 Scheduler `sigmas` Consistent with Original Code
2025-12-31 13:46:47 -08:00
sayakpaul
d3f10fe54e test i2v. 2025-12-31 09:36:48 +05:30
dg845
bd607b97a8 Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11) 2025-12-31 09:23:35 +05:30
Daniel Gu
6a236a27fb Merge branch 'ltx-2-transformer' into make-scheduler-consistent 2025-12-30 20:25:59 +01:00
Vasiliy Kuznetsov
1cdb8723b8 fix torchao quantizer for new torchao versions (#12901)
* fix torchao quantizer for new torchao versions

Summary:

`torchao==0.16.0` (not yet released) has some bc-breaking changes, this
PR fixes the diffusers repo with those changes. Specifics on the
changes:
1. `UInt4Tensor` is removed: https://github.com/pytorch/ao/pull/3536
2. old float8 tensors v1 are removed: https://github.com/pytorch/ao/pull/3510

In this PR:
1. move the logger variable up (not sure why it was in the middle of the
   file before) to get better error messages
2. gate the old torchao objects by torchao version

Test Plan:

import diffusers objects with new versions of torchao works:

```bash
> python -c "import torchao; print(torchao.__version__); from diffusers import StableDiffusionPipeline"
0.16.0.dev20251229+cu129
```

Reviewers:

Subscribers:

Tasks:

Tags:

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-12-30 10:04:54 +05:30
Sayak Paul
46822c43db Add support for I2V (#8)
* start i2v.

* up

* up

* up

* up

* up

* remove uniform strategy code.

* remove unneeded code.
2025-12-30 09:06:07 +05:30
Sayak Paul
280e347814 Refactor Audio VAE to be simpler and remove helpers (#7)
* remove resolve causality axes stuff.

* remove a bunch of helpers.

* remove adjust output shape helper.

* remove the use of audiolatentshape.

* move normalization and patchify out of pipeline.

* fix

* up

* up

* Remove unpatchify and patchify ops before audio latents denormalization (#9)

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
2025-12-30 08:05:56 +05:30
Daniel Gu
e1f0b7e255 Fix typo when applying scheduler fix in T2V inference script 2025-12-30 00:38:51 +01:00
Daniel Gu
581f21c431 Make LTX 2.0 scheduler more consistent with original code 2025-12-29 23:44:52 +01:00
RuoyiDu
f6b6a7181e Add z-image-omni-base implementation (#12857)
* Add z-image-omni-base implementation

* Merged into one transformer for Z-Image.

* Fix bugs for controlnet after merging the main branch new feature.

* Fix for auto_pipeline, Add Styling.

* Refactor noise handling and modulation

- Add select_per_token function for per-token value selection
- Separate adaptive modulation logic
- Cleanify t_noisy/clean variable naming
- Move image_noise_mask handler from forward to pipeline

* Styling & Formatting.

* Rewrite code with more non-forward func & clean forward.

1.Change to one forward with shorter code with omni code (None).
2.Split out non-forward funcs: _build_unified_sequence, _prepare_sequence, patchify, pad.

* Styling & Formatting.

* Manual check fix-copies in controlnet, Add select_per_token, _patchify_image, _pad_with_ids; Styling.

* Add Import in pipeline __init__.py.

---------

Co-authored-by: Jerry Qilong Wu <xinglong.wql@alibaba-inc.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-12-23 23:45:35 -10:00
dg845
0c41297453 Merge pull request #4 from huggingface/ltx-2-t2v-pipeline
LTX 2.0 Text-to-Video (T2V) Pipeline
2025-12-23 21:29:25 -08:00
Daniel Gu
b5891b19b1 Get LTX 2 T2V pipeline to produce reasonable outputs 2025-12-24 06:07:38 +01:00
Alvaro Bartolome
52766e6a69 Use T5Tokenizer instead of MT5Tokenizer (removed in Transformers v5.0+) (#12877)
Use `T5Tokenizer` instead of `MT5Tokenizer`

Given that the `MT5Tokenizer` in `transformers` is just a "re-export" of
`T5Tokenizer` as per
https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/models/mt5/tokenization_mt5.py
)on latest available stable Transformers i.e., v4.57.3), this commit
updates the imports to point to `T5Tokenizer` instead, so that those
still work with Transformers v5.0.0rc0 onwards.
2025-12-23 06:57:41 -10:00
Daniel Gu
e89d9c1951 Fix video shape error in full pipeline test script 2025-12-23 11:14:05 +01:00
Daniel Gu
f9b947651f Fix pipeline audio VAE decoding dtype bug 2025-12-23 11:03:19 +01:00
Daniel Gu
1484c43183 Improve CPU offload support 2025-12-23 10:56:32 +01:00
Daniel Gu
90edc6abc9 Fix more bugs in LTX2Pipeline.__call__ 2025-12-23 10:41:27 +01:00
Daniel Gu
a56cf23483 Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__ 2025-12-23 10:40:56 +01:00
Daniel Gu
fa7d9f77f1 Fix pipeline return bugs 2025-12-23 08:49:11 +01:00
Daniel Gu
3bf736979f Add script to test full LTX2Pipeline T2V inference 2025-12-23 08:43:37 +01:00
Daniel Gu
595f485ad8 LTX 2.0 scheduler and full pipeline conversion 2025-12-23 07:41:28 +01:00
Daniel Gu
cbb10b8dca Support num_videos_per_prompt for prompt embeddings 2025-12-23 07:01:17 +01:00
Daniel Gu
6e6ce20595 Duplicate scheduler for audio latents 2025-12-23 06:40:35 +01:00
Daniel Gu
54bfc5d617 Add Audio VAE logic to T2V pipeline 2025-12-23 03:51:22 +01:00