diffusers

mirror of https://github.com/huggingface/diffusers.git synced 2026-01-29 07:22:12 +03:00

Author	SHA1	Message	Date
Daniel Gu	aa9b65d0fc	When returning latents, return unpacked and denormalized latents for T2V and I2V	2026-01-07 09:04:34 +01:00
Daniel Gu	e6e7e7b26f	make style and make quality	2026-01-07 08:07:24 +01:00
Daniel Gu	5e48a114b5	Remove deprecated pipeline VAE slicing/tiling methods	2026-01-07 08:06:07 +01:00
Daniel Gu	32df138fef	Add latent upsample pipeline docstring and example	2026-01-07 08:03:41 +01:00
Daniel Gu	0637b549a0	Fix typo in BlurDownsample	2026-01-07 03:36:19 +01:00
Daniel Gu	8f1ddb1b1e	Get latent upsampler working with video latents	2026-01-07 01:58:25 +01:00
Daniel Gu	245d056c7d	Add option to enable VAE tiling in upsampling test script	2026-01-06 08:07:33 +01:00
Daniel Gu	a7d6916afc	Add test script for LTX 2.0 latent upsampling	2026-01-06 05:58:31 +01:00
Daniel Gu	84c0b2fb84	Merge branch 'ltx-2-transformer' into ltx-2-latent-upsample-pipeline	2026-01-06 04:53:42 +01:00
Daniel Gu	d97fd2dd35	Add new LTX 2.0 spatial latent upsampler logic	2026-01-06 04:47:06 +01:00
sayakpaul	550eca3530	use export util funcs.	2026-01-06 09:14:38 +05:30
sayakpaul	c039c87b99	up	2026-01-06 08:09:59 +05:30
sayakpaul	9b8788cc98	resolve conflicts.	2026-01-06 08:09:37 +05:30
Sayak Paul	93a417f24a	Tests for T2V and I2V (#6 ) * add ltx2 pipeline tests. * up * up * up * up * remove content * style * Denormalize audio latents in I2V pipeline (analogous to T2V change) * Initial refactor to put video and audio text encoder connectors in transformer * Get LTX 2 transformer tests working after connector refactor * up * up * i2v tests. * up * Address review comments * Calculate RoPE double precisions freqs using torch instead of np * Further simplify LTX 2 RoPE freq calc * revert unneded changes. * up * up * update to split style rope. * up --------- Co-authored-by: Daniel Gu <dgu8957@gmail.com>	2026-01-06 08:05:30 +05:30
Daniel Gu	084490cd98	Merge branch 'ltx-2-transformer' into ltx-2-latent-upsample-pipeline	2026-01-06 03:29:38 +01:00
dg845	ce9da5d472	Merge pull request #20 from huggingface/video-export-utils-file Add export_utils file for exporting LTX 2.0 videos with audio	2026-01-05 18:25:29 -08:00
Daniel Gu	90516804e0	Merge branch 'ltx-2-transformer' into ltx-2-latent-upsample-pipeline	2026-01-06 03:18:51 +01:00
Daniel Gu	cb50cacba5	Add export_utils file for exporting LTX 2.0 videos with audio	2026-01-06 02:17:39 +01:00
Daniel Gu	bff989110c	Fix apply split RoPE shape error when reshaping x to 4D	2026-01-06 01:22:05 +01:00
Daniel Gu	2fa4f8471f	When using split RoPE, make sure that the output dtype is same as input dtype	2026-01-06 00:19:39 +01:00
Sayak Paul	c5b52d6c9f	address initial feedback from lightricks team (#16 ) * cross_attn_timestep_scale_multiplier to 1000 * implement split rope type. * up * propagate rope_type to rope embed classes as well. * up	2026-01-05 21:13:10 +05:30
Sayak Paul	0be4f31620	up (#19 )	2026-01-05 21:13:01 +05:30
dg845	caae16768a	Move Video and Audio Text Encoder Connectors to Transformer (#12 ) * Denormalize audio latents in I2V pipeline (analogous to T2V change) * Initial refactor to put video and audio text encoder connectors in transformer * Get LTX 2 transformer tests working after connector refactor * precompute run_connectors,. * fixes * Address review comments * Calculate RoPE double precisions freqs using torch instead of np * Further simplify LTX 2 RoPE freq calc * Make connectors a separate module (#18) * remove text_encoder.py * address yiyi's comments. * up * up * up * up --------- Co-authored-by: sayakpaul <spsayakpaul@gmail.com>	2026-01-05 20:11:13 +05:30
Daniel Gu	fe3ba3b698	Initial implementation of LTX 2.0 latent upsampling pipeline	2026-01-02 20:18:32 +01:00
hlky	47378066c0	Z-Image-Turbo from_single_file fix (#12888 )	2026-01-02 22:29:24 +05:30
Maxim Balabanski	208cda8f6d	fix Qwen Image Transformer single file loading mapping function to be consistent with other loader APIs (#12894 ) fix Qwen single file loading to be consistent with other loader API	2026-01-02 12:59:11 +05:30
dg845	aae70b90db	Merge pull request #10 from huggingface/make-scheduler-consistent Make LTX 2.0 Scheduler `sigmas` Consistent with Original Code	2025-12-31 13:46:47 -08:00
sayakpaul	d3f10fe54e	test i2v.	2025-12-31 09:36:48 +05:30
dg845	bd607b97a8	Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11 )	2025-12-31 09:23:35 +05:30
Daniel Gu	6a236a27fb	Merge branch 'ltx-2-transformer' into make-scheduler-consistent	2025-12-30 20:25:59 +01:00
Vasiliy Kuznetsov	1cdb8723b8	fix torchao quantizer for new torchao versions (#12901 ) * fix torchao quantizer for new torchao versions Summary: `torchao==0.16.0` (not yet released) has some bc-breaking changes, this PR fixes the diffusers repo with those changes. Specifics on the changes: 1. `UInt4Tensor` is removed: https://github.com/pytorch/ao/pull/3536 2. old float8 tensors v1 are removed: https://github.com/pytorch/ao/pull/3510 In this PR: 1. move the logger variable up (not sure why it was in the middle of the file before) to get better error messages 2. gate the old torchao objects by torchao version Test Plan: import diffusers objects with new versions of torchao works: ```bash > python -c "import torchao; print(torchao.__version__); from diffusers import StableDiffusionPipeline" 0.16.0.dev20251229+cu129 ``` Reviewers: Subscribers: Tasks: Tags: * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-12-30 10:04:54 +05:30
Sayak Paul	46822c43db	Add support for I2V (#8 ) * start i2v. * up * up * up * up * up * remove uniform strategy code. * remove unneeded code.	2025-12-30 09:06:07 +05:30
Sayak Paul	280e347814	Refactor Audio VAE to be simpler and remove helpers (#7 ) * remove resolve causality axes stuff. * remove a bunch of helpers. * remove adjust output shape helper. * remove the use of audiolatentshape. * move normalization and patchify out of pipeline. * fix * up * up * Remove unpatchify and patchify ops before audio latents denormalization (#9) --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>	2025-12-30 08:05:56 +05:30
Daniel Gu	e1f0b7e255	Fix typo when applying scheduler fix in T2V inference script	2025-12-30 00:38:51 +01:00
Daniel Gu	581f21c431	Make LTX 2.0 scheduler more consistent with original code	2025-12-29 23:44:52 +01:00
RuoyiDu	f6b6a7181e	Add z-image-omni-base implementation (#12857 ) * Add z-image-omni-base implementation * Merged into one transformer for Z-Image. * Fix bugs for controlnet after merging the main branch new feature. * Fix for auto_pipeline, Add Styling. * Refactor noise handling and modulation - Add select_per_token function for per-token value selection - Separate adaptive modulation logic - Cleanify t_noisy/clean variable naming - Move image_noise_mask handler from forward to pipeline * Styling & Formatting. * Rewrite code with more non-forward func & clean forward. 1.Change to one forward with shorter code with omni code (None). 2.Split out non-forward funcs: _build_unified_sequence, _prepare_sequence, patchify, pad. * Styling & Formatting. * Manual check fix-copies in controlnet, Add select_per_token, _patchify_image, _pad_with_ids; Styling. * Add Import in pipeline __init__.py. --------- Co-authored-by: Jerry Qilong Wu <xinglong.wql@alibaba-inc.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>	2025-12-23 23:45:35 -10:00
dg845	0c41297453	Merge pull request #4 from huggingface/ltx-2-t2v-pipeline LTX 2.0 Text-to-Video (T2V) Pipeline	2025-12-23 21:29:25 -08:00
Daniel Gu	b5891b19b1	Get LTX 2 T2V pipeline to produce reasonable outputs	2025-12-24 06:07:38 +01:00
Alvaro Bartolome	52766e6a69	Use `T5Tokenizer` instead of `MT5Tokenizer` (removed in Transformers v5.0+) (#12877 ) Use `T5Tokenizer` instead of `MT5Tokenizer` Given that the `MT5Tokenizer` in `transformers` is just a "re-export" of `T5Tokenizer` as per https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/models/mt5/tokenization_mt5.py )on latest available stable Transformers i.e., v4.57.3), this commit updates the imports to point to `T5Tokenizer` instead, so that those still work with Transformers v5.0.0rc0 onwards.	2025-12-23 06:57:41 -10:00
Daniel Gu	e89d9c1951	Fix video shape error in full pipeline test script	2025-12-23 11:14:05 +01:00
Daniel Gu	f9b947651f	Fix pipeline audio VAE decoding dtype bug	2025-12-23 11:03:19 +01:00
Daniel Gu	1484c43183	Improve CPU offload support	2025-12-23 10:56:32 +01:00
Daniel Gu	90edc6abc9	Fix more bugs in LTX2Pipeline.__call__	2025-12-23 10:41:27 +01:00
Daniel Gu	a56cf23483	Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__	2025-12-23 10:40:56 +01:00
Daniel Gu	fa7d9f77f1	Fix pipeline return bugs	2025-12-23 08:49:11 +01:00
Daniel Gu	3bf736979f	Add script to test full LTX2Pipeline T2V inference	2025-12-23 08:43:37 +01:00
Daniel Gu	595f485ad8	LTX 2.0 scheduler and full pipeline conversion	2025-12-23 07:41:28 +01:00
Daniel Gu	cbb10b8dca	Support num_videos_per_prompt for prompt embeddings	2025-12-23 07:01:17 +01:00
Daniel Gu	6e6ce20595	Duplicate scheduler for audio latents	2025-12-23 06:40:35 +01:00
Daniel Gu	54bfc5d617	Add Audio VAE logic to T2V pipeline	2025-12-23 03:51:22 +01:00

1 2 3 4 5 ...

6172 Commits