diffusers

mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Author	SHA1	Message	Date
Daniel Gu	dd81242eba	make style and make quality	2026-01-06 06:42:24 +01:00
Daniel Gu	ace2ee93fb	Allow the I2V pipeline to accept image URLs	2026-01-06 06:40:42 +01:00
Daniel Gu	ef199118e2	Point original checkpoint to LTX 2.0 official checkpoint	2026-01-06 06:35:51 +01:00
sayakpaul	550eca3530	use export util funcs.	2026-01-06 09:14:38 +05:30
sayakpaul	9b8788cc98	resolve conflicts.	2026-01-06 08:09:37 +05:30
Sayak Paul	93a417f24a	Tests for T2V and I2V (#6 ) * add ltx2 pipeline tests. * up * up * up * up * remove content * style * Denormalize audio latents in I2V pipeline (analogous to T2V change) * Initial refactor to put video and audio text encoder connectors in transformer * Get LTX 2 transformer tests working after connector refactor * up * up * i2v tests. * up * Address review comments * Calculate RoPE double precisions freqs using torch instead of np * Further simplify LTX 2 RoPE freq calc * revert unneded changes. * up * up * update to split style rope. * up --------- Co-authored-by: Daniel Gu <dgu8957@gmail.com>	2026-01-06 08:05:30 +05:30
Sayak Paul	c5b52d6c9f	address initial feedback from lightricks team (#16 ) * cross_attn_timestep_scale_multiplier to 1000 * implement split rope type. * up * propagate rope_type to rope embed classes as well. * up	2026-01-05 21:13:10 +05:30
Sayak Paul	0be4f31620	up (#19 )	2026-01-05 21:13:01 +05:30
dg845	caae16768a	Move Video and Audio Text Encoder Connectors to Transformer (#12 ) * Denormalize audio latents in I2V pipeline (analogous to T2V change) * Initial refactor to put video and audio text encoder connectors in transformer * Get LTX 2 transformer tests working after connector refactor * precompute run_connectors,. * fixes * Address review comments * Calculate RoPE double precisions freqs using torch instead of np * Further simplify LTX 2 RoPE freq calc * Make connectors a separate module (#18) * remove text_encoder.py * address yiyi's comments. * up * up * up * up --------- Co-authored-by: sayakpaul <spsayakpaul@gmail.com>	2026-01-05 20:11:13 +05:30
dg845	aae70b90db	Merge pull request #10 from huggingface/make-scheduler-consistent Make LTX 2.0 Scheduler `sigmas` Consistent with Original Code	2025-12-31 13:46:47 -08:00
sayakpaul	d3f10fe54e	test i2v.	2025-12-31 09:36:48 +05:30
Daniel Gu	6a236a27fb	Merge branch 'ltx-2-transformer' into make-scheduler-consistent	2025-12-30 20:25:59 +01:00
Sayak Paul	280e347814	Refactor Audio VAE to be simpler and remove helpers (#7 ) * remove resolve causality axes stuff. * remove a bunch of helpers. * remove adjust output shape helper. * remove the use of audiolatentshape. * move normalization and patchify out of pipeline. * fix * up * up * Remove unpatchify and patchify ops before audio latents denormalization (#9) --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>	2025-12-30 08:05:56 +05:30
Daniel Gu	e1f0b7e255	Fix typo when applying scheduler fix in T2V inference script	2025-12-30 00:38:51 +01:00
Daniel Gu	581f21c431	Make LTX 2.0 scheduler more consistent with original code	2025-12-29 23:44:52 +01:00
Daniel Gu	b5891b19b1	Get LTX 2 T2V pipeline to produce reasonable outputs	2025-12-24 06:07:38 +01:00
Daniel Gu	e89d9c1951	Fix video shape error in full pipeline test script	2025-12-23 11:14:05 +01:00
Daniel Gu	1484c43183	Improve CPU offload support	2025-12-23 10:56:32 +01:00
Daniel Gu	fa7d9f77f1	Fix pipeline return bugs	2025-12-23 08:49:11 +01:00
Daniel Gu	3bf736979f	Add script to test full LTX2Pipeline T2V inference	2025-12-23 08:43:37 +01:00
Daniel Gu	595f485ad8	LTX 2.0 scheduler and full pipeline conversion	2025-12-23 07:41:28 +01:00
Daniel Gu	ae3b6e7cc2	Merge branch 'ltx-2-transformer' into ltx-2-t2v-pipeline	2025-12-23 02:59:33 +01:00
Daniel Gu	d303e2a6ff	Conversion script for LTX 2.0 Audio VAE Decoder	2025-12-23 02:48:15 +01:00
Miguel Martin	973a077c6a	Cosmos Predict2.5 14b Conversion (#12863 ) 14b conversion	2025-12-22 08:02:06 -10:00
sayakpaul	409d651bab	resolve conflicts.	2025-12-22 15:59:31 +05:30
Sayak Paul	059999a3f7	up	2025-12-22 10:24:55 +00:00
sayakpaul	58257eb0e0	up	2025-12-22 15:45:56 +05:30
Sayak Paul	5f0f2a03f7	up	2025-12-22 10:06:39 +00:00
Daniel Gu	0028955c37	Initial LTX 2.0 text encoder implementation	2025-12-22 10:06:01 +01:00
sayakpaul	4904fd6fa5	up	2025-12-22 13:46:58 +05:30
sayakpaul	907896d533	simplify and clean up	2025-12-22 13:41:41 +05:30
sayakpaul	e54cd6bb1d	up	2025-12-22 13:03:40 +05:30
Daniel Gu	c6a11a5530	Initial LTX 2.0 vocoder implementation	2025-12-19 12:17:10 +01:00
Daniel Gu	a748975a7c	Get diffusers implementation on par with official LTX 2.0 video VAE implementation	2025-12-19 07:02:38 +01:00
Miguel Martin	b5309683cb	Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion (#12852 ) * cosmos predict2.5 base: convert chkpt & pipeline - New scheduler: scheduling_flow_unipc_multistep.py - Changes to TransformerCosmos for text embeddings via crossattn_proj * scheduler cleanup * simplify inference pipeline * cleanup scheduler + tests * Basic tests for flow unipc * working b2b inference * Rename everything * Tests for pipeline present, but not working (predict2 also not working) * docstring update * wrapper pipelines + make style * remove unnecessary files * UniPCMultistep: support use_karras_sigmas=True and use_flow_sigmas=True * use UniPCMultistepScheduler + fix tests for pipeline * Remove FlowUniPCMultistepScheduler * UniPCMultistepScheduler for use_flow_sigmas=True & use_karras_sigmas=True * num_inference_steps=36 due to bug in scheduler used by predict2.5 * Address comments * make style + make fix-copies * fix tests + remove references to old pipelines * address comments * add revision in from_pretrained call * fix tests	2025-12-19 05:38:18 +05:30
Daniel Gu	baf23e2da3	Explicitly specify temporal and spatial VAE scale factors when converting	2025-12-17 11:14:45 +01:00
Daniel Gu	269cf7b40d	Initial implementation of LTX 2.0 video VAE	2025-12-17 10:51:34 +01:00
Daniel Gu	57a8b9c330	Allow LTX 2 transformer to be loaded from local path for conversion	2025-12-16 10:38:03 +01:00
Daniel Gu	a5f2d2da6c	Initial script to convert LTX 2 transformer to diffusers	2025-12-15 07:09:42 +01:00
YiYi Xu	671149e036	[HunyuanVideo1.5] support step-distilled (#12802 ) * support step-distilled * style	2025-12-07 21:50:36 -10:00
Guo-Hua Wang	4f136f842c	Add support for Ovis-Image (#12740 ) * add ovis_image * fix code quality * optimize pipeline_ovis_image.py according to the feedbacks * optimize imports * add docs * make style * make style * add ovis to toctree * oops --------- Co-authored-by: YiYi Xu <yixu310@gmail.com>	2025-12-02 11:48:07 -10:00
YiYi Xu	6156cf8f22	Hunyuanvideo15 (#12696 ) * add --------- Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-161-123.ec2.internal> Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-30 20:27:59 -10:00
Sayak Paul	5ffb73d4ae	let's go Flux2 🚀 (#12711 ) * add vae * Initial commit for Flux 2 Transformer implementation * add pipeline part * small edits to the pipeline and conversion * update conversion script * fix * up up * finish pipeline * Remove Flux IP Adapter logic for now * Remove deprecated 3D id logic * Remove ControlNet logic for now * Add link to ViT-22B paper as reference for parallel transformer blocks such as the Flux 2 single stream block * update pipeline * Don't use biases for input projs and output AdaNorm * up * Remove bias for double stream block text QKV projections * Add script to convert Flux 2 transformer to diffusers * make style and make quality * fix a few things. * allow sft files to go. * fix image processor * fix batch * style a bit * Fix some bugs in Flux 2 transformer implementation * Fix dummy input preparation and fix some test bugs * fix dtype casting in timestep guidance module. * resolve conflicts., * remove ip adapter stuff. * Fix Flux 2 transformer consistency test * Fix bug in Flux2TransformerBlock (double stream block) * Get remaining Flux 2 transformer tests passing * make style; make quality; make fix-copies * remove stuff. * fix type annotaton. * remove unneeded stuff from tests * tests * up * up * add sf support * Remove unused IP Adapter and ControlNet logic from transformer (#9) * copied from * Apply suggestions from code review Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: apolinário <joaopaulo.passos@gmail.com> * up * up * up * up * up * Refactor Flux2Attention into separate classes for double stream and single stream attention * Add _supports_qkv_fusion to AttentionModuleMixin to allow subclasses to disable QKV fusion * Have Flux2ParallelSelfAttention inherit from AttentionModuleMixin with _supports_qkv_fusion=False * Log debug message when calling fuse_projections on a AttentionModuleMixin subclass that does not support QKV fusion * Address review comments * Update src/diffusers/pipelines/flux2/pipeline_flux2.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * up * Remove maybe_allow_in_graph decorators for Flux 2 transformer blocks (#12) * up * support ostris loras. (#13) * up * update schdule * up * up (#17) * add training scripts (#16) * add training scripts Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com> * model cpu offload in validation. * add flux.2 readme * add img2img and tests * cpu offload in log validation * Apply suggestions from code review * fix * up * fixes * remove i2i training tests for now. --------- Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com> Co-authored-by: linoytsaban <linoy@huggingface.co> * up --------- Co-authored-by: yiyixuxu <yixu310@gmail.com> Co-authored-by: Daniel Gu <dgu8957@gmail.com> Co-authored-by: yiyi@huggingface.co <yiyi@ip-10-53-87-203.ec2.internal> Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: apolinário <joaopaulo.passos@gmail.com> Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal> Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com> Co-authored-by: linoytsaban <linoy@huggingface.co>	2025-11-25 21:49:04 +05:30
Junsong Chen	1afc21855e	SANA-Video Image to Video pipeline `SanaImageToVideoPipeline` support (#12634 ) * move sana-video to a new dir and add `SanaImageToVideoPipeline` with no modify; * fix bug and run text/image-to-vidoe success; * make style; quality; fix-copies; * add sana image-to-video pipeline in markdown; * add test case for sana image-to-video; * make style; * add a init file in sana-video test dir; * Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/sana_video/test_sana_video_i2v.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/sana_video/test_sana_video_i2v.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * minor update; * fix bug and skip fp16 save test; Co-authored-by: Yuyang Zhao <43061147+HeliosZhao@users.noreply.github.com> * Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/pipelines/sana_video/pipeline_sana_video_i2v.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * add copied from for `encode_prompt` * Apply style fixes --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> Co-authored-by: Yuyang Zhao <43061147+HeliosZhao@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-17 00:23:34 -08:00
dg845	d8e4805816	[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) (#12526 ) --------- Co-authored-by: Tolga Cangöz <mtcangoz@gmail.com> Co-authored-by: Tolga Cangöz <46008593+tolgacangoz@users.noreply.github.com>	2025-11-12 16:52:31 -10:00
Yashwant Bezawada	0fd58c7706	fix: correct import path for load_model_dict_into_meta in conversion scripts (#12616 ) The function load_model_dict_into_meta was moved from modeling_utils.py to model_loading_utils.py but the imports in the conversion scripts were not updated, causing ImportError when running these scripts. This fixes the import in 6 conversion scripts: - scripts/convert_sd3_to_diffusers.py - scripts/convert_stable_cascade_lite.py - scripts/convert_stable_cascade.py - scripts/convert_stable_audio.py - scripts/convert_sana_to_diffusers.py - scripts/convert_sana_controlnet_to_diffusers.py Fixes #12606	2025-11-10 14:47:18 +05:30
Junsong Chen	b3e9dfced7	[SANA-Video] Adding 5s pre-trained 480p SANA-Video inference (#12584 ) * 1. add `SanaVideoTransformer3DModel` in transformer_sana_video.py 2. add `SanaVideoPipeline` in pipeline_sana_video.py 3. add all code we need for import `SanaVideoPipeline` * add a sample about how to use sana-video; * code update; * update hf model path; * update code; * sana-video can run now; * 1. add aspect ratio in sana-video-pipeline; 2. add reshape function in sana-video-processor; 3. fix convert pth to safetensor bugs; * default to use `use_resolution_binning`; * make style; * remove unused code; * Update src/diffusers/models/transformers/transformer_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/models/transformers/transformer_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/models/transformers/transformer_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/pipelines/sana/pipeline_sana_video.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/models/transformers/transformer_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/models/transformers/transformer_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/models/transformers/transformer_sana_video.py * Update src/diffusers/pipelines/sana/pipeline_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/models/transformers/transformer_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/pipelines/sana/pipeline_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * support `dispatch_attention_fn` * 1. add sana-video markdown; 2. fix typos; * add two test case for sana-video (need check) * fix text-encoder in test-sana-video; * Update tests/pipelines/sana/test_sana_video.py * Update tests/pipelines/sana/test_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/sana/test_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/sana/test_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/sana/test_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/sana/test_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/pipelines/sana/pipeline_sana_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/video_processor.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * make style make quality make fix-copies * toctree yaml update; * add sana-video-transformer3d markdown; * Apply style fixes --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-05 21:08:47 -08:00
YiYi Xu	a138d71ec1	HunyuanImage21 (#12333 ) * add hunyuanimage2.1 --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-10-23 22:31:12 -10:00
David Bertoin	dd07b19e27	Prx (#12525 ) * rename photon to prx * rename photon into prx * Revert .gitignore to state before commit `b7fb0fe9d6` * rename photon to prx * rename photon into prx * Revert .gitignore to state before commit `b7fb0fe9d6` * make fix-copies	2025-10-21 17:09:22 -07:00
David Bertoin	cefc2cf82d	Add Photon model and pipeline support (#12456 ) * Add Photon model and pipeline support This commit adds support for the Photon image generation model: - PhotonTransformer2DModel: Core transformer architecture - PhotonPipeline: Text-to-image generation pipeline - Attention processor updates for Photon-specific attention mechanism - Conversion script for loading Photon checkpoints - Documentation and tests * just store the T5Gemma encoder * enhance_vae_properties if vae is provided only * remove autocast for text encoder forwad * BF16 example * conditioned CFG * remove enhance vae and use vae.config directly when possible * move PhotonAttnProcessor2_0 in transformer_photon * remove einops dependency and now inherits from AttentionMixin * unify the structure of the forward block * update doc * update doc * fix T5Gemma loading from hub * fix timestep shift * remove lora support from doc * Rename EmbedND for PhotoEmbedND * remove modulation dataclass * put _attn_forward and _ffn_forward logic in PhotonBlock's forward * renam LastLayer for FinalLayer * remove lora related code * rename vae_spatial_compression_ratio for vae_scale_factor * support prompt_embeds in call * move xattention conditionning out computation out of the denoising loop * add negative prompts * Use _import_structure for lazy loading * make quality + style * add pipeline test + corresponding fixes * utility function that determines the default resolution given the VAE * Refactor PhotonAttention to match Flux pattern * built-in RMSNorm * Revert accidental .gitignore change * parameter names match the standard diffusers conventions * renaming and remove unecessary attributes setting * Update docs/source/en/api/pipelines/photon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * quantization example * added doc to toctree * Update docs/source/en/api/pipelines/photon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/api/pipelines/photon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/api/pipelines/photon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * use dispatch_attention_fn for multiple attention backend support * naming changes * make fix copy * Update docs/source/en/api/pipelines/photon.md Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Add PhotonTransformer2DModel to TYPE_CHECKING imports * make fix-copies * Use Tuple instead of tuple Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * restrict the version of transformers Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/photon/test_pipeline_photon.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/photon/test_pipeline_photon.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * change \| for Optional * fix nits. * use typing Dict --------- Co-authored-by: davidb <davidb@worker-10.soperator-worker-svc.soperator.svc.cluster.local> Co-authored-by: David Briand <david@photoroom.com> Co-authored-by: davidb <davidb@worker-8.soperator-worker-svc.soperator.svc.cluster.local> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> Co-authored-by: sayakpaul <spsayakpaul@gmail.com>	2025-10-21 20:55:55 +05:30

1 2 3 4 5 ...

267 Commits