* Initial LTX 2.0 transformer implementation
* Add tests for LTX 2 transformer model
* Get LTX 2 transformer tests working
* Rename LTX 2 compile test class to have LTX2
* Remove RoPE debug print statements
* Get LTX 2 transformer compile tests passing
* Fix LTX 2 transformer shape errors
* Initial script to convert LTX 2 transformer to diffusers
* Add more LTX 2 transformer audio arguments
* Allow LTX 2 transformer to be loaded from local path for conversion
* Improve dummy inputs and add test for LTX 2 transformer consistency
* Fix LTX 2 transformer bugs so consistency test passes
* Initial implementation of LTX 2.0 video VAE
* Explicitly specify temporal and spatial VAE scale factors when converting
* Add initial LTX 2.0 video VAE tests
* Add initial LTX 2.0 video VAE tests (part 2)
* Get diffusers implementation on par with official LTX 2.0 video VAE implementation
* Initial LTX 2.0 vocoder implementation
* Use RMSNorm implementation closer to original for LTX 2.0 video VAE
* start audio decoder.
* init registration.
* up
* simplify and clean up
* up
* Initial LTX 2.0 text encoder implementation
* Rough initial LTX 2.0 pipeline implementation
* up
* up
* up
* up
* Add imports for LTX 2.0 Audio VAE
* Conversion script for LTX 2.0 Audio VAE Decoder
* Add Audio VAE logic to T2V pipeline
* Duplicate scheduler for audio latents
* Support num_videos_per_prompt for prompt embeddings
* LTX 2.0 scheduler and full pipeline conversion
* Add script to test full LTX2Pipeline T2V inference
* Fix pipeline return bugs
* Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__
* Fix more bugs in LTX2Pipeline.__call__
* Improve CPU offload support
* Fix pipeline audio VAE decoding dtype bug
* Fix video shape error in full pipeline test script
* Get LTX 2 T2V pipeline to produce reasonable outputs
* Make LTX 2.0 scheduler more consistent with original code
* Fix typo when applying scheduler fix in T2V inference script
* Refactor Audio VAE to be simpler and remove helpers (#7)
* remove resolve causality axes stuff.
* remove a bunch of helpers.
* remove adjust output shape helper.
* remove the use of audiolatentshape.
* move normalization and patchify out of pipeline.
* fix
* up
* up
* Remove unpatchify and patchify ops before audio latents denormalization (#9)
---------
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Add support for I2V (#8)
* start i2v.
* up
* up
* up
* up
* up
* remove uniform strategy code.
* remove unneeded code.
* Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11)
* test i2v.
* Move Video and Audio Text Encoder Connectors to Transformer (#12)
* Denormalize audio latents in I2V pipeline (analogous to T2V change)
* Initial refactor to put video and audio text encoder connectors in transformer
* Get LTX 2 transformer tests working after connector refactor
* precompute run_connectors,.
* fixes
* Address review comments
* Calculate RoPE double precisions freqs using torch instead of np
* Further simplify LTX 2 RoPE freq calc
* Make connectors a separate module (#18)
* remove text_encoder.py
* address yiyi's comments.
* up
* up
* up
* up
---------
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
* up (#19)
* address initial feedback from lightricks team (#16)
* cross_attn_timestep_scale_multiplier to 1000
* implement split rope type.
* up
* propagate rope_type to rope embed classes as well.
* up
* When using split RoPE, make sure that the output dtype is same as input dtype
* Fix apply split RoPE shape error when reshaping x to 4D
* Add export_utils file for exporting LTX 2.0 videos with audio
* Tests for T2V and I2V (#6)
* add ltx2 pipeline tests.
* up
* up
* up
* up
* remove content
* style
* Denormalize audio latents in I2V pipeline (analogous to T2V change)
* Initial refactor to put video and audio text encoder connectors in transformer
* Get LTX 2 transformer tests working after connector refactor
* up
* up
* i2v tests.
* up
* Address review comments
* Calculate RoPE double precisions freqs using torch instead of np
* Further simplify LTX 2 RoPE freq calc
* revert unneded changes.
* up
* up
* update to split style rope.
* up
---------
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
* up
* use export util funcs.
* Point original checkpoint to LTX 2.0 official checkpoint
* Allow the I2V pipeline to accept image URLs
* make style and make quality
* remove function map.
* remove args.
* update docs.
* update doc entries.
* disable ltx2_consistency test
* Simplify LTX 2 RoPE forward by removing coords is None logic
* make style and make quality
* Support LTX 2.0 audio VAE encoder
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Remove print statement in audio VAE
* up
* Fix bug when calculating audio RoPE coords
* Ltx 2 latent upsample pipeline (#12922)
* Initial implementation of LTX 2.0 latent upsampling pipeline
* Add new LTX 2.0 spatial latent upsampler logic
* Add test script for LTX 2.0 latent upsampling
* Add option to enable VAE tiling in upsampling test script
* Get latent upsampler working with video latents
* Fix typo in BlurDownsample
* Add latent upsample pipeline docstring and example
* Remove deprecated pipeline VAE slicing/tiling methods
* make style and make quality
* When returning latents, return unpacked and denormalized latents for T2V and I2V
* Add model_cpu_offload_seq for latent upsampling pipeline
---------
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
* Fix latent upsampler filename in LTX 2 conversion script
* Add latent upsample pipeline to LTX 2 docs
* Add dummy objects for LTX 2 latent upsample pipeline
* Set default FPS to official LTX 2 ckpt default of 24.0
* Set default CFG scale to official LTX 2 ckpt default of 4.0
* Update LTX 2 pipeline example docstrings
* make style and make quality
* Remove LTX 2 test scripts
* Fix LTX 2 upsample pipeline example docstring
* Add logic to convert and save a LTX 2 upsampling pipeline
* Document LTX2VideoTransformer3DModel forward pass
---------
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
* begin transformer conversion
* refactor
* refactor
* refactor
* refactor
* refactor
* refactor
* update
* add conversion script
* add pipeline
* make fix-copies
* remove einops
* update docs
* gradient checkpointing
* add transformer test
* update
* debug
* remove prints
* match sigmas
* add vae pt. 1
* finish CV* vae
* update
* update
* update
* update
* update
* update
* make fix-copies
* update
* make fix-copies
* fix
* update
* update
* make fix-copies
* update
* update tests
* handle device and dtype for safety checker; required in latest diffusers
* remove enable_gqa and use repeat_interleave instead
* enforce safety checker; use dummy checker in fast tests
* add review suggestion for ONNX export
Co-Authored-By: Asfiya Baig <asfiyab@nvidia.com>
* fix safety_checker issues when not passed explicitly
We could either do what's done in this commit, or update the Cosmos examples to explicitly pass the safety checker
* use cosmos guardrail package
* auto format docs
* update conversion script to support 14B models
* update name CosmosPipeline -> CosmosTextToWorldPipeline
* update docs
* fix docs
* fix group offload test failing for vae
---------
Co-authored-by: Asfiya Baig <asfiyab@nvidia.com>
* update
* update
* make style
* remove dynamo disable
* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
* update
* update
* update
* update mixin
* add some basic tests
* update
* update
* non_blocking
* improvements
* update
* norm.* -> norm
* apply suggestions from review
* add example
* update hook implementation to the latest changes from pyramid attention broadcast
* deinitialize should raise an error
* update doc page
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update docs
* update
* refactor
* fix _always_upcast_modules for asym ae and vq_model
* fix lumina embedding forward to not depend on weight dtype
* refactor tests
* add simple lora inference tests
* _always_upcast_modules -> _precision_sensitive_module_patterns
* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case
* check layer dtypes in lora test
* fix UNet1DModelTests::test_layerwise_upcasting_inference
* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback
* skip test in NCSNppModelTests
* skip tests for AutoencoderTinyTests
* skip tests for AutoencoderOobleckTests
* skip tests for UNet1DModelTests - unsupported pytorch operations
* layerwise_upcasting -> layerwise_casting
* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support
* add layerwise fp8 pipeline test
* use xfail
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)
* add note about memory consumption on tesla CI runner for failing test
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* copy transformer
* copy vae
* copy pipeline
* make fix-copies
* refactor; make original code work with diffusers; test latents for comparison generated with this commit
* move rope into pipeline; remove flash attention; refactor
* begin conversion script
* make style
* refactor attention
* refactor
* refactor final layer
* their mlp -> our feedforward
* make style
* add docs
* refactor layer names
* refactor modulation
* cleanup
* refactor norms
* refactor activations
* refactor single blocks attention
* refactor attention processor
* make style
* cleanup a bit
* refactor double transformer block attention
* update mochi attn proc
* use diffusers attention implementation in all modules; checkpoint for all values matching original
* remove helper functions in vae
* refactor upsample
* refactor causal conv
* refactor resnet
* refactor
* refactor
* refactor
* grad checkpointing
* autoencoder test
* fix scaling factor
* refactor clip
* refactor llama text encoding
* add coauthor
Co-Authored-By: "Gregory D. Hunkins" <greg@ollano.com>
* refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device
Note: The following line diverges from original behaviour. We create the grid on the device, whereas
original implementation creates it on CPU and then moves it to device. This results in numerical
differences in layerwise debugging outputs, but visually it is the same.
* use diffusers timesteps embedding; diff: 0.10205078125
* rename
* convert
* update
* add tests for transformer
* add pipeline tests; text encoder 2 is not optional
* fix attention implementation for torch
* add example
* update docs
* update docs
* apply suggestions from review
* refactor vae
* update
* Apply suggestions from code review
Co-authored-by: hlky <hlky@hlky.ac>
* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
Co-authored-by: hlky <hlky@hlky.ac>
* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
Co-authored-by: hlky <hlky@hlky.ac>
* make fix-copies
* update
---------
Co-authored-by: "Gregory D. Hunkins" <greg@ollano.com>
Co-authored-by: hlky <hlky@hlky.ac>
* reduce block sizes for unet1d.
* reduce blocks for unet_2d.
* reduce block size for unet_motion
* increase channels.
* correctly increase channels.
* reduce number of layers in unet2dconditionmodel tests.
* reduce block sizes for unet2dconditionmodel tests
* reduce block sizes for unet3dconditionmodel.
* fix: test_feed_forward_chunking
* fix: test_forward_with_norm_groups
* skip spatiotemporal tests on MPS.
* reduce block size in AutoencoderKL.
* reduce block sizes for vqmodel.
* further reduce block size.
* make style.
* Empty-Commit
* reduce sizes for ConsistencyDecoderVAETests
* further reduction.
* further block reductions in AutoencoderKL and AssymetricAutoencoderKL.
* massively reduce the block size in unet2dcontionmodel.
* reduce sizes for unet3d
* fix tests in unet3d.
* reduce blocks further in motion unet.
* fix: output shape
* add attention_head_dim to the test configuration.
* remove unexpected keyword arg
* up a bit.
* groups.
* up again
* fix