1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00
Commit Graph

49 Commits

Author SHA1 Message Date
Aryan
aace1f412b [core] Hunyuan Video (#10136)
* copy transformer

* copy vae

* copy pipeline

* make fix-copies

* refactor; make original code work with diffusers; test latents for comparison generated with this commit

* move rope into pipeline; remove flash attention; refactor

* begin conversion script

* make style

* refactor attention

* refactor

* refactor final layer

* their mlp -> our feedforward

* make style

* add docs

* refactor layer names

* refactor modulation

* cleanup

* refactor norms

* refactor activations

* refactor single blocks attention

* refactor attention processor

* make style

* cleanup a bit

* refactor double transformer block attention

* update mochi attn proc

* use diffusers attention implementation in all modules; checkpoint for all values matching original

* remove helper functions in vae

* refactor upsample

* refactor causal conv

* refactor resnet

* refactor

* refactor

* refactor

* grad checkpointing

* autoencoder test

* fix scaling factor

* refactor clip

* refactor llama text encoding

* add coauthor

Co-Authored-By: "Gregory D. Hunkins" <greg@ollano.com>

* refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device

Note: The following line diverges from original behaviour. We create the grid on the device, whereas
original implementation creates it on CPU and then moves it to device. This results in numerical
differences in layerwise debugging outputs, but visually it is the same.

* use diffusers timesteps embedding; diff: 0.10205078125

* rename

* convert

* update

* add tests for transformer

* add pipeline tests; text encoder 2 is not optional

* fix attention implementation for torch

* add example

* update docs

* update docs

* apply suggestions from review

* refactor vae

* update

* Apply suggestions from code review

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py

Co-authored-by: hlky <hlky@hlky.ac>

* make fix-copies

* update

---------

Co-authored-by: "Gregory D. Hunkins" <greg@ollano.com>
Co-authored-by: hlky <hlky@hlky.ac>
2024-12-16 13:56:18 +05:30
Junsong Chen
5a196e3d46 [Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. (#9982)
* first add a script for DC-AE;

* DC-AE init

* replace triton with custom implementation

* 1. rename file and remove un-used codes;

* no longer rely on omegaconf and dataclass

* replace custom activation with diffuers activation

* remove dc_ae attention in attention_processor.py

* iinherit from ModelMixin

* inherit from ConfigMixin

* dc-ae reduce to one file

* update downsample and upsample

* clean code

* support DecoderOutput

* remove get_same_padding and val2tuple

* remove autocast and some assert

* update ResBlock

* remove contents within super().__init__

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove opsequential

* update other blocks to support the removal of build_norm

* remove build encoder/decoder project in/out

* remove inheritance of RMSNorm2d from LayerNorm

* remove reset_parameters for RMSNorm2d

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove device and dtype in RMSNorm2d __init__

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove op_list & build_block

* remove build_stage_main

* change file name to autoencoder_dc

* move LiteMLA to attention.py

* align with other vae decode output;

* add DC-AE into init files;

* update

* make quality && make style;

* quick push before dgx disappears again

* update

* make style

* update

* update

* fix

* refactor

* refactor

* refactor

* update

* possibly change to nn.Linear

* refactor

* make fix-copies

* replace vae with ae

* replace get_block_from_block_type to get_block

* replace downsample_block_type from Conv to conv for consistency

* add scaling factors

* incorporate changes for all checkpoints

* make style

* move mla to attention processor file; split qkv conv to linears

* refactor

* add tests

* from original file loader

* add docs

* add standard autoencoder methods

* combine attention processor

* fix tests

* update

* minor fix

* minor fix

* minor fix & in/out shortcut rename

* minor fix

* make style

* fix paper link

* update docs

* update single file loading

* make style

* remove single file loading support; todo for DN6

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add abstract

* 1. add DCAE into diffusers;
2. make style and make quality;

* add DCAE_HF into diffusers;

* bug fixed;

* add SanaPipeline, SanaTransformer2D into diffusers;

* add sanaLinearAttnProcessor2_0;

* first update for SanaTransformer;

* first update for SanaPipeline;

* first success run SanaPipeline;

* model output finally match with original model with the same intput;

* code update;

* code update;

* add a flow dpm-solver scripts

* 🎉[important update]
1. Integrate flow-dpm-sovler into diffusers;
2. finally run successfully on both `FlowMatchEulerDiscreteScheduler` and `FlowDPMSolverMultistepScheduler`;

* 🎉🔧[important update & fix huge bugs!!]
1. add SanaPAGPipeline & several related Sana linear attention operators;
2. `SanaTransformer2DModel` not supports multi-resolution input;
2. fix the multi-scale HW bugs in SanaPipeline and SanaPAGPipeline;
3. fix the flow-dpm-solver set_timestep() init `model_output` and `lower_order_nums` bugs;

* remove prints;

* add convert sana official checkpoint to diffusers format Safetensor.

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/pag/pipeline_pag_sana.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/sana/pipeline_sana.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/sana/pipeline_sana.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update Sana for DC-AE's recent commit;

* make style && make quality

* Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)

* fix progress bar updates in SD 1.5 PAG Img2Img pipeline

---------

Co-authored-by: Vinh H. Pham <phamvinh257@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* make the vae can be None in `__init__` of `SanaPipeline`

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: hlky <hlky@hlky.ac>

* change the ae related code due to the latest update of DCAE branch;

* change the ae related code due to the latest update of DCAE branch;

* 1. change code based on AutoencoderDC;
2. fix the bug of new GLUMBConv;
3. run success;

* update for solving conversation.

* 1. fix bugs and run convert script success;
2. Downloading ckpt from hub automatically;

* make style && make quality;

* 1. remove un-unsed parameters in init;
2. code update;

* remove test file

* refactor; add docs; add tests; update conversion script

* make style

* make fix-copies

* refactor

* udpate pipelines

* pag tests and refactor

* remove sana pag conversion script

* handle weight casting in conversion script

* update conversion script

* add a processor

* 1. add bf16 pth file path;
2. add complex human instruct in pipeline;

* fix fast \tests

* change gemma-2-2b-it ckpt to a non-gated repo;

* fix the pth path bug in conversion script;

* change grad ckpt to original; make style

* fix the complex_human_instruct bug and typo;

* remove dpmsolver flow scheduler

* apply review suggestions

* change the `FlowMatchEulerDiscreteScheduler` to default `DPMSolverMultistepScheduler` with flow matching scheduler.

* fix the tokenizer.padding_side='right' bug;

* update docs

* make fix-copies

* fix imports

* fix docs

* add integration test

* update docs

* update examples

* fix convert_model_output in schedulers

* fix failing tests

---------

Co-authored-by: Junyu Chen <chenjydl2003@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: chenjy2003 <70215701+chenjy2003@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
2024-12-16 02:16:56 +05:30
Aryan
96c376a5ff [core] LTX Video (#10021)
* transformer

* make style & make fix-copies

* transformer

* add transformer tests

* 80% vae

* make style

* make fix-copies

* fix

* undo cogvideox changes

* update

* update

* match vae

* add docs

* t2v pipeline working; scheduler needs to be checked

* docs

* add pipeline test

* update

* update

* make fix-copies

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* copy t2v to i2v pipeline

* update

* apply review suggestions

* update

* make style

* remove framewise encoding/decoding

* pack/unpack latents

* image2video

* update

* make fix-copies

* update

* update

* rope scale fix

* debug layerwise code

* remove debug

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* propagate precision changes to i2v pipeline

* remove downcast

* address review comments

* fix comment

* address review comments

* [Single File] LTX support for loading original weights (#10135)

* from original file mixin for ltx

* undo config mapping fn changes

* update

* add single file to pipelines

* update docs

* Update src/diffusers/models/autoencoders/autoencoder_kl_ltx.py

* Update src/diffusers/models/autoencoders/autoencoder_kl_ltx.py

* rename classes based on ltx review

* point to original repository for inference

* make style

* resolve conflicts correctly

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-12-12 16:21:28 +05:30
hlky
914a585be8 Add ControlNetUnion (#10131)
* ControlNetUnion model
2024-12-11 07:07:50 -10:00
Dhruv Nair
ad40e26515 [Single File] Add single file support for AutoencoderDC (#10183)
* update

* update

* update
2024-12-11 16:57:36 +05:30
Junsong Chen
cd892041e2 [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); (#9708)
* first add a script for DC-AE;

* DC-AE init

* replace triton with custom implementation

* 1. rename file and remove un-used codes;

* no longer rely on omegaconf and dataclass

* replace custom activation with diffuers activation

* remove dc_ae attention in attention_processor.py

* iinherit from ModelMixin

* inherit from ConfigMixin

* dc-ae reduce to one file

* update downsample and upsample

* clean code

* support DecoderOutput

* remove get_same_padding and val2tuple

* remove autocast and some assert

* update ResBlock

* remove contents within super().__init__

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove opsequential

* update other blocks to support the removal of build_norm

* remove build encoder/decoder project in/out

* remove inheritance of RMSNorm2d from LayerNorm

* remove reset_parameters for RMSNorm2d

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove device and dtype in RMSNorm2d __init__

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove op_list & build_block

* remove build_stage_main

* change file name to autoencoder_dc

* move LiteMLA to attention.py

* align with other vae decode output;

* add DC-AE into init files;

* update

* make quality && make style;

* quick push before dgx disappears again

* update

* make style

* update

* update

* fix

* refactor

* refactor

* refactor

* update

* possibly change to nn.Linear

* refactor

* make fix-copies

* replace vae with ae

* replace get_block_from_block_type to get_block

* replace downsample_block_type from Conv to conv for consistency

* add scaling factors

* incorporate changes for all checkpoints

* make style

* move mla to attention processor file; split qkv conv to linears

* refactor

* add tests

* from original file loader

* add docs

* add standard autoencoder methods

* combine attention processor

* fix tests

* update

* minor fix

* minor fix

* minor fix & in/out shortcut rename

* minor fix

* make style

* fix paper link

* update docs

* update single file loading

* make style

* remove single file loading support; todo for DN6

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add abstract

---------

Co-authored-by: Junyu Chen <chenjydl2003@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: chenjy2003 <70215701+chenjy2003@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-12-07 01:01:51 +05:30
Sayak Paul
ded3db164b [Core] introduce controlnet module (#8768)
* move vae flax module.

* controlnet module.

* prepare for PR.

* revert a commit

* gracefully deprecate controlnet deps.

* fix

* fix doc path

* fix-copies

* fix path

* style

* style

* conflicts

* fix

* fix-copies

* sparsectrl.

* updates

* fix

* updates

* updates

* updates

* fix

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-11-06 22:08:55 -04:00
Aryan
3f329a426a [core] Mochi T2V (#9769)
* update

* udpate

* update transformer

* make style

* fix

* add conversion script

* update

* fix

* update

* fix

* update

* fixes

* make style

* update

* update

* update

* init

* update

* update

* add

* up

* up

* up

* update

* mochi transformer

* remove original implementation

* make style

* update inits

* update conversion script

* docs

* Update src/diffusers/pipelines/mochi/pipeline_mochi.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update src/diffusers/pipelines/mochi/pipeline_mochi.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* fix docs

* pipeline fixes

* make style

* invert sigmas in scheduler; fix pipeline

* fix pipeline num_frames

* flip proj and gate in swiglu

* make style

* fix

* make style

* fix tests

* latent mean and std fix

* update

* cherry-pick 1069d210e1

* remove additional sigma already handled by flow match scheduler

* fix

* remove hardcoded value

* replace conv1x1 with linear

* Update src/diffusers/pipelines/mochi/pipeline_mochi.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* framewise decoding and conv_cache

* make style

* Apply suggestions from code review

* mochi vae encoder changes

* rebase correctly

* Update scripts/convert_mochi_to_diffusers.py

* fix tests

* fixes

* make style

* update

* make style

* update

* add framewise and tiled encoding

* make style

* make original vae implementation behaviour the default; note: framewise encoding does not work

* remove framewise encoding implementation due to presence of attn layers

* fight test 1

* fight test 2

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-11-05 20:33:41 +05:30
Aryan
0d1d267b12 [core] Allegro T2V (#9736)
* update

* refactor transformer part 1

* refactor part 2

* refactor part 3

* make style

* refactor part 4; modeling tests

* make style

* refactor part 5

* refactor part 6

* gradient checkpointing

* pipeline tests (broken atm)

* update

* add coauthor

Co-Authored-By: Huan Yang <hyang@fastmail.com>

* refactor part 7

* add docs

* make style

* add coauthor

Co-Authored-By: YiYi Xu <yixu310@gmail.com>

* make fix-copies

* undo unrelated change

* revert changes to embeddings, normalization, transformer

* refactor part 8

* make style

* refactor part 9

* make style

* fix

* apply suggestions from review

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update example

* remove attention mask for self-attention

* update

* copied from

* update

* update

---------

Co-authored-by: Huan Yang <hyang@fastmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-29 13:14:36 +05:30
Yuxuan.Zhang
8d81564b27 CogView3Plus DiT (#9570)
* merge 9588

* max_shard_size="5GB" for colab running

* conversion script updates; modeling test; refactor transformer

* make fix-copies

* Update convert_cogview3_to_diffusers.py

* initial pipeline draft

* make style

* fight bugs 🐛🪳

* add example

* add tests; refactor

* make style

* make fix-copies

* add co-author

YiYi Xu <yixu310@gmail.com>

* remove files

* add docs

* add co-author

Co-Authored-By: YiYi Xu <yixu310@gmail.com>

* fight docs

* address reviews

* make style

* make model work

* remove qkv fusion

* remove qkv fusion tets

* address review comments

* fix make fix-copies error

* remove None and TODO

* for FP16(draft)

* make style

* remove dynamic cfg

* remove pooled_projection_dim as a parameter

* fix tests

---------

Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-14 19:30:36 +05:30
suzukimain
b52119ae92 [docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8 (#9428)
* [docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8

Updated documentation as runwayml/stable-diffusion-v1-5 has been removed from Huggingface.

* Update docs/source/en/using-diffusers/inpaint.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Replace with stable-diffusion-v1-5/stable-diffusion-v1-5

* Update inpaint.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-09-16 10:18:45 -07:00
王奇勋
c1e6a32ae4 [Flux] Support Union ControlNet (#9175)
* refactor
---------

Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
2024-08-25 00:24:21 -10:00
zR
2dad462d9b Add CogVideoX text-to-video generation model (#9082)
* add CogVideoX

---------

Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-08-06 21:23:57 -10:00
Sayak Paul
5934873b8f [Docs] add stable cascade unet doc. (#9066)
* add stable cascade unet doc.

* fix path
2024-08-05 21:28:48 +05:30
Sayak Paul
27637a5402 Flux pipeline (#9043)
add flux!

Signed-off-by: Adrien <adrien@huggingface.co>
Co-authored-by: Adrien <adrien.69740@gmail.com>
Co-authored-by: Anatoly Belikov <abelikov@singularitynet.io>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-08-01 11:30:52 -10:00
Yoach Lacombe
69e72b1dd1 Stable Audio integration (#8716)
* WIP modeling code and pipeline

* add custom attention processor + custom activation + add to init

* correct ProjectionModel forward

* add stable audio to __initèè

* add autoencoder and update pipeline and modeling code

* add half Rope

* add partial rotary v2

* add temporary modfis to scheduler

* add EDM DPM Solver

* remove TODOs

* clean GLU

* remove att.group_norm to attn processor

* revert back src/diffusers/schedulers/scheduling_dpmsolver_multistep.py

* refactor GLU -> SwiGLU

* remove redundant args

* add channel multiples in autoencoder docstrings

* changes in docsrtings and copyright headers

* clean pipeline

* further cleaning

* remove peft and lora and fromoriginalmodel

* Delete src/diffusers/pipelines/stable_audio/diffusers.code-workspace

* make style

* dummy models

* fix copied from

* add fast oobleck tests

* add brownian tree

* oobleck autoencoder slow tests

* remove TODO

* fast stable audio pipeline tests

* add slow tests

* make style

* add first version of docs

* wrap is_torchsde_available to the scheduler

* fix slow test

* test with input waveform

* add input waveform

* remove some todos

* create stableaudio gaussian projection + make style

* add pipeline to toctree

* fix copied from

* make quality

* refactor timestep_features->time_proj

* refactor joint_attention_kwargs->cross_attention_kwargs

* remove forward_chunk

* move StableAudioDitModel to transformers folder

* correct convert + remove partial rotary embed

* apply suggestions from yiyixuxu -> removing attn.kv_heads

* remove temb

* remove cross_attention_kwargs

* further removal of cross_attention_kwargs

* remove text encoder autocast to fp16

* continue removing autocast

* make style

* refactor how text and audio are embedded

* add paper

* update example code

* make style

* unify projection model forward + fix device placement

* make style

* remove fuse qkv

* apply suggestions from review

* Update src/diffusers/pipelines/stable_audio/pipeline_stable_audio.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* make style

* smaller models in fast tests

* pass sequential offloading fast tests

* add docs for vae and autoencoder

* make style and update example

* remove useless import

* add cosine scheduler

* dummy classes

* cosine scheduler docs

* better description of scheduler

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-30 15:29:06 +05:30
Aryan
5c53ca5ed8 [core] AnimateDiff SparseCtrl (#8897)
* initial sparse control model draft

* remove unnecessary implementation

* copy animatediff pipeline

* remove deprecated callbacks

* update

* update pipeline implementation progress

* make style

* make fix-copies

* update progress

* add partially working pipeline

* remove debug prints

* add model docs

* dummy objects

* improve motion lora conversion script

* fix bugs

* update docstrings

* remove unnecessary model params; docs

* address review comment

* add copied from to zero_module

* copy animatediff test

* add fast tests

* update docs

* update

* update pipeline docs

* fix expected slice values

* fix license

* remove get_down_block usage

* remove temporal_double_self_attention from get_down_block

* update

* update docs with org and documentation images

* make from_unet work in sparsecontrolnetmodel

* add latest freeinit test from #8969

* make fix-copies

* LoraLoaderMixin -> StableDiffsuionLoraLoaderMixin
2024-07-26 17:46:05 +05:30
Sayak Paul
973a62d408 [Docs] add AuraFlow docs (#8851)
* add pipeline documentation.

* add api spec for pipeline

* model documentation

* model spec
2024-07-12 09:52:18 +02:00
Xin Ma
b8cf84a3f9 Latte: Latent Diffusion Transformer for Video Generation (#8404)
* add Latte to diffusers

* remove print

* remove print

* remove print

* remove unuse codes

* remove layer_norm_latte and add a flag

* remove layer_norm_latte and add a flag

* update latte_pipeline

* update latte_pipeline

* remove unuse squeeze

* add norm_hidden_states.ndim == 2: # for Latte

* fixed test latte pipeline bugs

* fixed test latte pipeline bugs

* delete sh

* add doc for latte

* add licensing

* Move Transformer3DModelOutput to modeling_outputs

* give a default value to sample_size

* remove the einops dependency

* change norm2 for latte

* modify pipeline of latte

* update test for Latte

* modify some codes for latte

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* video_length -> num_frames; update prepare_latents copied from

* make fix-copies

* make style

* typo: videe -> video

* update

* modify for Latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify for Latte pipeline

* Delete .vscode directory

* make style

* make fix-copies

* add latte transformer 3d to docs _toctree.yml

* update example

* reduce frames for test

* fixed bug of _text_preprocessing

* set num frame to 1 for testing

* remove unuse print

* add text = self._clean_caption(text) again

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-07-11 15:06:22 +05:30
PommesPeter
98388670d2 [Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652)
---------

Co-authored-by: zhuole1025 <zhuole1025@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-07 17:12:09 -10:00
Dhruv Nair
0368483b61 Remove legacy single file model loading mixins (#8754)
update
2024-07-01 07:20:19 -10:00
Sayak Paul
10b4e354b6 [Chore] remove deprecation from transformer2d regarding the output class. (#8698)
* remove deprecation from transformer2d regarding the output class.

* up

* deprecate more
2024-06-26 07:35:36 -10:00
XCL
fa2abfdb03 [Tencent Hunyuan Team] Add Hunyuan-DiT ControlNet Inference (#8694)
* add controlnet support

---------

Co-authored-by: xingchaoliu <xingchaoliu@tencent.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-26 00:43:03 -10:00
Tolga Cangöz
468ae09ed8 Errata - Trim trailing white space in the whole repo (#8575)
* Trim all the trailing white space in the whole repo

* Remove unnecessary empty places

* make style && make quality

* Trim trailing white space

* trim

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 18:39:15 +05:30
王奇勋
e5564d45bf Support SD3 ControlNet and Multi-ControlNet. (#8566)
* sd3 controlnet



---------

Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
2024-06-18 14:59:22 -10:00
Dhruv Nair
04717fd861 Add Stable Diffusion 3 (#8483)
* up

* add sd3

* update

* update

* add tests

* fix copies

* fix docs

* update

* add dreambooth lora

* add LoRA

* update

* update

* update

* update

* import fix

* update

* Update src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* import fix 2

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* update

* update

* update

* fix ckpt id

* fix more ids

* update

* missing doc

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update'

* fix

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* note on gated access.

* requirements

* licensing

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-12 20:44:00 +01:00
Sayak Paul
3ff39e8e86 [HunyuanDiT] minor docs changes in hunyuandit (#8395)
minor docs changes in hunyuandit
2024-06-04 12:18:53 +04:00
Marçal Comajoan Cara
dc89434bdc Update transformer2d.md title (#8375)
* Update transformer2d.md title

For the other classes (e.g., UNet2DModel) the title of the documentation coincides with the name of the class, but that was not the case for Transformer2DModel.

* Update model docs titles for consistency with class names
2024-06-03 17:01:21 -07:00
XCL
174cf868ea Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383)
* add hunyuandit doc

* update hunyuandit doc

* update hunyuandit 2d model

* update toctree.yml for hunyuandit
2024-06-03 14:02:46 +04:00
Sayak Paul
983dec3bf7 [Core] Introduce class variants for Transformer2DModel (#7647)
* init for patches

* finish patched model.

* continuous transformer

* vectorized transformer2d.

* style.

* inits.

* fix-copies.

* introduce DiTTransformer2DModel.

* fixes

* use REMAPPING as suggested by @DN6

* better logging.

* add pixart transformer model.

* inits.

* caption_channels.

* attention masking.

* fix use_additional_conditions.

* remove print.

* debug

* flatten

* fix: assertion for sigma

* handle remapping for modeling_utils

* add tests for dit transformer2d

* quality

* placeholder for pixart tests

* pixart tests

* add _no_split_modules

* add docs.

* check

* check

* check

* check

* fix tests

* fix tests

* move Transformer output to modeling_output

* move errors better and bring back use_additional_conditions attribute.

* add unnecessary things from DiT.

* clean up pixart

* fix remapping

* fix device_map things in pixart2d.

* replace Transformer2DModel with appropriate classes in dit, pixart tests

* empty

* legacy mixin classes./

* use a remapping dict for fetching class names.

* change to specifc model types in the pipeline implementations.

* move _fetch_remapped_cls_from_config to modeling_loading_utils.py

* fix dependency problems.

* add deprecation note.
2024-05-31 13:40:27 +05:30
Sayak Paul
5edd0b34fa move vqmodel to models.autoencoders. (#8292)
move vqmodel to models.autoencoders.
2024-05-29 06:30:35 +05:30
M. Tolga Cangöz
f4fc75035f [Docs] Fix typos (#7131)
* Add copyright notice to relevant files and fix typos

* Set `timestep_spacing` parameter of `StableDiffusionXLPipeline`'s scheduler to `'trailing'`.

* Update `StableDiffusionXLPipeline.from_single_file` by including EulerAncestralDiscreteScheduler with `timestep_spacing="trailing"` param.

* Update model loading method in SDXL Turbo documentation
2024-02-29 13:03:01 -08:00
Sayak Paul
30e5e81d58 change to 2024 in the license (#6902)
change to 2024
2024-02-08 08:19:31 -10:00
Sayak Paul
09b7bfce91 [Core] move transformer scripts to transformers modules (#6747)
* move transformer scripts to transformers modules

* move transformer model test

* move prior transformer test to  directory

* fix doc path

* correct doc path

* add: __init__.py
2024-01-29 22:28:28 +05:30
Steven Liu
87bfbc320d [docs] UViT2D (#6643)
* uvit2d

* fix

* fix?

* add correct paper

* fix paths

* update abstract
2024-01-25 09:37:28 -08:00
Sayak Paul
1f0705adcf [Big refactor] move unets to unets module 🦋 (#6630)
* move unets to  module 🦋

* parameterize unet-level import.

* fix flax unet2dcondition model import

* models __init__

* mildly depcrecating models.unet_2d_blocks in favor of models.unets.unet_2d_blocks.

* noqa

* correct depcrecation behaviour

* inherit from the actual classes.

* Empty-Commit

* backwards compatibility for unet_2d.py

* backward compatibility for unet_2d_condition

* bc for unet_1d

* bc for unet_1d_blocks
2024-01-23 08:57:58 +05:30
Steven Liu
5ca062e011 [docs] Fix missing API function (#6604)
fix?
2024-01-17 13:59:09 -08:00
Sayak Paul
56b3b21693 [Refactor autoencoders] feat: introduce autoencoders module (#6129)
* feat: introduce autoencoders module

* more changes for styling and copy fixing

* path changes in the docs.

* fix: import structure in init.

* fix controlnetxs import
2023-12-18 12:42:15 +05:30
M. Tolga Cangöz
a359ff7644 [Docs] Fix typos and update files at API's Main Classes, Models, and Schedulers pages (#5720)
* Fix typos, update, add Copyright info, and trim trailing whitespaces

* Update docs/source/en/api/loaders.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/models/autoencoder_tiny.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/models/autoencoder_tiny.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-13 14:32:59 -08:00
Will Berman
2fd46405cd consistency decoder (#5694)
* consistency decoder

* rename

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/pipelines/consistency_models/pipeline_consistency_models.py

* uP

* Apply suggestions from code review

* uP

* uP

* uP

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-11-09 12:21:41 +01:00
Dhruv Nair
2a8cf8e39f Animatediff Proposal (#5413)
* draft design

* clean up

* clean up

* clean up

* clean up

* clean up

* clean  up

* clean up

* clean up

* clean up

* update pipeline

* clean up

* clean up

* clean up

* add tests

* change motion block

* clean up

* clean up

* clean up

* update

* update

* update

* update

* update

* update

* update

* update

* clean up

* update

* update

* update model test

* update

* update

* update

* update

* make style

* update

* fix embeddings

* update

* merge upstream

* max fix copies

* fix bug

* fix mistake

* add docs

* update

* clean up

* update

* clean up

* clean up

* fix docstrings

* fix docstrings

* update

* update

* clean  up

* update
2023-11-02 15:04:03 +01:00
Chengxi Guo
dcbfe662ef fix typo (#5505)
Signed-off-by: mymusise <mymusise1@gmail.com>
2023-10-24 17:14:05 -07:00
Steven Liu
4ff7264d9b [docs] PushToHubMixin (#4622)
* push to hub docs

* fix typo

* feedback

* make style
2023-08-16 13:20:59 -06:00
Sayak Paul
15782fd506 [Pipeline utils] feat: implement push_to_hub for standalone models, schedulers as well as pipelines (#4128)
* feat: implement push_to_hub for standalone models.

* address PR feedback.

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* remove max_shard_size.

* add: support for scheduler push_to_hub

* enable push_to_hub support for flax schedulers.

* enable push_to_hub for pipelines.

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* reflect pr feedback.

* address another round of deedback.

* better handling of kwargs.

* add: tests

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* setting hub staging to False for now.

* incorporate staging test as a separate job.

Co-authored-by: ydshieh <2521628+ydshieh@users.noreply.github.com>

* fix: tokenizer loading.

* fix: json dumping.

* move is_staging_test to a better location.

* better treatment to tokens.

* define repo_id to better handle concurrency

* style

* explicitly set token

* Empty-Commit

* move SUER, TOKEN to test

* collate org_repo_id

* delete repo

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: ydshieh <2521628+ydshieh@users.noreply.github.com>
2023-08-15 07:39:22 +05:30
Sayak Paul
18fc40c169 [Feat] add tiny Autoencoder for (almost) instant decoding (#4384)
* add: model implementation of tiny autoencoder.

* add: inits.

* push the latest devs.

* add: conversion script and finish.

* add: scaling factor args.

* debugging

* fix denormalization.

* fix: positional argument.

* handle use_torch_2_0_or_xformers.

* handle post_quant_conv

* handle dtype

* fix: sdxl image processor for tiny ae.

* fix: sdxl image processor for tiny ae.

* unify upcasting logic.

* copied from madness.

* remove trailing whitespace.

* set is_tiny_vae = False

* address PR comments.

* change to AutoencoderTiny

* make act_fn an str throughout

* fix: apply_forward_hook decorator call

* get rid of the special is_tiny_vae flag.

* directly scale the output.

* fix dummies?

* fix: act_fn.

* get rid of the Clamp() layer.

* bring back copied from.

* movement of the blocks to appropriate modules.

* add: docstrings to AutoencoderTiny

* add: documentation.

* changes to the conversion script.

* add doc entry.

* settle tests.

* style

* add one slow test.

* fix

* fix 2

* fix 2

* fix: 4

* fix: 5

* finish integration tests

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-08-02 23:58:05 +05:30
camenduru
c6ae9b7df6 Where did this 'x' come from, Elon? (#4277)
* why mdx?

* why mdx?

* why mdx?

* no x for kandinksy either

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-07-26 18:18:14 +02:00
Ruslan Vorovchenko
07f1fbb18e Asymmetric vqgan (#3956)
* added AsymmetricAutoencoderKL

* fixed copies+dummy

* added script to convert original asymmetric vqgan

* added docs

* updated docs

* fixed style

* fixes, added tests

* update doc

* fixed doc

* fixed tests

* naming

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* naming

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* udpated code example

* updated doc

* comments fixes

* added docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* comments fixes

* added inpaint pipeline tests

* comment suggestion: delete method

* yet another fixes

---------

Co-authored-by: Ruslan Vorovchenko <r.vorovchenko@prequelapp.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-07-20 17:51:06 +02:00
Patrick von Platen
6b1abba18d Add controlnet and vae from single file (#4084)
* Add controlnet from single file

* Updates

* make style

* finish

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-07-19 14:50:27 +02:00
Steven Liu
174dcd697f [docs] Model API (#3562)
* add modelmixin and unets

* remove old model page

* minor fixes

* fix unet2dcondition

* add vqmodel and autoencoderkl

* add rest of models

* fix autoencoderkl path

* fix toctree

* fix toctree again

* apply feedback

* apply feedback

* fix copies

* fix controlnet copy

* fix copies
2023-06-29 17:24:39 -07:00