diffusers

mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Author	SHA1	Message	Date
Andreas Jörg	8ead643bb7	[examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch (#11051 ) Fix: dtype mismatch of prompt embeddings in sd3 controlnet training Co-authored-by: Andreas Jörg <andreasjoerg@MacBook-Pro-von-Andreas-2.fritz.box> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-03-14 17:33:15 +05:30
Yaniv Galron	5e48cd27d4	making ```formatted_images``` initialization compact (#10801 ) compact writing Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>	2025-03-13 09:27:14 -10:00
wonderfan	36d0553af2	chore: fix help messages in advanced diffusion examples (#10923 )	2025-03-11 07:33:55 -10:00
Eliseu Silva	4e3ddd5afa	fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012 ) small fix on generating time_ids & embeddings	2025-03-11 04:20:18 -03:00
Tolga Cangöz	b88fef4785	[`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing (#8998 ) * Add initial template * Second template * feat: Add TextEmbeddingModule to AnyTextPipeline * feat: Add AuxiliaryLatentModule template to AnyTextPipeline * Add bert tokenizer from the anytext repo for now * feat: Update AnyTextPipeline's modify_prompt method This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe. * Fill in the `forward` pass of `AuxiliaryLatentModule` * `make style && make quality` * `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library` * Update error handling to raise and logging * Add `create_glyph_lines` function into `TextEmbeddingModule` * make style * Up * Up * Up * Up * Remove several comments * refactor: Remove ControlNetConditioningEmbedding and update code accordingly * Up * Up * up * refactor: Update AnyTextPipeline to include new optional parameters * up * feat: Add OCR model and its components * chore: Update `TextEmbeddingModule` to include OCR model components and dependencies * chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task * `make style` * refactor: Update `AnyTextPipeline`'s docstring * Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once * simplify * `make style` * Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function * Simplify for now * `make style` * Up * feat: Add scripts to convert AnyText controlnet to diffusers * `make style` * Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule` * make style * Up * Simplify * Up * feat: Add safetensors module for loading model file * Fix device issues * Up * Up * refactor: Simplify * refactor: Simplify code for loading models and handling data types * `make style` * refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule * refactor: Update dtype in embedding_manager.py to match proj.weight * Up * Add attribution and adaptation information to pipeline_anytext.py * Update usage example * Will refactor `controlnet_cond_embedding` initialization * Add `AnyTextControlNetConditioningEmbedding` template * Refactor organization * style * style * Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding` * Follow one-file policy * style * [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel * [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py * [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py * Refactor AnyTextControlNet to use configurable conditioning embedding channels * Complete control net conditioning embedding in AnyTextControlNetModel * up * [FIX] Ensure embeddings use correct device in AnyTextControlNetModel * up * up * style * [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline * [UPDATE] Update example code in anytext.py to use correct font file and improve clarity * down * [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing * update pillow * [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity * [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file * [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency * 🆙 * style * [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py * style * Update examples/research_projects/anytext/README.md Co-authored-by: Aryan <contact.aryanvs@gmail.com> * Remove commented-out image preparation code in AnyTextPipeline * Remove unnecessary blank line in README.md	2025-03-11 01:49:37 +05:30
Kinam Kim	b38450d5d2	Add STG to community pipelines (#10960 ) * Support STG for video pipelines * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update pipeline_stg_cogvideox.py * Update pipeline_stg_hunyuan_video.py * Update pipeline_stg_ltx.py * Update pipeline_stg_ltx_image2video.py * Update pipeline_stg_mochi.py * Update pipeline_stg_hunyuan_video.py * Update pipeline_stg_ltx.py * Update pipeline_stg_ltx_image2video.py * Update pipeline_stg_mochi.py * update * remove rescaling * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-03-08 00:28:24 +05:30
LittleNyima	748cb0fab6	Add CogVideoX DDIM Inversion to Community Pipelines (#10956 ) * add cogvideox ddim inversion script * implement as a pipeline, and add documentation --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>	2025-03-06 10:46:38 -10:00
dependabot[bot]	f103993094	Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill (#10984 ) Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.5 to 3.1.6. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.5...3.1.6) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-03-06 11:59:51 +00:00
Jun Yeop Na	37b8edfb86	[train_dreambooth_lora.py] Fix the LR Schedulers when `num_train_epochs` is passed in a distributed training env (#10973 ) * updated train_dreambooth_lora to fix the LR schedulers for `num_train_epochs` in distributed training env * fixed formatting * remove trailing newlines * fixed style error	2025-03-06 10:06:24 +05:30
Linoy Tsaban	e031caf4ea	[flux lora training] fix t5 training bug (#10845 ) * fix t5 training bug * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-03-05 13:47:01 +02:00
Eliseu Silva	66bf7ea5be	feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL (#10951 ) * feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL * make style make quality	2025-03-04 17:17:36 -03:00
Alexey Zolotenkov	b8215b1c06	Fix incorrect seed initialization when args.seed is 0 (#10964 ) * Fix seed initialization to handle args.seed = 0 correctly * Apply style fixes --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-03-04 10:09:52 -10:00
SahilCarterr	170833c22a	[Fix] fp16 unscaling in train_dreambooth_lora_sdxl (#10889 ) Fix fp16 bug Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-02-24 06:49:23 -10:00
hlky	6f74ef550d	Fix `torch_dtype` in Kolors text encoder with `transformers` v4.49 (#10816 ) * Fix `torch_dtype` in Kolors text encoder with `transformers` v4.49 * Default torch_dtype and warning	2025-02-24 13:37:54 +05:30
Parag Ekbote	51941387dc	Notebooks for Community Scripts-7 (#10846 ) Add 5 Notebooks, improve their example scripts and update the missing links for the example README.	2025-02-20 09:02:09 -08:00
Sayak Paul	f10d3c6d04	[LoRA] add LoRA support to Lumina2 and fine-tuning script (#10818 ) * feat: lora support for Lumina2. * fix-copies. * updates * updates * docs. * fix * add: training script. * tests * updates * updates * major updates. * updates * fixes * docs. * updates * updates	2025-02-20 09:41:51 +05:30
puhuk	b75b204a58	Fix max_shift value in flux and related functions to 1.15 (issue #10675 ) (#10807 ) This PR updates the max_shift value in flux to 1.15 for consistency across the codebase. In addition to modifying max_shift in flux, all related functions that copy and use this logic, such as calculate_shift in `src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3_img2img.py`, have also been updated to ensure uniform behavior.	2025-02-18 06:54:56 +00:00
Yaniv Galron	952b9131a2	typo fix (#10802 )	2025-02-16 20:56:54 +05:30
Eliseu Silva	051ebc3c8d	fix: [Community pipeline] Fix flattened elements on image (#10774 ) * feat: new community mixture_tiling_sdxl pipeline for SDXL mixture-of-diffusers support * fix use of variable latents to tile_latents * removed references to modules that are not being used in this pipeline * make style, make quality * fixfeat: added _get_crops_coords_list function to pipeline to automatically define ctop,cleft coord to focus on image generation, helps to better harmonize the image and corrects the problem of flattened elements.	2025-02-12 19:50:41 -03:00
Eliseu Silva	c470274865	feat: new community mixture_tiling_sdxl pipeline for SDXL (#10759 ) * feat: new community mixture_tiling_sdxl pipeline for SDXL mixture-of-diffusers support * fix use of variable latents to tile_latents * removed references to modules that are not being used in this pipeline * make style, make quality	2025-02-11 18:01:42 -03:00
Leo Jiang	cd0a4a82cf	[bugfix] NPU Adaption for Sana (#10724 ) * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * NPU Adaption for Sanna * [bugfix]NPU Adaption for Sanna --------- Co-authored-by: J石页 <jiangshuo9@h-partners.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-02-06 19:29:58 +05:30
suzukimain	145522cbb7	[Community] Enhanced `Model Search` (#10417 ) * Added `auto_load_textual_inversion` and `auto_load_lora_weights` * update README.md * fix * make quality * Fix and `make style`	2025-02-05 14:43:53 -10:00
Parag Ekbote	dbe0094e86	Notebooks for Community Scripts-6 (#10713 ) * Fix Doc Tutorial. * Add 4 Notebooks and improve their example scripts.	2025-02-04 10:12:17 -08:00
Nicolas	f63d32233f	Fix train_text_to_image.py --help (#10711 )	2025-02-04 11:26:23 +05:30
Thanh Le	5d2d23986e	Fix inconsistent random transform in instruct pix2pix (#10698 ) * Update train_instruct_pix2pix.py Fix inconsistent random transform in instruct_pix2pix * Update train_instruct_pix2pix_sdxl.py	2025-01-31 08:29:29 -10:00
Dimitri Barbot	196aef5a6f	Fix pipeline dtype unexpected change when using SDXL reference community pipelines in float16 mode (#10670 ) Fix pipeline dtype unexpected change when using SDXL reference community pipelines	2025-01-28 10:46:41 -03:00
Aryan	c4d4ac21e7	Refactor gradient checkpointing (#10611 ) * update * remove unused fn * apply suggestions based on review * update + cleanup 🧹 * more cleanup 🧹 * make fix-copies * update test	2025-01-28 06:51:46 +05:30
hlky	41571773d9	[training] Convert to ImageFolder script (#10664 ) * [training] Convert to ImageFolder script * make	2025-01-27 09:43:51 -10:00
Marlon May	f7f36c7d3d	Add community pipeline for semantic guidance for FLUX (#10610 ) * add community pipeline for semantic guidance for flux * fix imports in community pipeline for semantic guidance for flux * Update examples/community/pipeline_flux_semantic_guidance.py Co-authored-by: hlky <hlky@hlky.ac> * fix community pipeline for semantic guidance for flux --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com> Co-authored-by: hlky <hlky@hlky.ac>	2025-01-27 16:19:46 +02:00
Yuqian Hong	4fa24591a3	create a script to train autoencoderkl (#10605 ) * create a script to train vae * update main.py * update train_autoencoderkl.py * update train_autoencoderkl.py * add a check of --pretrained_model_name_or_path and --model_config_name_or_path * remove the comment, remove diffusers in requiremnets.txt, add validation_image ote * update autoencoderkl.py * quality --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-01-27 16:41:34 +05:30
Leo Jiang	07860f9916	NPU Adaption for Sanna (#10409 ) * NPU Adaption for Sanna --------- Co-authored-by: J石页 <jiangshuo9@h-partners.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-01-24 09:08:52 -10:00
Wenhao Sun	87252d80c3	Add pipeline_stable_diffusion_xl_attentive_eraser (#10579 ) * add pipeline_stable_diffusion_xl_attentive_eraser * add pipeline_stable_diffusion_xl_attentive_eraser_make_style * make style and add example output * update Docs Co-authored-by: Other Contributor <a457435687@126.com> * add Oral Co-authored-by: Other Contributor <a457435687@126.com> * update_review Co-authored-by: Other Contributor <a457435687@126.com> * update_review_ms Co-authored-by: Other Contributor <a457435687@126.com> --------- Co-authored-by: Other Contributor <a457435687@126.com>	2025-01-24 13:52:45 +00:00
Yaniv Galron	a451c0ed14	removing redundant requires_grad = False (#10628 ) We already set the unet to requires grad false at line 506 Co-authored-by: Aryan <aryan@huggingface.co>	2025-01-24 03:25:33 +05:30
Muyang Li	158a5a87fb	Remove the FP32 Wrapper when evaluating (#10617 ) Remove the FP32 Wrapper Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>	2025-01-21 16:16:54 +05:30
jiqing-feng	012d08b1bc	Enable dreambooth lora finetune example on other devices (#10602 ) * enable dreambooth_lora on other devices Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check cuda device before empty cache Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * import free_memory Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-01-21 14:09:45 +05:30
Sayak Paul	4ace7d0483	[chore] change licensing to 2025 from 2024. (#10615 ) change licensing to 2025 from 2024.	2025-01-20 16:57:27 -10:00
baymax591	75a636da48	bugfix for npu not support float64 (#10123 ) * bugfix for npu not support float64 * is_mps is_npu --------- Co-authored-by: 白超 <baichao19@huawei.com> Co-authored-by: hlky <hlky@hlky.ac>	2025-01-20 09:35:24 -10:00
Sayak Paul	328e0d20a7	[training] set rest of the blocks with `requires_grad` False. (#10607 ) set rest of the blocks with requires_grad False.	2025-01-19 19:34:53 +05:30
Juan Acevedo	aeac0a00f8	implementing flux on TPUs with ptxla (#10515 ) * implementing flux on TPUs with ptxla * add xla flux attention class * run make style/quality * Update src/diffusers/models/attention_processor.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/models/attention_processor.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * run style and quality --------- Co-authored-by: Juan Acevedo <jfacevedo@google.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>	2025-01-16 08:46:02 -10:00
Leo Jiang	b0c8973834	[Sana 4K] Add vae tiling option to avoid OOM (#10583 ) Co-authored-by: J石页 <jiangshuo9@h-partners.com>	2025-01-16 02:06:07 +05:30
hlky	980736b792	Fix train_dreambooth_lora_sd3_miniature (#10554 )	2025-01-13 13:47:27 +00:00
chaowenguo	d6c030fd37	add the xm.mark_step for the first denosing loop (#10530 ) * Update rerender_a_video.py * Update rerender_a_video.py * Update examples/community/rerender_a_video.py Co-authored-by: hlky <hlky@hlky.ac> * Update rerender_a_video.py * make style --------- Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: YiYi Xu <yixu310@gmail.com>	2025-01-10 21:03:41 +00:00
hlky	12fbe3f7dc	Use Pipelines without unet (#10440 ) * Use Pipelines without unet * unet.config.in_channels * default_sample_size * is_unet_version_less_0_9_0 --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-01-10 04:45:42 +00:00
Linoy Tsaban	83ba01a38d	small readme changes for advanced training examples (#10473 ) add to readme about hf login and wandb installation to address https://github.com/huggingface/diffusers/issues/10142#issuecomment-2571655570 Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-01-10 07:35:19 +05:30
chaowenguo	7bc8b92384	add callable object to convert frame into control_frame to reduce cpu memory usage. (#10501 ) * Update rerender_a_video.py * Update rerender_a_video.py * Update examples/community/rerender_a_video.py Co-authored-by: hlky <hlky@hlky.ac> --------- Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: YiYi Xu <yixu310@gmail.com>	2025-01-09 11:25:53 -10:00
Vladimir Mandic	f0c6d9784b	flux: make scheduler config params optional (#10384 ) * dont assume scheduler has optional config params * make style, make fix-copies * calculate_shift * fix-copies, usage in pipelines --------- Co-authored-by: hlky <hlky@hlky.ac>	2025-01-09 10:44:26 -10:00
Bagheera	a0acbdc989	fix for #7365 , prevent pipelines from overriding provided prompt embeds (#7926 ) * fix for #7365, prevent pipelines from overriding provided prompt embeds * fix-copies * fix implementation * update --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Aryan <aryan@huggingface.co> Co-authored-by: sayakpaul <spsayakpaul@gmail.com>	2025-01-08 10:12:12 -10:00
Parag Ekbote	5655b22ead	Notebooks for Community Scripts-5 (#10499 ) Add 5 Notebooks for Diffusers Community Pipelines.	2025-01-08 08:56:17 -08:00
hlky	ee7e141d80	Use pipelines without vae (#10441 ) * Use pipelines without vae * getattr * vqvae --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-01-07 13:26:51 -10:00
Teriks	03bcf5aefe	RFInversionFluxPipeline, small fix for enable_model_cpu_offload & enable_sequential_cpu_offload compatibility (#10480 ) RFInversionFluxPipeline.encode_image, device fix Use self._execution_device instead of self.device when selecting a device for the input image tensor. This allows for compatibility with enable_model_cpu_offload & enable_sequential_cpu_offload Co-authored-by: Teriks <Teriks@users.noreply.github.com> Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>	2025-01-07 15:47:28 +01:00

1 2 3 4 5 ...

1243 Commits