diffusers

mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Author	SHA1	Message	Date
Steven Liu	b60faf456b	[docs] Pipeline callbacks (#12212 ) * init * review	2025-08-22 13:01:24 -07:00
Vương Đình Minh	d03240801f	[Docs] Add documentation for KontextInpaintingPipeline (#12197 ) * [Docs] Add documentation for KontextInpaintingPipeline * Update docs/source/en/api/pipelines/flux.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update kontext inpaint docs with hfoption * Update docs/source/en/api/pipelines/flux.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/api/pipelines/flux.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-22 00:04:28 -07:00
galbria	7993be9e7f	Bria 3 2 pipeline (#12010 ) * Add Bria model and pipeline to diffusers - Introduced `BriaTransformer2DModel` and `BriaPipeline` for enhanced image generation capabilities. - Updated import structures across various modules to include the new Bria components. - Added utility functions and output classes specific to the Bria pipeline. - Implemented tests for the Bria pipeline to ensure functionality and output integrity. * with working tests * style and quality pass * adding docs * add to overview * fixes from "make fix-copies" * Refactor transformer_bria.py and pipeline_bria.py: Introduce new EmbedND class for rotary position embedding, and enhance Timestep and TimestepProjEmbeddings classes. Add utility functions for handling negative prompts and generating original sigmas in pipeline_bria.py. * remove redundent and duplicates tests and fix bf16 slow test * style fixes * small doc update * Enhance Bria 3.2 documentation and implementation - Updated the GitHub repository link for Bria 3.2. - Added usage instructions for the gated model access. - Introduced the BriaTransformerBlock and BriaAttention classes to the model architecture. - Refactored existing classes to integrate Bria-specific components, including BriaEmbedND and BriaPipeline. - Updated the pipeline output class to reflect Bria-specific functionality. - Adjusted test cases to align with the new Bria model structure. * Refactor Bria model components and update documentation - Removed outdated inference example from Bria 3.2 documentation. - Introduced the BriaTransformerBlock class to enhance model architecture. - Updated attention handling to use `attention_kwargs` instead of `joint_attention_kwargs`. - Improved import structure in the Bria pipeline to handle optional dependencies. - Adjusted test cases to reflect changes in model dtype assertions. * Update Bria model reference in documentation to reflect new file naming convention * Update docs/source/en/_toctree.yml * Refactor BriaPipeline to inherit from DiffusionPipeline instead of FluxPipeline, updating imports accordingly. * move the __call__ func to the end of file * Update BriaPipeline example to use bfloat16 for precision sensitivity for better result * make style && make quality && make fix-copiessource --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com> Co-authored-by: Aryan <contact.aryanvs@gmail.com>	2025-08-20 14:57:39 +05:30
Linoy Tsaban	8d1de40891	[Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora (#12074 ) * add alpha * load into 2nd transformer * Update src/diffusers/loaders/lora_conversion_utils.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update src/diffusers/loaders/lora_conversion_utils.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * pr comments * pr comments * pr comments * fix * fix * Apply style fixes * fix copies * fix * fix copies * Update src/diffusers/loaders/lora_pipeline.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * revert change * revert change * fix copies * up * fix --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: linoy <linoy@hf.co>	2025-08-19 08:32:39 +05:30
Sayak Paul	8cc528c5e7	[chore] add lora button to qwenimage docs (#12183 ) up	2025-08-19 07:13:24 +05:30
Sayak Paul	5b53f67f06	[docs] Clarify guidance scale in Qwen pipelines (#12181 ) * add clarification regarding guidance_scale in QwenImage * propagate.	2025-08-18 20:10:23 +05:30
Sayak Paul	4d9b82297f	[qwen] Qwen image edit followups (#12166 ) * add docs. * more docs. * xfail full compilation for Qwen for now. * tests * up * up * up * reviewer feedback.	2025-08-18 08:33:07 +05:30
Nguyễn Trọng Tuấn	da096a4999	Add QwenImage Inpainting and Img2Img pipeline (#12117 ) * feat/qwenimage-img2img-inpaint * Update qwenimage.md to reflect new pipelines and add # Copied from convention * tiny fix for passing ruff check * reformat code * fix copied from statement * fix copied from statement * copy and style fix * fix dummies --------- Co-authored-by: TuanNT-ZenAI <tuannt.zenai@gmail.com> Co-authored-by: DN6 <dhruv.nair@gmail.com>	2025-08-13 09:41:50 +05:30
Steven Liu	38740ddbd8	[docs] Modular diffusers (#11931 ) * start * draft * state, pipelineblock, apis * sequential * fix links * new * loop, auto * fix * pipeline * guiders * components manager * reviews * update * update * update --------- Co-authored-by: DN6 <dhruv.nair@gmail.com>	2025-08-12 18:50:20 +05:30
Steven Liu	f8ba5cd77a	[docs] Cache link (#12105 ) cache	2025-08-11 11:03:59 -07:00
Sayak Paul	f442955c6e	[lora] support loading loras from `lightx2v/Qwen-Image-Lightning` (#12119 ) * feat: support qwen lightning lora. * add docs. * fix	2025-08-11 09:27:10 +05:30
Sayak Paul	5937e11d85	[docs] small corrections to the example in the Qwen docs (#12068 ) * up * up	2025-08-05 09:47:21 +05:30
Sayak Paul	9c1d4e3be1	[wip] feat: support lora in qwen image and training script (#12056 ) * feat: support lora in qwen image and training script * up * up * up * up * up * up * add lora tests * fix * add tests * fix * reviewer feedback * up[ * Apply suggestions from code review Co-authored-by: Aryan <aryan@huggingface.co> --------- Co-authored-by: Aryan <aryan@huggingface.co>	2025-08-05 07:06:02 +05:30
Aryan	9a38fab5ae	tests + minor refactor for QwenImage (#12057 ) * update * update * update * add docs	2025-08-04 16:28:42 +05:30
Sayak Paul	9a2eaed002	[LoRA] support lightx2v lora in wan (#12040 ) * support lightx2v lora in wan * add docsa. * reviewer feedback * empty	2025-08-02 11:43:26 +05:30
Steven Liu	dfa48831e2	[docs] quant_kwargs (#11712 ) * draft * update	2025-07-29 10:23:16 -07:00
Álvaro Somoza	edcbe8038b	Fix huggingface-hub failing tests (#11994 ) * login * more logins * uploads * missed login * another missed login * downloads * examples and more logins * fix * setup * Apply style fixes * fix * Apply style fixes	2025-07-29 02:34:58 -04:00
Tolga Cangöz	7298bdd817	Add SkyReels V2: Infinite-Length Film Generative Model (#11518 ) * style * Fix class name casing for SkyReelsV2 components in multiple files to ensure consistency and correct functionality. * cleaning * cleansing * Refactor `get_timestep_embedding` to move modifications into `SkyReelsV2TimeTextImageEmbedding`. * Remove unnecessary line break in `get_timestep_embedding` function for cleaner code. * Remove `skyreels_v2` entry from `_import_structure` and update its initialization to directly assign the list of SkyReelsV2 components. * cleansing * Refactor attention processing in `SkyReelsV2AttnProcessor2_0` to always convert query, key, and value to `torch.bfloat16`, simplifying the code and improving clarity. * Enhance example usage in `pipeline_skyreels_v2_diffusion_forcing.py` by adding VAE initialization and detailed prompt for video generation, improving clarity and usability of the documentation. * Refactor import structure in `__init__.py` for SkyReelsV2 components and improve formatting in `pipeline_skyreels_v2_diffusion_forcing.py` to enhance code readability and maintainability. * Update `guidance_scale` parameter in `SkyReelsV2DiffusionForcingPipeline` from 5.0 to 6.0 to enhance video generation quality. * Update `guidance_scale` parameter in example documentation and class definition of `SkyReelsV2DiffusionForcingPipeline` to ensure consistency and improve video generation quality. * Update `causal_block_size` parameter in `SkyReelsV2DiffusionForcingPipeline` to default to `None`. * up * Fix dtype conversion for `timestep_proj` in `SkyReelsV2Transformer3DModel` to ensure correct tensor operations. * Optimize causal mask generation by replacing repeated tensor with `repeat_interleave` for improved efficiency in `SkyReelsV2Transformer3DModel`. * style * Enhance example documentation in `SkyReelsV2DiffusionForcingPipeline` with guidance scale and shift parameters for T2V and I2V. Remove unused `retrieve_latents` function to streamline the code. * Refactor sample scheduler creation in `SkyReelsV2DiffusionForcingPipeline` to use `deepcopy` for improved state management during inference steps. * Enhance error handling and documentation in `SkyReelsV2DiffusionForcingPipeline` for `overlap_history` and `addnoise_condition` parameters to improve long video generation guidance. * Update documentation and progress bar handling in `SkyReelsV2DiffusionForcingPipeline` to clarify asynchronous inference settings and improve progress tracking during denoising steps. * Refine progress bar calculation in `SkyReelsV2DiffusionForcingPipeline` by rounding the step size to one decimal place for improved readability during denoising steps. * Update import statements in `SkyReelsV2DiffusionForcingPipeline` documentation for improved clarity and organization. * Refactor progress bar handling in `SkyReelsV2DiffusionForcingPipeline` to use total steps instead of calculated step size. * update templates for i2v, v2v * Add `retrieve_latents` function to streamline latent retrieval in `SkyReelsV2DiffusionForcingPipeline`. Update video latent processing to utilize this new function for improved clarity and maintainability. * Add `retrieve_latents` function to both i2v and v2v pipelines for consistent latent retrieval. Update video latent processing to utilize this function, enhancing clarity and maintainability across the SkyReelsV2DiffusionForcingPipeline implementations. * Remove redundant ValueError for `overlap_history` in `SkyReelsV2DiffusionForcingPipeline` to streamline error handling and improve user guidance for long video generation. * Update default video dimensions and flow matching scheduler parameter in `SkyReelsV2DiffusionForcingPipeline` to enhance video generation capabilities. * Refactor `SkyReelsV2DiffusionForcingPipeline` to support Image-to-Video (i2v) generation. Update class name, add image encoding functionality, and adjust parameters for improved video generation. Enhance error handling for image inputs and update documentation accordingly. * Improve organization for image-last_image condition. * Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to improve latent preparation and video condition handling integration. * style * style * Add example usage of PIL for image input in `SkyReelsV2DiffusionForcingImageToVideoPipeline` documentation. * Refactor `SkyReelsV2DiffusionForcingPipeline` to `SkyReelsV2DiffusionForcingVideoToVideoPipeline`, enhancing support for Video-to-Video (v2v) generation. Introduce video input handling, update latent preparation logic, and improve error handling for input parameters. * Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` by removing the `image_encoder` and `image_processor` dependencies. Update the CPU offload sequence accordingly. * Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to enhance latent preparation logic and condition handling. Update image input type to `Optional`, streamline video condition processing, and improve handling of `last_image` during latent generation. * Enhance `SkyReelsV2DiffusionForcingPipeline` by refining latent preparation for long video generation. Introduce new parameters for video handling, overlap history, and causal block size. Update logic to accommodate both short and long video scenarios, ensuring compatibility and improved processing. * refactor * fix num_frames * fix prefix_video_latents * up * refactor * Fix typo in scheduler method call within `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to ensure proper noise scaling during latent generation. * up * Enhance `SkyReelsV2DiffusionForcingImageToVideoPipeline` by adding support for `last_image` parameter and refining latent frame calculations. Update preprocessing logic. * add statistics * Refine latent frame handling in `SkyReelsV2DiffusionForcingImageToVideoPipeline` by correcting variable names and reintroducing latent mean and standard deviation calculations. Update logic for frame preparation and sampling to ensure accurate video generation. * up * refactor * up * Refactor `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to improve latent handling by enforcing tensor input for video, updating frame preparation logic, and adjusting default frame count. Enhance preprocessing and postprocessing steps for better integration. * style * fix vae output indexing * upup * up * Fix tensor concatenation and repetition logic in `SkyReelsV2DiffusionForcingImageToVideoPipeline` to ensure correct dimensionality for video conditions and latent conditions. * Refactor latent retrieval logic in `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to handle tensor dimensions more robustly, ensuring compatibility with both 3D and 4D video inputs. * Enhance logging in `SkyReelsV2DiffusionForcing` pipelines by adding iteration print statements for better debugging. Clean up unused code related to prefix video latents length calculation in `SkyReelsV2DiffusionForcingImageToVideoPipeline`. * Update latent handling in `SkyReelsV2DiffusionForcingImageToVideoPipeline` to conditionally set latents based on video iteration state, improving flexibility for video input processing. * Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize `get_1d_sincos_pos_embed_from_grid` for timestep projection. * Enhance `get_1d_sincos_pos_embed_from_grid` function to include an optional parameter `flip_sin_to_cos` for flipping sine and cosine embeddings, improving flexibility in positional embedding generation. * Update timestep projection in `SkyReelsV2TimeTextImageEmbedding` to include `flip_sin_to_cos` parameter, enhancing the flexibility of time embedding generation. * Refactor tensor type handling in `SkyReelsV2AttnProcessor2_0` and `SkyReelsV2TransformerBlock` to ensure consistent use of `torch.float32` and `torch.bfloat16`, improving integration. * Update tensor type in `SkyReelsV2RotaryPosEmbed` to use `torch.float32` for frequency calculations, ensuring consistency in data types across the model. * Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize automatic mixed precision for timestep projection. * down * down * style * Add debug tensor tracking to `SkyReelsV2Transformer3DModel` for enhanced debugging and output analysis; update `Transformer2DModelOutput` to include debug tensors. * up * Refactor indentation in `SkyReelsV2AttnProcessor2_0` to improve code readability and maintain consistency in style. * Convert query, key, and value tensors to bfloat16 in `SkyReelsV2AttnProcessor2_0` for improved performance. * Add debug print statements in `SkyReelsV2TransformerBlock` to track tensor shapes and values for improved debugging and analysis. * debug * debug * Remove commented-out debug tensor tracking from `SkyReelsV2TransformerBlock` * Add functionality to save processed video latents as a Safetensors file in `SkyReelsV2DiffusionForcingPipeline`. * up * Add functionality to save output latents as a Safetensors file in `SkyReelsV2DiffusionForcingPipeline`. * up * Remove additional commented-out debug tensor tracking from `SkyReelsV2TransformerBlock` and `SkyReelsV2Transformer3DModel` for cleaner code. * style * cleansing * Update example documentation and parameters in `SkyReelsV2Pipeline`. Adjusted example code for loading models, modified default values for height, width, num_frames, and guidance_scale, and improved output video quality settings. * Update shift parameter in example documentation and default values across SkyReels V2 pipelines. Adjusted shift values for I2V from 3.0 to 5.0 and updated related example code for consistency. * Update example documentation in SkyReels V2 pipelines to include available model options and update model references for loading. Adjusted model names to reflect the latest versions across I2V, V2V, and T2V pipelines. * Add test templates * style * Add docs template * Add SkyReels V2 Diffusion Forcing Video-to-Video Pipeline to imports * style * fix-copies * convert i2v 1.3b * Update transformer configuration to include `image_dim` for SkyReels V2 models and refactor imports to use `SkyReelsV2Transformer3DModel`. * Refactor transformer import in SkyReels V2 pipeline to use `SkyReelsV2Transformer3DModel` for consistency. * Update transformer configuration in SkyReels V2 to increase `in_channels` from 16 to 36 for i2v conf. * Update transformer configuration in SkyReels V2 to set `added_kv_proj_dim` values for different model types. * up * up * up * Add SkyReelsV2Pipeline support for T2V model type in conversion script * upp * Refactor model type checks in conversion script to use substring matching for improved flexibility * upp * Fix shard path formatting in conversion script to accommodate varying model types by dynamically adjusting zero padding. * Update sharded safetensors loading logic in conversion script to use substring matching for model directory checks * Update scheduler parameters in SkyReels V2 test files for consistency across image and video pipelines * Refactor conversion script to initialize text encoder, tokenizer, and scheduler for SkyReels pipelines, enhancing model integration * style * Update documentation for SkyReels-V2, introducing the Infinite-length Film Generative model, enhancing text-to-video generation examples, and updating model references throughout the API documentation. * Add SkyReelsV2Transformer3DModel and FlowMatchUniPCMultistepScheduler documentation, updating TOC and introducing new model and scheduler files. * style * Update documentation for SkyReelsV2DiffusionForcingPipeline to correct flow matching scheduler parameter for I2V from 3.0 to 5.0, ensuring clarity in usage examples. * Add documentation for causal_block_size parameter in SkyReelsV2DF pipelines, clarifying its role in asynchronous inference. * Simplify min_ar_step calculation in SkyReelsV2DiffusionForcingPipeline to improve clarity. * style and fix-copies * style * Add documentation for SkyReelsV2Transformer3DModel Introduced a new markdown file detailing the SkyReelsV2Transformer3DModel, including usage instructions and model output specifications. * Update test configurations for SkyReelsV2 pipelines - Adjusted `in_channels` from 36 to 16 in `test_skyreels_v2_df_image_to_video.py`. - Added new parameters: `overlap_history`, `num_frames`, and `base_num_frames` in `test_skyreels_v2_df_video_to_video.py`. - Updated expected output shape in video tests from (17, 3, 16, 16) to (41, 3, 16, 16). * Refines SkyReelsV2DF test parameters * Update src/diffusers/models/modeling_outputs.py Co-authored-by: Aryan <contact.aryanvs@gmail.com> * Refactor `grid_sizes` processing by using already-calculated post-patch parameters to simplify * Update docs/source/en/api/pipelines/skyreels_v2.md Co-authored-by: Aryan <contact.aryanvs@gmail.com> * Refactor parameter naming for diffusion forcing in SkyReelsV2 pipelines - Changed `flag_df` to `enable_diffusion_forcing` for clarity in the SkyReelsV2Transformer3DModel and associated pipelines. - Updated all relevant method calls to reflect the new parameter name. * Revert _toctree.yml to adjust section expansion states * style * Update docs/source/en/api/models/skyreels_v2_transformer_3d.md Co-authored-by: YiYi Xu <yixu310@gmail.com> * Add copying label to SkyReelsV2ImageEmbedding from WanImageEmbedding. * Refactor transformer block processing in SkyReelsV2Transformer3DModel - Ensured proper handling of hidden states during both gradient checkpointing and standard processing. * Update SkyReels V2 documentation to remove VRAM requirement and streamline imports - Removed the mention of ~13GB VRAM requirement for the SkyReels-V2 model. - Simplified import statements by removing unused `load_image` import. * Add SkyReelsV2LoraLoaderMixin for loading and managing LoRA layers in SkyReelsV2Transformer3DModel - Introduced SkyReelsV2LoraLoaderMixin class to handle loading, saving, and fusing of LoRA weights specific to the SkyReelsV2 model. - Implemented methods for state dict management, including compatibility checks for various LoRA formats. - Enhanced functionality for loading weights with options for low CPU memory usage and hotswapping. - Added detailed docstrings for clarity on parameters and usage. * Update SkyReelsV2 documentation and loader mixin references - Corrected the documentation to reference the new `SkyReelsV2LoraLoaderMixin` for loading LoRA weights. - Updated comments in the `SkyReelsV2LoraLoaderMixin` class to reflect changes in model references from `WanTransformer3DModel` to `SkyReelsV2Transformer3DModel`. * Enhance SkyReelsV2 integration by adding SkyReelsV2LoraLoaderMixin references - Added `SkyReelsV2LoraLoaderMixin` to the documentation and loader imports for improved LoRA weight management. - Updated multiple pipeline classes to inherit from `SkyReelsV2LoraLoaderMixin` instead of `WanLoraLoaderMixin`. * Update SkyReelsV2 model references in documentation - Replaced placeholder model paths with actual paths for SkyReels-V2 models in multiple pipeline files. - Ensured consistency across the documentation for loading models in the SkyReelsV2 pipelines. * style * fix-copies * Refactor `fps_projection` in `SkyReelsV2Transformer3DModel` - Replaced the sequential linear layers for `fps_projection` with a `FeedForward` layer using `SiLU` activation for better integration. * Update docs * Refactor video processing in SkyReelsV2DiffusionForcingPipeline - Renamed parameters for clarity: `video` to `video_latents` and `overlap_history` to `overlap_history_latent_frames`. - Updated logic for handling long video generation, including adjustments to latent frame calculations and accumulation. - Consolidated handling of latents for both long and short video generation scenarios. - Final decoding step now consistently converts latents to pixels, ensuring proper output format. * Update activation function in `fps_projection` of `SkyReelsV2Transformer3DModel` - Changed activation function from `silu` to `linear-silu` in the `fps_projection` layer for improved performance and integration. * Add fps_projection layer renaming in convert_skyreelsv2_to_diffusers.py - Updated key mappings for the `fps_projection` layer to align with new naming conventions, ensuring consistency in model integration. * Fix fps_projection assignment in SkyReelsV2Transformer3DModel - Corrected the assignment of the `fps_projection` layer to ensure it is properly cast to the appropriate data type, enhancing model functionality. * Update _keep_in_fp32_modules in SkyReelsV2Transformer3DModel - Added `fps_projection` to the list of modules that should remain in FP32 precision, ensuring proper handling of data types during model operations. * Remove integration test classes from SkyReelsV2 test files - Deleted the `SkyReelsV2DiffusionForcingPipelineIntegrationTests` and `SkyReelsV2PipelineIntegrationTests` classes along with their associated setup, teardown, and test methods, as they were not implemented and not needed for current testing. * style * Refactor: Remove hardcoded `torch.bfloat16` cast in attention * Refactor: Simplify data type handling in transformer model Removes unnecessary data type conversions for the FPS embedding and timestep projection. This change simplifies the forward pass by relying on the inherent data types of the tensors. * Refactor: Remove `fps_projection` from `_keep_in_fp32_modules` in `SkyReelsV2Transformer3DModel` * Update src/diffusers/models/transformers/transformer_skyreels_v2.py Co-authored-by: Aryan <contact.aryanvs@gmail.com> * Refactor: Remove unused flags and simplify attention mask handling in SkyReelsV2AttnProcessor2_0 and SkyReelsV2Transformer3DModel Refactor: Simplify causal attention logic in SkyReelsV2 Removes the `flag_causal_attention` and `_flag_ar_attention` flags to simplify the implementation. The decision to apply a causal attention mask is now based directly on the `num_frame_per_block` configuration, eliminating redundant flags and conditional checks. This streamlines the attention mechanism and simplifies the `set_ar_attention` methods. * Refactor: Clarify variable names for latent frames Renames `base_num_frames` to `base_latent_num_frames` to make it explicit that the variable refers to the number of frames in the latent space. This change improves code readability and reduces potential confusion between latent frames and decoded video frames. The `num_frames` parameter in `generate_timestep_matrix` is also renamed to `num_latent_frames` for consistency. * Enhance documentation: Add detailed docstring for timestep matrix generation in SkyReelsV2DiffusionForcingPipeline * Docs: Clarify long video chunking in pipeline docstring Improves the explanation of long video processing within the pipeline's docstring. The update replaces the abstract description with a concrete example, illustrating how the sliding window mechanism works with overlapping chunks. This makes the roles of `base_num_frames` and `overlap_history` clearer for users. * Docs: Move visual demonstration and processing details for SkyReelsV2DiffusionForcingPipeline to docs page from the code * Docs: Update asynchronous processing timeline and examples for long video handling in SkyReels-V2 documentation * Enhance timestep matrix generation documentation and logic for synchronous/asynchronous video processing * Update timestep matrix documentation and enhance analysis for clarity in SkyReelsV2DiffusionForcingPipeline * Docs: Update visual demonstration section and add detailed step matrix construction example for asynchronous processing in SkyReelsV2DiffusionForcingPipeline * style * fix-copies * Refactor parameter names for clarity in SkyReelsV2DiffusionForcingImageToVideoPipeline and SkyReelsV2DiffusionForcingVideoToVideoPipeline * Refactor: Avoid VAE roundtrip in long video generation Improves performance and quality for long video generation by operating entirely in latent space during the iterative generation process. Instead of decoding latents to video and then re-encoding the overlapping section for the next chunk, this change passes the generated latents directly between iterations. This avoids a computationally expensive and potentially lossy VAE decode/encode cycle within the loop. The full video is now decoded only once from the accumulated latents at the end of the process. * Refactor: Rename prefix_video_latents_length to prefix_video_latents_frames for clarity * Refactor: Rename num_latent_frames to current_num_latent_frames for clarity in SkyReelsV2DiffusionForcingImageToVideoPipeline * Refactor: Enhance long video generation logic and improve latent handling in SkyReelsV2DiffusionForcingImageToVideoPipeline Refactor: Unify video generation and pass latents directly Unifies the separate code paths for short and long video generation into a single, streamlined loop. This change eliminates the inefficient decode-encode cycle during long video generation. Instead of converting latents to pixel-space video between chunks, the pipeline now passes the generated latents directly to the next iteration. This improves performance, avoids potential quality loss from intermediate VAE steps, and enhances code maintainability by removing significant duplication. * style * Refactor: Remove overlap_history parameter and streamline long video generation logic in SkyReelsV2DiffusionForcingImageToVideoPipeline Refactor: Streamline long video generation logic Removes the `overlap_history` parameter and simplifies the conditioning process for long video generation. This change avoids a redundant VAE encoding step by directly using latent frames from the previous chunk for conditioning. It also moves image preprocessing outside the main generation loop to prevent repeated computations and clarifies the handling of prefix latents. * style * Refactor latent handling in i2v diffusion forcing pipeline Improves the latent conditioning and accumulation logic within the image-to-video diffusion forcing loop. - Corrects the splitting of the initial conditioning tensor to robustly handle both even and odd lengths. - Simplifies how latents are accumulated across iterations for long video generation. - Ensures the final latents are trimmed correctly before decoding only when a `last_image` is provided. * Refactor: Remove overlap_history parameter from SkyReelsV2DiffusionForcingImageToVideoPipeline * Refactor: Adjust video_latents parameter handling in prepare_latents method * style * Refactor: Update long video iteration print statements for clarity * Fix: Update transformer config with dynamic causal block size Updates the SkyReelsV2 pipelines to correctly set the `causal_block_size` in the transformer's configuration when it's provided during a pipeline call. This ensures the model configuration reflects the user's specified setting for the inference run. The `set_ar_attention` method is also renamed to `_set_ar_attention` to mark it as an internal helper. * style * Refactor: Adjust video input size and expected output shape in inference test * Refactor: Rename video variables for clarity in SkyReelsV2DiffusionForcingVideoToVideoPipeline * Docs: Clarify time embedding logic in SkyReelsV2 Adds comments to explain the handling of different time embedding tensor dimensions. A 2D tensor is used for standard models with a single time embedding per batch, while a 3D tensor is used for Diffusion Forcing models where each frame has its own time embedding. This clarifies the expected input for different model variations. * Docs: Update SkyReels V2 pipeline examples Updates the docstring examples for the SkyReels V2 pipelines to reflect current best practices and API changes. - Removes the `shift` parameter from pipeline call examples, as it is now configured directly on the scheduler. - Replaces the `set_ar_attention` method call with the `causal_block_size` argument in the pipeline call for diffusion forcing examples. - Adjusts recommended parameters for I2V and V2V examples, including inference steps, guidance scale, and `ar_step`. * Refactor: Remove `shift` parameter from SkyReelsV2 pipelines Removes the `shift` parameter from the call signature of all SkyReelsV2 pipelines. This parameter is a scheduler-specific configuration and should be set directly on the scheduler during its initialization, rather than being passed at runtime through the pipeline. This change simplifies the pipeline API. Usage examples are updated to reflect that the `shift` value should now be passed when creating the `FlowMatchUniPCMultistepScheduler`. * Refactors SkyReelsV2 image-to-video tests and adds last image case Simplifies the test suite by removing a duplicated test class and streamlining the dummy component and input generation. Adds a new test to verify the pipeline's behavior when a `last_image` is provided as input for conditioning. * test: Add image components to SkyReelsV2 pipeline test Adds the `image_encoder` and `image_processor` to the test components for the image-to-video pipeline. Also replaces a hardcoded value for the positional embedding sequence length with a more descriptive calculation, improving clarity. * test: Add callback configuration test for SkyReelsV2DiffusionForcingVideoToVideoPipeline test: Add callback test for SkyReelsV2DFV2V pipeline Adds a test to validate the callback functionality for the `SkyReelsV2DiffusionForcingVideoToVideoPipeline`. This test confirms that `callback_on_step_end` is invoked correctly and can modify the pipeline's state during inference. It uses a callback to dynamically increase the `guidance_scale` and asserts that the final value is as expected. The implementation correctly accounts for the nested denoising loops present in diffusion forcing pipelines. * style * fix: Update image_encoder type to CLIPVisionModelWithProjection in SkyReelsV2ImageToVideoPipeline * UP * Add conversion support for SkyReels-V2-FLF2V models Adds configurations for three new FLF2V model variants (1.3B-540P, 14B-540P, and 14B-720P) to the conversion script. This change also introduces specific handling to zero out the image positional embeddings for these models and updates the main script to correctly initialize the image-to-video pipeline. * Docs: Update and simplify SkyReels V2 usage examples Simplifies the text-to-video example by removing the manual group offloading configuration, making it more straightforward. Adds comments to pipeline parameters to clarify their purpose and provides guidance for different resolutions and long video generation. Introduces a new section with a code example for the video-to-video pipeline. * style * docs: Add SkyReels-V2 FLF2V 1.3B model to supported models list * docs: Update SkyReels-V2 documentation * Move the initialization of the `gradient_checkpointing` attribute to its suggested location. * Refactor: Use logger for long video progress messages Replaces `print()` calls with `logger.debug()` for reporting progress during long video generation in SkyReelsV2DF pipelines. This change reduces console output verbosity for standard runs while allowing developers to view progress by enabling debug-level logging. * Refactor SkyReelsV2 timestep embedding into a module Extract the sinusoidal timestep embedding logic into a new `SkyReelsV2Timesteps` `nn.Module`. This change encapsulates the embedding generation, which simplifies the `SkyReelsV2TimeTextImageEmbedding` class and improves code modularity. * Fix: Preserve original shape in timestep embeddings Reshapes the timestep embedding tensor to match the original input shape. This ensures that batched timestep inputs retain their batch dimension after embedding, preventing potential shape mismatches. * style * Refactor: Move SkyReelsV2Timesteps to model file Colocates the `SkyReelsV2Timesteps` class with the SkyReelsV2 transformer model. This change moves model-specific timestep embedding logic from the general embeddings module to the transformer's own file, improving modularity and making the model more self-contained. * Refactor parameter dtype retrieval to use utility function Replaces manual parameter iteration with the `get_parameter_dtype` helper to determine the time embedder's data type. This change improves code readability and centralizes the logic. * Add comments to track the tensor shape transformations * Add copied froms * style * fix-copies * up * Remove FlowMatchUniPCMultistepScheduler Deletes the `FlowMatchUniPCMultistepScheduler` as it is no longer being used. * Refactor: Replace FlowMatchUniPC scheduler with UniPC Removes the `FlowMatchUniPCMultistepScheduler` and integrates its functionality into the existing `UniPCMultistepScheduler`. This consolidation is achieved by using the `use_flow_sigmas=True` parameter in `UniPCMultistepScheduler`, simplifying the scheduler API and reducing code duplication. All usages, documentation, and tests are updated accordingly. * style * Remove text_encoder parameter from SkyReelsV2DiffusionForcingPipeline initialization * Docs: Rename `pipe` to `pipeline` in SkyReels examples Updates the variable name from `pipe` to `pipeline` across all SkyReels V2 documentation examples. This change improves clarity and consistency. * Fix: Rename shift parameter to flow_shift in SkyReels-V2 examples * Fix: Rename shift parameter to flow_shift in example documentation across SkyReels-V2 files * Fix: Rename shift parameter to flow_shift in UniPCMultistepScheduler initialization across SkyReels test files * Removes unused generator argument from scheduler step The `generator` parameter is not used by the scheduler's `step` method within the SkyReelsV2 diffusion forcing pipelines. This change removes the unnecessary argument from the method call for code clarity and consistency. * Fix: Update time_embedder_dtype assignment to use the first parameter's dtype in SkyReelsV2TimeTextImageEmbedding * style * Refactor: Use get_parameter_dtype utility function Replaces manual parameter iteration with the `get_parameter_dtype` helper. * Fix: Prevent (potential) error in parameter dtype check Adds a check to ensure the `_keep_in_fp32_modules` attribute exists on a parameter before it is accessed. This prevents a potential `AttributeError`, making the utility function more robust when used with models that do not define this attribute. --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: Aryan <contact.aryanvs@gmail.com>	2025-07-16 08:24:41 -10:00
shm4r7	de043c6044	Update chroma.md (#11891 ) Fix typo in Inference example code	2025-07-09 09:58:38 +05:30
Aryan	0454fbb30b	First Block Cache (#11180 ) * update * modify flux single blocks to make compatible with cache techniques (without too much model-specific intrusion code) * remove debug logs * update * cache context for different batches of data * fix hs residual bug for single return outputs; support ltx * fix controlnet flux * support flux, ltx i2v, ltx condition * update * update * Update docs/source/en/api/cache.md * Update src/diffusers/hooks/hooks.py Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * address review comments pt. 1 * address review comments pt. 2 * cache context refacotr; address review pt. 3 * address review comments * metadata registration with decorators instead of centralized * support cogvideox * support mochi * fix * remove unused function * remove central registry based on review * update --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>	2025-07-09 03:27:15 +05:30
Steven Liu	64a9210315	[docs] Deprecated pipelines (#11838 ) add warning Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-07-01 14:02:54 -10:00
Sayak Paul	470458623e	[docs] fix single_file example. (#11847 ) fix single_file example.	2025-07-01 21:23:27 +05:30
Aryan	a79c3af6bb	[single file] Cosmos (#11801 ) * update * update * update docs	2025-07-01 18:02:58 +05:30
Aryan	d7dd924ece	Kontext fixes (#11815 ) fix	2025-06-26 13:03:44 -10:00
Sayak Paul	00f95b9755	Kontext training (#11813 ) * support flux kontext * make fix-copies * add example * add tests * update docs * update * add note on integrity checker * initial commit * initial commit * add readme section and fixes in the training script. * add test * rectify ckpt_id * fix ckpt * fixes * change id * update * Update examples/dreambooth/train_dreambooth_lora_flux_kontext.py Co-authored-by: Aryan <aryan@huggingface.co> * Update examples/dreambooth/README_flux.md --------- Co-authored-by: Aryan <aryan@huggingface.co> Co-authored-by: linoytsaban <linoy@huggingface.co> Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>	2025-06-26 19:31:42 +03:00
Aryan	eea76892e8	Flux Kontext (#11812 ) * support flux kontext * make fix-copies * add example * add tests * update docs * update * add note on integrity checker * make fix-copies issue * add copied froms * make style * update repository ids * more copied froms	2025-06-26 21:29:59 +05:30
Sayak Paul	92542719ed	[docs] minor cleanups in the lora docs. (#11770 ) * minor cleanups in the lora docs. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * format docs * fix copies --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-24 08:10:07 +05:30
Dhruv Nair	195926bbdc	Update Chroma Docs (#11753 ) * update * update --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-06-19 19:33:19 +02:00
Aryan	a4df8dbc40	Update more licenses to 2025 (#11746 ) update	2025-06-19 07:46:01 +05:30
Edna	8adc6003ba	Chroma Pipeline (#11698 ) * working state from hameerabbasi and iddl * working state form hameerabbasi and iddl (transformer) * working state (normalization) * working state (embeddings) * add chroma loader * add chroma to mappings * add chroma to transformer init * take out variant stuff * get decently far in changing variant stuff * add chroma init * make chroma output class * add chroma transformer to dummy tp * add chroma to init * add chroma to init * fix single file * update * update * add chroma to auto pipeline * add chroma to pipeline init * change to chroma transformer * take out variant from blocks * swap embedder location * remove prompt_2 * work on swapping text encoders * remove mask function * dont modify mask (for now) * wrap attn mask * no attn mask (can't get it to work) * remove pooled prompt embeds * change to my own unpooled embeddeer * fix load * take pooled projections out of transformer * ensure correct dtype for chroma embeddings * update * use dn6 attn mask + fix true_cfg_scale * use chroma pipeline output * use DN6 embeddings * remove guidance * remove guidance embed (pipeline) * remove guidance from embeddings * don't return length * dont change dtype * remove unused stuff, fix up docs * add chroma autodoc * add .md (oops) * initial chroma docs * undo don't change dtype * undo arxiv change unsure why that happened * fix hf papers regression in more places * Update docs/source/en/api/pipelines/chroma.md Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * do_cfg -> self.do_classifier_free_guidance * Update docs/source/en/api/models/chroma_transformer.md Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * Update chroma.md * Move chroma layers into transformer * Remove pruned AdaLayerNorms * Add chroma fast tests * (untested) batch cond and uncond * Add # Copied from for shift * Update # Copied from statements * update norm imports * Revert cond + uncond batching * Add transformer tests * move chroma test (oops) * chroma init * fix chroma pipeline fast tests * Update src/diffusers/models/transformers/transformer_chroma.py Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * Move Approximator and Embeddings * Fix auto pipeline + make style, quality * make style * Apply style fixes * switch to new input ids * fix # Copied from error * remove # Copied from on protected members * try to fix import * fix import * make fix-copes * revert style fix * update chroma transformer params * update chroma transformer approximator init params * update to pad tokens * fix batch inference * Make more pipeline tests work * Make most transformer tests work * fix docs * make style, make quality * skip batch tests * fix test skipping * fix test skipping again * fix for tests * Fix all pipeline test * update * push local changes, fix docs * add encoder test, remove pooled dim * default proj dim * fix tests * fix equal size list input * update * push local changes, fix docs * add encoder test, remove pooled dim * default proj dim * fix tests * fix equal size list input * Revert "fix equal size list input" This reverts commit `3fe4ad67d5`. * update * update * update * update * update --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-06-14 06:52:56 +05:30
Aryan	9f91305f85	Cosmos Predict2 (#11695 ) * support text-to-image * update example * make fix-copies * support use_flow_sigmas in EDM scheduler instead of maintain cosmos-specific scheduler * support video-to-world * update * rename text2image pipeline * make fix-copies * add t2i test * add test for v2w pipeline * support edm dpmsolver multistep * update * update * update * update tests * fix tests * safety checker * make conversion script work without guardrail	2025-06-14 01:51:29 +05:30
Aryan	73a9d5856f	Wan VACE (#11582 ) * initial support * make fix-copies * fix no split modules * add conversion script * refactor * add pipeline test * refactor * fix bug with mask * fix for reference images * remove print * update docs * update slices * update * update * update example	2025-06-06 17:53:10 +05:30
Steven Liu	c934720629	[docs] Model cards (#11112 ) * initial * update * hunyuanvideo * ltx * fix * wan * gen guide * feedback * feedback * pipeline-level quant config * feedback * ltx	2025-06-02 16:55:14 -07:00
Steven Liu	9f48394bf7	[docs] Caching methods (#11625 ) * cache * feedback	2025-06-02 10:58:47 -07:00
VLT Media	d0ec6601df	Bug: Fixed Image 2 Image example (#11619 ) Bug: Fixed Image 2 Image example where a PIL.Image was improperly being asked for an item via index.	2025-05-30 11:30:52 +05:30
Steven Liu	be2fb77dc1	[docs] PyTorch 2.0 (#11618 ) * combine * Update docs/source/en/optimization/fp16.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2025-05-28 09:42:41 -07:00
Linoy Tsaban	28ef0165b9	[Sana Sprint] add image-to-image pipeline (#11602 ) * sana sprint img2img * fix import * fix name * fix image encoding * fix image encoding * fix image encoding * fix image encoding * fix image encoding * fix image encoding * try w/o strength * try scaling differently * try with strength * revert unnecessary changes to scheduler * revert unnecessary changes to scheduler * Apply style fixes * remove comment * add copy statements * add copy statements * add to doc * add to doc * add to doc * add to doc * Apply style fixes * empty commit * fix copies * fix copies * fix copies * fix copies * fix copies * docs * make fix-copies. * fix doc building error. * initial commit - add img2img test * initial commit - add img2img test * fix import * fix imports * Apply style fixes * empty commit * remove * empty commit * test vocab size * fix * fix prompt missing from last commits * small changes * fix image processing when input is tensor * fix order * Apply style fixes * empty commit * fix shape * remove comment * image processing * remove comment * skip vae tiling test for now * Apply style fixes * empty commit --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: sayakpaul <spsayakpaul@gmail.com>	2025-05-27 22:09:51 +03:00
osrm	8705af0914	docs: fix invalid links (#11505 ) * fix invalid link lora.md * fix invalid link controlnet_sdxl.md The Hugging Face models page now uses the tags parameter instead of the other parameter for tag-based filtering. Therefore, to simultaneously apply both the "Stable Diffusion XL" and "ControlNet" tags, the following URL should be used: https://huggingface.co/models?tags=stable-diffusion-xl,controlnet * fix invalid link cosine_dpm.md "https://github.com/Stability-AI/stable-audio-tool" -> "https://github.com/Stability-AI/stable-audio-tools" * Update controlnet_sdxl.md * Update cosine_dpm.md --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-20 08:55:41 -07:00
Aryan	05c8b42b75	LTX 0.9.7-distilled; documentation improvements (#11571 ) * add guidance rescale * update docs * support adaptive instance norm filter * fix custom timesteps support * add custom timestep example to docs * add a note about best generation settings being available only in the original repository * use original org hub ids instead of personal * make fix-copies --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>	2025-05-20 02:29:16 +05:30
Quentin Gallouédec	c8bb1ff53e	Use HF Papers (#11567 ) * Use HF Papers * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-05-19 06:22:33 -10:00
Dhruv Nair	4267d8f4eb	[Single File] GGUF/Single File Support for HiDream (#11550 ) * update * update * update * update * update * update * update	2025-05-15 12:25:18 +05:30
Aryan	06fee551e9	LTX Video 0.9.7 (#11516 ) * add upsampling pipeline * ltx upsample pipeline conversion; pipeline fixes * make fix-copies * remove print * add vae convenience methods * update * add tests * support denoising strength for upscaling & video-to-video * update docs * update doc checkpoints * update docs * fix --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>	2025-05-13 14:57:03 +05:30
Zhong-Yu Li	4f438de35a	Add VisualCloze (#11377 ) * VisualCloze * style quality * add docs * add docs * typo * Update docs/source/en/api/pipelines/visualcloze.md * delete einops * style quality * Update src/diffusers/pipelines/visualcloze/pipeline_visualcloze.py * reorg * refine doc * style quality * typo * typo * Update src/diffusers/image_processor.py * add comment * test * style * Modified based on review * style * restore image_processor * update example url * style * fix-copies * VisualClozeGenerationPipeline * combine * tests docs * remove VisualClozeUpsamplingPipeline * style * quality * test examples * quality style * typo * make fix-copies * fix test_callback_cfg and test_save_load_dduf in VisualClozePipelineFastTests * add EXAMPLE_DOC_STRING to VisualClozeGenerationPipeline * delete maybe_free_model_hooks from pipeline_visualcloze_combined * Apply suggestions from code review * fix test_save_load_local test; add reason for skipping cfg test * more save_load test fixes * fix tests in generation pipeline tests	2025-05-13 02:46:51 +05:30
Aryan	e48f6aeeb4	Hunyuan Video Framepack F1 (#11534 ) * support framepack f1 * update docs * update toctree * remove typo	2025-05-12 16:11:10 +05:30
Sayak Paul	599c887164	feat: pipeline-level quantization config (#11130 ) * feat: pipeline-level quant config. Co-authored-by: SunMarc <marc.sun@hotmail.fr> condition better. support mapping. improvements. [Quantization] Add Quanto backend (#10756) * update * updaet * update * update * update * update * update * update * update * update * update * update * Update docs/source/en/quantization/quanto.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * Update src/diffusers/quantizers/quanto/utils.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> [Single File] Add single file loading for SANA Transformer (#10947) * added support for from_single_file * added diffusers mapping script * added testcase * bug fix * updated tests * corrected code quality * corrected code quality --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> [LoRA] Improve warning messages when LoRA loading becomes a no-op (#10187) * updates * updates * updates * updates * notebooks revert * fix-copies. * seeing * fix * revert * fixes * fixes * fixes * remove print * fix * conflicts ii. * updates * fixes * better filtering of prefix. --------- Co-authored-by: hlky <hlky@hlky.ac> [LoRA] CogView4 (#10981) * update * make fix-copies * update [Tests] improve quantization tests by additionally measuring the inference memory savings (#11021) * memory usage tests * fixes * gguf [`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing (#8998) * Add initial template * Second template * feat: Add TextEmbeddingModule to AnyTextPipeline * feat: Add AuxiliaryLatentModule template to AnyTextPipeline * Add bert tokenizer from the anytext repo for now * feat: Update AnyTextPipeline's modify_prompt method This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe. * Fill in the `forward` pass of `AuxiliaryLatentModule` * `make style && make quality` * `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library` * Update error handling to raise and logging * Add `create_glyph_lines` function into `TextEmbeddingModule` * make style * Up * Up * Up * Up * Remove several comments * refactor: Remove ControlNetConditioningEmbedding and update code accordingly * Up * Up * up * refactor: Update AnyTextPipeline to include new optional parameters * up * feat: Add OCR model and its components * chore: Update `TextEmbeddingModule` to include OCR model components and dependencies * chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task * `make style` * refactor: Update `AnyTextPipeline`'s docstring * Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once * simplify * `make style` * Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function * Simplify for now * `make style` * Up * feat: Add scripts to convert AnyText controlnet to diffusers * `make style` * Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule` * make style * Up * Simplify * Up * feat: Add safetensors module for loading model file * Fix device issues * Up * Up * refactor: Simplify * refactor: Simplify code for loading models and handling data types * `make style` * refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule * refactor: Update dtype in embedding_manager.py to match proj.weight * Up * Add attribution and adaptation information to pipeline_anytext.py * Update usage example * Will refactor `controlnet_cond_embedding` initialization * Add `AnyTextControlNetConditioningEmbedding` template * Refactor organization * style * style * Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding` * Follow one-file policy * style * [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel * [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py * [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py * Refactor AnyTextControlNet to use configurable conditioning embedding channels * Complete control net conditioning embedding in AnyTextControlNetModel * up * [FIX] Ensure embeddings use correct device in AnyTextControlNetModel * up * up * style * [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline * [UPDATE] Update example code in anytext.py to use correct font file and improve clarity * down * [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing * update pillow * [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity * [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file * [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency * 🆙 * style * [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py * style * Update examples/research_projects/anytext/README.md Co-authored-by: Aryan <contact.aryanvs@gmail.com> * Remove commented-out image preparation code in AnyTextPipeline * Remove unnecessary blank line in README.md [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018) * update * update * update * update * update * update * update * update * update fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012) small fix on generating time_ids & embeddings [LoRA] support wan i2v loras from the world. (#11025) * support wan i2v loras from the world. * remove copied from. * upates * add lora. Fix SD3 IPAdapter feature extractor (#11027) chore: fix help messages in advanced diffusion examples (#10923) Fix missing *kwargs in lora_pipeline.py (#11011) Update lora_pipeline.py * Apply style fixes * fix-copies --------- Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Fix for multi-GPU WAN inference (#10997) Ensure that hidden_state and shift/scale are on the same device when running with multiple GPUs Co-authored-by: Jimmy <39@🇺🇸.com> [Refactor] Clean up import utils boilerplate (#11026) * update * update * update Use `output_size` in `repeat_interleave` (#11030) [hybrid inference 🍯🐝] Add VAE encode (#11017) * [hybrid inference 🍯🐝] Add VAE encode * _toctree: add vae encode * Add endpoints, tests * vae_encode docs * vae encode benchmarks * api reference * changelog * Update docs/source/en/hybrid_inference/overview.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Wan Pipeline scaling fix, type hint warning, multi generator fix (#11007) * Wan Pipeline scaling fix, type hint warning, multi generator fix * Apply suggestions from code review [LoRA] change to warning from info when notifying the users about a LoRA no-op (#11044) * move to warning. * test related changes. Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline (#10827) * Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> making ```formatted_images``` initialization compact (#10801) compact writing Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed (#10820) * get_1d_rotary_pos_embed support npu * Update src/diffusers/models/embeddings.py --------- Co-authored-by: Kai zheng <kaizheng@KaideMacBook-Pro.local> Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: YiYi Xu <yixu310@gmail.com> [Tests] restrict memory tests for quanto for certain schemes. (#11052) * restrict memory tests for quanto for certain schemes. * Apply suggestions from code review Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * fixes * style --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> [LoRA] feat: support non-diffusers wan t2v loras. (#11059) feat: support non-diffusers wan t2v loras. [examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch (#11051) Fix: dtype mismatch of prompt embeddings in sd3 controlnet training Co-authored-by: Andreas Jörg <andreasjoerg@MacBook-Pro-von-Andreas-2.fritz.box> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> reverts accidental change that removes attn_mask in attn. Improves fl… (#11065) reverts accidental change that removes attn_mask in attn. Improves flux ptxla by using flash block sizes. Moves encoding outside the for loop. Co-authored-by: Juan Acevedo <jfacevedo@google.com> Fix deterministic issue when getting pipeline dtype and device (#10696) Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> [Tests] add requires peft decorator. (#11037) * add requires peft decorator. * install peft conditionally. * conditional deps. Co-authored-by: DN6 <dhruv.nair@gmail.com> --------- Co-authored-by: DN6 <dhruv.nair@gmail.com> CogView4 Control Block (#10809) * cogview4 control training --------- Co-authored-by: OleehyO <leehy0357@gmail.com> Co-authored-by: yiyixuxu <yixu310@gmail.com> [CI] pin transformers version for benchmarking. (#11067) pin transformers version for benchmarking. updates Fix Wan I2V Quality (#11087) * fix_wan_i2v_quality * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update pipeline_wan_i2v.py --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: hlky <hlky@hlky.ac> LTX 0.9.5 (#10968) * update --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: hlky <hlky@hlky.ac> make PR GPU tests conditioned on styling. (#11099) Group offloading improvements (#11094) update Fix pipeline_flux_controlnet.py (#11095) * Fix pipeline_flux_controlnet.py * Fix style update readme instructions. (#11096) Co-authored-by: Juan Acevedo <jfacevedo@google.com> Resolve stride mismatch in UNet's ResNet to support Torch DDP (#11098) Modify UNet's ResNet implementation to resolve stride mismatch in Torch's DDP Fix Group offloading behaviour when using streams (#11097) * update * update Quality options in `export_to_video` (#11090) * Quality options in `export_to_video` * make style improve more. add placeholders for docstrings. formatting. smol fix. solidify validation and annotation * Revert "feat: pipeline-level quant config." This reverts commit `316ff46b76`. * feat: implement pipeline-level quantization config Co-authored-by: SunMarc <marc@huggingface.co> * update * fixes * fix validation. * add tests and other improvements. * add tests * import quality * remove prints. * add docs. * fixes to docs. * doc fixes. * doc fixes. * add validation to the input quantization_config. * clarify recommendations. * docs * add to ci. * todo. --------- Co-authored-by: SunMarc <marc@huggingface.co>	2025-05-09 10:04:44 +05:30
Aryan	7b904941bc	Cosmos (#10660 ) * begin transformer conversion * refactor * refactor * refactor * refactor * refactor * refactor * update * add conversion script * add pipeline * make fix-copies * remove einops * update docs * gradient checkpointing * add transformer test * update * debug * remove prints * match sigmas * add vae pt. 1 * finish CV* vae * update * update * update * update * update * update * make fix-copies * update * make fix-copies * fix * update * update * make fix-copies * update * update tests * handle device and dtype for safety checker; required in latest diffusers * remove enable_gqa and use repeat_interleave instead * enforce safety checker; use dummy checker in fast tests * add review suggestion for ONNX export Co-Authored-By: Asfiya Baig <asfiyab@nvidia.com> * fix safety_checker issues when not passed explicitly We could either do what's done in this commit, or update the Cosmos examples to explicitly pass the safety checker * use cosmos guardrail package * auto format docs * update conversion script to support 14B models * update name CosmosPipeline -> CosmosTextToWorldPipeline * update docs * fix docs * fix group offload test failing for vae --------- Co-authored-by: Asfiya Baig <asfiyab@nvidia.com>	2025-05-07 20:59:09 +05:30
Aryan	d7ffe60166	Hunyuan Video Framepack (#11428 ) * add transformer * add pipeline * fixes * make fix-copies * update * add flux mu shift * update example snippet * debug * cleanup * batch_size=1 optimization * add pipeline test * fix for model cpu offloading' * add last_image support; credits: https://github.com/lllyasviel/FramePack/pull/167 * update example with flf2v * update penguin url * fix test * address review comment: https://github.com/huggingface/diffusers/pull/11428#discussion_r2071032371 * address review comment: https://github.com/huggingface/diffusers/pull/11428#discussion_r2071087689 * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video_framepack.py --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>	2025-05-06 14:59:38 +05:30
co63oc	86294d3c7f	Fix typos in docs and comments (#11416 ) * Fix typos in docs and comments * Apply style fixes --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-04-30 20:30:53 -10:00
co63oc	f00a995753	Fix typos in strings and comments (#11407 )	2025-04-24 08:53:47 -10:00
Emiliano	7986834572	Fix Flux IP adapter argument in the pipeline example (#11402 ) Fix Flux IP adapter argument in the example IP-Adapter example had a wrong argument. Fix `true_cfg` -> `true_cfg_scale`	2025-04-24 08:41:12 -10:00

1 2 3 4 5 ...

508 Commits