1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

[docs] clarify the mapping between Transformer2DModel and finegrained variants. (#11947)

* clarify the mapping between Transformer2DModel and finegrained variants.

* Update src/diffusers/pipelines/dit/pipeline_dit.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
Sayak Paul
2025-07-18 08:28:51 +01:00
committed by GitHub
parent 18c8f10f20
commit 478df933c3
3 changed files with 26 additions and 2 deletions

View File

@@ -46,7 +46,9 @@ class DiTPipeline(DiffusionPipeline):
Parameters:
transformer ([`DiTTransformer2DModel`]):
A class conditioned `DiTTransformer2DModel` to denoise the encoded image latents.
A class conditioned `DiTTransformer2DModel` to denoise the encoded image latents. Initially published as
[`Transformer2DModel`](https://huggingface.co/facebook/DiT-XL-2-256/blob/main/transformer/config.json#L2)
in the config, but the mismatch can be ignored.
vae ([`AutoencoderKL`]):
Variational Auto-Encoder (VAE) model to encode and decode images to and from latent representations.
scheduler ([`DDIMScheduler`]):

View File

@@ -256,7 +256,9 @@ class PixArtAlphaPipeline(DiffusionPipeline):
Tokenizer of class
[T5Tokenizer](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Tokenizer).
transformer ([`PixArtTransformer2DModel`]):
A text conditioned `PixArtTransformer2DModel` to denoise the encoded image latents.
A text conditioned `PixArtTransformer2DModel` to denoise the encoded image latents. Initially published as
[`Transformer2DModel`](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS/blob/main/transformer/config.json#L2)
in the config, but the mismatch can be ignored.
scheduler ([`SchedulerMixin`]):
A scheduler to be used in combination with `transformer` to denoise the encoded image latents.
"""

View File

@@ -185,6 +185,26 @@ def retrieve_timesteps(
class PixArtSigmaPipeline(DiffusionPipeline):
r"""
Pipeline for text-to-image generation using PixArt-Sigma.
This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the
library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)
Args:
vae ([`AutoencoderKL`]):
Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
text_encoder ([`T5EncoderModel`]):
Frozen text-encoder. PixArt-Alpha uses
[T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5EncoderModel), specifically the
[t5-v1_1-xxl](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl) variant.
tokenizer (`T5Tokenizer`):
Tokenizer of class
[T5Tokenizer](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Tokenizer).
transformer ([`PixArtTransformer2DModel`]):
A text conditioned `PixArtTransformer2DModel` to denoise the encoded image latents. Initially published as
[`Transformer2DModel`](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS/blob/main/transformer/config.json#L2)
in the config, but the mismatch can be ignored.
scheduler ([`SchedulerMixin`]):
A scheduler to be used in combination with `transformer` to denoise the encoded image latents.
"""
bad_punct_regex = re.compile(