From 34a74928ac6617da857157af554cbd9e863f67bb Mon Sep 17 00:00:00 2001 From: David Bertoin Date: Thu, 16 Oct 2025 10:48:24 +0200 Subject: [PATCH] Update docs/source/en/api/pipelines/photon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/api/pipelines/photon.md | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/docs/source/en/api/pipelines/photon.md b/docs/source/en/api/pipelines/photon.md index 62133f93c4..a46e9a4fc5 100644 --- a/docs/source/en/api/pipelines/photon.md +++ b/docs/source/en/api/pipelines/photon.md @@ -15,15 +15,7 @@ # PhotonPipeline -Photon is a text-to-image diffusion model using simplified MMDIT architecture with flow matching for efficient high-quality image generation. The model uses T5Gemma as the text encoder and supports either Flux VAE (AutoencoderKL) or DC-AE (AutoencoderDC) for latent compression. - -Key features: - -- **Simplified MMDIT architecture**: Uses a simplified MMDIT architecture for image generation where text tokens are not updated through the transformer blocks -- **Flow Matching**: Employs flow matching with discrete scheduling for efficient sampling -- **Flexible VAE Support**: Compatible with both Flux VAE (8x compression, 16 latent channels) and DC-AE (32x compression, 32 latent channels) -- **T5Gemma Text Encoder**: Uses Google's T5Gemma-2B-2B-UL2 model for text encoding offering multiple language support -- **Efficient Architecture**: ~1.3B parameters in the transformer, enabling fast inference while maintaining quality +Photon generates high-quality images from text using a simplified MMDIT architecture where text tokens don't update through transformer blocks. It employs flow matching with discrete scheduling for efficient sampling and uses Google's T5Gemma-2B-2B-UL2 model for multi-language text encoding. The ~1.3B parameter transformer delivers fast inference without sacrificing quality. You can choose between Flux VAE (8x compression, 16 latent channels) for balanced quality and speed or DC-AE (32x compression, 32 latent channels) for latent compression and faster processing. ## Available models: We offer a range of **Photon models** featuring different **VAE configurations**, each optimized for generating images at various resolutions.