# PRX PRX generates high-quality images from text using a simplified MMDIT architecture where text tokens don't update through transformer blocks. It employs flow matching with discrete scheduling for efficient sampling and uses Google's T5Gemma-2B-2B-UL2 model for multi-language text encoding. The ~1.3B parameter transformer delivers fast inference without sacrificing quality. You can choose between Flux VAE (8x compression, 16 latent channels) for balanced quality and speed or DC-AE (32x compression, 32 latent channels) for latent compression and faster processing. ## Available models PRX offers multiple variants with different VAE configurations, each optimized for specific resolutions. Base models excel with detailed prompts, capturing complex compositions and subtle details. Fine-tuned models trained on the [Alchemist dataset](https://huggingface.co/datasets/yandex/alchemist) improve aesthetic quality, especially with simpler prompts. | Model | Resolution | Fine-tuned | Distilled | Description | Suggested prompts | Suggested parameters | Recommended dtype | |:-----:|:-----------------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:| | [`Photoroom/prx-256-t2i`](https://huggingface.co/Photoroom/prx-256-t2i)| 256 | No | No | Base model pre-trained at 256 with Flux VAE|Works best with detailed prompts in natural language|28 steps, cfg=5.0| `torch.bfloat16` | | [`Photoroom/prx-256-t2i-sft`](https://huggingface.co/Photoroom/prx-256-t2i-sft)| 512 | Yes | No | Fine-tuned on the [Alchemist dataset](https://huggingface.co/datasets/yandex/alchemist) dataset with Flux VAE | Can handle less detailed prompts|28 steps, cfg=5.0| `torch.bfloat16` | | [`Photoroom/prx-512-t2i`](https://huggingface.co/Photoroom/prx-512-t2i)| 512 | No | No | Base model pre-trained at 512 with Flux VAE |Works best with detailed prompts in natural language|28 steps, cfg=5.0| `torch.bfloat16` | | [`Photoroom/prx-512-t2i-sft`](https://huggingface.co/Photoroom/prx-512-t2i-sft)| 512 | Yes | No | Fine-tuned on the [Alchemist dataset](https://huggingface.co/datasets/yandex/alchemist) dataset with Flux VAE | Can handle less detailed prompts in natural language|28 steps, cfg=5.0| `torch.bfloat16` | | [`Photoroom/prx-512-t2i-sft-distilled`](https://huggingface.co/Photoroom/prx-512-t2i-sft-distilled)| 512 | Yes | Yes | 8-step distilled model from [`Photoroom/prx-512-t2i-sft`](https://huggingface.co/Photoroom/prx-512-t2i-sft) | Can handle less detailed prompts in natural language|8 steps, cfg=1.0| `torch.bfloat16` | | [`Photoroom/prx-512-t2i-dc-ae`](https://huggingface.co/Photoroom/prx-512-t2i-dc-ae)| 512 | No | No | Base model pre-trained at 512 with [Deep Compression Autoencoder (DC-AE)](https://hanlab.mit.edu/projects/dc-ae)|Works best with detailed prompts in natural language|28 steps, cfg=5.0| `torch.bfloat16` | | [`Photoroom/prx-512-t2i-dc-ae-sft`](https://huggingface.co/Photoroom/prx-512-t2i-dc-ae-sft)| 512 | Yes | No | Fine-tuned on the [Alchemist dataset](https://huggingface.co/datasets/yandex/alchemist) dataset with [Deep Compression Autoencoder (DC-AE)](https://hanlab.mit.edu/projects/dc-ae) | Can handle less detailed prompts in natural language|28 steps, cfg=5.0| `torch.bfloat16` | | [`Photoroom/prx-512-t2i-dc-ae-sft-distilled`](https://huggingface.co/Photoroom/prx-512-t2i-dc-ae-sft-distilled)| 512 | Yes | Yes | 8-step distilled model from [`Photoroom/prx-512-t2i-dc-ae-sft-distilled`](https://huggingface.co/Photoroom/prx-512-t2i-dc-ae-sft-distilled) | Can handle less detailed prompts in natural language|8 steps, cfg=1.0| `torch.bfloat16` |s Refer to [this](https://huggingface.co/collections/Photoroom/prx-models-68e66254c202ebfab99ad38e) collection for more information. ## Loading the pipeline Load the pipeline with [`~DiffusionPipeline.from_pretrained`]. ```py from diffusers.pipelines.prx import PRXPipeline # Load pipeline - VAE and text encoder will be loaded from HuggingFace pipe = PRXPipeline.from_pretrained("Photoroom/prx-512-t2i-sft", torch_dtype=torch.bfloat16) pipe.to("cuda") prompt = "A front-facing portrait of a lion the golden savanna at sunset." image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0] image.save("prx_output.png") ``` ### Manual Component Loading Load components individually to customize the pipeline for instance to use quantized models. ```py import torch from diffusers.pipelines.prx import PRXPipeline from diffusers.models import AutoencoderKL, AutoencoderDC from diffusers.models.transformers.transformer_prx import PRXTransformer2DModel from diffusers.schedulers import FlowMatchEulerDiscreteScheduler from transformers import T5GemmaModel, GemmaTokenizerFast from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig from transformers import BitsAndBytesConfig as BitsAndBytesConfig quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True) # Load transformer transformer = PRXTransformer2DModel.from_pretrained( "checkpoints/prx-512-t2i-sft", subfolder="transformer", quantization_config=quant_config, torch_dtype=torch.bfloat16, ) # Load scheduler scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained( "checkpoints/prx-512-t2i-sft", subfolder="scheduler" ) # Load T5Gemma text encoder t5gemma_model = T5GemmaModel.from_pretrained("google/t5gemma-2b-2b-ul2", quantization_config=quant_config, torch_dtype=torch.bfloat16) text_encoder = t5gemma_model.encoder.to(dtype=torch.bfloat16) tokenizer = GemmaTokenizerFast.from_pretrained("google/t5gemma-2b-2b-ul2") tokenizer.model_max_length = 256 # Load VAE - choose either Flux VAE or DC-AE # Flux VAE vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-dev", subfolder="vae", quantization_config=quant_config, torch_dtype=torch.bfloat16) pipe = PRXPipeline( transformer=transformer, scheduler=scheduler, text_encoder=text_encoder, tokenizer=tokenizer, vae=vae ) pipe.to("cuda") ``` ## Memory Optimization For memory-constrained environments: ```py import torch from diffusers.pipelines.prx import PRXPipeline pipe = PRXPipeline.from_pretrained("Photoroom/prx-512-t2i-sft", torch_dtype=torch.bfloat16) pipe.enable_model_cpu_offload() # Offload components to CPU when not in use # Or use sequential CPU offload for even lower memory pipe.enable_sequential_cpu_offload() ``` ## PRXPipeline [[autodoc]] PRXPipeline - all - __call__ ## PRXPipelineOutput [[autodoc]] pipelines.prx.pipeline_output.PRXPipelineOutput