mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Files

Steven Liu cc5b31ffc9 [docs] Migrate syntax (#12390 )

* change syntax

* make style

2025-09-30 10:11:19 -07:00

4.4 KiB

Raw Permalink Blame History

Lumina2

Lumina Image 2.0: A Unified and Efficient Image Generative Model is a 2 billion parameter flow-based diffusion transformer capable of generating diverse images from text descriptions.

The abstract from the paper is:

We introduce Lumina-Image 2.0, an advanced text-to-image model that surpasses previous state-of-the-art methods across multiple benchmarks, while also shedding light on its potential to evolve into a generalist vision intelligence model. Lumina-Image 2.0 exhibits three key properties: (1) Unification – it adopts a unified architecture that treats text and image tokens as a joint sequence, enabling natural cross-modal interactions and facilitating task expansion. Besides, since high-quality captioners can provide semantically better-aligned text-image training pairs, we introduce a unified captioning system, UniCaptioner, which generates comprehensive and precise captions for the model. This not only accelerates model convergence but also enhances prompt adherence, variable-length prompt handling, and task generalization via prompt templates. (2) Efficiency – to improve the efficiency of the unified architecture, we develop a set of optimization techniques that improve semantic learning and fine-grained texture generation during training while incorporating inference-time acceleration strategies without compromising image quality. (3) Transparency – we open-source all training details, code, and models to ensure full reproducibility, aiming to bridge the gap between well-resourced closed-source research teams and independent developers.

Tip

Make sure to check out the Schedulers guide to learn how to explore the tradeoff between scheduler speed and quality, and see the reuse components across pipelines section to learn how to efficiently load the same components into multiple pipelines.

Using Single File loading with Lumina Image 2.0

Single file loading for Lumina Image 2.0 is available for the Lumina2Transformer2DModel

import torch
from diffusers import Lumina2Transformer2DModel, Lumina2Pipeline

ckpt_path = "https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0/blob/main/consolidated.00-of-01.pth"
transformer = Lumina2Transformer2DModel.from_single_file(
    ckpt_path, torch_dtype=torch.bfloat16
)

pipe = Lumina2Pipeline.from_pretrained(
    "Alpha-VLLM/Lumina-Image-2.0", transformer=transformer, torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
image = pipe(
    "a cat holding a sign that says hello",
    generator=torch.Generator("cpu").manual_seed(0),
).images[0]
image.save("lumina-single-file.png")

Using GGUF Quantized Checkpoints with Lumina Image 2.0

GGUF Quantized checkpoints for the Lumina2Transformer2DModel can be loaded via from_single_file with the GGUFQuantizationConfig

from diffusers import Lumina2Transformer2DModel, Lumina2Pipeline, GGUFQuantizationConfig 

ckpt_path = "https://huggingface.co/calcuis/lumina-gguf/blob/main/lumina2-q4_0.gguf"
transformer = Lumina2Transformer2DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
)

pipe = Lumina2Pipeline.from_pretrained(
    "Alpha-VLLM/Lumina-Image-2.0", transformer=transformer, torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
image = pipe(
    "a cat holding a sign that says hello",
    generator=torch.Generator("cpu").manual_seed(0),
).images[0]
image.save("lumina-gguf.png")

Lumina2Pipeline

autodoc Lumina2Pipeline

all
call

4.4 KiB Raw Permalink Blame History Unescape Escape

Lumina2

Using Single File loading with Lumina Image 2.0

Using GGUF Quantized Checkpoints with Lumina Image 2.0

Lumina2Pipeline

4.4 KiB

Raw Permalink Blame History