mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-29 07:22:12 +03:00
* Add ZImageImg2ImgPipeline
Updated the pipeline structure to include ZImageImg2ImgPipeline
alongside ZImagePipeline.
Implemented the ZImageImg2ImgPipeline class for image-to-image
transformations, including necessary methods for
encoding prompts, preparing latents, and denoising.
Enhanced the auto_pipeline to map the new ZImageImg2ImgPipeline
for image generation tasks.
Added unit tests for ZImageImg2ImgPipeline to ensure
functionality and performance.
Updated dummy objects to include ZImageImg2ImgPipeline for
testing purposes.
* Address review comments for ZImageImg2ImgPipeline
- Add `# Copied from` annotations to encode_prompt and _encode_prompt
- Add ZImagePipeline to auto_pipeline.py for AutoPipeline support
* Add ZImage pipeline documentation
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Γlvaro Somoza <asomoza@users.noreply.github.com>
67 lines
2.3 KiB
Markdown
67 lines
2.3 KiB
Markdown
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# Z-Image
|
|
|
|
<div class="flex flex-wrap space-x-1">
|
|
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
|
|
</div>
|
|
|
|
[Z-Image](https://huggingface.co/papers/2511.22699) is a powerful and highly efficient image generation model with 6B parameters. Currently there's only one model with two more to be released:
|
|
|
|
|Model|Hugging Face|
|
|
|---|---|
|
|
|Z-Image-Turbo|https://huggingface.co/Tongyi-MAI/Z-Image-Turbo|
|
|
|
|
## Z-Image-Turbo
|
|
|
|
Z-Image-Turbo is a distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers sub-second inference latency on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
|
|
|
|
## Image-to-image
|
|
|
|
Use [`ZImageImg2ImgPipeline`] to transform an existing image based on a text prompt.
|
|
|
|
```python
|
|
import torch
|
|
from diffusers import ZImageImg2ImgPipeline
|
|
from diffusers.utils import load_image
|
|
|
|
pipe = ZImageImg2ImgPipeline.from_pretrained("Tongyi-MAI/Z-Image-Turbo", torch_dtype=torch.bfloat16)
|
|
pipe.to("cuda")
|
|
|
|
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
|
|
init_image = load_image(url).resize((1024, 1024))
|
|
|
|
prompt = "A fantasy landscape with mountains and a river, detailed, vibrant colors"
|
|
image = pipe(
|
|
prompt,
|
|
image=init_image,
|
|
strength=0.6,
|
|
num_inference_steps=9,
|
|
guidance_scale=0.0,
|
|
generator=torch.Generator("cuda").manual_seed(42),
|
|
).images[0]
|
|
image.save("zimage_img2img.png")
|
|
```
|
|
|
|
## ZImagePipeline
|
|
|
|
[[autodoc]] ZImagePipeline
|
|
- all
|
|
- __call__
|
|
|
|
## ZImageImg2ImgPipeline
|
|
|
|
[[autodoc]] ZImageImg2ImgPipeline
|
|
- all
|
|
- __call__
|