mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
157 lines
6.1 KiB
Markdown
157 lines
6.1 KiB
Markdown
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# T2I-Adapter
|
|
|
|
[T2I-Adapter](https://huggingface.co/papers/2302.08453) is an adapter that enables controllable generation like [ControlNet](./controlnet). A T2I-Adapter works by learning a *mapping* between a control signal (for example, a depth map) and a pretrained model's internal knowledge. The adapter is plugged in to the base model to provide extra guidance based on the control signal during generation.
|
|
|
|
Load a T2I-Adapter conditioned on a specific control, such as canny edge, and pass it to the pipeline in [`~DiffusionPipeline.from_pretrained`].
|
|
|
|
```py
|
|
import torch
|
|
from diffusers import T2IAdapter, StableDiffusionXLAdapterPipeline, AutoencoderKL
|
|
|
|
t2i_adapter = T2IAdapter.from_pretrained(
|
|
"TencentARC/t2i-adapter-canny-sdxl-1.0",
|
|
torch_dtype=torch.float16,
|
|
)
|
|
```
|
|
|
|
Generate a canny image with [opencv-python](https://github.com/opencv/opencv-python).
|
|
|
|
```py
|
|
import cv2
|
|
import numpy as np
|
|
from PIL import Image
|
|
from diffusers.utils import load_image
|
|
|
|
original_image = load_image(
|
|
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png"
|
|
)
|
|
|
|
image = np.array(original_image)
|
|
|
|
low_threshold = 100
|
|
high_threshold = 200
|
|
|
|
image = cv2.Canny(image, low_threshold, high_threshold)
|
|
image = image[:, :, None]
|
|
image = np.concatenate([image, image, image], axis=2)
|
|
canny_image = Image.fromarray(image)
|
|
```
|
|
|
|
Pass the canny image to the pipeline to generate an image.
|
|
|
|
```py
|
|
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
|
|
pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
|
|
"stabilityai/stable-diffusion-xl-base-1.0",
|
|
adapter=t2i_adapter,
|
|
vae=vae,
|
|
torch_dtype=torch.float16,
|
|
).to("cuda")
|
|
|
|
prompt = """
|
|
A photorealistic overhead image of a cat reclining sideways in a flamingo pool floatie holding a margarita.
|
|
The cat is floating leisurely in the pool and completely relaxed and happy.
|
|
"""
|
|
|
|
pipeline(
|
|
prompt,
|
|
image=canny_image,
|
|
num_inference_steps=100,
|
|
guidance_scale=10,
|
|
).images[0]
|
|
```
|
|
|
|
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
|
<figure>
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png" width="300" alt="Generated image (prompt only)"/>
|
|
<figcaption style="text-align: center;">original image</figcaption>
|
|
</figure>
|
|
<figure>
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png" width="300" alt="Control image (Canny edges)"/>
|
|
<figcaption style="text-align: center;">canny image</figcaption>
|
|
</figure>
|
|
<figure>
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/t2i-canny-cat-generated.png" width="300" alt="Generated image (ControlNet + prompt)"/>
|
|
<figcaption style="text-align: center;">generated image</figcaption>
|
|
</figure>
|
|
</div>
|
|
|
|
## MultiAdapter
|
|
|
|
You can compose multiple controls, such as canny image and a depth map, with the [`MultiAdapter`] class.
|
|
|
|
The example below composes a canny image and depth map.
|
|
|
|
Load the control images and T2I-Adapters as a list.
|
|
|
|
```py
|
|
import torch
|
|
from diffusers.utils import load_image
|
|
from diffusers import StableDiffusionXLAdapterPipeline, AutoencoderKL, MultiAdapter, T2IAdapter
|
|
|
|
canny_image = load_image(
|
|
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png"
|
|
)
|
|
depth_image = load_image(
|
|
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_depth_image.png"
|
|
)
|
|
controls = [canny_image, depth_image]
|
|
prompt = ["""
|
|
a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby,
|
|
bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
|
|
"""]
|
|
|
|
adapters = MultiAdapter(
|
|
[
|
|
T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16),
|
|
T2IAdapter.from_pretrained("TencentARC/t2i-adapter-depth-midas-sdxl-1.0", torch_dtype=torch.float16),
|
|
]
|
|
)
|
|
```
|
|
|
|
Pass the adapters, prompt, and control images to [`StableDiffusionXLAdapterPipeline`]. Use the `adapter_conditioning_scale` parameter to determine how much weight to assign to each control.
|
|
|
|
```py
|
|
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
|
|
pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
|
|
"stabilityai/stable-diffusion-xl-base-1.0",
|
|
torch_dtype=torch.float16,
|
|
vae=vae,
|
|
adapter=adapters,
|
|
).to("cuda")
|
|
|
|
pipeline(
|
|
prompt,
|
|
image=controls,
|
|
height=1024,
|
|
width=1024,
|
|
adapter_conditioning_scale=[0.7, 0.7]
|
|
).images[0]
|
|
```
|
|
|
|
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
|
<figure>
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png" width="300" alt="Generated image (prompt only)"/>
|
|
<figcaption style="text-align: center;">canny image</figcaption>
|
|
</figure>
|
|
<figure>
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_depth_image.png" width="300" alt="Control image (Canny edges)"/>
|
|
<figcaption style="text-align: center;">depth map</figcaption>
|
|
</figure>
|
|
<figure>
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/t2i-multi-rabbit.png" width="300" alt="Generated image (ControlNet + prompt)"/>
|
|
<figcaption style="text-align: center;">generated image</figcaption>
|
|
</figure>
|
|
</div> |