mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
* Fix typos * chore: Fix typos * chore: Update README.md for promptdiffusion example * Trim trailing white spaces * Fix a typo * update number * chore: update number * Trim trailing white space * Update README.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update README.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
122 lines
6.2 KiB
Markdown
122 lines
6.2 KiB
Markdown
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# Stable Video Diffusion
|
|
|
|
[[open-in-colab]]
|
|
|
|
[Stable Video Diffusion (SVD)](https://huggingface.co/papers/2311.15127)์ ์
๋ ฅ ์ด๋ฏธ์ง์ ๋ง์ถฐ 2~4์ด ๋ถ๋์ ๊ณ ํด์๋(576x1024) ๋น๋์ค๋ฅผ ์์ฑํ ์ ์๋ ๊ฐ๋ ฅํ image-to-video ์์ฑ ๋ชจ๋ธ์
๋๋ค.
|
|
|
|
์ด ๊ฐ์ด๋์์๋ SVD๋ฅผ ์ฌ์ฉํ์ฌ ์ด๋ฏธ์ง์์ ์งง์ ๋์์์ ์์ฑํ๋ ๋ฐฉ๋ฒ์ ์ค๋ช
ํฉ๋๋ค.
|
|
|
|
์์ํ๊ธฐ ์ ์ ๋ค์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๊ฐ ์ค์น๋์ด ์๋์ง ํ์ธํ์ธ์:
|
|
|
|
```py
|
|
!pip install -q -U diffusers transformers accelerate
|
|
```
|
|
|
|
์ด ๋ชจ๋ธ์๋ [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid)์ [SVD-XT](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) ๋ ๊ฐ์ง ์ข
๋ฅ๊ฐ ์์ต๋๋ค. SVD ์ฒดํฌํฌ์ธํธ๋ 14๊ฐ์ ํ๋ ์์ ์์ฑํ๋๋ก ํ์ต๋์๊ณ , SVD-XT ์ฒดํฌํฌ์ธํธ๋ 25๊ฐ์ ํ๋ ์์ ์์ฑํ๋๋ก ํ์ธํ๋๋์์ต๋๋ค.
|
|
|
|
์ด ๊ฐ์ด๋์์๋ SVD-XT ์ฒดํฌํฌ์ธํธ๋ฅผ ์ฌ์ฉํฉ๋๋ค.
|
|
|
|
```python
|
|
import torch
|
|
|
|
from diffusers import StableVideoDiffusionPipeline
|
|
from diffusers.utils import load_image, export_to_video
|
|
|
|
pipe = StableVideoDiffusionPipeline.from_pretrained(
|
|
"stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
|
|
)
|
|
pipe.enable_model_cpu_offload()
|
|
|
|
# Conditioning ์ด๋ฏธ์ง ๋ถ๋ฌ์ค๊ธฐ
|
|
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
|
|
image = image.resize((1024, 576))
|
|
|
|
generator = torch.manual_seed(42)
|
|
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]
|
|
|
|
export_to_video(frames, "generated.mp4", fps=7)
|
|
```
|
|
|
|
<div class="flex gap-4">
|
|
<div>
|
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png"/>
|
|
<figcaption class="mt-2 text-center text-sm text-gray-500">"source image of a rocket"</figcaption>
|
|
</div>
|
|
<div>
|
|
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/output_rocket.gif"/>
|
|
<figcaption class="mt-2 text-center text-sm text-gray-500">"generated video from source image"</figcaption>
|
|
</div>
|
|
</div>
|
|
|
|
## torch.compile
|
|
|
|
UNet์ [์ปดํ์ผ](../optimization/torch2.0#torchcompile)ํ๋ฉด ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ด ์ด์ง ์ฆ๊ฐํ์ง๋ง, 20~25%์ ์๋ ํฅ์์ ์ป์ ์ ์์ต๋๋ค.
|
|
|
|
```diff
|
|
- pipe.enable_model_cpu_offload()
|
|
+ pipe.to("cuda")
|
|
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
|
|
```
|
|
|
|
## ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋ ์ค์ด๊ธฐ
|
|
|
|
๋น๋์ค ์์ฑ์ ๊ธฐ๋ณธ์ ์ผ๋ก ๋ฐฐ์น ํฌ๊ธฐ๊ฐ ํฐ text-to-image ์์ฑ๊ณผ ์ ์ฌํ๊ฒ 'num_frames'๋ฅผ ํ ๋ฒ์ ์์ฑํ๊ธฐ ๋๋ฌธ์ ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ด ๋งค์ฐ ๋์ต๋๋ค. ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ ์ค์ด๊ธฐ ์ํด ์ถ๋ก ์๋์ ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ ์ ์ถฉํ๋ ์ฌ๋ฌ ๊ฐ์ง ์ต์
์ด ์์ต๋๋ค:
|
|
|
|
- ๋ชจ๋ธ ์คํ๋ก๋ง ํ์ฑํ: ํ์ดํ๋ผ์ธ์ ๊ฐ ๊ตฌ์ฑ ์์๊ฐ ๋ ์ด์ ํ์ํ์ง ์์ ๋ CPU๋ก ์คํ๋ก๋๋ฉ๋๋ค.
|
|
- Feed-forward chunking ํ์ฑํ: feed-forward ๋ ์ด์ด๊ฐ ๋ฐฐ์น ํฌ๊ธฐ๊ฐ ํฐ ๋จ์ผ feed-forward๋ฅผ ์คํํ๋ ๋์ ๋ฃจํ๋ก ๋ฐ๋ณตํด์ ์คํ๋ฉ๋๋ค.
|
|
- `decode_chunk_size` ๊ฐ์: VAE๊ฐ ํ๋ ์๋ค์ ํ๊บผ๋ฒ์ ๋์ฝ๋ฉํ๋ ๋์ chunk ๋จ์๋ก ๋์ฝ๋ฉํฉ๋๋ค. `decode_chunk_size=1`์ ์ค์ ํ๋ฉด ํ ๋ฒ์ ํ ํ๋ ์์ฉ ๋์ฝ๋ฉํ๊ณ ์ต์ํ์ ๋ฉ๋ชจ๋ฆฌ๋ง ์ฌ์ฉํ์ง๋ง(GPU ๋ฉ๋ชจ๋ฆฌ์ ๋ฐ๋ผ ์ด ๊ฐ์ ์กฐ์ ํ๋ ๊ฒ์ด ์ข์ต๋๋ค), ๋์์์ ์ฝ๊ฐ์ ๊น๋ฐ์์ด ๋ฐ์ํ ์ ์์ต๋๋ค.
|
|
|
|
```diff
|
|
- pipe.enable_model_cpu_offload()
|
|
- frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]
|
|
+ pipe.enable_model_cpu_offload()
|
|
+ pipe.unet.enable_forward_chunking()
|
|
+ frames = pipe(image, decode_chunk_size=2, generator=generator, num_frames=25).frames[0]
|
|
```
|
|
|
|
์ด๋ฌํ ๋ชจ๋ ๋ฐฉ๋ฒ๋ค์ ์ฌ์ฉํ๋ฉด ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ด 8GAM VRAM๋ณด๋ค ์ ์ ๊ฒ์
๋๋ค.
|
|
|
|
## Micro-conditioning
|
|
|
|
Stable Diffusion Video๋ ๋ํ ์ด๋ฏธ์ง conditoning ์ธ์๋ micro-conditioning์ ํ์ฉํ๋ฏ๋ก ์์ฑ๋ ๋น๋์ค๋ฅผ ๋ ์ ์ ์ดํ ์ ์์ต๋๋ค:
|
|
|
|
- `fps`: ์์ฑ๋ ๋น๋์ค์ ์ด๋น ํ๋ ์ ์์
๋๋ค.
|
|
- `motion_bucket_id`: ์์ฑ๋ ๋์์์ ์ฌ์ฉํ ๋ชจ์
๋ฒํท ์์ด๋์
๋๋ค. ์์ฑ๋ ๋์์์ ๋ชจ์
์ ์ ์ดํ๋ ๋ฐ ์ฌ์ฉํ ์ ์์ต๋๋ค. ๋ชจ์
๋ฒํท ์์ด๋๋ฅผ ๋๋ฆฌ๋ฉด ์์ฑ๋๋ ๋์์์ ๋ชจ์
์ด ์ฆ๊ฐํฉ๋๋ค.
|
|
- `noise_aug_strength`: Conditioning ์ด๋ฏธ์ง์ ์ถ๊ฐ๋๋ ๋
ธ์ด์ฆ์ ์์
๋๋ค. ๊ฐ์ด ํด์๋ก ๋น๋์ค๊ฐ conditioning ์ด๋ฏธ์ง์ ๋ ์ ์ฌํด์ง๋๋ค. ์ด ๊ฐ์ ๋์ด๋ฉด ์์ฑ๋ ๋น๋์ค์ ์์ง์๋ ์ฆ๊ฐํฉ๋๋ค.
|
|
|
|
์๋ฅผ ๋ค์ด, ๋ชจ์
์ด ๋ ๋ง์ ๋์์์ ์์ฑํ๋ ค๋ฉด `motion_bucket_id` ๋ฐ `noise_aug_strength` micro-conditioning ํ๋ผ๋ฏธํฐ๋ฅผ ์ฌ์ฉํฉ๋๋ค:
|
|
|
|
```python
|
|
import torch
|
|
|
|
from diffusers import StableVideoDiffusionPipeline
|
|
from diffusers.utils import load_image, export_to_video
|
|
|
|
pipe = StableVideoDiffusionPipeline.from_pretrained(
|
|
"stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
|
|
)
|
|
pipe.enable_model_cpu_offload()
|
|
|
|
# Conditioning ์ด๋ฏธ์ง ๋ถ๋ฌ์ค๊ธฐ
|
|
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
|
|
image = image.resize((1024, 576))
|
|
|
|
generator = torch.manual_seed(42)
|
|
frames = pipe(image, decode_chunk_size=8, generator=generator, motion_bucket_id=180, noise_aug_strength=0.1).frames[0]
|
|
export_to_video(frames, "generated.mp4", fps=7)
|
|
```
|
|
|
|

|