mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
[Docs] Korean translation update (#4022)
* feat) optimization kr translation * fix) typo, italic setting * feat) dreambooth, text2image kr * feat) lora kr * fix) LoRA * fix) fp16 fix * fix) doc-builder style * fix) fp16 ์ผ๋ถ ๋จ์ด ์์ * fix) fp16 style fix * fix) opt, training docs update * merge conflict * Fix community pipelines (#3266) * Allow disabling torch 2_0 attention (#3273) * Allow disabling torch 2_0 attention * make style * Update src/diffusers/models/attention.py * Release: v0.16.1 * feat) toctree update * feat) toctree update * Fix custom releases (#3708) * Fix custom releases * make style * Fix loading if unexpected keys are present (#3720) * Fix loading * make style * Release: v0.17.0 * opt_overview * commit * Create pipeline_overview.mdx * unconditional_image_generatoin_1stDraft * โจ Add translation for write_own_pipeline.mdx * conditional-์ง์ญ, ์ธ์ปจ๋์ ๋ * unconditional_image_generation first draft * reviese * Update pipeline_overview.mdx * revise-2 * โป๏ธ translation fixed for write_own_pipeline.mdx * complete translate basic_training.mdx * other-formats.mdx ๋ฒ์ญ ์๋ฃ * fix tutorials/basic_training.mdx * other-formats ์์ * inpaint ํ๊ตญ์ด ๋ฒ์ญ * depth2img translation * translate training/adapt-a-model.mdx * revised_all * feedback taken * using_safetensors.mdx_first_draft * custom_pipeline_examples.mdx_first_draft * img2img ํ๊ธ๋ฒ์ญ ์๋ฃ * tutorial_overview edit * reusing_seeds * torch2.0 * translate complete * fix) ์ฉ์ด ํต์ผ ๊ท์ฝ ๋ฐ์ * [fix] ํผ๋๋ฐฑ์ ๋ฐ์ํด์ ๋ฒ์ญ ๋ณด์ * ์คํ์ ์ ์ + ์ปจ๋ฒค์ ์๋ฐฐ๋ ๋ถ๋ถ ์ ์ * typo, style fix * toctree update * copyright fix * toctree fix * Update _toctree.yml --------- Co-authored-by: Chanran Kim <seriousran@gmail.com> Co-authored-by: apolinรกrio <joaopaulo.passos@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lee, Hongkyu <75282888+howsmyanimeprofilepicture@users.noreply.github.com> Co-authored-by: hyeminan <adios9709@gmail.com> Co-authored-by: movie5 <oyh5800@naver.com> Co-authored-by: idra79haza <idra79haza@github.com> Co-authored-by: Jihwan Kim <cuchoco@naver.com> Co-authored-by: jungwoo <boonkoonheart@gmail.com> Co-authored-by: jjuun0 <jh061993@gmail.com> Co-authored-by: szjung-test <93111772+szjung-test@users.noreply.github.com> Co-authored-by: idra79haza <37795618+idra79haza@users.noreply.github.com> Co-authored-by: howsmyanimeprofilepicture <howsmyanimeprofilepicture@gmail.com> Co-authored-by: hoswmyanimeprofilepicture <hoswmyanimeprofilepicture@gmail.com>
This commit is contained in:
@@ -8,14 +8,69 @@
|
||||
- local: installation
|
||||
title: "์ค์น"
|
||||
title: "์์ํ๊ธฐ"
|
||||
|
||||
- sections:
|
||||
- local: tutorials/tutorial_overview
|
||||
title: ๊ฐ์
|
||||
- local: using-diffusers/write_own_pipeline
|
||||
title: ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ ์ดํดํ๊ธฐ
|
||||
- local: tutorials/basic_training
|
||||
title: Diffusion ๋ชจ๋ธ ํ์ตํ๊ธฐ
|
||||
title: Tutorials
|
||||
- sections:
|
||||
- sections:
|
||||
- local: in_translation
|
||||
title: ๊ฐ์
|
||||
- local: in_translation
|
||||
- local: using-diffusers/loading
|
||||
title: ํ์ดํ๋ผ์ธ, ๋ชจ๋ธ, ์ค์ผ์ค๋ฌ ๋ถ๋ฌ์ค๊ธฐ
|
||||
- local: using-diffusers/schedulers
|
||||
title: ๋ค๋ฅธ ์ค์ผ์ค๋ฌ๋ค์ ๊ฐ์ ธ์ค๊ณ ๋น๊ตํ๊ธฐ
|
||||
- local: using-diffusers/custom_pipeline_overview
|
||||
title: ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ ๋ถ๋ฌ์ค๊ธฐ
|
||||
- local: using-diffusers/using_safetensors
|
||||
title: ์ธ์ดํํ
์ ๋ถ๋ฌ์ค๊ธฐ
|
||||
- local: using-diffusers/other-formats
|
||||
title: ๋ค๋ฅธ ํ์์ Stable Diffusion ๋ถ๋ฌ์ค๊ธฐ
|
||||
title: ๋ถ๋ฌ์ค๊ธฐ & ํ๋ธ
|
||||
- sections:
|
||||
- local: using-diffusers/pipeline_overview
|
||||
title: ๊ฐ์
|
||||
- local: using-diffusers/unconditional_image_generation
|
||||
title: Unconditional ์ด๋ฏธ์ง ์์ฑ
|
||||
- local: in_translation
|
||||
title: Text-to-image ์์ฑ
|
||||
- local: using-diffusers/img2img
|
||||
title: Text-guided image-to-image
|
||||
- local: using-diffusers/inpaint
|
||||
title: Text-guided ์ด๋ฏธ์ง ์ธํ์ธํ
|
||||
- local: using-diffusers/depth2img
|
||||
title: Text-guided depth-to-image
|
||||
- local: in_translation
|
||||
title: Textual inversion
|
||||
- local: in_translation
|
||||
title: ์ฌ๋ฌ GPU๋ฅผ ์ฌ์ฉํ ๋ถ์ฐ ์ถ๋ก
|
||||
- local: using-diffusers/reusing_seeds
|
||||
title: Deterministic ์์ฑ์ผ๋ก ์ด๋ฏธ์ง ํ๋ฆฌํฐ ๋์ด๊ธฐ
|
||||
- local: in_translation
|
||||
title: ์ฌํ ๊ฐ๋ฅํ ํ์ดํ๋ผ์ธ ์์ฑํ๊ธฐ
|
||||
- local: using-diffusers/custom_pipeline_examples
|
||||
title: ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ๋ค
|
||||
- local: in_translation
|
||||
title: ์ปค๋ฎคํฐ๋ ํ์ดํ๋ผ์ธ์ ๊ธฐ์ฌํ๋ ๋ฐฉ๋ฒ
|
||||
- local: in_translation
|
||||
title: JAX/Flax์์์ Stable Diffusion
|
||||
- local: in_translation
|
||||
title: Weighting Prompts
|
||||
title: ์ถ๋ก ์ ์ํ ํ์ดํ๋ผ์ธ
|
||||
- sections:
|
||||
- local: training/overview
|
||||
title: ๊ฐ์
|
||||
- local: in_translation
|
||||
title: ํ์ต์ ์ํ ๋ฐ์ดํฐ์
์์ฑํ๊ธฐ
|
||||
- local: training/adapt_a_model
|
||||
title: ์๋ก์ด ํ์คํฌ์ ๋ชจ๋ธ ์ ์ฉํ๊ธฐ
|
||||
- local: training/unconditional_training
|
||||
title: Unconditional ์ด๋ฏธ์ง ์์ฑ
|
||||
- local: training/text_inversion
|
||||
title: Textual Inversion
|
||||
- local: training/dreambooth
|
||||
title: DreamBooth
|
||||
@@ -27,13 +82,16 @@
|
||||
title: ControlNet
|
||||
- local: in_translation
|
||||
title: InstructPix2Pix ํ์ต
|
||||
title: ํ์ต
|
||||
- local: in_translation
|
||||
title: Custom Diffusion
|
||||
title: Training
|
||||
title: Diffusers ์ฌ์ฉํ๊ธฐ
|
||||
- sections:
|
||||
- local: in_translation
|
||||
- local: optimization/opt_overview
|
||||
title: ๊ฐ์
|
||||
- local: optimization/fp16
|
||||
title: ๋ฉ๋ชจ๋ฆฌ์ ์๋
|
||||
- local: in_translation
|
||||
- local: optimization/torch2.0
|
||||
title: Torch2.0 ์ง์
|
||||
- local: optimization/xformers
|
||||
title: xFormers
|
||||
@@ -41,8 +99,12 @@
|
||||
title: ONNX
|
||||
- local: optimization/open_vino
|
||||
title: OpenVINO
|
||||
- local: in_translation
|
||||
title: Core ML
|
||||
- local: optimization/mps
|
||||
title: MPS
|
||||
- local: optimization/habana
|
||||
title: Habana Gaudi
|
||||
- local: in_translation
|
||||
title: Token Merging
|
||||
title: ์ต์ ํ/ํน์ ํ๋์จ์ด
|
||||
|
||||
@@ -59,7 +59,7 @@ torch.backends.cuda.matmul.allow_tf32 = True
|
||||
|
||||
## ๋ฐ์ ๋ฐ๋ ๊ฐ์ค์น
|
||||
|
||||
๋ ๋ง์ GPU ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ ์ฝํ๊ณ ๋ ๋น ๋ฅธ ์๋๋ฅผ ์ป๊ธฐ ์ํด ๋ชจ๋ธ ๊ฐ์ค์น๋ฅผ ๋ฐ์ ๋ฐ๋(half precision)๋ก ์ง์ ๋ก๋ํ๊ณ ์คํํ ์ ์์ต๋๋ค.
|
||||
๋ ๋ง์ GPU ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ ์ฝํ๊ณ ๋ ๋น ๋ฅธ ์๋๋ฅผ ์ป๊ธฐ ์ํด ๋ชจ๋ธ ๊ฐ์ค์น๋ฅผ ๋ฐ์ ๋ฐ๋(half precision)๋ก ์ง์ ๋ถ๋ฌ์ค๊ณ ์คํํ ์ ์์ต๋๋ค.
|
||||
์ฌ๊ธฐ์๋ `fp16`์ด๋ผ๋ ๋ธ๋์น์ ์ ์ฅ๋ float16 ๋ฒ์ ์ ๊ฐ์ค์น๋ฅผ ๋ถ๋ฌ์ค๊ณ , ๊ทธ ๋ `float16` ์ ํ์ ์ฌ์ฉํ๋๋ก PyTorch์ ์ง์ํ๋ ์์
์ด ํฌํจ๋ฉ๋๋ค.
|
||||
|
||||
```Python
|
||||
|
||||
17
docs/source/ko/optimization/opt_overview.mdx
Normal file
17
docs/source/ko/optimization/opt_overview.mdx
Normal file
@@ -0,0 +1,17 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ๊ฐ์
|
||||
|
||||
๋
ธ์ด์ฆ๊ฐ ๋ง์ ์ถ๋ ฅ์์ ์ ์ ์ถ๋ ฅ์ผ๋ก ๋ง๋๋ ๊ณผ์ ์ผ๋ก ๊ณ ํ์ง ์์ฑ ๋ชจ๋ธ์ ์ถ๋ ฅ์ ๋ง๋๋ ๊ฐ๊ฐ์ ๋ฐ๋ณต๋๋ ์คํ
์ ๋ง์ ๊ณ์ฐ์ด ํ์ํฉ๋๋ค. ๐งจ Diffuser์ ๋ชฉํ ์ค ํ๋๋ ๋ชจ๋ ์ฌ๋์ด ์ด ๊ธฐ์ ์ ๋๋ฆฌ ์ด์ฉํ ์ ์๋๋ก ํ๋ ๊ฒ์ด๋ฉฐ, ์ฌ๊ธฐ์๋ ์๋น์ ๋ฐ ํน์ ํ๋์จ์ด์์ ๋น ๋ฅธ ์ถ๋ก ์ ๊ฐ๋ฅํ๊ฒ ํ๋ ๊ฒ์ ํฌํจํฉ๋๋ค.
|
||||
|
||||
์ด ์น์
์์๋ ์ถ๋ก ์๋๋ฅผ ์ต์ ํํ๊ณ ๋ฉ๋ชจ๋ฆฌ ์๋น๋ฅผ ์ค์ด๊ธฐ ์ํ ๋ฐ์ ๋ฐ(half-precision) ๊ฐ์ค์น ๋ฐ sliced attention๊ณผ ๊ฐ์ ํ๊ณผ ์๋ น์ ๋ค๋ฃน๋๋ค. ๋ํ [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) ๋๋ [ONNX Runtime](https://onnxruntime.ai/docs/)์ ์ฌ์ฉํ์ฌ PyTorch ์ฝ๋์ ์๋๋ฅผ ๋์ด๊ณ , [xFormers](https://facebookresearch.github.io/xformers/)๋ฅผ ์ฌ์ฉํ์ฌ memory-efficient attention์ ํ์ฑํํ๋ ๋ฐฉ๋ฒ์ ๋ฐฐ์ธ ์ ์์ต๋๋ค. Apple Silicon, Intel ๋๋ Habana ํ๋ก์ธ์์ ๊ฐ์ ํน์ ํ๋์จ์ด์์ ์ถ๋ก ์ ์คํํ๊ธฐ ์ํ ๊ฐ์ด๋๋ ์์ต๋๋ค.
|
||||
445
docs/source/ko/optimization/torch2.0.mdx
Normal file
445
docs/source/ko/optimization/torch2.0.mdx
Normal file
@@ -0,0 +1,445 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Diffusers์์์ PyTorch 2.0 ๊ฐ์ํ ์ง์
|
||||
|
||||
`0.13.0` ๋ฒ์ ๋ถํฐ Diffusers๋ [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/)์์์ ์ต์ ์ต์ ํ๋ฅผ ์ง์ํฉ๋๋ค. ์ด๋ ๋ค์์ ํฌํจ๋ฉ๋๋ค.
|
||||
1. momory-efficient attention์ ์ฌ์ฉํ ๊ฐ์ํ๋ ํธ๋์คํฌ๋จธ ์ง์ - `xformers`๊ฐ์ ์ถ๊ฐ์ ์ธ dependencies ํ์ ์์
|
||||
2. ์ถ๊ฐ ์ฑ๋ฅ ํฅ์์ ์ํ ๊ฐ๋ณ ๋ชจ๋ธ์ ๋ํ ์ปดํ์ผ ๊ธฐ๋ฅ [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) ์ง์
|
||||
|
||||
|
||||
## ์ค์น
|
||||
๊ฐ์ํ๋ ์ดํ
์
๊ตฌํ๊ณผ ๋ฐ `torch.compile()`์ ์ฌ์ฉํ๊ธฐ ์ํด, pip์์ ์ต์ ๋ฒ์ ์ PyTorch 2.0์ ์ค์น๋์ด ์๊ณ diffusers 0.13.0. ๋ฒ์ ์ด์์ธ์ง ํ์ธํ์ธ์. ์๋ ์ค๋ช
๋ ๋ฐ์ ๊ฐ์ด, PyTorch 2.0์ด ํ์ฑํ๋์ด ์์ ๋ diffusers๋ ์ต์ ํ๋ ์ดํ
์
ํ๋ก์ธ์([`AttnProcessor2_0`](https://github.com/huggingface/diffusers/blob/1a5797c6d4491a879ea5285c4efc377664e0332d/src/diffusers/models/attention_processor.py#L798))๋ฅผ ์ฌ์ฉํฉ๋๋ค.
|
||||
|
||||
```bash
|
||||
pip install --upgrade torch diffusers
|
||||
```
|
||||
|
||||
## ๊ฐ์ํ๋ ํธ๋์คํฌ๋จธ์ `torch.compile` ์ฌ์ฉํ๊ธฐ.
|
||||
|
||||
|
||||
1. **๊ฐ์ํ๋ ํธ๋์คํฌ๋จธ ๊ตฌํ**
|
||||
|
||||
PyTorch 2.0์๋ [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) ํจ์๋ฅผ ํตํด ์ต์ ํ๋ memory-efficient attention์ ๊ตฌํ์ด ํฌํจ๋์ด ์์ต๋๋ค. ์ด๋ ์
๋ ฅ ๋ฐ GPU ์ ํ์ ๋ฐ๋ผ ์ฌ๋ฌ ์ต์ ํ๋ฅผ ์๋์ผ๋ก ํ์ฑํํฉ๋๋ค. ์ด๋ [xFormers](https://github.com/facebookresearch/xformers)์ `memory_efficient_attention`๊ณผ ์ ์ฌํ์ง๋ง ๊ธฐ๋ณธ์ ์ผ๋ก PyTorch์ ๋ด์ฅ๋์ด ์์ต๋๋ค.
|
||||
|
||||
์ด๋ฌํ ์ต์ ํ๋ PyTorch 2.0์ด ์ค์น๋์ด ์๊ณ `torch.nn.functional.scaled_dot_product_attention`์ ์ฌ์ฉํ ์ ์๋ ๊ฒฝ์ฐ Diffusers์์ ๊ธฐ๋ณธ์ ์ผ๋ก ํ์ฑํ๋ฉ๋๋ค. ์ด๋ฅผ ์ฌ์ฉํ๋ ค๋ฉด `torch 2.0`์ ์ค์นํ๊ณ ํ์ดํ๋ผ์ธ์ ์ฌ์ฉํ๊ธฐ๋ง ํ๋ฉด ๋ฉ๋๋ค. ์๋ฅผ ๋ค์ด:
|
||||
|
||||
```Python
|
||||
import torch
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
|
||||
pipe = pipe.to("cuda")
|
||||
|
||||
prompt = "a photo of an astronaut riding a horse on mars"
|
||||
image = pipe(prompt).images[0]
|
||||
```
|
||||
|
||||
์ด๋ฅผ ๋ช
์์ ์ผ๋ก ํ์ฑํํ๋ ค๋ฉด(ํ์๋ ์๋) ์๋์ ๊ฐ์ด ์ํํ ์ ์์ต๋๋ค.
|
||||
|
||||
```diff
|
||||
import torch
|
||||
from diffusers import DiffusionPipeline
|
||||
+ from diffusers.models.attention_processor import AttnProcessor2_0
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
|
||||
+ pipe.unet.set_attn_processor(AttnProcessor2_0())
|
||||
|
||||
prompt = "a photo of an astronaut riding a horse on mars"
|
||||
image = pipe(prompt).images[0]
|
||||
```
|
||||
|
||||
์ด ์คํ ๊ณผ์ ์ `xFormers`๋งํผ ๋น ๋ฅด๊ณ ๋ฉ๋ชจ๋ฆฌ์ ์ผ๋ก ํจ์จ์ ์ด์ด์ผ ํฉ๋๋ค. ์์ธํ ๋ด์ฉ์ [๋ฒค์น๋งํฌ](#benchmark)์์ ํ์ธํ์ธ์.
|
||||
|
||||
ํ์ดํ๋ผ์ธ์ ๋ณด๋ค deterministic์ผ๋ก ๋ง๋ค๊ฑฐ๋ ํ์ธ ํ๋๋ ๋ชจ๋ธ์ [Core ML](https://huggingface.co/docs/diffusers/v0.16.0/en/optimization/coreml#how-to-run-stable-diffusion-with-core-ml)๊ณผ ๊ฐ์ ๋ค๋ฅธ ํ์์ผ๋ก ๋ณํํด์ผ ํ๋ ๊ฒฝ์ฐ ๋ฐ๋๋ผ ์ดํ
์
ํ๋ก์ธ์ ([`AttnProcessor`](https://github.com/huggingface/diffusers/blob/1a5797c6d4491a879ea5285c4efc377664e0332d/src/diffusers/models/attention_processor.py#L402))๋ก ๋๋๋ฆด ์ ์์ต๋๋ค. ์ผ๋ฐ ์ดํ
์
ํ๋ก์ธ์๋ฅผ ์ฌ์ฉํ๋ ค๋ฉด [`~diffusers.UNet2DConditionModel.set_default_attn_processor`] ํจ์๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค:
|
||||
|
||||
```Python
|
||||
import torch
|
||||
from diffusers import DiffusionPipeline
|
||||
from diffusers.models.attention_processor import AttnProcessor
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
|
||||
pipe.unet.set_default_attn_processor()
|
||||
|
||||
prompt = "a photo of an astronaut riding a horse on mars"
|
||||
image = pipe(prompt).images[0]
|
||||
```
|
||||
|
||||
2. **torch.compile**
|
||||
|
||||
์ถ๊ฐ์ ์ธ ์๋ ํฅ์์ ์ํด ์๋ก์ด `torch.compile` ๊ธฐ๋ฅ์ ์ฌ์ฉํ ์ ์์ต๋๋ค. ํ์ดํ๋ผ์ธ์ UNet์ ์ผ๋ฐ์ ์ผ๋ก ๊ณ์ฐ ๋น์ฉ์ด ๊ฐ์ฅ ํฌ๊ธฐ ๋๋ฌธ์ ๋๋จธ์ง ํ์ ๋ชจ๋ธ(ํ
์คํธ ์ธ์ฝ๋์ VAE)์ ๊ทธ๋๋ก ๋๊ณ `unet`์ `torch.compile`๋ก ๋ํํฉ๋๋ค. ์์ธํ ๋ด์ฉ๊ณผ ๋ค๋ฅธ ์ต์
์ [torch ์ปดํ์ผ ๋ฌธ์](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html)๋ฅผ ์ฐธ์กฐํ์ธ์.
|
||||
|
||||
```python
|
||||
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
|
||||
images = pipe(prompt, num_inference_steps=steps, num_images_per_prompt=batch_size).images
|
||||
```
|
||||
|
||||
GPU ์ ํ์ ๋ฐ๋ผ `compile()`์ ๊ฐ์ํ๋ ํธ๋์คํฌ๋จธ ์ต์ ํ๋ฅผ ํตํด **5% - 300%**์ _์ถ๊ฐ ์ฑ๋ฅ ํฅ์_์ ์ป์ ์ ์์ต๋๋ค. ๊ทธ๋ฌ๋ ์ปดํ์ผ์ Ampere(A100, 3090), Ada(4090) ๋ฐ Hopper(H100)์ ๊ฐ์ ์ต์ GPU ์ํคํ
์ฒ์์ ๋ ๋ง์ ์ฑ๋ฅ ํฅ์์ ๊ฐ์ ธ์ฌ ์ ์์์ ์ฐธ๊ณ ํ์ธ์.
|
||||
|
||||
์ปดํ์ผ์ ์๋ฃํ๋ ๋ฐ ์ฝ๊ฐ์ ์๊ฐ์ด ๊ฑธ๋ฆฌ๋ฏ๋ก, ํ์ดํ๋ผ์ธ์ ํ ๋ฒ ์ค๋นํ ๋ค์ ๋์ผํ ์ ํ์ ์ถ๋ก ์์
์ ์ฌ๋ฌ ๋ฒ ์ํํด์ผ ํ๋ ์ํฉ์ ๊ฐ์ฅ ์ ํฉํฉ๋๋ค. ๋ค๋ฅธ ์ด๋ฏธ์ง ํฌ๊ธฐ์์ ์ปดํ์ผ๋ ํ์ดํ๋ผ์ธ์ ํธ์ถํ๋ฉด ์๊ฐ์ ๋น์ฉ์ด ๋ง์ด ๋ค ์ ์๋ ์ปดํ์ผ ์์
์ด ๋ค์ ํธ๋ฆฌ๊ฑฐ๋ฉ๋๋ค.
|
||||
|
||||
|
||||
## ๋ฒค์น๋งํฌ
|
||||
|
||||
PyTorch 2.0์ ํจ์จ์ ์ธ ์ดํ
์
๊ตฌํ๊ณผ `torch.compile`์ ์ฌ์ฉํ์ฌ ๊ฐ์ฅ ๋ง์ด ์ฌ์ฉ๋๋ 5๊ฐ์ ํ์ดํ๋ผ์ธ์ ๋ํด ๋ค์ํ GPU์ ๋ฐฐ์น ํฌ๊ธฐ์ ๊ฑธ์ณ ํฌ๊ด์ ์ธ ๋ฒค์น๋งํฌ๋ฅผ ์ํํ์ต๋๋ค. ์ฌ๊ธฐ์๋ [`torch.compile()`์ด ์ต์ ์ผ๋ก ํ์ฉ๋๋๋ก ํ๋](https://github.com/huggingface/diffusers/pull/3313) `diffusers 0.17.0.dev0`์ ์ฌ์ฉํ์ต๋๋ค.
|
||||
|
||||
### ๋ฒค์น๋งํน ์ฝ๋
|
||||
|
||||
#### Stable Diffusion text-to-image
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
import torch
|
||||
|
||||
path = "runwayml/stable-diffusion-v1-5"
|
||||
|
||||
run_compile = True # Set True / False
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16)
|
||||
pipe = pipe.to("cuda")
|
||||
pipe.unet.to(memory_format=torch.channels_last)
|
||||
|
||||
if run_compile:
|
||||
print("Run torch compile")
|
||||
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
|
||||
|
||||
prompt = "ghibli style, a fantasy landscape with castles"
|
||||
|
||||
for _ in range(3):
|
||||
images = pipe(prompt=prompt).images
|
||||
```
|
||||
|
||||
#### Stable Diffusion image-to-image
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionImg2ImgPipeline
|
||||
import requests
|
||||
import torch
|
||||
from PIL import Image
|
||||
from io import BytesIO
|
||||
|
||||
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
|
||||
|
||||
response = requests.get(url)
|
||||
init_image = Image.open(BytesIO(response.content)).convert("RGB")
|
||||
init_image = init_image.resize((512, 512))
|
||||
|
||||
path = "runwayml/stable-diffusion-v1-5"
|
||||
|
||||
run_compile = True # Set True / False
|
||||
|
||||
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(path, torch_dtype=torch.float16)
|
||||
pipe = pipe.to("cuda")
|
||||
pipe.unet.to(memory_format=torch.channels_last)
|
||||
|
||||
if run_compile:
|
||||
print("Run torch compile")
|
||||
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
|
||||
|
||||
prompt = "ghibli style, a fantasy landscape with castles"
|
||||
|
||||
for _ in range(3):
|
||||
image = pipe(prompt=prompt, image=init_image).images[0]
|
||||
```
|
||||
|
||||
#### Stable Diffusion - inpainting
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionInpaintPipeline
|
||||
import requests
|
||||
import torch
|
||||
from PIL import Image
|
||||
from io import BytesIO
|
||||
|
||||
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
|
||||
|
||||
def download_image(url):
|
||||
response = requests.get(url)
|
||||
return Image.open(BytesIO(response.content)).convert("RGB")
|
||||
|
||||
|
||||
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
|
||||
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
|
||||
|
||||
init_image = download_image(img_url).resize((512, 512))
|
||||
mask_image = download_image(mask_url).resize((512, 512))
|
||||
|
||||
path = "runwayml/stable-diffusion-inpainting"
|
||||
|
||||
run_compile = True # Set True / False
|
||||
|
||||
pipe = StableDiffusionInpaintPipeline.from_pretrained(path, torch_dtype=torch.float16)
|
||||
pipe = pipe.to("cuda")
|
||||
pipe.unet.to(memory_format=torch.channels_last)
|
||||
|
||||
if run_compile:
|
||||
print("Run torch compile")
|
||||
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
|
||||
|
||||
prompt = "ghibli style, a fantasy landscape with castles"
|
||||
|
||||
for _ in range(3):
|
||||
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
|
||||
```
|
||||
|
||||
#### ControlNet
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
|
||||
import requests
|
||||
import torch
|
||||
from PIL import Image
|
||||
from io import BytesIO
|
||||
|
||||
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
|
||||
|
||||
response = requests.get(url)
|
||||
init_image = Image.open(BytesIO(response.content)).convert("RGB")
|
||||
init_image = init_image.resize((512, 512))
|
||||
|
||||
path = "runwayml/stable-diffusion-v1-5"
|
||||
|
||||
run_compile = True # Set True / False
|
||||
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
|
||||
pipe = StableDiffusionControlNetPipeline.from_pretrained(
|
||||
path, controlnet=controlnet, torch_dtype=torch.float16
|
||||
)
|
||||
|
||||
pipe = pipe.to("cuda")
|
||||
pipe.unet.to(memory_format=torch.channels_last)
|
||||
pipe.controlnet.to(memory_format=torch.channels_last)
|
||||
|
||||
if run_compile:
|
||||
print("Run torch compile")
|
||||
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
|
||||
pipe.controlnet = torch.compile(pipe.controlnet, mode="reduce-overhead", fullgraph=True)
|
||||
|
||||
prompt = "ghibli style, a fantasy landscape with castles"
|
||||
|
||||
for _ in range(3):
|
||||
image = pipe(prompt=prompt, image=init_image).images[0]
|
||||
```
|
||||
|
||||
#### IF text-to-image + upscaling
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
import torch
|
||||
|
||||
run_compile = True # Set True / False
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16)
|
||||
pipe.to("cuda")
|
||||
pipe_2 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-II-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16)
|
||||
pipe_2.to("cuda")
|
||||
pipe_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16)
|
||||
pipe_3.to("cuda")
|
||||
|
||||
|
||||
pipe.unet.to(memory_format=torch.channels_last)
|
||||
pipe_2.unet.to(memory_format=torch.channels_last)
|
||||
pipe_3.unet.to(memory_format=torch.channels_last)
|
||||
|
||||
if run_compile:
|
||||
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
|
||||
pipe_2.unet = torch.compile(pipe_2.unet, mode="reduce-overhead", fullgraph=True)
|
||||
pipe_3.unet = torch.compile(pipe_3.unet, mode="reduce-overhead", fullgraph=True)
|
||||
|
||||
prompt = "the blue hulk"
|
||||
|
||||
prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)
|
||||
neg_prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)
|
||||
|
||||
for _ in range(3):
|
||||
image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
|
||||
image_2 = pipe_2(image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
|
||||
image_3 = pipe_3(prompt=prompt, image=image, noise_level=100).images
|
||||
```
|
||||
|
||||
PyTorch 2.0 ๋ฐ `torch.compile()`๋ก ์ป์ ์ ์๋ ๊ฐ๋ฅํ ์๋ ํฅ์์ ๋ํด, [Stable Diffusion text-to-image pipeline](StableDiffusionPipeline)์ ๋ํ ์๋์ ์ธ ์๋ ํฅ์์ ๋ณด์ฌ์ฃผ๋ ์ฐจํธ๋ฅผ 5๊ฐ์ ์๋ก ๋ค๋ฅธ GPU ์ ํ๊ตฐ(๋ฐฐ์น ํฌ๊ธฐ 4)์ ๋ํด ๋ํ๋
๋๋ค:
|
||||
|
||||

|
||||
|
||||
To give you an even better idea of how this speed-up holds for the other pipelines presented above, consider the following
|
||||
plot that shows the benchmarking numbers from an A100 across three different batch sizes
|
||||
(with PyTorch 2.0 nightly and `torch.compile()`):
|
||||
์ด ์๋ ํฅ์์ด ์์ ์ ์๋ ๋ค๋ฅธ ํ์ดํ๋ผ์ธ์ ๋ํด์๋ ์ด๋ป๊ฒ ์ ์ง๋๋์ง ๋ ์ ์ดํดํ๊ธฐ ์ํด, ์ธ ๊ฐ์ง์ ๋ค๋ฅธ ๋ฐฐ์น ํฌ๊ธฐ์ ๊ฑธ์ณ A100์ ๋ฒค์น๋งํน(PyTorch 2.0 nightly ๋ฐ `torch.compile() ์ฌ์ฉ) ์์น๋ฅผ ๋ณด์ฌ์ฃผ๋ ์ฐจํธ๋ฅผ ๋ณด์
๋๋ค:
|
||||
|
||||

|
||||
|
||||
_(์ ์ฐจํธ์ ๋ฒค์น๋งํฌ ๋ฉํธ๋ฆญ์ **์ด๋น iteration ์(iterations/second)**์
๋๋ค)_
|
||||
|
||||
๊ทธ๋ฌ๋ ํฌ๋ช
์ฑ์ ์ํด ๋ชจ๋ ๋ฒค์น๋งํน ์์น๋ฅผ ๊ณต๊ฐํฉ๋๋ค!
|
||||
|
||||
๋ค์ ํ๋ค์์๋, **_์ด๋น ์ฒ๋ฆฌ๋๋ iteration_** ์ ์ธก๋ฉด์์์ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ฌ์ค๋๋ค.
|
||||
|
||||
### A100 (batch size: 1)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 21.66 | 23.13 | 44.03 | 49.74 |
|
||||
| SD - img2img | 21.81 | 22.40 | 43.92 | 46.32 |
|
||||
| SD - inpaint | 22.24 | 23.23 | 43.76 | 49.25 |
|
||||
| SD - controlnet | 15.02 | 15.82 | 32.13 | 36.08 |
|
||||
| IF | 20.21 / <br>13.84 / <br>24.00 | 20.12 / <br>13.70 / <br>24.03 | โ | 97.34 / <br>27.23 / <br>111.66 |
|
||||
|
||||
### A100 (batch size: 4)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 11.6 | 13.12 | 14.62 | 17.27 |
|
||||
| SD - img2img | 11.47 | 13.06 | 14.66 | 17.25 |
|
||||
| SD - inpaint | 11.67 | 13.31 | 14.88 | 17.48 |
|
||||
| SD - controlnet | 8.28 | 9.38 | 10.51 | 12.41 |
|
||||
| IF | 25.02 | 18.04 | โ | 48.47 |
|
||||
|
||||
### A100 (batch size: 16)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 3.04 | 3.6 | 3.83 | 4.68 |
|
||||
| SD - img2img | 2.98 | 3.58 | 3.83 | 4.67 |
|
||||
| SD - inpaint | 3.04 | 3.66 | 3.9 | 4.76 |
|
||||
| SD - controlnet | 2.15 | 2.58 | 2.74 | 3.35 |
|
||||
| IF | 8.78 | 9.82 | โ | 16.77 |
|
||||
|
||||
### V100 (batch size: 1)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 18.99 | 19.14 | 20.95 | 22.17 |
|
||||
| SD - img2img | 18.56 | 19.18 | 20.95 | 22.11 |
|
||||
| SD - inpaint | 19.14 | 19.06 | 21.08 | 22.20 |
|
||||
| SD - controlnet | 13.48 | 13.93 | 15.18 | 15.88 |
|
||||
| IF | 20.01 / <br>9.08 / <br>23.34 | 19.79 / <br>8.98 / <br>24.10 | โ | 55.75 / <br>11.57 / <br>57.67 |
|
||||
|
||||
### V100 (batch size: 4)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 5.96 | 5.89 | 6.83 | 6.86 |
|
||||
| SD - img2img | 5.90 | 5.91 | 6.81 | 6.82 |
|
||||
| SD - inpaint | 5.99 | 6.03 | 6.93 | 6.95 |
|
||||
| SD - controlnet | 4.26 | 4.29 | 4.92 | 4.93 |
|
||||
| IF | 15.41 | 14.76 | โ | 22.95 |
|
||||
|
||||
### V100 (batch size: 16)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 1.66 | 1.66 | 1.92 | 1.90 |
|
||||
| SD - img2img | 1.65 | 1.65 | 1.91 | 1.89 |
|
||||
| SD - inpaint | 1.69 | 1.69 | 1.95 | 1.93 |
|
||||
| SD - controlnet | 1.19 | 1.19 | OOM after warmup | 1.36 |
|
||||
| IF | 5.43 | 5.29 | โ | 7.06 |
|
||||
|
||||
### T4 (batch size: 1)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 6.9 | 6.95 | 7.3 | 7.56 |
|
||||
| SD - img2img | 6.84 | 6.99 | 7.04 | 7.55 |
|
||||
| SD - inpaint | 6.91 | 6.7 | 7.01 | 7.37 |
|
||||
| SD - controlnet | 4.89 | 4.86 | 5.35 | 5.48 |
|
||||
| IF | 17.42 / <br>2.47 / <br>18.52 | 16.96 / <br>2.45 / <br>18.69 | โ | 24.63 / <br>2.47 / <br>23.39 |
|
||||
|
||||
### T4 (batch size: 4)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 1.79 | 1.79 | 2.03 | 1.99 |
|
||||
| SD - img2img | 1.77 | 1.77 | 2.05 | 2.04 |
|
||||
| SD - inpaint | 1.81 | 1.82 | 2.09 | 2.09 |
|
||||
| SD - controlnet | 1.34 | 1.27 | 1.47 | 1.46 |
|
||||
| IF | 5.79 | 5.61 | โ | 7.39 |
|
||||
|
||||
### T4 (batch size: 16)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 2.34s | 2.30s | OOM after 2nd iteration | 1.99s |
|
||||
| SD - img2img | 2.35s | 2.31s | OOM after warmup | 2.00s |
|
||||
| SD - inpaint | 2.30s | 2.26s | OOM after 2nd iteration | 1.95s |
|
||||
| SD - controlnet | OOM after 2nd iteration | OOM after 2nd iteration | OOM after warmup | OOM after warmup |
|
||||
| IF * | 1.44 | 1.44 | โ | 1.94 |
|
||||
|
||||
### RTX 3090 (batch size: 1)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 22.56 | 22.84 | 23.84 | 25.69 |
|
||||
| SD - img2img | 22.25 | 22.61 | 24.1 | 25.83 |
|
||||
| SD - inpaint | 22.22 | 22.54 | 24.26 | 26.02 |
|
||||
| SD - controlnet | 16.03 | 16.33 | 17.38 | 18.56 |
|
||||
| IF | 27.08 / <br>9.07 / <br>31.23 | 26.75 / <br>8.92 / <br>31.47 | โ | 68.08 / <br>11.16 / <br>65.29 |
|
||||
|
||||
### RTX 3090 (batch size: 4)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 6.46 | 6.35 | 7.29 | 7.3 |
|
||||
| SD - img2img | 6.33 | 6.27 | 7.31 | 7.26 |
|
||||
| SD - inpaint | 6.47 | 6.4 | 7.44 | 7.39 |
|
||||
| SD - controlnet | 4.59 | 4.54 | 5.27 | 5.26 |
|
||||
| IF | 16.81 | 16.62 | โ | 21.57 |
|
||||
|
||||
### RTX 3090 (batch size: 16)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 1.7 | 1.69 | 1.93 | 1.91 |
|
||||
| SD - img2img | 1.68 | 1.67 | 1.93 | 1.9 |
|
||||
| SD - inpaint | 1.72 | 1.71 | 1.97 | 1.94 |
|
||||
| SD - controlnet | 1.23 | 1.22 | 1.4 | 1.38 |
|
||||
| IF | 5.01 | 5.00 | โ | 6.33 |
|
||||
|
||||
### RTX 4090 (batch size: 1)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 40.5 | 41.89 | 44.65 | 49.81 |
|
||||
| SD - img2img | 40.39 | 41.95 | 44.46 | 49.8 |
|
||||
| SD - inpaint | 40.51 | 41.88 | 44.58 | 49.72 |
|
||||
| SD - controlnet | 29.27 | 30.29 | 32.26 | 36.03 |
|
||||
| IF | 69.71 / <br>18.78 / <br>85.49 | 69.13 / <br>18.80 / <br>85.56 | โ | 124.60 / <br>26.37 / <br>138.79 |
|
||||
|
||||
### RTX 4090 (batch size: 4)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 12.62 | 12.84 | 15.32 | 15.59 |
|
||||
| SD - img2img | 12.61 | 12,.79 | 15.35 | 15.66 |
|
||||
| SD - inpaint | 12.65 | 12.81 | 15.3 | 15.58 |
|
||||
| SD - controlnet | 9.1 | 9.25 | 11.03 | 11.22 |
|
||||
| IF | 31.88 | 31.14 | โ | 43.92 |
|
||||
|
||||
### RTX 4090 (batch size: 16)
|
||||
|
||||
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
| SD - txt2img | 3.17 | 3.2 | 3.84 | 3.85 |
|
||||
| SD - img2img | 3.16 | 3.2 | 3.84 | 3.85 |
|
||||
| SD - inpaint | 3.17 | 3.2 | 3.85 | 3.85 |
|
||||
| SD - controlnet | 2.23 | 2.3 | 2.7 | 2.75 |
|
||||
| IF | 9.26 | 9.2 | โ | 13.31 |
|
||||
|
||||
## ์ฐธ๊ณ
|
||||
|
||||
* Follow [this PR](https://github.com/huggingface/diffusers/pull/3313) for more details on the environment used for conducting the benchmarks.
|
||||
* For the IF pipeline and batch sizes > 1, we only used a batch size of >1 in the first IF pipeline for text-to-image generation and NOT for upscaling. So, that means the two upscaling pipelines received a batch size of 1.
|
||||
|
||||
*Thanks to [Horace He](https://github.com/Chillee) from the PyTorch team for their support in improving our support of `torch.compile()` in Diffusers.*
|
||||
|
||||
* ๋ฒค์น๋งํฌ ์ํ์ ์ฌ์ฉ๋ ํ๊ฒฝ์ ๋ํ ์์ธํ ๋ด์ฉ์ [์ด PR](https://github.com/huggingface/diffusers/pull/3313)์ ์ฐธ์กฐํ์ธ์.
|
||||
* IF ํ์ดํ๋ผ์ธ์ ๋ฐฐ์น ํฌ๊ธฐ > 1์ ๊ฒฝ์ฐ ์ฒซ ๋ฒ์งธ IF ํ์ดํ๋ผ์ธ์์ text-to-image ์์ฑ์ ์ํ ๋ฐฐ์น ํฌ๊ธฐ > 1๋ง ์ฌ์ฉํ์ผ๋ฉฐ ์
์ค์ผ์ผ๋ง์๋ ์ฌ์ฉํ์ง ์์์ต๋๋ค. ์ฆ, ๋ ๊ฐ์ ์
์ค์ผ์ผ๋ง ํ์ดํ๋ผ์ธ์ด ๋ฐฐ์น ํฌ๊ธฐ 1์์ ์๋ฏธํฉ๋๋ค.
|
||||
|
||||
*Diffusers์์ `torch.compile()` ์ง์์ ๊ฐ์ ํ๋ ๋ฐ ๋์์ ์ค PyTorch ํ์ [Horace He](https://github.com/Chillee)์๊ฒ ๊ฐ์ฌ๋๋ฆฝ๋๋ค.*
|
||||
54
docs/source/ko/training/adapt_a_model.mdx
Normal file
54
docs/source/ko/training/adapt_a_model.mdx
Normal file
@@ -0,0 +1,54 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ์๋ก์ด ์์
์ ๋ํ ๋ชจ๋ธ์ ์ ์ฉํ๊ธฐ
|
||||
|
||||
๋ง์ diffusion ์์คํ
์ ๊ฐ์ ๊ตฌ์ฑ ์์๋ค์ ๊ณต์ ํ๋ฏ๋ก ํ ์์
์ ๋ํด ์ฌ์ ํ์ต๋ ๋ชจ๋ธ์ ์์ ํ ๋ค๋ฅธ ์์
์ ์ ์ฉํ ์ ์์ต๋๋ค.
|
||||
|
||||
์ด ์ธํ์ธํ
์ ์ํ ๊ฐ์ด๋๋ ์ฌ์ ํ์ต๋ [`UNet2DConditionModel`]์ ์ํคํ
์ฒ๋ฅผ ์ด๊ธฐํํ๊ณ ์์ ํ์ฌ ์ฌ์ ํ์ต๋ text-to-image ๋ชจ๋ธ์ ์ด๋ป๊ฒ ์ธํ์ธํ
์ ์ ์ฉํ๋์ง๋ฅผ ์๋ ค์ค ๊ฒ์
๋๋ค.
|
||||
|
||||
## UNet2DConditionModel ํ๋ผ๋ฏธํฐ ๊ตฌ์ฑ
|
||||
|
||||
[`UNet2DConditionModel`]์ [input sample](https://huggingface.co/docs/diffusers/v0.16.0/en/api/models#diffusers.UNet2DConditionModel.in_channels)์์ 4๊ฐ์ ์ฑ๋์ ๊ธฐ๋ณธ์ ์ผ๋ก ํ์ฉํฉ๋๋ค. ์๋ฅผ ๋ค์ด, [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)์ ๊ฐ์ ์ฌ์ ํ์ต๋ text-to-image ๋ชจ๋ธ์ ๋ถ๋ฌ์ค๊ณ `in_channels`์ ์๋ฅผ ํ์ธํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
from diffusers import StableDiffusionPipeline
|
||||
|
||||
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
|
||||
pipeline.unet.config["in_channels"]
|
||||
4
|
||||
```
|
||||
|
||||
์ธํ์ธํ
์ ์
๋ ฅ ์ํ์ 9๊ฐ์ ์ฑ๋์ด ํ์ํฉ๋๋ค. [`runwayml/stable-diffusion-inpainting`](https://huggingface.co/runwayml/stable-diffusion-inpainting)์ ๊ฐ์ ์ฌ์ ํ์ต๋ ์ธํ์ธํ
๋ชจ๋ธ์์ ์ด ๊ฐ์ ํ์ธํ ์ ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
from diffusers import StableDiffusionPipeline
|
||||
|
||||
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
|
||||
pipeline.unet.config["in_channels"]
|
||||
9
|
||||
```
|
||||
|
||||
์ธํ์ธํ
์ ๋ํ text-to-image ๋ชจ๋ธ์ ์ ์ฉํ๊ธฐ ์ํด, `in_channels` ์๋ฅผ 4์์ 9๋ก ์์ ํด์ผ ํ ๊ฒ์
๋๋ค.
|
||||
|
||||
์ฌ์ ํ์ต๋ text-to-image ๋ชจ๋ธ์ ๊ฐ์ค์น์ [`UNet2DConditionModel`]์ ์ด๊ธฐํํ๊ณ `in_channels`๋ฅผ 9๋ก ์์ ํด ์ฃผ์ธ์. `in_channels`์ ์๋ฅผ ์์ ํ๋ฉด ํฌ๊ธฐ๊ฐ ๋ฌ๋ผ์ง๊ธฐ ๋๋ฌธ์ ํฌ๊ธฐ๊ฐ ์ ๋ง๋ ์ค๋ฅ๋ฅผ ํผํ๊ธฐ ์ํด `ignore_mismatched_sizes=True` ๋ฐ `low_cpu_mem_usage=False`๋ฅผ ์ค์ ํด์ผ ํฉ๋๋ค.
|
||||
|
||||
```py
|
||||
from diffusers import UNet2DConditionModel
|
||||
|
||||
model_id = "runwayml/stable-diffusion-v1-5"
|
||||
unet = UNet2DConditionModel.from_pretrained(
|
||||
model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True
|
||||
)
|
||||
```
|
||||
|
||||
Text-to-image ๋ชจ๋ธ๋ก๋ถํฐ ๋ค๋ฅธ ๊ตฌ์ฑ ์์์ ์ฌ์ ํ์ต๋ ๊ฐ์ค์น๋ ์ฒดํฌํฌ์ธํธ๋ก๋ถํฐ ์ด๊ธฐํ๋์ง๋ง `unet`์ ์
๋ ฅ ์ฑ๋ ๊ฐ์ค์น (`conv_in.weight`)๋ ๋๋คํ๊ฒ ์ด๊ธฐํ๋ฉ๋๋ค. ๊ทธ๋ ์ง ์์ผ๋ฉด ๋ชจ๋ธ์ด ๋
ธ์ด์ฆ๋ฅผ ๋ฆฌํดํ๊ธฐ ๋๋ฌธ์ ์ธํ์ธํ
์ ๋ชจ๋ธ์ ํ์ธํ๋ ํ ๋ ์ค์ํฉ๋๋ค.
|
||||
@@ -273,7 +273,7 @@ from diffusers import DiffusionPipeline, UNet2DConditionModel
|
||||
from transformers import CLIPTextModel
|
||||
import torch
|
||||
|
||||
# ํ์ต์ ์ฌ์ฉ๋ ๊ฒ๊ณผ ๋์ผํ ์ธ์(model, revision)๋ก ํ์ดํ๋ผ์ธ์ ๋ก๋ํฉ๋๋ค.
|
||||
# ํ์ต์ ์ฌ์ฉ๋ ๊ฒ๊ณผ ๋์ผํ ์ธ์(model, revision)๋ก ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ต๋๋ค.
|
||||
model_id = "CompVis/stable-diffusion-v1-4"
|
||||
|
||||
unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet")
|
||||
@@ -294,7 +294,7 @@ If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an in
|
||||
from accelerate import Accelerator
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
# ํ์ต์ ์ฌ์ฉ๋ ๊ฒ๊ณผ ๋์ผํ ์ธ์(model, revision)๋ก ํ์ดํ๋ผ์ธ์ ๋ก๋ํฉ๋๋ค.
|
||||
# ํ์ต์ ์ฌ์ฉ๋ ๊ฒ๊ณผ ๋์ผํ ์ธ์(model, revision)๋ก ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ต๋๋ค.
|
||||
model_id = "CompVis/stable-diffusion-v1-4"
|
||||
pipeline = DiffusionPipeline.from_pretrained(model_id)
|
||||
|
||||
|
||||
@@ -102,7 +102,7 @@ accelerate launch train_dreambooth_lora.py \
|
||||
>>> pipe = StableDiffusionPipeline.from_pretrained(model_base, torch_dtype=torch.float16)
|
||||
```
|
||||
|
||||
*๊ธฐ๋ณธ ๋ชจ๋ธ์ ๊ฐ์ค์น ์์* ํ์ธํ๋๋ DreamBooth ๋ชจ๋ธ์์ LoRA ๊ฐ์ค์น๋ฅผ ๋ก๋ํ ๋ค์, ๋ ๋น ๋ฅธ ์ถ๋ก ์ ์ํด ํ์ดํ๋ผ์ธ์ GPU๋ก ์ด๋ํฉ๋๋ค. LoRA ๊ฐ์ค์น๋ฅผ ํ๋ฆฌ์ง๋ ์ฌ์ ํ๋ จ๋ ๋ชจ๋ธ ๊ฐ์ค์น์ ๋ณํฉํ ๋, ์ ํ์ ์ผ๋ก 'scale' ๋งค๊ฐ๋ณ์๋ก ์ด๋ ์ ๋์ ๊ฐ์ค์น๋ฅผ ๋ณํฉํ ์ง ์กฐ์ ํ ์ ์์ต๋๋ค:
|
||||
*๊ธฐ๋ณธ ๋ชจ๋ธ์ ๊ฐ์ค์น ์์* ํ์ธํ๋๋ DreamBooth ๋ชจ๋ธ์์ LoRA ๊ฐ์ค์น๋ฅผ ๋ถ๋ฌ์จ ๋ค์, ๋ ๋น ๋ฅธ ์ถ๋ก ์ ์ํด ํ์ดํ๋ผ์ธ์ GPU๋ก ์ด๋ํฉ๋๋ค. LoRA ๊ฐ์ค์น๋ฅผ ํ๋ฆฌ์ง๋ ์ฌ์ ํ๋ จ๋ ๋ชจ๋ธ ๊ฐ์ค์น์ ๋ณํฉํ ๋, ์ ํ์ ์ผ๋ก 'scale' ๋งค๊ฐ๋ณ์๋ก ์ด๋ ์ ๋์ ๊ฐ์ค์น๋ฅผ ๋ณํฉํ ์ง ์กฐ์ ํ ์ ์์ต๋๋ค:
|
||||
|
||||
<Tip>
|
||||
|
||||
|
||||
73
docs/source/ko/training/overview.mdx
Normal file
73
docs/source/ko/training/overview.mdx
Normal file
@@ -0,0 +1,73 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ๐งจ Diffusers ํ์ต ์์
|
||||
|
||||
์ด๋ฒ ์ฑํฐ์์๋ ๋ค์ํ ์ ์ฆ์ผ์ด์ค๋ค์ ๋ํ ์์ ์ฝ๋๋ค์ ํตํด ์ด๋ป๊ฒํ๋ฉด ํจ๊ณผ์ ์ผ๋ก `diffusers` ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ฌ์ฉํ ์ ์์๊น์ ๋ํด ์์๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค.
|
||||
|
||||
**Note**: ํน์ ์คํผ์
ํ ์์์ฝ๋๋ฅผ ์ฐพ๊ณ ์๋ค๋ฉด, [์ฌ๊ธฐ](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)๋ฅผ ์ฐธ๊ณ ํด๋ณด์ธ์!
|
||||
|
||||
์ฌ๊ธฐ์ ๋ค๋ฃฐ ์์๋ค์ ๋ค์์ ์งํฅํฉ๋๋ค.
|
||||
|
||||
- **์์ฌ์ด ๋ํ๋์ ์ค์น** (Self-contained) : ์ฌ๊ธฐ์ ์ฌ์ฉ๋ ์์ ์ฝ๋๋ค์ ๋ํ๋์ ํจํค์ง๋ค์ ์ ๋ถ `pip install` ๋ช
๋ น์ด๋ฅผ ํตํด ์ค์น ๊ฐ๋ฅํ ํจํค์ง๋ค์
๋๋ค. ๋ํ ์น์ ํ๊ฒ `requirements.txt` ํ์ผ์ ํด๋น ํจํค์ง๋ค์ด ๋ช
์๋์ด ์์ด, `pip install -r requirements.txt`๋ก ๊ฐํธํ๊ฒ ํด๋น ๋ํ๋์๋ค์ ์ค์นํ ์ ์์ต๋๋ค. ์์: [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements.txt)
|
||||
- **์์ฌ์ด ์์ ** (Easy-to-tweak) : ์ ํฌ๋ ๊ฐ๋ฅํ๋ฉด ๋ง์ ์ ์ฆ ์ผ์ด์ค๋ค์ ์ ๊ณตํ๊ณ ์ ํฉ๋๋ค. ํ์ง๋ง ์์๋ ๊ฒฐ๊ตญ ๊ทธ์ ์์๋ผ๋ ์ ๋ค ๊ธฐ์ตํด์ฃผ์ธ์. ์ฌ๊ธฐ์ ์ ๊ณต๋๋ ์์์ฝ๋๋ค์ ๊ทธ์ ๋จ์ํ ๋ณต์ฌ-๋ถํ๋ฃ๊ธฐํ๋ ์์ผ๋ก๋ ์ฌ๋ฌ๋ถ์ด ๋ง์ฃผํ ๋ฌธ์ ๋ค์ ์์ฝ๊ฒ ํด๊ฒฐํ ์ ์์ ๊ฒ์
๋๋ค. ๋ค์ ๋งํด ์ด๋ ์ ๋๋ ์ฌ๋ฌ๋ถ์ ์ํฉ๊ณผ ๋์ฆ์ ๋ง์ถฐ ์ฝ๋๋ฅผ ์ผ์ ๋ถ๋ถ ๊ณ ์ณ๋๊ฐ์ผ ํ ๊ฒ์
๋๋ค. ๋ฐ๋ผ์ ๋๋ถ๋ถ์ ํ์ต ์์๋ค์ ๋ฐ์ดํฐ์ ์ ์ฒ๋ฆฌ ๊ณผ์ ๊ณผ ํ์ต ๊ณผ์ ์ ๋ํ ์ฝ๋๋ค์ ํจ๊ป ์ ๊ณตํจ์ผ๋ก์จ, ์ฌ์ฉ์๊ฐ ๋์ฆ์ ๋ง๊ฒ ์์ฌ์ด ์์ ํ ์ ์๋๋ก ๋๊ณ ์์ต๋๋ค.
|
||||
- **์
๋ฌธ์ ์นํ์ ์ธ** (Beginner-friendly) : ์ด๋ฒ ์ฑํฐ๋ diffusion ๋ชจ๋ธ๊ณผ `diffusers` ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ๋ํ ์ ๋ฐ์ ์ธ ์ดํด๋ฅผ ๋๊ธฐ ์ํด ์์ฑ๋์์ต๋๋ค. ๋ฐ๋ผ์ diffusion ๋ชจ๋ธ์ ๋ํ ์ต์ SOTA (state-of-the-art) ๋ฐฉ๋ฒ๋ก ๋ค ๊ฐ์ด๋ฐ์๋, ์
๋ฌธ์์๊ฒ๋ ๋ง์ด ์ด๋ ค์ธ ์ ์๋ค๊ณ ํ๋จ๋๋ฉด, ํด๋น ๋ฐฉ๋ฒ๋ก ๋ค์ ์ฌ๊ธฐ์ ๋ค๋ฃจ์ง ์์ผ๋ ค๊ณ ํฉ๋๋ค.
|
||||
- **ํ๋์ ํ์คํฌ๋ง ํฌํจํ ๊ฒ**(One-purpose-only): ์ฌ๊ธฐ์ ๋ค๋ฃฐ ์์๋ค์ ํ๋์ ํ์คํฌ๋ง ํฌํจํ๊ณ ์์ด์ผ ํฉ๋๋ค. ๋ฌผ๋ก ์ด๋ฏธ์ง ์ดํด์ํ(super-resolution)์ ์ด๋ฏธ์ง ๋ณด์ (modification)๊ณผ ๊ฐ์ ์ ์ฌํ ๋ชจ๋ธ๋ง ํ๋ก์ธ์ค๋ฅผ ๊ฐ๋ ํ์คํฌ๋ค์ด ์กด์ฌํ๊ฒ ์ง๋ง, ํ๋์ ์์ ์ ํ๋์ ํ์คํฌ๋ง์ ๋ด๋ ๊ฒ์ด ๋ ์ดํดํ๊ธฐ ์ฉ์ดํ๋ค๊ณ ํ๋จํ๊ธฐ ๋๋ฌธ์
๋๋ค.
|
||||
|
||||
|
||||
|
||||
์ ํฌ๋ diffusion ๋ชจ๋ธ์ ๋ํ์ ์ธ ํ์คํฌ๋ค์ ๋ค๋ฃจ๋ ๊ณต์ ์์ ๋ฅผ ์ ๊ณตํ๊ณ ์์ต๋๋ค. *๊ณต์* ์์ ๋ ํ์ฌ ์งํํ์ผ๋ก `diffusers` ๊ด๋ฆฌ์๋ค(maintainers)์ ์ํด ๊ด๋ฆฌ๋๊ณ ์์ต๋๋ค. ๋ํ ์ ํฌ๋ ์์ ์ ์ํ ์ ํฌ์ ์ฒ ํ์ ์๊ฒฉํ๊ฒ ๋ฐ๋ฅด๊ณ ์ ๋
ธ๋ ฅํ๊ณ ์์ต๋๋ค. ํน์ ์ฌ๋ฌ๋ถ๊ป์ ์ด๋ฌํ ์์๊ฐ ๋ฐ๋์ ํ์ํ๋ค๊ณ ์๊ฐ๋์ ๋ค๋ฉด, ์ธ์ ๋ ์ง [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) ํน์ ์ง์ [Pull Request](https://github.com/huggingface/diffusers/compare)๋ฅผ ์ฃผ์๊ธฐ ๋ฐ๋๋๋ค. ์ ํฌ๋ ์ธ์ ๋ ํ์์
๋๋ค!
|
||||
|
||||
ํ์ต ์์๋ค์ ๋ค์ํ ํ์คํฌ๋ค์ ๋ํด diffusion ๋ชจ๋ธ์ ์ฌ์ ํ์ต(pretrain)ํ๊ฑฐ๋ ํ์ธํ๋(fine-tuning)ํ๋ ๋ฒ์ ๋ณด์ฌ์ค๋๋ค. ํ์ฌ ๋ค์๊ณผ ๊ฐ์ ์์ ๋ค์ ์ง์ํ๊ณ ์์ต๋๋ค.
|
||||
|
||||
- [Unconditional Training](./unconditional_training)
|
||||
- [Text-to-Image Training](./text2image)
|
||||
- [Text Inversion](./text_inversion)
|
||||
- [Dreambooth](./dreambooth)
|
||||
|
||||
memory-efficient attention ์ฐ์ฐ์ ์ํํ๊ธฐ ์ํด, ๊ฐ๋ฅํ๋ฉด [xFormers](../optimization/xformers)๋ฅผ ์ค์นํด์ฃผ์๊ธฐ ๋ฐ๋๋๋ค. ์ด๋ฅผ ํตํด ํ์ต ์๋๋ฅผ ๋๋ฆฌ๊ณ ๋ฉ๋ชจ๋ฆฌ์ ๋ํ ๋ถ๋ด์ ์ค์ผ ์ ์์ต๋๋ค.
|
||||
|
||||
| Task | ๐ค Accelerate | ๐ค Datasets | Colab
|
||||
|---|---|:---:|:---:|
|
||||
| [**Unconditional Image Generation**](./unconditional_training) | โ
| โ
| [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
|
||||
| [**Text-to-Image fine-tuning**](./text2image) | โ
| โ
|
|
||||
| [**Textual Inversion**](./text_inversion) | โ
| - | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
|
||||
| [**Dreambooth**](./dreambooth) | โ
| - | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb)
|
||||
| [**Training with LoRA**](./lora) | โ
| - | - |
|
||||
| [**ControlNet**](./controlnet) | โ
| โ
| - |
|
||||
| [**InstructPix2Pix**](./instructpix2pix) | โ
| โ
| - |
|
||||
| [**Custom Diffusion**](./custom_diffusion) | โ
| โ
| - |
|
||||
|
||||
|
||||
## ์ปค๋ฎค๋ํฐ
|
||||
|
||||
๊ณต์ ์์ ์ธ์๋ **์ปค๋ฎค๋ํฐ ์์ ** ์ญ์ ์ ๊ณตํ๊ณ ์์ต๋๋ค. ํด๋น ์์ ๋ค์ ์ฐ๋ฆฌ์ ์ปค๋ฎค๋ํฐ์ ์ํด ๊ด๋ฆฌ๋ฉ๋๋ค. ์ปค๋ฎค๋ํฐ ์์ฉจ๋ ํ์ต ์์๋ ์ถ๋ก ํ์ดํ๋ผ์ธ์ผ๋ก ๊ตฌ์ฑ๋ ์ ์์ต๋๋ค. ์ด๋ฌํ ์ปค๋ฎค๋ํฐ ์์๋ค์ ๊ฒฝ์ฐ, ์์ ์ ์ํ๋ ์ฒ ํ๋ค์ ์ข ๋ ๊ด๋ํ๊ฒ ์ ์ฉํ๊ณ ์์ต๋๋ค. ๋ํ ์ด๋ฌํ ์ปค๋ฎค๋ํฐ ์์๋ค์ ๊ฒฝ์ฐ, ๋ชจ๋ ์ด์๋ค์ ๋ํ ์ ์ง๋ณด์๋ฅผ ๋ณด์ฅํ ์๋ ์์ต๋๋ค.
|
||||
|
||||
์ ์ฉํ๊ธด ํ์ง๋ง, ์์ง์ ๋์ค์ ์ด์ง ๋ชปํ๊ฑฐ๋ ์ ํฌ์ ์ฒ ํ์ ๋ถํฉํ์ง ์๋ ์์ ๋ค์ [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) ํด๋์ ๋ด๊ธฐ๊ฒ ๋ฉ๋๋ค.
|
||||
|
||||
**Note**: ์ปค๋ฎค๋ํฐ ์์ ๋ `diffusers`์ ๊ธฐ์ฌ(contribution)๋ฅผ ํฌ๋งํ๋ ๋ถ๋ค์๊ฒ [์์ฃผ ์ข์ ๊ธฐ์ฌ ์๋จ](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)์ด ๋ ์ ์์ต๋๋ค.
|
||||
|
||||
## ์ฃผ๋ชฉํ ์ฌํญ๋ค
|
||||
|
||||
์ต์ ๋ฒ์ ์ ์์ ์ฝ๋๋ค์ ์ฑ๊ณต์ ์ธ ๊ตฌ๋์ ๋ณด์ฅํ๊ธฐ ์ํด์๋, ๋ฐ๋์ **์์ค์ฝ๋๋ฅผ ํตํด `diffusers`๋ฅผ ์ค์นํด์ผ ํ๋ฉฐ,** ํด๋น ์์ ์ฝ๋๋ค์ด ์๊ตฌํ๋ ๋ํ๋์๋ค ์ญ์ ์ค์นํด์ผ ํฉ๋๋ค. ์ด๋ฅผ ์ํด ์๋ก์ด ๊ฐ์ ํ๊ฒฝ์ ๊ตฌ์ถํ๊ณ ๋ค์์ ๋ช
๋ น์ด๋ฅผ ์คํํด์ผ ํฉ๋๋ค.
|
||||
|
||||
```bash
|
||||
git clone https://github.com/huggingface/diffusers
|
||||
cd diffusers
|
||||
pip install .
|
||||
```
|
||||
|
||||
๊ทธ ๋ค์ `cd` ๋ช
๋ น์ด๋ฅผ ํตํด ํด๋น ์์ ๋๋ ํ ๋ฆฌ์ ์ ๊ทผํด์ ๋ค์ ๋ช
๋ น์ด๋ฅผ ์คํํ๋ฉด ๋ฉ๋๋ค.
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
275
docs/source/ko/training/text_inversion.mdx
Normal file
275
docs/source/ko/training/text_inversion.mdx
Normal file
@@ -0,0 +1,275 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
|
||||
# Textual-Inversion
|
||||
|
||||
[[open-in-colab]]
|
||||
|
||||
[textual-inversion](https://arxiv.org/abs/2208.01618)์ ์์์ ์์ ์ด๋ฏธ์ง์์ ์๋ก์ด ์ฝ์
ํธ๋ฅผ ํฌ์ฐฉํ๋ ๊ธฐ๋ฒ์
๋๋ค. ์ด ๊ธฐ์ ์ ์๋ [Latent Diffusion](https://github.com/CompVis/latent-diffusion)์์ ์์ฐ๋์์ง๋ง, ์ดํ [Stable Diffusion](https://huggingface.co/docs/diffusers/main/en/conceptual/stable_diffusion)๊ณผ ๊ฐ์ ์ ์ฌํ ๋ค๋ฅธ ๋ชจ๋ธ์๋ ์ ์ฉ๋์์ต๋๋ค. ํ์ต๋ ์ฝ์
ํธ๋ text-to-image ํ์ดํ๋ผ์ธ์์ ์์ฑ๋ ์ด๋ฏธ์ง๋ฅผ ๋ ์ ์ ์ดํ๋ ๋ฐ ์ฌ์ฉํ ์ ์์ต๋๋ค. ์ด ๋ชจ๋ธ์ ํ
์คํธ ์ธ์ฝ๋์ ์๋ฒ ๋ฉ ๊ณต๊ฐ์์ ์๋ก์ด '๋จ์ด'๋ฅผ ํ์ตํ์ฌ ๊ฐ์ธํ๋ ์ด๋ฏธ์ง ์์ฑ์ ์ํ ํ
์คํธ ํ๋กฌํํธ ๋ด์์ ์ฌ์ฉ๋ฉ๋๋ค.
|
||||
|
||||

|
||||
<small>By using just 3-5 images you can teach new concepts to a model such as Stable Diffusion for personalized image generation <a href="https://github.com/rinongal/textual_inversion">(image source)</a>.</small>
|
||||
|
||||
์ด ๊ฐ์ด๋์์๋ textual-inversion์ผ๋ก [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) ๋ชจ๋ธ์ ํ์ตํ๋ ๋ฐฉ๋ฒ์ ์ค๋ช
ํฉ๋๋ค. ์ด ๊ฐ์ด๋์์ ์ฌ์ฉ๋ ๋ชจ๋ textual-inversion ํ์ต ์คํฌ๋ฆฝํธ๋ [์ฌ๊ธฐ](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion)์์ ํ์ธํ ์ ์์ต๋๋ค. ๋ด๋ถ์ ์ผ๋ก ์ด๋ป๊ฒ ์๋ํ๋์ง ์์ธํ ์ดํด๋ณด๊ณ ์ถ์ผ์๋ค๋ฉด ํด๋น ๋งํฌ๋ฅผ ์ฐธ์กฐํด์ฃผ์๊ธฐ ๋ฐ๋๋๋ค.
|
||||
|
||||
<Tip>
|
||||
|
||||
[Stable Diffusion Textual Inversion Concepts Library](https://huggingface.co/sd-concepts-library)์๋ ์ปค๋ฎค๋ํฐ์์ ์ ์ํ ํ์ต๋ textual-inversion ๋ชจ๋ธ๋ค์ด ์์ต๋๋ค. ์๊ฐ์ด ์ง๋จ์ ๋ฐ๋ผ ๋ ๋ง์ ์ฝ์
ํธ๋ค์ด ์ถ๊ฐ๋์ด ์ ์ฉํ ๋ฆฌ์์ค๋ก ์ฑ์ฅํ ๊ฒ์
๋๋ค!
|
||||
|
||||
</Tip>
|
||||
|
||||
์์ํ๊ธฐ ์ ์ ํ์ต์ ์ํ ์์กด์ฑ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ค์ ์ค์นํด์ผ ํฉ๋๋ค:
|
||||
|
||||
```bash
|
||||
pip install diffusers accelerate transformers
|
||||
```
|
||||
|
||||
์์กด์ฑ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ค์ ์ค์น๊ฐ ์๋ฃ๋๋ฉด, [๐คAccelerate](https://github.com/huggingface/accelerate/) ํ๊ฒฝ์ ์ด๊ธฐํ์ํต๋๋ค.
|
||||
|
||||
```bash
|
||||
accelerate config
|
||||
```
|
||||
|
||||
๋ณ๋์ ์ค์ ์์ด, ๊ธฐ๋ณธ ๐คAccelerate ํ๊ฒฝ์ ์ค์ ํ๋ ค๋ฉด ๋ค์๊ณผ ๊ฐ์ด ํ์ธ์:
|
||||
|
||||
```bash
|
||||
accelerate config default
|
||||
```
|
||||
|
||||
๋๋ ์ฌ์ฉ ์ค์ธ ํ๊ฒฝ์ด ๋
ธํธ๋ถ๊ณผ ๊ฐ์ ๋ํํ ์
ธ์ ์ง์ํ์ง ์๋๋ค๋ฉด, ๋ค์๊ณผ ๊ฐ์ด ์ฌ์ฉํ ์ ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
from accelerate.utils import write_basic_config
|
||||
|
||||
write_basic_config()
|
||||
```
|
||||
|
||||
๋ง์ง๋ง์ผ๋ก, Memory-Efficient Attention์ ํตํด ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ ์ค์ด๊ธฐ ์ํด [xFormers](https://huggingface.co/docs/diffusers/main/en/training/optimization/xformers)๋ฅผ ์ค์นํฉ๋๋ค. xFormers๋ฅผ ์ค์นํ ํ, ํ์ต ์คํฌ๋ฆฝํธ์ `--enable_xformers_memory_efficient_attention` ์ธ์๋ฅผ ์ถ๊ฐํฉ๋๋ค. xFormers๋ Flax์์ ์ง์๋์ง ์์ต๋๋ค.
|
||||
|
||||
## ํ๋ธ์ ๋ชจ๋ธ ์
๋ก๋ํ๊ธฐ
|
||||
|
||||
๋ชจ๋ธ์ ํ๋ธ์ ์ ์ฅํ๋ ค๋ฉด, ํ์ต ์คํฌ๋ฆฝํธ์ ๋ค์ ์ธ์๋ฅผ ์ถ๊ฐํด์ผ ํฉ๋๋ค.
|
||||
|
||||
```bash
|
||||
--push_to_hub
|
||||
```
|
||||
|
||||
## ์ฒดํฌํฌ์ธํธ ์ ์ฅ ๋ฐ ๋ถ๋ฌ์ค๊ธฐ
|
||||
|
||||
ํ์ต์ค์ ๋ชจ๋ธ์ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ๊ธฐ์ ์ผ๋ก ์ ์ฅํ๋ ๊ฒ์ด ์ข์ต๋๋ค. ์ด๋ ๊ฒ ํ๋ฉด ์ด๋ค ์ด์ ๋ก๋ ํ์ต์ด ์ค๋จ๋ ๊ฒฝ์ฐ ์ ์ฅ๋ ์ฒดํฌํฌ์ธํธ์์ ํ์ต์ ๋ค์ ์์ํ ์ ์์ต๋๋ค. ํ์ต ์คํฌ๋ฆฝํธ์ ๋ค์ ์ธ์๋ฅผ ์ ๋ฌํ๋ฉด 500๋จ๊ณ๋ง๋ค ์ ์ฒด ํ์ต ์ํ๊ฐ `output_dir`์ ํ์ ํด๋์ ์ฒดํฌํฌ์ธํธ๋ก์ ์ ์ฅ๋ฉ๋๋ค.
|
||||
|
||||
```bash
|
||||
--checkpointing_steps=500
|
||||
```
|
||||
|
||||
์ ์ฅ๋ ์ฒดํฌํฌ์ธํธ์์ ํ์ต์ ์ฌ๊ฐํ๋ ค๋ฉด, ํ์ต ์คํฌ๋ฆฝํธ์ ์ฌ๊ฐํ ํน์ ์ฒดํฌํฌ์ธํธ์ ๋ค์ ์ธ์๋ฅผ ์ ๋ฌํ์ธ์.
|
||||
|
||||
```bash
|
||||
--resume_from_checkpoint="checkpoint-1500"
|
||||
```
|
||||
|
||||
## ํ์ธ ํ๋
|
||||
|
||||
ํ์ต์ฉ ๋ฐ์ดํฐ์
์ผ๋ก [๊ณ ์์ด ์ฅ๋๊ฐ ๋ฐ์ดํฐ์
](https://huggingface.co/datasets/diffusers/cat_toy_example)์ ๋ค์ด๋ก๋ํ์ฌ ๋๋ ํ ๋ฆฌ์ ์ ์ฅํ์ธ์. ์ฌ๋ฌ๋ถ๋ง์ ๊ณ ์ ํ ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ๊ณ ์ ํ๋ค๋ฉด, [ํ์ต์ฉ ๋ฐ์ดํฐ์
๋ง๋ค๊ธฐ](https://huggingface.co/docs/diffusers/training/create_dataset) ๊ฐ์ด๋๋ฅผ ์ดํด๋ณด์๊ธฐ ๋ฐ๋๋๋ค.
|
||||
|
||||
```py
|
||||
from huggingface_hub import snapshot_download
|
||||
|
||||
local_dir = "./cat"
|
||||
snapshot_download(
|
||||
"diffusers/cat_toy_example", local_dir=local_dir, repo_type="dataset", ignore_patterns=".gitattributes"
|
||||
)
|
||||
```
|
||||
|
||||
๋ชจ๋ธ์ ๋ฆฌํฌ์งํ ๋ฆฌ ID(๋๋ ๋ชจ๋ธ ๊ฐ์ค์น๊ฐ ํฌํจ๋ ๋๋ ํฐ๋ฆฌ ๊ฒฝ๋ก)๋ฅผ `MODEL_NAME` ํ๊ฒฝ ๋ณ์์ ํ ๋นํ๊ณ , ํด๋น ๊ฐ์ [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) ์ธ์์ ์ ๋ฌํฉ๋๋ค. ๊ทธ๋ฆฌ๊ณ ์ด๋ฏธ์ง๊ฐ ํฌํจ๋ ๋๋ ํฐ๋ฆฌ ๊ฒฝ๋ก๋ฅผ `DATA_DIR` ํ๊ฒฝ ๋ณ์์ ํ ๋นํฉ๋๋ค.
|
||||
|
||||
์ด์ [ํ์ต ์คํฌ๋ฆฝํธ](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py)๋ฅผ ์คํํ ์ ์์ต๋๋ค. ์คํฌ๋ฆฝํธ๋ ๋ค์ ํ์ผ์ ์์ฑํ๊ณ ๋ฆฌํฌ์งํ ๋ฆฌ์ ์ ์ฅํฉ๋๋ค.
|
||||
|
||||
- `learned_embeds.bin`
|
||||
- `token_identifier.txt`
|
||||
- `type_of_concept.txt`.
|
||||
|
||||
<Tip>
|
||||
|
||||
๐กV100 GPU 1๊ฐ๋ฅผ ๊ธฐ์ค์ผ๋ก ์ ์ฒด ํ์ต์๋ ์ต๋ 1์๊ฐ์ด ๊ฑธ๋ฆฝ๋๋ค. ํ์ต์ด ์๋ฃ๋๊ธฐ๋ฅผ ๊ธฐ๋ค๋ฆฌ๋ ๋์ ๊ถ๊ธํ ์ ์ด ์์ผ๋ฉด ์๋ ์น์
์์ [textual-inversion์ด ์ด๋ป๊ฒ ์๋ํ๋์ง](https://huggingface.co/docs/diffusers/training/text_inversion#how-it-works) ์์ ๋กญ๊ฒ ํ์ธํ์ธ์ !
|
||||
|
||||
</Tip>
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
```bash
|
||||
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
|
||||
export DATA_DIR="./cat"
|
||||
|
||||
accelerate launch textual_inversion.py \
|
||||
--pretrained_model_name_or_path=$MODEL_NAME \
|
||||
--train_data_dir=$DATA_DIR \
|
||||
--learnable_property="object" \
|
||||
--placeholder_token="<cat-toy>" --initializer_token="toy" \
|
||||
--resolution=512 \
|
||||
--train_batch_size=1 \
|
||||
--gradient_accumulation_steps=4 \
|
||||
--max_train_steps=3000 \
|
||||
--learning_rate=5.0e-04 --scale_lr \
|
||||
--lr_scheduler="constant" \
|
||||
--lr_warmup_steps=0 \
|
||||
--output_dir="textual_inversion_cat" \
|
||||
--push_to_hub
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
๐กํ์ต ์ฑ๋ฅ์ ์ฌ๋ฆฌ๊ธฐ ์ํด, ํ๋ ์ด์คํ๋ ํ ํฐ(`<cat-toy>`)์ (๋จ์ผํ ์๋ฒ ๋ฉ ๋ฒกํฐ๊ฐ ์๋) ๋ณต์์ ์๋ฒ ๋ฉ ๋ฒกํฐ๋ก ํํํ๋ ๊ฒ ์ญ์ ๊ณ ๋ คํ ์์ต๋๋ค. ์ด๋ฌํ ํธ๋ฆญ์ด ๋ชจ๋ธ์ด ๋ณด๋ค ๋ณต์กํ ์ด๋ฏธ์ง์ ์คํ์ผ(์์ ๋งํ ์ฝ์
ํธ)์ ๋ ์ ์บก์ฒํ๋ ๋ฐ ๋์์ด ๋ ์ ์์ต๋๋ค. ๋ณต์์ ์๋ฒ ๋ฉ ๋ฒกํฐ ํ์ต์ ํ์ฑํํ๋ ค๋ฉด ๋ค์ ์ต์
์ ์ ๋ฌํ์ญ์์ค.
|
||||
|
||||
```bash
|
||||
--num_vectors=5
|
||||
```
|
||||
|
||||
</Tip>
|
||||
</pt>
|
||||
<jax>
|
||||
|
||||
TPU์ ์ก์ธ์คํ ์ ์๋ ๊ฒฝ์ฐ, [Flax ํ์ต ์คํฌ๋ฆฝํธ](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py)๋ฅผ ์ฌ์ฉํ์ฌ ๋ ๋น ๋ฅด๊ฒ ๋ชจ๋ธ์ ํ์ต์์ผ๋ณด์ธ์. (๋ฌผ๋ก GPU์์๋ ์๋ํฉ๋๋ค.) ๋์ผํ ์ค์ ์์ Flax ํ์ต ์คํฌ๋ฆฝํธ๋ PyTorch ํ์ต ์คํฌ๋ฆฝํธ๋ณด๋ค ์ต์ 70% ๋ ๋นจ๋ผ์ผ ํฉ๋๋ค! โก๏ธ
|
||||
|
||||
์์ํ๊ธฐ ์์ Flax์ ๋ํ ์์กด์ฑ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ค์ ์ค์นํด์ผ ํฉ๋๋ค.
|
||||
|
||||
```bash
|
||||
pip install -U -r requirements_flax.txt
|
||||
```
|
||||
|
||||
๋ชจ๋ธ์ ๋ฆฌํฌ์งํ ๋ฆฌ ID(๋๋ ๋ชจ๋ธ ๊ฐ์ค์น๊ฐ ํฌํจ๋ ๋๋ ํฐ๋ฆฌ ๊ฒฝ๋ก)๋ฅผ `MODEL_NAME` ํ๊ฒฝ ๋ณ์์ ํ ๋นํ๊ณ , ํด๋น ๊ฐ์ [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) ์ธ์์ ์ ๋ฌํฉ๋๋ค.
|
||||
|
||||
๊ทธ๋ฐ ๋ค์ [ํ์ต ์คํฌ๋ฆฝํธ](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py)๋ฅผ ์์ํ ์ ์์ต๋๋ค.
|
||||
|
||||
```bash
|
||||
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
|
||||
export DATA_DIR="./cat"
|
||||
|
||||
python textual_inversion_flax.py \
|
||||
--pretrained_model_name_or_path=$MODEL_NAME \
|
||||
--train_data_dir=$DATA_DIR \
|
||||
--learnable_property="object" \
|
||||
--placeholder_token="<cat-toy>" --initializer_token="toy" \
|
||||
--resolution=512 \
|
||||
--train_batch_size=1 \
|
||||
--max_train_steps=3000 \
|
||||
--learning_rate=5.0e-04 --scale_lr \
|
||||
--output_dir="textual_inversion_cat" \
|
||||
--push_to_hub
|
||||
```
|
||||
</jax>
|
||||
</frameworkcontent>
|
||||
|
||||
### ์ค๊ฐ ๋ก๊น
|
||||
|
||||
๋ชจ๋ธ์ ํ์ต ์งํ ์ํฉ์ ์ถ์ ํ๋ ๋ฐ ๊ด์ฌ์ด ์๋ ๊ฒฝ์ฐ, ํ์ต ๊ณผ์ ์์ ์์ฑ๋ ์ด๋ฏธ์ง๋ฅผ ์ ์ฅํ ์ ์์ต๋๋ค. ํ์ต ์คํฌ๋ฆฝํธ์ ๋ค์ ์ธ์๋ฅผ ์ถ๊ฐํ์ฌ ์ค๊ฐ ๋ก๊น
์ ํ์ฑํํฉ๋๋ค.
|
||||
|
||||
- `validation_prompt` : ์ํ์ ์์ฑํ๋ ๋ฐ ์ฌ์ฉ๋๋ ํ๋กฌํํธ(๊ธฐ๋ณธ๊ฐ์ `None`์ผ๋ก ์ค์ ๋๋ฉฐ, ์ด ๋ ์ค๊ฐ ๋ก๊น
์ ๋นํ์ฑํ๋จ)
|
||||
- `num_validation_images` : ์์ฑํ ์ํ ์ด๋ฏธ์ง ์
|
||||
- `validation_steps` : `validation_prompt`๋ก๋ถํฐ ์ํ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๊ธฐ ์ ์คํ
์ ์
|
||||
|
||||
```bash
|
||||
--validation_prompt="A <cat-toy> backpack"
|
||||
--num_validation_images=4
|
||||
--validation_steps=100
|
||||
```
|
||||
|
||||
## ์ถ๋ก
|
||||
|
||||
๋ชจ๋ธ์ ํ์ตํ ํ์๋, ํด๋น ๋ชจ๋ธ์ [`StableDiffusionPipeline`]์ ์ฌ์ฉํ์ฌ ์ถ๋ก ์ ์ฌ์ฉํ ์ ์์ต๋๋ค.
|
||||
|
||||
textual-inversion ์คํฌ๋ฆฝํธ๋ ๊ธฐ๋ณธ์ ์ผ๋ก textual-inversion์ ํตํด ์ป์ด์ง ์๋ฒ ๋ฉ ๋ฒกํฐ๋ง์ ์ ์ฅํฉ๋๋ค. ํด๋น ์๋ฒ ๋ฉ ๋ฒกํฐ๋ค์ ํ
์คํธ ์ธ์ฝ๋์ ์๋ฒ ๋ฉ ํ๋ ฌ์ ์ถ๊ฐ๋์ด ์์ต์ต๋๋ค.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
<Tip>
|
||||
|
||||
๐ก ์ปค๋ฎค๋ํฐ๋ [sd-concepts-library](https://huggingface.co/sd-concepts-library) ๋ผ๋ ๋๊ท๋ชจ์ textual-inversion ์๋ฒ ๋ฉ ๋ฒกํฐ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ๋ง๋ค์์ต๋๋ค. textual-inversion ์๋ฒ ๋ฉ์ ๋ฐ๋ฐ๋ฅ๋ถํฐ ํ์ตํ๋ ๋์ , ํด๋น ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ๋ณธ์ธ์ด ์ฐพ๋ textual-inversion ์๋ฒ ๋ฉ์ด ์ด๋ฏธ ์ถ๊ฐ๋์ด ์์ง ์์์ง๋ฅผ ํ์ธํ๋ ๊ฒ๋ ์ข์ ๋ฐฉ๋ฒ์ด ๋ ๊ฒ ๊ฐ์ต๋๋ค.
|
||||
|
||||
</Tip>
|
||||
|
||||
textual-inversion ์๋ฒ ๋ฉ ๋ฒกํฐ์ ๋ถ๋ฌ์ค๊ธฐ ์ํด์๋, ๋จผ์ ํด๋น ์๋ฒ ๋ฉ ๋ฒกํฐ๋ฅผ ํ์ตํ ๋ ์ฌ์ฉํ ๋ชจ๋ธ์ ๋ถ๋ฌ์์ผ ํฉ๋๋ค. ์ฌ๊ธฐ์๋ [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/docs/diffusers/training/runwayml/stable-diffusion-v1-5) ๋ชจ๋ธ์ด ์ฌ์ฉ๋์๋ค๊ณ ๊ฐ์ ํ๊ณ ๋ถ๋ฌ์ค๊ฒ ์ต๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionPipeline
|
||||
import torch
|
||||
|
||||
model_id = "runwayml/stable-diffusion-v1-5"
|
||||
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
|
||||
```
|
||||
|
||||
๋ค์์ผ๋ก `TextualInversionLoaderMixin.load_textual_inversion` ํจ์๋ฅผ ํตํด, textual-inversion ์๋ฒ ๋ฉ ๋ฒกํฐ๋ฅผ ๋ถ๋ฌ์์ผ ํฉ๋๋ค. ์ฌ๊ธฐ์ ์ฐ๋ฆฌ๋ ์ด์ ์ `<cat-toy>` ์์ ์ ์๋ฒ ๋ฉ์ ๋ถ๋ฌ์ฌ ๊ฒ์
๋๋ค.
|
||||
|
||||
```python
|
||||
pipe.load_textual_inversion("sd-concepts-library/cat-toy")
|
||||
```
|
||||
|
||||
์ด์ ํ๋ ์ด์คํ๋ ํ ํฐ(`<cat-toy>`)์ด ์ ๋์ํ๋์ง๋ฅผ ํ์ธํ๋ ํ์ดํ๋ผ์ธ์ ์คํํ ์ ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
prompt = "A <cat-toy> backpack"
|
||||
|
||||
image = pipe(prompt, num_inference_steps=50).images[0]
|
||||
image.save("cat-backpack.png")
|
||||
```
|
||||
|
||||
`TextualInversionLoaderMixin.load_textual_inversion`์ Diffusers ํ์์ผ๋ก ์ ์ฅ๋ ํ
์คํธ ์๋ฒ ๋ฉ ๋ฒกํฐ๋ฅผ ๋ก๋ํ ์ ์์ ๋ฟ๋ง ์๋๋ผ, [Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui) ํ์์ผ๋ก ์ ์ฅ๋ ์๋ฒ ๋ฉ ๋ฒกํฐ๋ ๋ก๋ํ ์ ์์ต๋๋ค. ์ด๋ ๊ฒ ํ๋ ค๋ฉด, ๋จผ์ [civitAI](https://civitai.com/models/3036?modelVersionId=8387)์์ ์๋ฒ ๋ฉ ๋ฒกํฐ๋ฅผ ๋ค์ด๋ก๋ํ ๋ค์ ๋ก์ปฌ์์ ๋ถ๋ฌ์์ผ ํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
pipe.load_textual_inversion("./charturnerv2.pt")
|
||||
```
|
||||
</pt>
|
||||
<jax>
|
||||
|
||||
ํ์ฌ Flax์ ๋ํ `load_textual_inversion` ํจ์๋ ์์ต๋๋ค. ๋ฐ๋ผ์ ํ์ต ํ textual-inversion ์๋ฒ ๋ฉ ๋ฒกํฐ๊ฐ ๋ชจ๋ธ์ ์ผ๋ถ๋ก์ ์ ์ฅ๋์๋์ง๋ฅผ ํ์ธํด์ผ ํฉ๋๋ค. ๊ทธ๋ฐ ๋ค์์ ๋ค๋ฅธ Flax ๋ชจ๋ธ๊ณผ ๋ง์ฐฌ๊ฐ์ง๋ก ์คํํ ์ ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
import jax
|
||||
import numpy as np
|
||||
from flax.jax_utils import replicate
|
||||
from flax.training.common_utils import shard
|
||||
from diffusers import FlaxStableDiffusionPipeline
|
||||
|
||||
model_path = "path-to-your-trained-model"
|
||||
pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16)
|
||||
|
||||
prompt = "A <cat-toy> backpack"
|
||||
prng_seed = jax.random.PRNGKey(0)
|
||||
num_inference_steps = 50
|
||||
|
||||
num_samples = jax.device_count()
|
||||
prompt = num_samples * [prompt]
|
||||
prompt_ids = pipeline.prepare_inputs(prompt)
|
||||
|
||||
# shard inputs and rng
|
||||
params = replicate(params)
|
||||
prng_seed = jax.random.split(prng_seed, jax.device_count())
|
||||
prompt_ids = shard(prompt_ids)
|
||||
|
||||
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
|
||||
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
|
||||
image.save("cat-backpack.png")
|
||||
```
|
||||
</jax>
|
||||
</frameworkcontent>
|
||||
|
||||
## ์๋ ๋ฐฉ์
|
||||
|
||||

|
||||
<small>Architecture overview from the Textual Inversion <a href="https://textual-inversion.github.io/">blog post.</a></small>
|
||||
|
||||
์ผ๋ฐ์ ์ผ๋ก ํ
์คํธ ํ๋กฌํํธ๋ ๋ชจ๋ธ์ ์ ๋ฌ๋๊ธฐ ์ ์ ์๋ฒ ๋ฉ์ผ๋ก ํ ํฐํ๋ฉ๋๋ค. textual-inversion์ ๋น์ทํ ์์
์ ์ํํ์ง๋ง, ์ ๋ค์ด์ด๊ทธ๋จ์ ํน์ ํ ํฐ `S*`๋ก๋ถํฐ ์๋ก์ด ํ ํฐ ์๋ฒ ๋ฉ `v*`๋ฅผ ํ์ตํฉ๋๋ค. ๋ชจ๋ธ์ ์์ํ์ ๋ํจ์ ๋ชจ๋ธ์ ์กฐ์ ํ๋ ๋ฐ ์ฌ์ฉ๋๋ฉฐ, ๋ํจ์ ๋ชจ๋ธ์ด ๋จ ๋ช ๊ฐ์ ์์ ์ด๋ฏธ์ง์์ ์ ์ํ๊ณ ์๋ก์ด ์ฝ์
ํธ๋ฅผ ์ดํดํ๋ ๋ฐ ๋์์ ์ค๋๋ค.
|
||||
|
||||
์ด๋ฅผ ์ํด textual-inversion์ ์ ๋๋ ์ดํฐ ๋ชจ๋ธ๊ณผ ํ์ต์ฉ ์ด๋ฏธ์ง์ ๋
ธ์ด์ฆ ๋ฒ์ ์ ์ฌ์ฉํฉ๋๋ค. ์ ๋๋ ์ดํฐ๋ ๋
ธ์ด์ฆ๊ฐ ์ ์ ๋ฒ์ ์ ์ด๋ฏธ์ง๋ฅผ ์์ธกํ๋ ค๊ณ ์๋ํ๋ฉฐ ํ ํฐ ์๋ฒ ๋ฉ `v*`์ ์ ๋๋ ์ดํฐ์ ์ฑ๋ฅ์ ๋ฐ๋ผ ์ต์ ํ๋ฉ๋๋ค. ํ ํฐ ์๋ฒ ๋ฉ์ด ์๋ก์ด ์ฝ์
ํธ๋ฅผ ์ฑ๊ณต์ ์ผ๋ก ํฌ์ฐฉํ๋ฉด ๋ํจ์ ๋ชจ๋ธ์ ๋ ์ ์ฉํ ์ ๋ณด๋ฅผ ์ ๊ณตํ๊ณ ๋
ธ์ด์ฆ๊ฐ ์ ์ ๋ ์ ๋ช
ํ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๋ ๋ฐ ๋์์ด ๋ฉ๋๋ค. ์ด๋ฌํ ์ต์ ํ ํ๋ก์ธ์ค๋ ์ผ๋ฐ์ ์ผ๋ก ๋ค์ํ ํ๋กฌํํธ์ ์ด๋ฏธ์ง์ ์์ฒ ๋ฒ์ ๋
ธ์ถ๋จ์ผ๋ก์จ ์ด๋ฃจ์ด์ง๋๋ค.
|
||||
|
||||
144
docs/source/ko/training/unconditional_training.mdx
Normal file
144
docs/source/ko/training/unconditional_training.mdx
Normal file
@@ -0,0 +1,144 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Unconditional ์ด๋ฏธ์ง ์์ฑ
|
||||
|
||||
unconditional ์ด๋ฏธ์ง ์์ฑ์ text-to-image ๋๋ image-to-image ๋ชจ๋ธ๊ณผ ๋ฌ๋ฆฌ ํ
์คํธ๋ ์ด๋ฏธ์ง์ ๋ํ ์กฐ๊ฑด์ด ์์ด ํ์ต ๋ฐ์ดํฐ ๋ถํฌ์ ์ ์ฌํ ์ด๋ฏธ์ง๋ง์ ์์ฑํฉ๋๋ค.
|
||||
|
||||
<iframe
|
||||
src="https://stevhliu-ddpm-butterflies-128.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="550"
|
||||
></iframe>
|
||||
|
||||
|
||||
์ด ๊ฐ์ด๋์์๋ ๊ธฐ์กด์ ์กด์ฌํ๋ ๋ฐ์ดํฐ์
๊ณผ ์์ ๋ง์ ์ปค์คํ
๋ฐ์ดํฐ์
์ ๋ํด unconditional image generation ๋ชจ๋ธ์ ํ๋ จํ๋ ๋ฐฉ๋ฒ์ ์ค๋ช
ํฉ๋๋ค. ํ๋ จ ์ธ๋ถ ์ฌํญ์ ๋ํด ๋ ์์ธํ ์๊ณ ์ถ๋ค๋ฉด unconditional image generation์ ์ํ ๋ชจ๋ ํ์ต ์คํฌ๋ฆฝํธ๋ฅผ [์ฌ๊ธฐ](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation)์์ ํ์ธํ ์ ์์ต๋๋ค.
|
||||
|
||||
์คํฌ๋ฆฝํธ๋ฅผ ์คํํ๊ธฐ ์ , ๋จผ์ ์์กด์ฑ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ค์ ์ค์นํด์ผ ํฉ๋๋ค.
|
||||
|
||||
```bash
|
||||
pip install diffusers[training] accelerate datasets
|
||||
```
|
||||
|
||||
๊ทธ ๋ค์ ๐ค [Accelerate](https://github.com/huggingface/accelerate/) ํ๊ฒฝ์ ์ด๊ธฐํํฉ๋๋ค.
|
||||
|
||||
```bash
|
||||
accelerate config
|
||||
```
|
||||
|
||||
๋ณ๋์ ์ค์ ์์ด ๊ธฐ๋ณธ ์ค์ ์ผ๋ก ๐ค [Accelerate](https://github.com/huggingface/accelerate/) ํ๊ฒฝ์ ์ด๊ธฐํํด๋ด
์๋ค.
|
||||
|
||||
```bash
|
||||
accelerate config default
|
||||
```
|
||||
|
||||
๋
ธํธ๋ถ๊ณผ ๊ฐ์ ๋ํํ ์์ ์ง์ํ์ง ์๋ ํ๊ฒฝ์ ๊ฒฝ์ฐ, ๋ค์๊ณผ ๊ฐ์ด ์ฌ์ฉํด๋ณผ ์๋ ์์ต๋๋ค.
|
||||
|
||||
```py
|
||||
from accelerate.utils import write_basic_config
|
||||
|
||||
write_basic_config()
|
||||
```
|
||||
|
||||
## ๋ชจ๋ธ์ ํ๋ธ์ ์
๋ก๋ํ๊ธฐ
|
||||
|
||||
ํ์ต ์คํฌ๋ฆฝํธ์ ๋ค์ ์ธ์๋ฅผ ์ถ๊ฐํ์ฌ ํ๋ธ์ ๋ชจ๋ธ์ ์
๋ก๋ํ ์ ์์ต๋๋ค.
|
||||
|
||||
```bash
|
||||
--push_to_hub
|
||||
```
|
||||
|
||||
## ์ฒดํฌํฌ์ธํธ ์ ์ฅํ๊ณ ๋ถ๋ฌ์ค๊ธฐ
|
||||
|
||||
ํ๋ จ ์ค ๋ฌธ์ ๊ฐ ๋ฐ์ํ ๊ฒฝ์ฐ๋ฅผ ๋๋นํ์ฌ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ๊ธฐ์ ์ผ๋ก ์ ์ฅํ๋ ๊ฒ์ด ์ข์ต๋๋ค. ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ์ฅํ๋ ค๋ฉด ํ์ต ์คํฌ๋ฆฝํธ์ ๋ค์ ์ธ์๋ฅผ ์ ๋ฌํฉ๋๋ค:
|
||||
|
||||
```bash
|
||||
--checkpointing_steps=500
|
||||
```
|
||||
|
||||
์ ์ฒด ํ๋ จ ์ํ๋ 500์คํ
๋ง๋ค `output_dir`์ ํ์ ํด๋์ ์ ์ฅ๋๋ฉฐ, ํ์ต ์คํฌ๋ฆฝํธ์ `--resume_from_checkpoint` ์ธ์๋ฅผ ์ ๋ฌํจ์ผ๋ก์จ ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ถ๋ฌ์ค๊ณ ํ๋ จ์ ์ฌ๊ฐํ ์ ์์ต๋๋ค.
|
||||
|
||||
```bash
|
||||
--resume_from_checkpoint="checkpoint-1500"
|
||||
```
|
||||
|
||||
## ํ์ธํ๋
|
||||
|
||||
์ด์ ํ์ต ์คํฌ๋ฆฝํธ๋ฅผ ์์ํ ์ค๋น๊ฐ ๋์์ต๋๋ค! `--dataset_name` ์ธ์์ ํ์ธํ๋ํ ๋ฐ์ดํฐ์
์ด๋ฆ์ ์ง์ ํ ๋ค์, `--output_dir` ์ธ์์ ์ง์ ๋ ๊ฒฝ๋ก๋ก ์ ์ฅํฉ๋๋ค. ๋ณธ์ธ๋ง์ ๋ฐ์ดํฐ์
๋ฅผ ์ฌ์ฉํ๋ ค๋ฉด, [ํ์ต์ฉ ๋ฐ์ดํฐ์
๋ง๋ค๊ธฐ](create_dataset) ๊ฐ์ด๋๋ฅผ ์ฐธ์กฐํ์ธ์.
|
||||
|
||||
ํ์ต ์คํฌ๋ฆฝํธ๋ `diffusion_pytorch_model.bin` ํ์ผ์ ์์ฑํ๊ณ , ๊ทธ๊ฒ์ ๋น์ ์ ๋ฆฌํฌ์งํ ๋ฆฌ์ ์ ์ฅํฉ๋๋ค.
|
||||
|
||||
<Tip>
|
||||
|
||||
๐ก ์ ์ฒด ํ์ต์ V100 GPU 4๊ฐ๋ฅผ ์ฌ์ฉํ ๊ฒฝ์ฐ, 2์๊ฐ์ด ์์๋ฉ๋๋ค.
|
||||
|
||||
</Tip>
|
||||
|
||||
์๋ฅผ ๋ค์ด, [Oxford Flowers](https://huggingface.co/datasets/huggan/flowers-102-categories) ๋ฐ์ดํฐ์
์ ์ฌ์ฉํด ํ์ธํ๋ํ ๊ฒฝ์ฐ:
|
||||
|
||||
```bash
|
||||
accelerate launch train_unconditional.py \
|
||||
--dataset_name="huggan/flowers-102-categories" \
|
||||
--resolution=64 \
|
||||
--output_dir="ddpm-ema-flowers-64" \
|
||||
--train_batch_size=16 \
|
||||
--num_epochs=100 \
|
||||
--gradient_accumulation_steps=1 \
|
||||
--learning_rate=1e-4 \
|
||||
--lr_warmup_steps=500 \
|
||||
--mixed_precision=no \
|
||||
--push_to_hub
|
||||
```
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png"/>
|
||||
</div>
|
||||
[Pokemon](https://huggingface.co/datasets/huggan/pokemon) ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ ๊ฒฝ์ฐ:
|
||||
|
||||
```bash
|
||||
accelerate launch train_unconditional.py \
|
||||
--dataset_name="huggan/pokemon" \
|
||||
--resolution=64 \
|
||||
--output_dir="ddpm-ema-pokemon-64" \
|
||||
--train_batch_size=16 \
|
||||
--num_epochs=100 \
|
||||
--gradient_accumulation_steps=1 \
|
||||
--learning_rate=1e-4 \
|
||||
--lr_warmup_steps=500 \
|
||||
--mixed_precision=no \
|
||||
--push_to_hub
|
||||
```
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://user-images.githubusercontent.com/26864830/180248200-928953b4-db38-48db-b0c6-8b740fe6786f.png"/>
|
||||
</div>
|
||||
|
||||
### ์ฌ๋ฌ๊ฐ์ GPU๋ก ํ๋ จํ๊ธฐ
|
||||
|
||||
`accelerate`์ ์ฌ์ฉํ๋ฉด ์ํํ ๋ค์ค GPU ํ๋ จ์ด ๊ฐ๋ฅํฉ๋๋ค. `accelerate`์ ์ฌ์ฉํ์ฌ ๋ถ์ฐ ํ๋ จ์ ์คํํ๋ ค๋ฉด [์ฌ๊ธฐ](https://huggingface.co/docs/accelerate/basic_tutorials/launch) ์ง์นจ์ ๋ฐ๋ฅด์ธ์. ๋ค์์ ๋ช
๋ น์ด ์์ ์
๋๋ค.
|
||||
|
||||
```bash
|
||||
accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
|
||||
--dataset_name="huggan/pokemon" \
|
||||
--resolution=64 --center_crop --random_flip \
|
||||
--output_dir="ddpm-ema-pokemon-64" \
|
||||
--train_batch_size=16 \
|
||||
--num_epochs=100 \
|
||||
--gradient_accumulation_steps=1 \
|
||||
--use_ema \
|
||||
--learning_rate=1e-4 \
|
||||
--lr_warmup_steps=500 \
|
||||
--mixed_precision="fp16" \
|
||||
--logger="wandb" \
|
||||
--push_to_hub
|
||||
```
|
||||
405
docs/source/ko/tutorials/basic_training.mdx
Normal file
405
docs/source/ko/tutorials/basic_training.mdx
Normal file
@@ -0,0 +1,405 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
[[open-in-colab]]
|
||||
|
||||
|
||||
# Diffusion ๋ชจ๋ธ์ ํ์ตํ๊ธฐ
|
||||
|
||||
Unconditional ์ด๋ฏธ์ง ์์ฑ์ ํ์ต์ ์ฌ์ฉ๋ ๋ฐ์ดํฐ์
๊ณผ ์ ์ฌํ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๋ diffusion ๋ชจ๋ธ์์ ์ธ๊ธฐ ์๋ ์ดํ๋ฆฌ์ผ์ด์
์
๋๋ค. ์ผ๋ฐ์ ์ผ๋ก, ๊ฐ์ฅ ์ข์ ๊ฒฐ๊ณผ๋ ํน์ ๋ฐ์ดํฐ์
์ ์ฌ์ ํ๋ จ๋ ๋ชจ๋ธ์ ํ์ธํ๋ํ๋ ๊ฒ์ผ๋ก ์ป์ ์ ์์ต๋๋ค. ์ด [ํ๋ธ](https://huggingface.co/search/full-text?q=unconditional-image-generation&type=model)์์ ์ด๋ฌํ ๋ง์ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ฐพ์ ์ ์์ง๋ง, ๋ง์ฝ ๋ง์์ ๋๋ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ฐพ์ง ๋ชปํ๋ค๋ฉด, ์ธ์ ๋ ์ง ์ค์ค๋ก ํ์ตํ ์ ์์ต๋๋ค!
|
||||
|
||||
์ด ํํ ๋ฆฌ์ผ์ ๋๋ง์ ๐ฆ ๋๋น ๐ฆ๋ฅผ ์์ฑํ๊ธฐ ์ํด [Smithsonian Butterflies](https://huggingface.co/datasets/huggan/smithsonian_butterflies_subset) ๋ฐ์ดํฐ์
์ ํ์ ์งํฉ์์ [`UNet2DModel`] ๋ชจ๋ธ์ ํ์ตํ๋ ๋ฐฉ๋ฒ์ ๊ฐ๋ฅด์ณ์ค ๊ฒ์
๋๋ค.
|
||||
|
||||
<Tip>
|
||||
|
||||
๐ก ์ด ํ์ต ํํ ๋ฆฌ์ผ์ [Training with ๐งจ Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) ๋
ธํธ๋ถ ๊ธฐ๋ฐ์ผ๋ก ํฉ๋๋ค. Diffusion ๋ชจ๋ธ์ ์๋ ๋ฐฉ์ ๋ฐ ์์ธํ ๋ด์ฉ์ ๋
ธํธ๋ถ์ ํ์ธํ์ธ์!
|
||||
|
||||
</Tip>
|
||||
|
||||
์์ ์ ์, ๐ค Datasets์ ๋ถ๋ฌ์ค๊ณ ์ ์ฒ๋ฆฌํ๊ธฐ ์ํด ๋ฐ์ดํฐ์
์ด ์ค์น๋์ด ์๋์ง ๋ค์ GPU์์ ํ์ต์ ๊ฐ์ํํ๊ธฐ ์ํด ๐ค Accelerate ๊ฐ ์ค์น๋์ด ์๋์ง ํ์ธํ์ธ์. ๊ทธ ํ ํ์ต ๋ฉํธ๋ฆญ์ ์๊ฐํํ๊ธฐ ์ํด [TensorBoard](https://www.tensorflow.org/tensorboard)๋ฅผ ๋ํ ์ค์นํ์ธ์. (๋ํ ํ์ต ์ถ์ ์ ์ํด [Weights & Biases](https://docs.wandb.ai/)๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค.)
|
||||
|
||||
```bash
|
||||
!pip install diffusers[training]
|
||||
```
|
||||
|
||||
์ปค๋ฎค๋ํฐ์ ๋ชจ๋ธ์ ๊ณต์ ํ ๊ฒ์ ๊ถ์ฅํ๋ฉฐ, ์ด๋ฅผ ์ํด์ Hugging Face ๊ณ์ ์ ๋ก๊ทธ์ธ์ ํด์ผ ํฉ๋๋ค. (๊ณ์ ์ด ์๋ค๋ฉด [์ฌ๊ธฐ](https://hf.co/join)์์ ๋ง๋ค ์ ์์ต๋๋ค.) ๋
ธํธ๋ถ์์ ๋ก๊ทธ์ธํ ์ ์์ผ๋ฉฐ ๋ฉ์์ง๊ฐ ํ์๋๋ฉด ํ ํฐ์ ์
๋ ฅํ ์ ์์ต๋๋ค.
|
||||
|
||||
```py
|
||||
>>> from huggingface_hub import notebook_login
|
||||
|
||||
>>> notebook_login()
|
||||
```
|
||||
|
||||
๋๋ ํฐ๋ฏธ๋๋ก ๋ก๊ทธ์ธํ ์ ์์ต๋๋ค:
|
||||
|
||||
```bash
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
๋ชจ๋ธ ์ฒดํฌํฌ์ธํธ๊ฐ ์๋นํ ํฌ๊ธฐ ๋๋ฌธ์ [Git-LFS](https://git-lfs.com/)์์ ๋์ฉ๋ ํ์ผ์ ๋ฒ์ ๊ด๋ฆฌ๋ฅผ ํ ์ ์์ต๋๋ค.
|
||||
|
||||
```bash
|
||||
!sudo apt -qq install git-lfs
|
||||
!git config --global credential.helper store
|
||||
```
|
||||
|
||||
|
||||
## ํ์ต ๊ตฌ์ฑ
|
||||
|
||||
ํธ์๋ฅผ ์ํด ํ์ต ํ๋ผ๋ฏธํฐ๋ค์ ํฌํจํ `TrainingConfig` ํด๋์ค๋ฅผ ์์ฑํฉ๋๋ค (์์ ๋กญ๊ฒ ์กฐ์ ๊ฐ๋ฅ):
|
||||
|
||||
```py
|
||||
>>> from dataclasses import dataclass
|
||||
|
||||
|
||||
>>> @dataclass
|
||||
... class TrainingConfig:
|
||||
... image_size = 128 # ์์ฑ๋๋ ์ด๋ฏธ์ง ํด์๋
|
||||
... train_batch_size = 16
|
||||
... eval_batch_size = 16 # ํ๊ฐ ๋์์ ์ํ๋งํ ์ด๋ฏธ์ง ์
|
||||
... num_epochs = 50
|
||||
... gradient_accumulation_steps = 1
|
||||
... learning_rate = 1e-4
|
||||
... lr_warmup_steps = 500
|
||||
... save_image_epochs = 10
|
||||
... save_model_epochs = 30
|
||||
... mixed_precision = "fp16" # `no`๋ float32, ์๋ ํผํฉ ์ ๋ฐ๋๋ฅผ ์ํ `fp16`
|
||||
... output_dir = "ddpm-butterflies-128" # ๋ก์ปฌ ๋ฐ HF Hub์ ์ ์ฅ๋๋ ๋ชจ๋ธ๋ช
|
||||
|
||||
... push_to_hub = True # ์ ์ฅ๋ ๋ชจ๋ธ์ HF Hub์ ์
๋ก๋ํ ์ง ์ฌ๋ถ
|
||||
... hub_private_repo = False
|
||||
... overwrite_output_dir = True # ๋
ธํธ๋ถ์ ๋ค์ ์คํํ ๋ ์ด์ ๋ชจ๋ธ์ ๋ฎ์ด์์ธ์ง
|
||||
... seed = 0
|
||||
|
||||
|
||||
>>> config = TrainingConfig()
|
||||
```
|
||||
|
||||
|
||||
## ๋ฐ์ดํฐ์
๋ถ๋ฌ์ค๊ธฐ
|
||||
|
||||
๐ค Datasets ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ [Smithsonian Butterflies](https://huggingface.co/datasets/huggan/smithsonian_butterflies_subset) ๋ฐ์ดํฐ์
์ ์ฝ๊ฒ ๋ถ๋ฌ์ฌ ์ ์์ต๋๋ค.
|
||||
|
||||
```py
|
||||
>>> from datasets import load_dataset
|
||||
|
||||
>>> config.dataset_name = "huggan/smithsonian_butterflies_subset"
|
||||
>>> dataset = load_dataset(config.dataset_name, split="train")
|
||||
```
|
||||
|
||||
๐ก[HugGan Community Event](https://huggingface.co/huggan) ์์ ์ถ๊ฐ์ ๋ฐ์ดํฐ์
์ ์ฐพ๊ฑฐ๋ ๋ก์ปฌ์ [`ImageFolder`](https://huggingface.co/docs/datasets/image_dataset#imagefolder)๋ฅผ ๋ง๋ฆ์ผ๋ก์จ ๋๋ง์ ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ ์ ์์ต๋๋ค. HugGan Community Event ์ ๊ฐ์ ธ์จ ๋ฐ์ดํฐ์
์ ๊ฒฝ์ฐ ๋ ํฌ์งํ ๋ฆฌ์ id๋ก `config.dataset_name` ์ ์ค์ ํ๊ณ , ๋๋ง์ ์ด๋ฏธ์ง๋ฅผ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ `imagefolder` ๋ฅผ ์ค์ ํฉ๋๋ค.
|
||||
|
||||
๐ค Datasets์ [`~datasets.Image`] ๊ธฐ๋ฅ์ ์ฌ์ฉํด ์๋์ผ๋ก ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ฅผ ๋์ฝ๋ฉํ๊ณ [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html)๋ก ๋ถ๋ฌ์ต๋๋ค. ์ด๋ฅผ ์๊ฐํ ํด๋ณด๋ฉด:
|
||||
|
||||
```py
|
||||
>>> import matplotlib.pyplot as plt
|
||||
|
||||
>>> fig, axs = plt.subplots(1, 4, figsize=(16, 4))
|
||||
>>> for i, image in enumerate(dataset[:4]["image"]):
|
||||
... axs[i].imshow(image)
|
||||
... axs[i].set_axis_off()
|
||||
>>> fig.show()
|
||||
```
|
||||
|
||||

|
||||
|
||||
์ด๋ฏธ์ง๋ ๋ชจ๋ ๋ค๋ฅธ ์ฌ์ด์ฆ์ด๊ธฐ ๋๋ฌธ์, ์ฐ์ ์ ์ฒ๋ฆฌ๊ฐ ํ์ํฉ๋๋ค:
|
||||
|
||||
- `Resize` ๋ `config.image_size` ์ ์ ์๋ ์ด๋ฏธ์ง ์ฌ์ด์ฆ๋ก ๋ณ๊ฒฝํฉ๋๋ค.
|
||||
- `RandomHorizontalFlip` ์ ๋๋ค์ ์ผ๋ก ์ด๋ฏธ์ง๋ฅผ ๋ฏธ๋ฌ๋งํ์ฌ ๋ฐ์ดํฐ์
์ ๋ณด๊ฐํฉ๋๋ค.
|
||||
- `Normalize` ๋ ๋ชจ๋ธ์ด ์์ํ๋ [-1, 1] ๋ฒ์๋ก ํฝ์
๊ฐ์ ์ฌ์กฐ์ ํ๋๋ฐ ์ค์ํฉ๋๋ค.
|
||||
|
||||
```py
|
||||
>>> from torchvision import transforms
|
||||
|
||||
>>> preprocess = transforms.Compose(
|
||||
... [
|
||||
... transforms.Resize((config.image_size, config.image_size)),
|
||||
... transforms.RandomHorizontalFlip(),
|
||||
... transforms.ToTensor(),
|
||||
... transforms.Normalize([0.5], [0.5]),
|
||||
... ]
|
||||
... )
|
||||
```
|
||||
|
||||
ํ์ต ๋์ค์ `preprocess` ํจ์๋ฅผ ์ ์ฉํ๋ ค๋ฉด ๐ค Datasets์ [`~datasets.Dataset.set_transform`] ๋ฐฉ๋ฒ์ด ์ฌ์ฉ๋ฉ๋๋ค.
|
||||
|
||||
```py
|
||||
>>> def transform(examples):
|
||||
... images = [preprocess(image.convert("RGB")) for image in examples["image"]]
|
||||
... return {"images": images}
|
||||
|
||||
|
||||
>>> dataset.set_transform(transform)
|
||||
```
|
||||
|
||||
์ด๋ฏธ์ง์ ํฌ๊ธฐ๊ฐ ์กฐ์ ๋์๋์ง ํ์ธํ๊ธฐ ์ํด ์ด๋ฏธ์ง๋ฅผ ๋ค์ ์๊ฐํํด๋ณด์ธ์. ์ด์ [DataLoader](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader)์ ๋ฐ์ดํฐ์
์ ํฌํจํด ํ์ตํ ์ค๋น๊ฐ ๋์์ต๋๋ค!
|
||||
|
||||
```py
|
||||
>>> import torch
|
||||
|
||||
>>> train_dataloader = torch.utils.data.DataLoader(dataset, batch_size=config.train_batch_size, shuffle=True)
|
||||
```
|
||||
|
||||
|
||||
## UNet2DModel ์์ฑํ๊ธฐ
|
||||
|
||||
๐งจ Diffusers์ ์ฌ์ ํ์ต๋ ๋ชจ๋ธ๋ค์ ๋ชจ๋ธ ํด๋์ค์์ ์ํ๋ ํ๋ผ๋ฏธํฐ๋ก ์ฝ๊ฒ ์์ฑํ ์ ์์ต๋๋ค. ์๋ฅผ ๋ค์ด, [`UNet2DModel`]๋ฅผ ์์ฑํ๋ ค๋ฉด:
|
||||
|
||||
```py
|
||||
>>> from diffusers import UNet2DModel
|
||||
|
||||
>>> model = UNet2DModel(
|
||||
... sample_size=config.image_size, # ํ๊ฒ ์ด๋ฏธ์ง ํด์๋
|
||||
... in_channels=3, # ์
๋ ฅ ์ฑ๋ ์, RGB ์ด๋ฏธ์ง์์ 3
|
||||
... out_channels=3, # ์ถ๋ ฅ ์ฑ๋ ์
|
||||
... layers_per_block=2, # UNet ๋ธ๋ญ๋น ๋ช ๊ฐ์ ResNet ๋ ์ด์ด๊ฐ ์ฌ์ฉ๋๋์ง
|
||||
... block_out_channels=(128, 128, 256, 256, 512, 512), # ๊ฐ UNet ๋ธ๋ญ์ ์ํ ์ถ๋ ฅ ์ฑ๋ ์
|
||||
... down_block_types=(
|
||||
... "DownBlock2D", # ์ผ๋ฐ์ ์ธ ResNet ๋ค์ด์ํ๋ง ๋ธ๋ญ
|
||||
... "DownBlock2D",
|
||||
... "DownBlock2D",
|
||||
... "DownBlock2D",
|
||||
... "AttnDownBlock2D", # spatial self-attention์ด ํฌํจ๋ ์ผ๋ฐ์ ์ธ ResNet ๋ค์ด์ํ๋ง ๋ธ๋ญ
|
||||
... "DownBlock2D",
|
||||
... ),
|
||||
... up_block_types=(
|
||||
... "UpBlock2D", # ์ผ๋ฐ์ ์ธ ResNet ์
์ํ๋ง ๋ธ๋ญ
|
||||
... "AttnUpBlock2D", # spatial self-attention์ด ํฌํจ๋ ์ผ๋ฐ์ ์ธ ResNet ์
์ํ๋ง ๋ธ๋ญ
|
||||
... "UpBlock2D",
|
||||
... "UpBlock2D",
|
||||
... "UpBlock2D",
|
||||
... "UpBlock2D",
|
||||
... ),
|
||||
... )
|
||||
```
|
||||
|
||||
์ํ์ ์ด๋ฏธ์ง ํฌ๊ธฐ์ ๋ชจ๋ธ ์ถ๋ ฅ ํฌ๊ธฐ๊ฐ ๋ง๋์ง ๋น ๋ฅด๊ฒ ํ์ธํ๊ธฐ ์ํ ์ข์ ์์ด๋์ด๊ฐ ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
>>> sample_image = dataset[0]["images"].unsqueeze(0)
|
||||
>>> print("Input shape:", sample_image.shape)
|
||||
Input shape: torch.Size([1, 3, 128, 128])
|
||||
|
||||
>>> print("Output shape:", model(sample_image, timestep=0).sample.shape)
|
||||
Output shape: torch.Size([1, 3, 128, 128])
|
||||
```
|
||||
|
||||
ํ๋ฅญํด์! ๋ค์, ์ด๋ฏธ์ง์ ์ฝ๊ฐ์ ๋
ธ์ด์ฆ๋ฅผ ๋ํ๊ธฐ ์ํด ์ค์ผ์ค๋ฌ๊ฐ ํ์ํฉ๋๋ค.
|
||||
|
||||
|
||||
## ์ค์ผ์ค๋ฌ ์์ฑํ๊ธฐ
|
||||
|
||||
์ค์ผ์ค๋ฌ๋ ๋ชจ๋ธ์ ํ์ต ๋๋ ์ถ๋ก ์ ์ฌ์ฉํ๋์ง์ ๋ฐ๋ผ ๋ค๋ฅด๊ฒ ์๋ํฉ๋๋ค. ์ถ๋ก ์์, ์ค์ผ์ค๋ฌ๋ ๋
ธ์ด์ฆ๋ก๋ถํฐ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค. ํ์ต์ ์ค์ผ์ค๋ฌ๋ diffusion ๊ณผ์ ์์์ ํน์ ํฌ์ธํธ๋ก๋ถํฐ ๋ชจ๋ธ์ ์ถ๋ ฅ ๋๋ ์ํ์ ๊ฐ์ ธ์ *๋
ธ์ด์ฆ ์ค์ผ์ค* ๊ณผ *์
๋ฐ์ดํธ ๊ท์น*์ ๋ฐ๋ผ ์ด๋ฏธ์ง์ ๋
ธ์ด์ฆ๋ฅผ ์ ์ฉํฉ๋๋ค.
|
||||
|
||||
`DDPMScheduler`๋ฅผ ๋ณด๋ฉด ์ด์ ์ผ๋ก๋ถํฐ `sample_image`์ ๋๋คํ ๋
ธ์ด์ฆ๋ฅผ ๋ํ๋ `add_noise` ๋ฉ์๋๋ฅผ ์ฌ์ฉํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
>>> import torch
|
||||
>>> from PIL import Image
|
||||
>>> from diffusers import DDPMScheduler
|
||||
|
||||
>>> noise_scheduler = DDPMScheduler(num_train_timesteps=1000)
|
||||
>>> noise = torch.randn(sample_image.shape)
|
||||
>>> timesteps = torch.LongTensor([50])
|
||||
>>> noisy_image = noise_scheduler.add_noise(sample_image, noise, timesteps)
|
||||
|
||||
>>> Image.fromarray(((noisy_image.permute(0, 2, 3, 1) + 1.0) * 127.5).type(torch.uint8).numpy()[0])
|
||||
```
|
||||
|
||||

|
||||
|
||||
๋ชจ๋ธ์ ํ์ต ๋ชฉ์ ์ ์ด๋ฏธ์ง์ ๋ํด์ง ๋
ธ์ด์ฆ๋ฅผ ์์ธกํ๋ ๊ฒ์
๋๋ค. ์ด ๋จ๊ณ์์ ์์ค์ ๋ค์๊ณผ ๊ฐ์ด ๊ณ์ฐ๋ ์ ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
>>> import torch.nn.functional as F
|
||||
|
||||
>>> noise_pred = model(noisy_image, timesteps).sample
|
||||
>>> loss = F.mse_loss(noise_pred, noise)
|
||||
```
|
||||
|
||||
## ๋ชจ๋ธ ํ์ตํ๊ธฐ
|
||||
|
||||
์ง๊ธ๊น์ง, ๋ชจ๋ธ ํ์ต์ ์์ํ๊ธฐ ์ํด ๋ง์ ๋ถ๋ถ์ ๊ฐ์ถ์์ผ๋ฉฐ ์ด์ ๋จ์ ๊ฒ์ ๋ชจ๋ ๊ฒ์ ์กฐํฉํ๋ ๊ฒ์
๋๋ค.
|
||||
|
||||
์ฐ์ ์ตํฐ๋ง์ด์ (optimizer)์ ํ์ต๋ฅ ์ค์ผ์ค๋ฌ(learning rate scheduler)๊ฐ ํ์ํ ๊ฒ์
๋๋ค:
|
||||
|
||||
```py
|
||||
>>> from diffusers.optimization import get_cosine_schedule_with_warmup
|
||||
|
||||
>>> optimizer = torch.optim.AdamW(model.parameters(), lr=config.learning_rate)
|
||||
>>> lr_scheduler = get_cosine_schedule_with_warmup(
|
||||
... optimizer=optimizer,
|
||||
... num_warmup_steps=config.lr_warmup_steps,
|
||||
... num_training_steps=(len(train_dataloader) * config.num_epochs),
|
||||
... )
|
||||
```
|
||||
|
||||
๊ทธ ํ, ๋ชจ๋ธ์ ํ๊ฐํ๋ ๋ฐฉ๋ฒ์ด ํ์ํฉ๋๋ค. ํ๊ฐ๋ฅผ ์ํด, `DDPMPipeline`์ ์ฌ์ฉํด ๋ฐฐ์น์ ์ด๋ฏธ์ง ์ํ๋ค์ ์์ฑํ๊ณ ๊ทธ๋ฆฌ๋ ํํ๋ก ์ ์ฅํ ์ ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
>>> from diffusers import DDPMPipeline
|
||||
>>> import math
|
||||
>>> import os
|
||||
|
||||
|
||||
>>> def make_grid(images, rows, cols):
|
||||
... w, h = images[0].size
|
||||
... grid = Image.new("RGB", size=(cols * w, rows * h))
|
||||
... for i, image in enumerate(images):
|
||||
... grid.paste(image, box=(i % cols * w, i // cols * h))
|
||||
... return grid
|
||||
|
||||
|
||||
>>> def evaluate(config, epoch, pipeline):
|
||||
... # ๋๋คํ ๋
ธ์ด์ฆ๋ก ๋ถํฐ ์ด๋ฏธ์ง๋ฅผ ์ถ์ถํฉ๋๋ค.(์ด๋ ์ญ์ ํ diffusion ๊ณผ์ ์
๋๋ค.)
|
||||
... # ๊ธฐ๋ณธ ํ์ดํ๋ผ์ธ ์ถ๋ ฅ ํํ๋ `List[PIL.Image]` ์
๋๋ค.
|
||||
... images = pipeline(
|
||||
... batch_size=config.eval_batch_size,
|
||||
... generator=torch.manual_seed(config.seed),
|
||||
... ).images
|
||||
|
||||
... # ์ด๋ฏธ์ง๋ค์ ๊ทธ๋ฆฌ๋๋ก ๋ง๋ค์ด์ค๋๋ค.
|
||||
... image_grid = make_grid(images, rows=4, cols=4)
|
||||
|
||||
... # ์ด๋ฏธ์ง๋ค์ ์ ์ฅํฉ๋๋ค.
|
||||
... test_dir = os.path.join(config.output_dir, "samples")
|
||||
... os.makedirs(test_dir, exist_ok=True)
|
||||
... image_grid.save(f"{test_dir}/{epoch:04d}.png")
|
||||
```
|
||||
|
||||
TensorBoard์ ๋ก๊น
, ๊ทธ๋๋์ธํธ ๋์ ๋ฐ ํผํฉ ์ ๋ฐ๋ ํ์ต์ ์ฝ๊ฒ ์ํํ๊ธฐ ์ํด ๐ค Accelerate๋ฅผ ํ์ต ๋ฃจํ์ ํจ๊ป ์์ ๋งํ ๋ชจ๋ ๊ตฌ์ฑ ์ ๋ณด๋ค์ ๋ฌถ์ด ์งํํ ์ ์์ต๋๋ค. ํ๋ธ์ ๋ชจ๋ธ์ ์
๋ก๋ ํ๊ธฐ ์ํด ๋ ํฌ์งํ ๋ฆฌ ์ด๋ฆ ๋ฐ ์ ๋ณด๋ฅผ ๊ฐ์ ธ์ค๊ธฐ ์ํ ํจ์๋ฅผ ์์ฑํ๊ณ ํ๋ธ์ ์
๋ก๋ํ ์ ์์ต๋๋ค.
|
||||
|
||||
๐ก์๋์ ํ์ต ๋ฃจํ๋ ์ด๋ ต๊ณ ๊ธธ์ด ๋ณด์ผ ์ ์์ง๋ง, ๋์ค์ ํ ์ค์ ์ฝ๋๋ก ํ์ต์ ํ๋ค๋ฉด ๊ทธ๋งํ ๊ฐ์น๊ฐ ์์ ๊ฒ์
๋๋ค! ๋ง์ฝ ๊ธฐ๋ค๋ฆฌ์ง ๋ชปํ๊ณ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๊ณ ์ถ๋ค๋ฉด, ์๋ ์ฝ๋๋ฅผ ์์ ๋กญ๊ฒ ๋ถ์ฌ๋ฃ๊ณ ์๋์ํค๋ฉด ๋ฉ๋๋ค. ๐ค
|
||||
|
||||
```py
|
||||
>>> from accelerate import Accelerator
|
||||
>>> from huggingface_hub import HfFolder, Repository, whoami
|
||||
>>> from tqdm.auto import tqdm
|
||||
>>> from pathlib import Path
|
||||
>>> import os
|
||||
|
||||
|
||||
>>> def get_full_repo_name(model_id: str, organization: str = None, token: str = None):
|
||||
... if token is None:
|
||||
... token = HfFolder.get_token()
|
||||
... if organization is None:
|
||||
... username = whoami(token)["name"]
|
||||
... return f"{username}/{model_id}"
|
||||
... else:
|
||||
... return f"{organization}/{model_id}"
|
||||
|
||||
|
||||
>>> def train_loop(config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler):
|
||||
... # accelerator์ tensorboard ๋ก๊น
์ด๊ธฐํ
|
||||
... accelerator = Accelerator(
|
||||
... mixed_precision=config.mixed_precision,
|
||||
... gradient_accumulation_steps=config.gradient_accumulation_steps,
|
||||
... log_with="tensorboard",
|
||||
... logging_dir=os.path.join(config.output_dir, "logs"),
|
||||
... )
|
||||
... if accelerator.is_main_process:
|
||||
... if config.push_to_hub:
|
||||
... repo_name = get_full_repo_name(Path(config.output_dir).name)
|
||||
... repo = Repository(config.output_dir, clone_from=repo_name)
|
||||
... elif config.output_dir is not None:
|
||||
... os.makedirs(config.output_dir, exist_ok=True)
|
||||
... accelerator.init_trackers("train_example")
|
||||
|
||||
... # ๋ชจ๋ ๊ฒ์ด ์ค๋น๋์์ต๋๋ค.
|
||||
... # ๊ธฐ์ตํด์ผ ํ ํน์ ํ ์์๋ ์์ผ๋ฉฐ ์ค๋นํ ๋ฐฉ๋ฒ์ ์ ๊ณตํ ๊ฒ๊ณผ ๋์ผํ ์์๋ก ๊ฐ์ฒด์ ์์ถ์ ํ๋ฉด ๋ฉ๋๋ค.
|
||||
... model, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
|
||||
... model, optimizer, train_dataloader, lr_scheduler
|
||||
... )
|
||||
|
||||
... global_step = 0
|
||||
|
||||
... # ์ด์ ๋ชจ๋ธ์ ํ์ตํฉ๋๋ค.
|
||||
... for epoch in range(config.num_epochs):
|
||||
... progress_bar = tqdm(total=len(train_dataloader), disable=not accelerator.is_local_main_process)
|
||||
... progress_bar.set_description(f"Epoch {epoch}")
|
||||
|
||||
... for step, batch in enumerate(train_dataloader):
|
||||
... clean_images = batch["images"]
|
||||
... # ์ด๋ฏธ์ง์ ๋ํ ๋
ธ์ด์ฆ๋ฅผ ์ํ๋งํฉ๋๋ค.
|
||||
... noise = torch.randn(clean_images.shape).to(clean_images.device)
|
||||
... bs = clean_images.shape[0]
|
||||
|
||||
... # ๊ฐ ์ด๋ฏธ์ง๋ฅผ ์ํ ๋๋คํ ํ์์คํ
(timestep)์ ์ํ๋งํฉ๋๋ค.
|
||||
... timesteps = torch.randint(
|
||||
... 0, noise_scheduler.config.num_train_timesteps, (bs,), device=clean_images.device
|
||||
... ).long()
|
||||
|
||||
... # ๊ฐ ํ์์คํ
์ ๋
ธ์ด์ฆ ํฌ๊ธฐ์ ๋ฐ๋ผ ๊นจ๋ํ ์ด๋ฏธ์ง์ ๋
ธ์ด์ฆ๋ฅผ ์ถ๊ฐํฉ๋๋ค.
|
||||
... # (์ด๋ foward diffusion ๊ณผ์ ์
๋๋ค.)
|
||||
... noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)
|
||||
|
||||
... with accelerator.accumulate(model):
|
||||
... # ๋
ธ์ด์ฆ๋ฅผ ๋ฐ๋ณต์ ์ผ๋ก ์์ธกํฉ๋๋ค.
|
||||
... noise_pred = model(noisy_images, timesteps, return_dict=False)[0]
|
||||
... loss = F.mse_loss(noise_pred, noise)
|
||||
... accelerator.backward(loss)
|
||||
|
||||
... accelerator.clip_grad_norm_(model.parameters(), 1.0)
|
||||
... optimizer.step()
|
||||
... lr_scheduler.step()
|
||||
... optimizer.zero_grad()
|
||||
|
||||
... progress_bar.update(1)
|
||||
... logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0], "step": global_step}
|
||||
... progress_bar.set_postfix(**logs)
|
||||
... accelerator.log(logs, step=global_step)
|
||||
... global_step += 1
|
||||
|
||||
... # ๊ฐ ์ํฌํฌ๊ฐ ๋๋ ํ evaluate()์ ๋ช ๊ฐ์ง ๋ฐ๋ชจ ์ด๋ฏธ์ง๋ฅผ ์ ํ์ ์ผ๋ก ์ํ๋งํ๊ณ ๋ชจ๋ธ์ ์ ์ฅํฉ๋๋ค.
|
||||
... if accelerator.is_main_process:
|
||||
... pipeline = DDPMPipeline(unet=accelerator.unwrap_model(model), scheduler=noise_scheduler)
|
||||
|
||||
... if (epoch + 1) % config.save_image_epochs == 0 or epoch == config.num_epochs - 1:
|
||||
... evaluate(config, epoch, pipeline)
|
||||
|
||||
... if (epoch + 1) % config.save_model_epochs == 0 or epoch == config.num_epochs - 1:
|
||||
... if config.push_to_hub:
|
||||
... repo.push_to_hub(commit_message=f"Epoch {epoch}", blocking=True)
|
||||
... else:
|
||||
... pipeline.save_pretrained(config.output_dir)
|
||||
```
|
||||
|
||||
ํด, ์ฝ๋๊ฐ ๊ฝค ๋ง์๋ค์! ํ์ง๋ง ๐ค Accelerate์ [`~accelerate.notebook_launcher`] ํจ์์ ํ์ต์ ์์ํ ์ค๋น๊ฐ ๋์์ต๋๋ค. ํจ์์ ํ์ต ๋ฃจํ, ๋ชจ๋ ํ์ต ์ธ์, ํ์ต์ ์ฌ์ฉํ ํ๋ก์ธ์ค ์(์ฌ์ฉ ๊ฐ๋ฅํ GPU์ ์๋ฅผ ๋ณ๊ฒฝํ ์ ์์)๋ฅผ ์ ๋ฌํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
>>> from accelerate import notebook_launcher
|
||||
|
||||
>>> args = (config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler)
|
||||
|
||||
>>> notebook_launcher(train_loop, args, num_processes=1)
|
||||
```
|
||||
|
||||
ํ๋ฒ ํ์ต์ด ์๋ฃ๋๋ฉด, diffusion ๋ชจ๋ธ๋ก ์์ฑ๋ ์ต์ข
๐ฆ์ด๋ฏธ์ง๐ฆ๋ฅผ ํ์ธํด๋ณด๊ธธ ๋ฐ๋๋๋ค!
|
||||
|
||||
```py
|
||||
>>> import glob
|
||||
|
||||
>>> sample_images = sorted(glob.glob(f"{config.output_dir}/samples/*.png"))
|
||||
>>> Image.open(sample_images[-1])
|
||||
```
|
||||
|
||||

|
||||
|
||||
## ๋ค์ ๋จ๊ณ
|
||||
|
||||
Unconditional ์ด๋ฏธ์ง ์์ฑ์ ํ์ต๋ ์ ์๋ ์์
์ค ํ๋์ ์์์
๋๋ค. ๋ค๋ฅธ ์์
๊ณผ ํ์ต ๋ฐฉ๋ฒ์ [๐งจ Diffusers ํ์ต ์์](../training/overview) ํ์ด์ง์์ ํ์ธํ ์ ์์ต๋๋ค. ๋ค์์ ํ์ตํ ์ ์๋ ๋ช ๊ฐ์ง ์์์
๋๋ค:
|
||||
|
||||
- [Textual Inversion](../training/text_inversion), ํน์ ์๊ฐ์ ๊ฐ๋
์ ํ์ต์์ผ ์์ฑ๋ ์ด๋ฏธ์ง์ ํตํฉ์ํค๋ ์๊ณ ๋ฆฌ์ฆ์
๋๋ค.
|
||||
- [DreamBooth](../training/dreambooth), ์ฃผ์ ์ ๋ํ ๋ช ๊ฐ์ง ์
๋ ฅ ์ด๋ฏธ์ง๋ค์ด ์ฃผ์ด์ง๋ฉด ์ฃผ์ ์ ๋ํ ๊ฐ์ธํ๋ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๊ธฐ ์ํ ๊ธฐ์ ์
๋๋ค.
|
||||
- [Guide](../training/text2image) ๋ฐ์ดํฐ์
์ Stable Diffusion ๋ชจ๋ธ์ ํ์ธํ๋ํ๋ ๋ฐฉ๋ฒ์
๋๋ค.
|
||||
- [Guide](../training/lora) LoRA๋ฅผ ์ฌ์ฉํด ๋งค์ฐ ํฐ ๋ชจ๋ธ์ ๋น ๋ฅด๊ฒ ํ์ธํ๋ํ๊ธฐ ์ํ ๋ฉ๋ชจ๋ฆฌ ํจ์จ์ ์ธ ๊ธฐ์ ์
๋๋ค.
|
||||
23
docs/source/ko/tutorials/tutorial_overview.mdx
Normal file
23
docs/source/ko/tutorials/tutorial_overview.mdx
Normal file
@@ -0,0 +1,23 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Overview
|
||||
|
||||
๐งจย Diffusers์ ์ค์ ๊ฑธ ํ์ํฉ๋๋ค! ์ฌ๋ฌ๋ถ์ด diffusion ๋ชจ๋ธ๊ณผ ์์ฑ AI๋ฅผ ์ฒ์ ์ ํ๊ณ , ๋ ๋ง์ ๊ฑธ ๋ฐฐ์ฐ๊ณ ์ถ์ผ์
จ๋ค๋ฉด ์ ๋๋ก ์ฐพ์์ค์
จ์ต๋๋ค. ์ด ํํ ๋ฆฌ์ผ์ diffusion model์ ์ฌ๋ฌ๋ถ์๊ฒ ์ ํํ๊ฒ ์๊ฐํ๊ณ , ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ๊ธฐ๋ณธ ์ฌํญ(ํต์ฌ ๊ตฌ์ฑ์์์ ๐งจย Diffusers ์ฌ์ฉ๋ฒ)์ ์ดํดํ๋ ๋ฐ ๋์์ด ๋๋๋ก ์ค๊ณ๋์์ต๋๋ค.
|
||||
|
||||
์ฌ๋ฌ๋ถ์ ์ด ํํ ๋ฆฌ์ผ์ ํตํด ๋น ๋ฅด๊ฒ ์์ฑํ๊ธฐ ์ํด์ ์ถ๋ก ํ์ดํ๋ผ์ธ์ ์ด๋ป๊ฒ ์ฌ์ฉํด์ผ ํ๋์ง, ๊ทธ๋ฆฌ๊ณ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ modular toolbox์ฒ๋ผ ์ด์ฉํด์ ์ฌ๋ฌ๋ถ๋ง์ diffusion system์ ๊ตฌ์ถํ ์ ์๋๋ก ํ์ดํ๋ผ์ธ์ ๋ถํดํ๋ ๋ฒ์ ๋ฐฐ์ธ ์ ์์ต๋๋ค. ๋ค์ ๋จ์์์๋ ์ฌ๋ฌ๋ถ์ด ์ํ๋ ๊ฒ์ ์์ฑํ๊ธฐ ์ํด ์์ ๋ง์ diffusion model์ ํ์ตํ๋ ๋ฐฉ๋ฒ์ ๋ฐฐ์ฐ๊ฒ ๋ฉ๋๋ค.
|
||||
|
||||
ํํ ๋ฆฌ์ผ์ ์๋ฃํ๋ค๋ฉด ์ฌ๋ฌ๋ถ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ง์ ํ์ํ๊ณ , ์์ ์ ํ๋ก์ ํธ์ ์ ํ๋ฆฌ์ผ์ด์
์ ์ ์ฉํ ์คํฌ๋ค์ ์ต๋ํ ์ ์์ ๊ฒ๋๋ค.
|
||||
|
||||
[Discord](https://discord.com/invite/JfAtkvEtRb)๋ [ํฌ๋ผ](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) ์ปค๋ฎค๋ํฐ์ ์์ ๋กญ๊ฒ ์ฐธ์ฌํด์ ๋ค๋ฅธ ์ฌ์ฉ์์ ๊ฐ๋ฐ์๋ค๊ณผ ๊ต๋ฅํ๊ณ ํ์
ํด ๋ณด์ธ์!
|
||||
|
||||
์ ์ง๊ธ๋ถํฐ diffusing์ ์์ํด ๋ณด๊ฒ ์ต๋๋ค! ๐งจ
|
||||
275
docs/source/ko/using-diffusers/custom_pipeline_examples.mdx
Normal file
275
docs/source/ko/using-diffusers/custom_pipeline_examples.mdx
Normal file
@@ -0,0 +1,275 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ
|
||||
|
||||
> **์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๋ํ ์์ธํ ๋ด์ฉ์ [์ด ์ด์](https://github.com/huggingface/diffusers/issues/841)๋ฅผ ์ฐธ์กฐํ์ธ์.
|
||||
|
||||
**์ปค๋ฎค๋ํฐ** ์์ ๋ ์ปค๋ฎค๋ํฐ์์ ์ถ๊ฐํ ์ถ๋ก ๋ฐ ํ๋ จ ์์ ๋ก ๊ตฌ์ฑ๋์ด ์์ต๋๋ค.
|
||||
๋ค์ ํ๋ฅผ ์ฐธ์กฐํ์ฌ ๋ชจ๋ ์ปค๋ฎค๋ํฐ ์์ ์ ๋ํ ๊ฐ์๋ฅผ ํ์ธํ์๊ธฐ ๋ฐ๋๋๋ค. **์ฝ๋ ์์ **๋ฅผ ํด๋ฆญํ๋ฉด ๋ณต์ฌํ์ฌ ๋ถ์ฌ๋ฃ๊ธฐํ ์ ์๋ ์ฝ๋ ์์ ๋ฅผ ํ์ธํ ์ ์์ต๋๋ค.
|
||||
์ปค๋ฎค๋ํฐ๊ฐ ์์๋๋ก ์๋ํ์ง ์๋ ๊ฒฝ์ฐ ์ด์๋ฅผ ๊ฐ์คํ๊ณ ์์ฑ์์๊ฒ ํ์ ๋ณด๋ด์ฃผ์ธ์.
|
||||
|
||||
| ์ | ์ค๋ช
| ์ฝ๋ ์์ | ์ฝ๋ฉ |์ ์ |
|
||||
|:---------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------:|
|
||||
| CLIP Guided Stable Diffusion | CLIP ๊ฐ์ด๋ ๊ธฐ๋ฐ์ Stable Diffusion์ผ๋ก ํ
์คํธ์์ ์ด๋ฏธ์ง๋ก ์์ฑํ๊ธฐ | [CLIP Guided Stable Diffusion](#clip-guided-stable-diffusion) | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb) | [Suraj Patil](https://github.com/patil-suraj/) |
|
||||
| One Step U-Net (Dummy) | ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ์ด๋ป๊ฒ ์ฌ์ฉํด์ผ ํ๋์ง์ ๋ํ ์์(์ฐธ๊ณ https://github.com/huggingface/diffusers/issues/841) | [One Step U-Net](#one-step-unet) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
|
||||
| Stable Diffusion Interpolation | ์๋ก ๋ค๋ฅธ ํ๋กฌํํธ/์๋ ๊ฐ Stable Diffusion์ latent space ๋ณด๊ฐ | [Stable Diffusion Interpolation](#stable-diffusion-interpolation) | - | [Nate Raw](https://github.com/nateraw/) |
|
||||
| Stable Diffusion Mega | ๋ชจ๋ ๊ธฐ๋ฅ์ ๊ฐ์ถ **ํ๋์** Stable Diffusion ํ์ดํ๋ผ์ธ [Text2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py), [Image2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py) and [Inpainting](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | [Stable Diffusion Mega](#stable-diffusion-mega) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
|
||||
| Long Prompt Weighting Stable Diffusion | ํ ํฐ ๊ธธ์ด ์ ํ์ด ์๊ณ ํ๋กฌํํธ์์ ํ์ฑ ๊ฐ์ค์น ์ง์์ ํ๋ **ํ๋์** Stable Diffusion ํ์ดํ๋ผ์ธ, | [Long Prompt Weighting Stable Diffusion](#long-prompt-weighting-stable-diffusion) |- | [SkyTNT](https://github.com/SkyTNT) |
|
||||
| Speech to Image | ์๋ ์์ฑ ์ธ์์ ์ฌ์ฉํ์ฌ ํ
์คํธ๋ฅผ ์์ฑํ๊ณ Stable Diffusion์ ์ฌ์ฉํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค. | [Speech to Image](#speech-to-image) | - | [Mikail Duzenli](https://github.com/MikailINTech) |
|
||||
|
||||
์ปค์คํ
ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ค๋ ค๋ฉด `diffusers/examples/community`์ ์๋ ํ์ผ ์ค ํ๋๋ก์ `custom_pipeline` ์ธ์๋ฅผ `DiffusionPipeline`์ ์ ๋ฌํ๊ธฐ๋ง ํ๋ฉด ๋ฉ๋๋ค. ์์ ๋ง์ ํ์ดํ๋ผ์ธ์ด ์๋ PR์ ๋ณด๋ด์ฃผ์๋ฉด ๋น ๋ฅด๊ฒ ๋ณํฉํด๋๋ฆฌ๊ฒ ์ต๋๋ค.
|
||||
```py
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"CompVis/stable-diffusion-v1-4", custom_pipeline="filename_in_the_community_folder"
|
||||
)
|
||||
```
|
||||
|
||||
## ์ฌ์ฉ ์์
|
||||
|
||||
### CLIP ๊ฐ์ด๋ ๊ธฐ๋ฐ์ Stable Diffusion
|
||||
|
||||
๋ชจ๋ ๋
ธ์ด์ฆ ์ ๊ฑฐ ๋จ๊ณ์์ ์ถ๊ฐ CLIP ๋ชจ๋ธ์ ํตํด Stable Diffusion์ ๊ฐ์ด๋ํจ์ผ๋ก์จ CLIP ๋ชจ๋ธ ๊ธฐ๋ฐ์ Stable Diffusion์ ๋ณด๋ค ๋ ์ฌ์ค์ ์ธ ์ด๋ฏธ์ง๋ฅผ ์์ฑ์ ํ ์ ์์ต๋๋ค.
|
||||
|
||||
๋ค์ ์ฝ๋๋ ์ฝ 12GB์ GPU RAM์ด ํ์ํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
from transformers import CLIPImageProcessor, CLIPModel
|
||||
import torch
|
||||
|
||||
|
||||
feature_extractor = CLIPImageProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
|
||||
clip_model = CLIPModel.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16)
|
||||
|
||||
|
||||
guided_pipeline = DiffusionPipeline.from_pretrained(
|
||||
"CompVis/stable-diffusion-v1-4",
|
||||
custom_pipeline="clip_guided_stable_diffusion",
|
||||
clip_model=clip_model,
|
||||
feature_extractor=feature_extractor,
|
||||
torch_dtype=torch.float16,
|
||||
)
|
||||
guided_pipeline.enable_attention_slicing()
|
||||
guided_pipeline = guided_pipeline.to("cuda")
|
||||
|
||||
prompt = "fantasy book cover, full moon, fantasy forest landscape, golden vector elements, fantasy magic, dark light night, intricate, elegant, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by WLOP and Artgerm and Albert Bierstadt, masterpiece"
|
||||
|
||||
generator = torch.Generator(device="cuda").manual_seed(0)
|
||||
images = []
|
||||
for i in range(4):
|
||||
image = guided_pipeline(
|
||||
prompt,
|
||||
num_inference_steps=50,
|
||||
guidance_scale=7.5,
|
||||
clip_guidance_scale=100,
|
||||
num_cutouts=4,
|
||||
use_cutouts=False,
|
||||
generator=generator,
|
||||
).images[0]
|
||||
images.append(image)
|
||||
|
||||
# ์ด๋ฏธ์ง ๋ก์ปฌ์ ์ ์ฅํ๊ธฐ
|
||||
for i, img in enumerate(images):
|
||||
img.save(f"./clip_guided_sd/image_{i}.png")
|
||||
```
|
||||
|
||||
์ด๋ฏธ์ง` ๋ชฉ๋ก์๋ ๋ก์ปฌ์ ์ ์ฅํ๊ฑฐ๋ ๊ตฌ๊ธ ์ฝ๋ฉ์ ์ง์ ํ์ํ ์ ์๋ PIL ์ด๋ฏธ์ง ๋ชฉ๋ก์ด ํฌํจ๋์ด ์์ต๋๋ค. ์์ฑ๋ ์ด๋ฏธ์ง๋ ๊ธฐ๋ณธ์ ์ผ๋ก ์์ ์ ์ธ ํ์ฐ์ ์ฌ์ฉํ๋ ๊ฒ๋ณด๋ค ํ์ง์ด ๋์ ๊ฒฝํฅ์ด ์์ต๋๋ค. ์๋ฅผ ๋ค์ด ์์ ์คํฌ๋ฆฝํธ๋ ๋ค์๊ณผ ๊ฐ์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค:
|
||||
|
||||
.
|
||||
|
||||
### One Step Unet
|
||||
|
||||
์์ "one-step-unet"๋ ๋ค์๊ณผ ๊ฐ์ด ์คํํ ์ ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="one_step_unet")
|
||||
pipe()
|
||||
```
|
||||
|
||||
**์ฐธ๊ณ **: ์ด ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๊ธฐ๋ฅ์ผ๋ก ์ ์ฉํ์ง ์์ผ๋ฉฐ ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ์ถ๊ฐํ ์ ์๋ ๋ฐฉ๋ฒ์ ์์์ผ ๋ฟ์
๋๋ค(https://github.com/huggingface/diffusers/issues/841 ์ฐธ์กฐ).
|
||||
|
||||
### Stable Diffusion Interpolation
|
||||
|
||||
๋ค์ ์ฝ๋๋ ์ต์ 8GB VRAM์ GPU์์ ์คํํ ์ ์์ผ๋ฉฐ ์ฝ 5๋ถ ์ ๋ ์์๋ฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
import torch
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"CompVis/stable-diffusion-v1-4",
|
||||
torch_dtype=torch.float16,
|
||||
safety_checker=None, # Very important for videos...lots of false positives while interpolating
|
||||
custom_pipeline="interpolate_stable_diffusion",
|
||||
).to("cuda")
|
||||
pipe.enable_attention_slicing()
|
||||
|
||||
frame_filepaths = pipe.walk(
|
||||
prompts=["a dog", "a cat", "a horse"],
|
||||
seeds=[42, 1337, 1234],
|
||||
num_interpolation_steps=16,
|
||||
output_dir="./dreams",
|
||||
batch_size=4,
|
||||
height=512,
|
||||
width=512,
|
||||
guidance_scale=8.5,
|
||||
num_inference_steps=50,
|
||||
)
|
||||
```
|
||||
|
||||
walk(...)` ํจ์์ ์ถ๋ ฅ์ `output_dir`์ ์ ์๋ ๋๋ก ํด๋์ ์ ์ฅ๋ ์ด๋ฏธ์ง ๋ชฉ๋ก์ ๋ฐํํฉ๋๋ค. ์ด ์ด๋ฏธ์ง๋ฅผ ์ฌ์ฉํ์ฌ ์์ ์ ์ผ๋ก ํ์ฐ๋๋ ๋์์์ ๋ง๋ค ์ ์์ต๋๋ค.
|
||||
|
||||
> ์์ ๋ ํ์ฐ์ ์ด์ฉํ ๋์์ ์ ์ ๋ฐฉ๋ฒ๊ณผ ๋ ๋ง์ ๊ธฐ๋ฅ์ ๋ํ ์์ธํ ๋ด์ฉ์ https://github.com/nateraw/stable-diffusion-videos ์์ ํ์ธํ์๊ธฐ ๋ฐ๋๋๋ค.
|
||||
|
||||
### Stable Diffusion Mega
|
||||
|
||||
The Stable Diffusion Mega ํ์ดํ๋ผ์ธ์ ์ฌ์ฉํ๋ฉด Stable Diffusion ํ์ดํ๋ผ์ธ์ ์ฃผ์ ์ฌ์ฉ ์ฌ๋ก๋ฅผ ๋จ์ผ ํด๋์ค์์ ์ฌ์ฉํ ์ ์์ต๋๋ค.
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
from diffusers import DiffusionPipeline
|
||||
import PIL
|
||||
import requests
|
||||
from io import BytesIO
|
||||
import torch
|
||||
|
||||
|
||||
def download_image(url):
|
||||
response = requests.get(url)
|
||||
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
|
||||
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"CompVis/stable-diffusion-v1-4",
|
||||
custom_pipeline="stable_diffusion_mega",
|
||||
torch_dtype=torch.float16,
|
||||
)
|
||||
pipe.to("cuda")
|
||||
pipe.enable_attention_slicing()
|
||||
|
||||
|
||||
### Text-to-Image
|
||||
|
||||
images = pipe.text2img("An astronaut riding a horse").images
|
||||
|
||||
### Image-to-Image
|
||||
|
||||
init_image = download_image(
|
||||
"https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
|
||||
)
|
||||
|
||||
prompt = "A fantasy landscape, trending on artstation"
|
||||
|
||||
images = pipe.img2img(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
|
||||
|
||||
### Inpainting
|
||||
|
||||
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
|
||||
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
|
||||
init_image = download_image(img_url).resize((512, 512))
|
||||
mask_image = download_image(mask_url).resize((512, 512))
|
||||
|
||||
prompt = "a cat sitting on a bench"
|
||||
images = pipe.inpaint(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.75).images
|
||||
```
|
||||
|
||||
์์ ํ์๋ ๊ฒ์ฒ๋ผ ํ๋์ ํ์ดํ๋ผ์ธ์์ 'ํ
์คํธ-์ด๋ฏธ์ง ๋ณํ', '์ด๋ฏธ์ง-์ด๋ฏธ์ง ๋ณํ', '์ธํ์ธํ
'์ ๋ชจ๋ ์คํํ ์ ์์ต๋๋ค.
|
||||
|
||||
### Long Prompt Weighting Stable Diffusion
|
||||
|
||||
ํ์ดํ๋ผ์ธ์ ์ฌ์ฉํ๋ฉด 77๊ฐ์ ํ ํฐ ๊ธธ์ด ์ ํ ์์ด ํ๋กฌํํธ๋ฅผ ์
๋ ฅํ ์ ์์ต๋๋ค. ๋ํ "()"๋ฅผ ์ฌ์ฉํ์ฌ ๋จ์ด ๊ฐ์ค์น๋ฅผ ๋์ด๊ฑฐ๋ "[]"๋ฅผ ์ฌ์ฉํ์ฌ ๋จ์ด ๊ฐ์ค์น๋ฅผ ๋ฎ์ถ ์ ์์ต๋๋ค.
|
||||
๋ํ ํ์ดํ๋ผ์ธ์ ์ฌ์ฉํ๋ฉด ๋จ์ผ ํด๋์ค์์ Stable Diffusion ํ์ดํ๋ผ์ธ์ ์ฃผ์ ์ฌ์ฉ ์ฌ๋ก๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค.
|
||||
|
||||
#### pytorch
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
import torch
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"hakurei/waifu-diffusion", custom_pipeline="lpw_stable_diffusion", torch_dtype=torch.float16
|
||||
)
|
||||
pipe = pipe.to("cuda")
|
||||
|
||||
prompt = "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes happy hood japanese_clothes kimono long_sleeves red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms"
|
||||
neg_prompt = "lowres, bad_anatomy, error_body, error_hair, error_arm, error_hands, bad_hands, error_fingers, bad_fingers, missing_fingers, error_legs, bad_legs, multiple_legs, missing_legs, error_lighting, error_shadow, error_reflection, text, error, extra_digit, fewer_digits, cropped, worst_quality, low_quality, normal_quality, jpeg_artifacts, signature, watermark, username, blurry"
|
||||
|
||||
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0]
|
||||
```
|
||||
|
||||
#### onnxruntime
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
import torch
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
"CompVis/stable-diffusion-v1-4",
|
||||
custom_pipeline="lpw_stable_diffusion_onnx",
|
||||
revision="onnx",
|
||||
provider="CUDAExecutionProvider",
|
||||
)
|
||||
|
||||
prompt = "a photo of an astronaut riding a horse on mars, best quality"
|
||||
neg_prompt = "lowres, bad anatomy, error body, error hair, error arm, error hands, bad hands, error fingers, bad fingers, missing fingers, error legs, bad legs, multiple legs, missing legs, error lighting, error shadow, error reflection, text, error, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
|
||||
|
||||
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0]
|
||||
```
|
||||
|
||||
ํ ํฐ ์ธ๋ฑ์ค ์ํ์ค ๊ธธ์ด๊ฐ ์ด ๋ชจ๋ธ์ ์ง์ ๋ ์ต๋ ์ํ์ค ๊ธธ์ด๋ณด๋ค ๊ธธ๋ฉด(*** > 77). ์ด ์ํ์ค๋ฅผ ๋ชจ๋ธ์์ ์คํํ๋ฉด ์ธ๋ฑ์ฑ ์ค๋ฅ๊ฐ ๋ฐ์ํฉ๋๋ค`. ์ ์์ ์ธ ํ์์ด๋ ๊ฑฑ์ ํ์ง ๋ง์ธ์.
|
||||
### Speech to Image
|
||||
|
||||
๋ค์ ์ฝ๋๋ ์ฌ์ ํ์ต๋ OpenAI whisper-small๊ณผ Stable Diffusion์ ์ฌ์ฉํ์ฌ ์ค๋์ค ์ํ์์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์ ์์ต๋๋ค.
|
||||
```Python
|
||||
import torch
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
from datasets import load_dataset
|
||||
from diffusers import DiffusionPipeline
|
||||
from transformers import (
|
||||
WhisperForConditionalGeneration,
|
||||
WhisperProcessor,
|
||||
)
|
||||
|
||||
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
|
||||
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
||||
|
||||
audio_sample = ds[3]
|
||||
|
||||
text = audio_sample["text"].lower()
|
||||
speech_data = audio_sample["audio"]["array"]
|
||||
|
||||
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small").to(device)
|
||||
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
|
||||
|
||||
diffuser_pipeline = DiffusionPipeline.from_pretrained(
|
||||
"CompVis/stable-diffusion-v1-4",
|
||||
custom_pipeline="speech_to_image_diffusion",
|
||||
speech_model=model,
|
||||
speech_processor=processor,
|
||||
|
||||
torch_dtype=torch.float16,
|
||||
)
|
||||
|
||||
diffuser_pipeline.enable_attention_slicing()
|
||||
diffuser_pipeline = diffuser_pipeline.to(device)
|
||||
|
||||
output = diffuser_pipeline(speech_data)
|
||||
plt.imshow(output.images[0])
|
||||
```
|
||||
์ ์์๋ ๋ค์์ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง๋ฅผ ๋ณด์
๋๋ค.
|
||||
|
||||

|
||||
56
docs/source/ko/using-diffusers/custom_pipeline_overview.mdx
Normal file
56
docs/source/ko/using-diffusers/custom_pipeline_overview.mdx
Normal file
@@ -0,0 +1,56 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ์ปค์คํ
ํ์ดํ๋ผ์ธ ๋ถ๋ฌ์ค๊ธฐ
|
||||
|
||||
[[open-in-colab]]
|
||||
|
||||
์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๋
ผ๋ฌธ์ ๋ช
์๋ ์๋์ ๊ตฌํ์ฒด์ ๋ค๋ฅธ ํํ๋ก ๊ตฌํ๋ ๋ชจ๋ [`DiffusionPipeline`] ํด๋์ค๋ฅผ ์๋ฏธํฉ๋๋ค. (์๋ฅผ ๋ค์ด, [`StableDiffusionControlNetPipeline`]๋ ["Text-to-Image Generation with ControlNet Conditioning"](https://arxiv.org/abs/2302.05543) ํด๋น) ์ด๋ค์ ์ถ๊ฐ ๊ธฐ๋ฅ์ ์ ๊ณตํ๊ฑฐ๋ ํ์ดํ๋ผ์ธ์ ์๋ ๊ตฌํ์ ํ์ฅํฉ๋๋ค.
|
||||
|
||||
[Speech to Image](https://github.com/huggingface/diffusers/tree/main/examples/community#speech-to-image) ๋๋ [Composable Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#composable-stable-diffusion) ๊ณผ ๊ฐ์ ๋ฉ์ง ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ด ๋ง์ด ์์ผ๋ฉฐ [์ฌ๊ธฐ์์](https://github.com/huggingface/diffusers/tree/main/examples/community) ๋ชจ๋ ๊ณต์ ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ์ฐพ์ ์ ์์ต๋๋ค.
|
||||
|
||||
ํ๋ธ์์ ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๋ก๋ํ๋ ค๋ฉด, ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๋ฆฌํฌ์งํ ๋ฆฌ ID์ (ํ์ดํ๋ผ์ธ ๊ฐ์ค์น ๋ฐ ๊ตฌ์ฑ ์์๋ฅผ ๋ก๋ํ๋ ค๋) ๋ชจ๋ธ์ ๋ฆฌํฌ์งํ ๋ฆฌ ID๋ฅผ ์ธ์๋ก ์ ๋ฌํด์ผ ํฉ๋๋ค. ์๋ฅผ ๋ค์ด, ์๋ ์์์์๋ `hf-internal-testing/diffusers-dummy-pipeline`์์ ๋๋ฏธ ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ค๊ณ , `google/ddpm-cifar10-32`์์ ํ์ดํ๋ผ์ธ์ ๊ฐ์ค์น์ ์ปดํฌ๋ํธ๋ค์ ๋ก๋ํฉ๋๋ค.
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
๐ ํ๊น
ํ์ด์ค ํ๋ธ์์ ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ค๋ ๊ฒ์ ๊ณง ํด๋น ์ฝ๋๊ฐ ์์ ํ๋ค๊ณ ์ ๋ขฐํ๋ ๊ฒ์
๋๋ค. ์ฝ๋๋ฅผ ์๋์ผ๋ก ๋ถ๋ฌ์ค๊ณ ์คํํ๊ธฐ ์์ ๋ฐ๋์ ์จ๋ผ์ธ์ผ๋ก ํด๋น ์ฝ๋์ ์ ๋ขฐ์ฑ์ ๊ฒ์ฌํ์ธ์!
|
||||
|
||||
</Tip>
|
||||
|
||||
```py
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
pipeline = DiffusionPipeline.from_pretrained(
|
||||
"google/ddpm-cifar10-32", custom_pipeline="hf-internal-testing/diffusers-dummy-pipeline"
|
||||
)
|
||||
```
|
||||
|
||||
๊ณต์ ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ค๋ ๊ฒ์ ๋น์ทํ์ง๋ง, ๊ณต์ ๋ฆฌํฌ์งํ ๋ฆฌ ID์์ ๊ฐ์ค์น๋ฅผ ๋ถ๋ฌ์ค๋ ๊ฒ๊ณผ ๋๋ถ์ด ํด๋น ํ์ดํ๋ผ์ธ ๋ด์ ์ปดํฌ๋ํธ๋ฅผ ์ง์ ์ง์ ํ๋ ๊ฒ ์ญ์ ๊ฐ๋ฅํฉ๋๋ค. ์๋ ์์ ๋ฅผ ๋ณด๋ฉด ์ปค๋ฎค๋ํฐ [CLIP Guided Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#clip-guided-stable-diffusion) ํ์ดํ๋ผ์ธ์ ๋ก๋ํ ๋, ํด๋น ํ์ดํ๋ผ์ธ์์ ์ฌ์ฉํ `clip_model` ์ปดํฌ๋ํธ์ `feature_extractor` ์ปดํฌ๋ํธ๋ฅผ ์ง์ ์ค์ ํ๋ ๊ฒ์ ํ์ธํ ์ ์์ต๋๋ค.
|
||||
|
||||
```py
|
||||
from diffusers import DiffusionPipeline
|
||||
from transformers import CLIPImageProcessor, CLIPModel
|
||||
|
||||
clip_model_id = "laion/CLIP-ViT-B-32-laion2B-s34B-b79K"
|
||||
|
||||
feature_extractor = CLIPImageProcessor.from_pretrained(clip_model_id)
|
||||
clip_model = CLIPModel.from_pretrained(clip_model_id)
|
||||
|
||||
pipeline = DiffusionPipeline.from_pretrained(
|
||||
"runwayml/stable-diffusion-v1-5",
|
||||
custom_pipeline="clip_guided_stable_diffusion",
|
||||
clip_model=clip_model,
|
||||
feature_extractor=feature_extractor,
|
||||
)
|
||||
```
|
||||
|
||||
์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๋ํ ์์ธํ ๋ด์ฉ์ [์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ](https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/custom_pipeline_examples) ๊ฐ์ด๋๋ฅผ ์ดํด๋ณด์ธ์. ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ ๋ฑ๋ก์ ๊ด์ฌ์ด ์๋ ๊ฒฝ์ฐ [์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๊ธฐ์ฌํ๋ ๋ฐฉ๋ฒ](https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/contribute_pipeline)์ ๋ํ ๊ฐ์ด๋๋ฅผ ํ์ธํ์ธ์ !
|
||||
57
docs/source/ko/using-diffusers/depth2img.mdx
Normal file
57
docs/source/ko/using-diffusers/depth2img.mdx
Normal file
@@ -0,0 +1,57 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Text-guided depth-to-image ์์ฑ
|
||||
|
||||
[[open-in-colab]]
|
||||
|
||||
[`StableDiffusionDepth2ImgPipeline`]์ ์ฌ์ฉํ๋ฉด ํ
์คํธ ํ๋กฌํํธ์ ์ด๊ธฐ ์ด๋ฏธ์ง๋ฅผ ์ ๋ฌํ์ฌ ์ ์ด๋ฏธ์ง์ ์์ฑ์ ์กฐ์ ํ ์ ์์ต๋๋ค. ๋ํ ์ด๋ฏธ์ง ๊ตฌ์กฐ๋ฅผ ๋ณด์กดํ๊ธฐ ์ํด `depth_map`์ ์ ๋ฌํ ์๋ ์์ต๋๋ค. `depth_map`์ด ์ ๊ณต๋์ง ์์ผ๋ฉด ํ์ดํ๋ผ์ธ์ ํตํฉ๋ [depth-estimation model](https://github.com/isl-org/MiDaS)์ ํตํด ์๋์ผ๋ก ๊น์ด๋ฅผ ์์ธกํฉ๋๋ค.
|
||||
|
||||
|
||||
๋จผ์ [`StableDiffusionDepth2ImgPipeline`]์ ์ธ์คํด์ค๋ฅผ ์์ฑํฉ๋๋ค:
|
||||
|
||||
```python
|
||||
import torch
|
||||
import requests
|
||||
from PIL import Image
|
||||
|
||||
from diffusers import StableDiffusionDepth2ImgPipeline
|
||||
|
||||
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
|
||||
"stabilityai/stable-diffusion-2-depth",
|
||||
torch_dtype=torch.float16,
|
||||
).to("cuda")
|
||||
```
|
||||
|
||||
์ด์ ํ๋กฌํํธ๋ฅผ ํ์ดํ๋ผ์ธ์ ์ ๋ฌํฉ๋๋ค. ํน์ ๋จ์ด๊ฐ ์ด๋ฏธ์ง ์์ฑ์ ๊ฐ์ด๋ ํ๋๊ฒ์ ๋ฐฉ์งํ๊ธฐ ์ํด `negative_prompt`๋ฅผ ์ ๋ฌํ ์๋ ์์ต๋๋ค:
|
||||
|
||||
```python
|
||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
init_image = Image.open(requests.get(url, stream=True).raw)
|
||||
prompt = "two tigers"
|
||||
n_prompt = "bad, deformed, ugly, bad anatomy"
|
||||
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0]
|
||||
image
|
||||
```
|
||||
|
||||
| Input | Output |
|
||||
|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/coco-cats.png" width="500"/> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/depth2img-tigers.png" width="500"/> |
|
||||
|
||||
์๋์ Spaces๋ฅผ ๊ฐ์ง๊ณ ๋๋ฉฐ depth map์ด ์๋ ์ด๋ฏธ์ง์ ์๋ ์ด๋ฏธ์ง์ ์ฐจ์ด๊ฐ ์๋์ง ํ์ธํด ๋ณด์ธ์!
|
||||
|
||||
<iframe
|
||||
src="https://radames-stable-diffusion-depth2img.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="500"
|
||||
></iframe>
|
||||
100
docs/source/ko/using-diffusers/img2img.mdx
Normal file
100
docs/source/ko/using-diffusers/img2img.mdx
Normal file
@@ -0,0 +1,100 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ํ
์คํธ ๊ธฐ๋ฐ image-to-image ์์ฑ
|
||||
|
||||
[[Colab์์ ์ด๊ธฐ]]
|
||||
|
||||
[`StableDiffusionImg2ImgPipeline`]์ ์ฌ์ฉํ๋ฉด ํ
์คํธ ํ๋กฌํํธ์ ์์ ์ด๋ฏธ์ง๋ฅผ ์ ๋ฌํ์ฌ ์ ์ด๋ฏธ์ง ์์ฑ์ ์กฐ๊ฑด์ ์ง์ ํ ์ ์์ต๋๋ค.
|
||||
|
||||
์์ํ๊ธฐ ์ ์ ํ์ํ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๊ฐ ๋ชจ๋ ์ค์น๋์ด ์๋์ง ํ์ธํ์ธ์:
|
||||
|
||||
```bash
|
||||
!pip install diffusers transformers ftfy accelerate
|
||||
```
|
||||
|
||||
[`nitrosocke/Ghibli-Diffusion`](https://huggingface.co/nitrosocke/Ghibli-Diffusion)๊ณผ ๊ฐ์ ์ฌ์ ํ์ต๋ stable diffusion ๋ชจ๋ธ๋ก [`StableDiffusionImg2ImgPipeline`]์ ์์ฑํ์ฌ ์์ํ์ธ์.
|
||||
|
||||
|
||||
```python
|
||||
import torch
|
||||
import requests
|
||||
from PIL import Image
|
||||
from io import BytesIO
|
||||
from diffusers import StableDiffusionImg2ImgPipeline
|
||||
|
||||
device = "cuda"
|
||||
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("nitrosocke/Ghibli-Diffusion", torch_dtype=torch.float16).to(
|
||||
device
|
||||
)
|
||||
```
|
||||
|
||||
์ด๊ธฐ ์ด๋ฏธ์ง๋ฅผ ๋ค์ด๋ก๋ํ๊ณ ์ฌ์ ์ฒ๋ฆฌํ์ฌ ํ์ดํ๋ผ์ธ์ ์ ๋ฌํ ์ ์์ต๋๋ค:
|
||||
|
||||
```python
|
||||
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
|
||||
|
||||
response = requests.get(url)
|
||||
init_image = Image.open(BytesIO(response.content)).convert("RGB")
|
||||
init_image.thumbnail((768, 768))
|
||||
init_image
|
||||
```
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_8_output_0.jpeg"/>
|
||||
</div>
|
||||
|
||||
<Tip>
|
||||
|
||||
๐ก `strength`๋ ์
๋ ฅ ์ด๋ฏธ์ง์ ์ถ๊ฐ๋๋ ๋
ธ์ด์ฆ์ ์์ ์ ์ดํ๋ 0.0์์ 1.0 ์ฌ์ด์ ๊ฐ์
๋๋ค. 1.0์ ๊ฐ๊น์ด ๊ฐ์ ๋ค์ํ ๋ณํ์ ํ์ฉํ์ง๋ง ์
๋ ฅ ์ด๋ฏธ์ง์ ์๋ฏธ์ ์ผ๋ก ์ผ์นํ์ง ์๋ ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค.
|
||||
|
||||
</Tip>
|
||||
|
||||
ํ๋กฌํํธ๋ฅผ ์ ์ํ๊ณ (์ง๋ธ๋ฆฌ ์คํ์ผ(Ghibli-style)์ ๋ง๊ฒ ์กฐ์ ๋ ์ด ์ฒดํฌํฌ์ธํธ์ ๊ฒฝ์ฐ ํ๋กฌํํธ ์์ `ghibli style` ํ ํฐ์ ๋ถ์ฌ์ผ ํฉ๋๋ค) ํ์ดํ๋ผ์ธ์ ์คํํฉ๋๋ค:
|
||||
|
||||
```python
|
||||
prompt = "ghibli style, a fantasy landscape with castles"
|
||||
generator = torch.Generator(device=device).manual_seed(1024)
|
||||
image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0]
|
||||
image
|
||||
```
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ghibli-castles.png"/>
|
||||
</div>
|
||||
|
||||
๋ค๋ฅธ ์ค์ผ์ค๋ฌ๋ก ์คํํ์ฌ ์ถ๋ ฅ์ ์ด๋ค ์ํฅ์ ๋ฏธ์น๋์ง ํ์ธํ ์๋ ์์ต๋๋ค:
|
||||
|
||||
```python
|
||||
from diffusers import LMSDiscreteScheduler
|
||||
|
||||
lms = LMSDiscreteScheduler.from_config(pipe.scheduler.config)
|
||||
pipe.scheduler = lms
|
||||
generator = torch.Generator(device=device).manual_seed(1024)
|
||||
image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0]
|
||||
image
|
||||
```
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lms-ghibli.png"/>
|
||||
</div>
|
||||
|
||||
์๋ ๊ณต๋ฐฑ์ ํ์ธํ๊ณ `strength` ๊ฐ์ ๋ค๋ฅด๊ฒ ์ค์ ํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์์ฑํด ๋ณด์ธ์. `strength`๋ฅผ ๋ฎ๊ฒ ์ค์ ํ๋ฉด ์๋ณธ ์ด๋ฏธ์ง์ ๋ ์ ์ฌํ ์ด๋ฏธ์ง๊ฐ ์์ฑ๋๋ ๊ฒ์ ํ์ธํ ์ ์์ต๋๋ค.
|
||||
|
||||
์์ ๋กญ๊ฒ ์ค์ผ์ค๋ฌ๋ฅผ [`LMSDiscreteScheduler`]๋ก ์ ํํ์ฌ ์ถ๋ ฅ์ ์ด๋ค ์ํฅ์ ๋ฏธ์น๋์ง ํ์ธํด ๋ณด์ธ์.
|
||||
|
||||
<iframe
|
||||
src="https://stevhliu-ghibli-img2img.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="500"
|
||||
></iframe>
|
||||
75
docs/source/ko/using-diffusers/inpaint.mdx
Normal file
75
docs/source/ko/using-diffusers/inpaint.mdx
Normal file
@@ -0,0 +1,75 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Text-guided ์ด๋ฏธ์ง ์ธํ์ธํ
(inpainting)
|
||||
|
||||
[[์ฝ๋ฉ์์ ์ด๊ธฐ]]
|
||||
|
||||
[`StableDiffusionInpaintPipeline`]์ ๋ง์คํฌ์ ํ
์คํธ ํ๋กฌํํธ๋ฅผ ์ ๊ณตํ์ฌ ์ด๋ฏธ์ง์ ํน์ ๋ถ๋ถ์ ํธ์งํ ์ ์๋๋ก ํฉ๋๋ค. ์ด ๊ธฐ๋ฅ์ ์ธํ์ธํ
์์
์ ์ํด ํน๋ณํ ํ๋ จ๋ [`runwayml/stable-diffusion-inpainting`](https://huggingface.co/runwayml/stable-diffusion-inpainting)๊ณผ ๊ฐ์ Stable Diffusion ๋ฒ์ ์ ์ฌ์ฉํฉ๋๋ค.
|
||||
|
||||
๋จผ์ [`StableDiffusionInpaintPipeline`] ์ธ์คํด์ค๋ฅผ ๋ถ๋ฌ์ต๋๋ค:
|
||||
|
||||
```python
|
||||
import PIL
|
||||
import requests
|
||||
import torch
|
||||
from io import BytesIO
|
||||
|
||||
from diffusers import StableDiffusionInpaintPipeline
|
||||
|
||||
pipeline = StableDiffusionInpaintPipeline.from_pretrained(
|
||||
"runwayml/stable-diffusion-inpainting",
|
||||
torch_dtype=torch.float16,
|
||||
)
|
||||
pipeline = pipeline.to("cuda")
|
||||
```
|
||||
|
||||
๋์ค์ ๊ต์ฒดํ ๊ฐ์์ง ์ด๋ฏธ์ง์ ๋ง์คํฌ๋ฅผ ๋ค์ด๋ก๋ํ์ธ์:
|
||||
|
||||
```python
|
||||
def download_image(url):
|
||||
response = requests.get(url)
|
||||
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
|
||||
|
||||
|
||||
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
|
||||
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
|
||||
|
||||
init_image = download_image(img_url).resize((512, 512))
|
||||
mask_image = download_image(mask_url).resize((512, 512))
|
||||
```
|
||||
|
||||
์ด์ ๋ง์คํฌ๋ฅผ ๋ค๋ฅธ ๊ฒ์ผ๋ก ๊ต์ฒดํ๋ผ๋ ํ๋กฌํํธ๋ฅผ ๋ง๋ค ์ ์์ต๋๋ค:
|
||||
|
||||
```python
|
||||
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
|
||||
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
|
||||
```
|
||||
|
||||
`image` | `mask_image` | `prompt` | output |
|
||||
:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:|
|
||||
<img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" alt="drawing" width="250"/> | <img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" alt="drawing" width="250"/> | ***Face of a yellow cat, high resolution, sitting on a park bench*** | <img src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/in_paint/yellow_cat_sitting_on_a_park_bench.png" alt="drawing" width="250"/> |
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
์ด์ ์ ์คํ์ ์ธ ์ธํ์ธํ
๊ตฌํ์์๋ ํ์ง์ด ๋ฎ์ ๋ค๋ฅธ ํ๋ก์ธ์ค๋ฅผ ์ฌ์ฉํ์ต๋๋ค. ์ด์ ๋ฒ์ ๊ณผ์ ํธํ์ฑ์ ๋ณด์ฅํ๊ธฐ ์ํด ์ ๋ชจ๋ธ์ด ํฌํจ๋์ง ์์ ์ฌ์ ํ์ต๋ ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ค๋ฉด ์ด์ ์ธํ์ธํ
๋ฐฉ๋ฒ์ด ๊ณ์ ์ ์ฉ๋ฉ๋๋ค.
|
||||
|
||||
</Tip>
|
||||
|
||||
์๋ Space์์ ์ด๋ฏธ์ง ์ธํ์ธํ
์ ์ง์ ํด๋ณด์ธ์!
|
||||
|
||||
<iframe
|
||||
src="https://runwayml-stable-diffusion-inpainting.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="500"
|
||||
></iframe>
|
||||
442
docs/source/ko/using-diffusers/loading.mdx
Normal file
442
docs/source/ko/using-diffusers/loading.mdx
Normal file
@@ -0,0 +1,442 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
|
||||
# ํ์ดํ๋ผ์ธ, ๋ชจ๋ธ, ์ค์ผ์ค๋ฌ ๋ถ๋ฌ์ค๊ธฐ
|
||||
|
||||
๊ธฐ๋ณธ์ ์ผ๋ก diffusion ๋ชจ๋ธ์ ๋ค์ํ ์ปดํฌ๋ํธ๋ค(๋ชจ๋ธ, ํ ํฌ๋์ด์ , ์ค์ผ์ค๋ฌ) ๊ฐ์ ๋ณต์กํ ์ํธ์์ฉ์ ๊ธฐ๋ฐ์ผ๋ก ๋์ํฉ๋๋ค. ๋ํจ์ ์ค(Diffusers)๋ ์ด๋ฌํ diffusion ๋ชจ๋ธ์ ๋ณด๋ค ์ฝ๊ณ ๊ฐํธํ API๋ก ์ ๊ณตํ๋ ๊ฒ์ ๋ชฉํ๋ก ์ค๊ณ๋์์ต๋๋ค. [`DiffusionPipeline`]์ diffusion ๋ชจ๋ธ์ด ๊ฐ๋ ๋ณต์ก์ฑ์ ํ๋์ ํ์ดํ๋ผ์ธ API๋ก ํตํฉํ๊ณ , ๋์์ ์ด๋ฅผ ๊ตฌ์ฑํ๋ ๊ฐ๊ฐ์ ์ปดํฌ๋ํธ๋ค์ ํ์คํฌ์ ๋ง์ถฐ ์ ์ฐํ๊ฒ ์ปค์คํฐ๋ง์ด์งํ ์ ์๋๋ก ์ง์ํ๊ณ ์์ต๋๋ค.
|
||||
|
||||
diffusion ๋ชจ๋ธ์ ํ๋ จ๊ณผ ์ถ๋ก ์ ํ์ํ ๋ชจ๋ ๊ฒ์ [`DiffusionPipeline.from_pretrained`] ๋ฉ์๋๋ฅผ ํตํด ์ ๊ทผํ ์ ์์ต๋๋ค. (์ด ๋ง์ ์๋ฏธ๋ ๋ค์ ๋จ๋ฝ์์ ๋ณด๋ค ์์ธํ๊ฒ ๋ค๋ค๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค.)
|
||||
|
||||
์ด ๋ฌธ์์์๋ ์ค๋ช
ํ ๋ด์ฉ์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
|
||||
|
||||
* ํ๋ธ๋ฅผ ํตํด ํน์ ๋ก์ปฌ๋ก ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ค๋ ๋ฒ
|
||||
|
||||
* ํ์ดํ๋ผ์ธ์ ๋ค๋ฅธ ์ปดํฌ๋ํธ๋ค์ ์ ์ฉํ๋ ๋ฒ
|
||||
* ์ค๋ฆฌ์ง๋ ์ฒดํฌํฌ์ธํธ๊ฐ ์๋ variant๋ฅผ ๋ถ๋ฌ์ค๋ ๋ฒ (variant๋ ๊ธฐ๋ณธ์ผ๋ก ์ค์ ๋ `fp32`๊ฐ ์๋ ๋ค๋ฅธ ๋ถ๋ ์์์ ํ์
(์: `fp16`)์ ์ฌ์ฉํ๊ฑฐ๋ Non-EMA ๊ฐ์ค์น๋ฅผ ์ฌ์ฉํ๋ ์ฒดํฌํฌ์ธํธ๋ค์ ์๋ฏธํฉ๋๋ค.)
|
||||
* ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ๋ฅผ ๋ถ๋ฌ์ค๋ ๋ฒ
|
||||
|
||||
|
||||
|
||||
## Diffusion ํ์ดํ๋ผ์ธ
|
||||
|
||||
<Tip>
|
||||
|
||||
๐ก [`DiffusionPipeline`] ํด๋์ค๊ฐ ๋์ํ๋ ๋ฐฉ์์ ๋ณด๋ค ์์ธํ ๋ด์ฉ์ด ๊ถ๊ธํ๋ค๋ฉด, [DiffusionPipeline explained](#diffusionpipeline์-๋ํด-์์๋ณด๊ธฐ) ์น์
์ ํ์ธํด๋ณด์ธ์.
|
||||
|
||||
</Tip>
|
||||
|
||||
[`DiffusionPipeline`] ํด๋์ค๋ diffusion ๋ชจ๋ธ์ [ํ๋ธ](https://huggingface.co/models?library=diffusers)๋ก๋ถํฐ ๋ถ๋ฌ์ค๋ ๊ฐ์ฅ ์ฌํํ๋ฉด์ ๋ณดํธ์ ์ธ ๋ฐฉ์์
๋๋ค. [`DiffusionPipeline.from_pretrained`] ๋ฉ์๋๋ ์ ํฉํ ํ์ดํ๋ผ์ธ ํด๋์ค๋ฅผ ์๋์ผ๋ก ํ์งํ๊ณ , ํ์ํ ๊ตฌ์ฑ์์(configuration)์ ๊ฐ์ค์น(weight) ํ์ผ๋ค์ ๋ค์ด๋ก๋ํ๊ณ ์บ์ฑํ ๋ค์, ํด๋น ํ์ดํ๋ผ์ธ ์ธ์คํด์ค๋ฅผ ๋ฐํํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
repo_id = "runwayml/stable-diffusion-v1-5"
|
||||
pipe = DiffusionPipeline.from_pretrained(repo_id)
|
||||
```
|
||||
|
||||
๋ฌผ๋ก [`DiffusionPipeline`] ํด๋์ค๋ฅผ ์ฌ์ฉํ์ง ์๊ณ , ๋ช
์์ ์ผ๋ก ์ง์ ํด๋น ํ์ดํ๋ผ์ธ ํด๋์ค๋ฅผ ๋ถ๋ฌ์ค๋ ๊ฒ๋ ๊ฐ๋ฅํฉ๋๋ค. ์๋ ์์ ์ฝ๋๋ ์ ์์์ ๋์ผํ ์ธ์คํด์ค๋ฅผ ๋ฐํํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionPipeline
|
||||
|
||||
repo_id = "runwayml/stable-diffusion-v1-5"
|
||||
pipe = StableDiffusionPipeline.from_pretrained(repo_id)
|
||||
```
|
||||
|
||||
[CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)์ด๋ [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) ๊ฐ์ ์ฒดํฌํฌ์ธํธ๋ค์ ๊ฒฝ์ฐ, ํ๋ ์ด์์ ๋ค์ํ ํ์คํฌ์ ํ์ฉ๋ ์ ์์ต๋๋ค. (์๋ฅผ ๋ค์ด ์์ ๋ ์ฒดํฌํฌ์ธํธ์ ๊ฒฝ์ฐ, text-to-image์ image-to-image์ ๋ชจ๋ ํ์ฉ๋ ์ ์์ต๋๋ค.) ๋ง์ฝ ์ด๋ฌํ ์ฒดํฌํฌ์ธํธ๋ค์ ๊ธฐ๋ณธ ์ค์ ํ์คํฌ๊ฐ ์๋ ๋ค๋ฅธ ํ์คํฌ์ ํ์ฉํ๊ณ ์ ํ๋ค๋ฉด, ํด๋น ํ์คํฌ์ ๋์๋๋ ํ์ดํ๋ผ์ธ(task-specific pipeline)์ ์ฌ์ฉํด์ผ ํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionImg2ImgPipeline
|
||||
|
||||
repo_id = "runwayml/stable-diffusion-v1-5"
|
||||
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(repo_id)
|
||||
```
|
||||
|
||||
|
||||
|
||||
### ๋ก์ปฌ ํ์ดํ๋ผ์ธ
|
||||
|
||||
ํ์ดํ๋ผ์ธ์ ๋ก์ปฌ๋ก ๋ถ๋ฌ์ค๊ณ ์ ํ๋ค๋ฉด, `git-lfs`๋ฅผ ์ฌ์ฉํ์ฌ ์ง์ ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ก์ปฌ ๋์คํฌ์ ๋ค์ด๋ก๋ ๋ฐ์์ผ ํฉ๋๋ค. ์๋์ ๋ช
๋ น์ด๋ฅผ ์คํํ๋ฉด `./stable-diffusion-v1-5`๋ ์ด๋ฆ์ผ๋ก ํด๋๊ฐ ๋ก์ปฌ๋์คํฌ์ ์์ฑ๋ฉ๋๋ค.
|
||||
|
||||
```bash
|
||||
git lfs install
|
||||
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
|
||||
```
|
||||
|
||||
๊ทธ๋ฐ ๋ค์ ํด๋น ๋ก์ปฌ ๊ฒฝ๋ก๋ฅผ [`~DiffusionPipeline.from_pretrained`] ๋ฉ์๋์ ์ ๋ฌํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
repo_id = "./stable-diffusion-v1-5"
|
||||
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id)
|
||||
```
|
||||
|
||||
์์ ์์์ฝ๋์ฒ๋ผ ๋ง์ฝ `repo_id`๊ฐ ๋ก์ปฌ ํจ์ค(local path)๋ผ๋ฉด, [`~DiffusionPipeline.from_pretrained`] ๋ฉ์๋๋ ์ด๋ฅผ ์๋์ผ๋ก ๊ฐ์งํ์ฌ ํ๋ธ์์ ํ์ผ์ ๋ค์ด๋ก๋ํ์ง ์์ต๋๋ค. ๋ง์ฝ ๋ก์ปฌ ๋์คํฌ์ ์ ์ฅ๋ ํ์ดํ๋ผ์ธ ์ฒดํฌํฌ์ธํธ๊ฐ ์ต์ ๋ฒ์ ์ด ์๋ ๊ฒฝ์ฐ์๋, ์ต์ ๋ฒ์ ์ ๋ค์ด๋ก๋ํ์ง ์๊ณ ๊ธฐ์กด ๋ก์ปฌ ๋์คํฌ์ ์ ์ฅ๋ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ฌ์ฉํ๋ค๋ ๊ฒ์ ์๋ฏธํฉ๋๋ค.
|
||||
|
||||
|
||||
|
||||
### ํ์ดํ๋ผ์ธ ๋ด๋ถ์ ์ปดํฌ๋ํธ ๊ต์ฒดํ๊ธฐ
|
||||
|
||||
ํ์ดํ๋ผ์ธ ๋ด๋ถ์ ์ปดํฌ๋ํธ๋ค์ ํธํ ๊ฐ๋ฅํ ๋ค๋ฅธ ์ปดํฌ๋ํธ๋ก ๊ต์ฒด๋ ์ ์์ต๋๋ค. ์ด์ ๊ฐ์ ์ปดํฌ๋ํธ ๊ต์ฒด๊ฐ ์ค์ํ ์ด์ ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
|
||||
|
||||
- ์ด๋ค ์ค์ผ์ค๋ฌ๋ฅผ ์ฌ์ฉํ ๊ฒ์ธ๊ฐ๋ ์์ฑ์๋์ ์์ฑํ์ง ๊ฐ์ ํธ๋ ์ด๋์คํ๋ฅผ ์ ์ํ๋ ์ค์ํ ์์์
๋๋ค.
|
||||
- diffusion ๋ชจ๋ธ ๋ด๋ถ์ ์ปดํฌ๋ํธ๋ค์ ์ผ๋ฐ์ ์ผ๋ก ๊ฐ๊ฐ ๋
๋ฆฝ์ ์ผ๋ก ํ๋ จ๋๊ธฐ ๋๋ฌธ์, ๋ ์ข์ ์ฑ๋ฅ์ ๋ณด์ฌ์ฃผ๋ ์ปดํฌ๋ํธ๊ฐ ์๋ค๋ฉด ๊ทธ๊ฑธ๋ก ๊ต์ฒดํ๋ ์์ผ๋ก ์ฑ๋ฅ์ ํฅ์์ํฌ ์ ์์ต๋๋ค.
|
||||
- ํ์ธ ํ๋ ๋จ๊ณ์์๋ ์ผ๋ฐ์ ์ผ๋ก UNet ํน์ ํ
์คํธ ์ธ์ฝ๋์ ๊ฐ์ ์ผ๋ถ ์ปดํฌ๋ํธ๋ค๋ง ํ๋ จํ๊ฒ ๋ฉ๋๋ค.
|
||||
|
||||
์ด๋ค ์ค์ผ์ค๋ฌ๋ค์ด ํธํ๊ฐ๋ฅํ์ง๋ `compatibles` ์์ฑ์ ํตํด ํ์ธํ ์ ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
repo_id = "runwayml/stable-diffusion-v1-5"
|
||||
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id)
|
||||
stable_diffusion.scheduler.compatibles
|
||||
```
|
||||
|
||||
์ด๋ฒ์๋ [`SchedulerMixin.from_pretrained`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํด์, ๊ธฐ์กด ๊ธฐ๋ณธ ์ค์ผ์ค๋ฌ์๋ [`PNDMScheduler`]๋ฅผ ๋ณด๋ค ์ฐ์ํ ์ฑ๋ฅ์ [`EulerDiscreteScheduler`]๋ก ๋ฐ๊ฟ๋ด
์๋ค. ์ค์ผ์ค๋ฌ๋ฅผ ๋ก๋ํ ๋๋ `subfolder` ์ธ์๋ฅผ ํตํด, ํด๋น ํ์ดํ๋ผ์ธ์ ๋ ํฌ์งํ ๋ฆฌ์์ [์ค์ผ์ค๋ฌ์ ๊ดํ ํ์ํด๋](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/scheduler)๋ฅผ ๋ช
์ํด์ฃผ์ด์ผ ํฉ๋๋ค.
|
||||
|
||||
๊ทธ ๋ค์ ์๋กญ๊ฒ ์์ฑํ [`EulerDiscreteScheduler`] ์ธ์คํด์ค๋ฅผ [`DiffusionPipeline`]์ `scheduler` ์ธ์์ ์ ๋ฌํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline, EulerDiscreteScheduler, DPMSolverMultistepScheduler
|
||||
|
||||
repo_id = "runwayml/stable-diffusion-v1-5"
|
||||
|
||||
scheduler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
|
||||
|
||||
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, scheduler=scheduler)
|
||||
```
|
||||
|
||||
### ์ธ์ดํํฐ ์ฒด์ปค
|
||||
|
||||
์คํ
์ด๋ธ diffusion๊ณผ ๊ฐ์ diffusion ๋ชจ๋ธ๋ค์ ์ ํดํ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์๋ ์์ต๋๋ค. ์ด๋ฅผ ์๋ฐฉํ๊ธฐ ์ํด ๋ํจ์ ์ค๋ ์์ฑ๋ ์ด๋ฏธ์ง์ ์ ํด์ฑ์ ํ๋จํ๋ [์ธ์ดํํฐ ์ฒด์ปค(safety checker)](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) ๊ธฐ๋ฅ์ ์ง์ํ๊ณ ์์ต๋๋ค. ๋ง์ฝ ์ธ์ดํํฐ ์ฒด์ปค์ ์ฌ์ฉ์ ์ํ์ง ์๋๋ค๋ฉด, `safety_checker` ์ธ์์ `None`์ ์ ๋ฌํด์ฃผ์๋ฉด ๋ฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
repo_id = "runwayml/stable-diffusion-v1-5"
|
||||
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, safety_checker=None)
|
||||
```
|
||||
|
||||
### ์ปดํฌ๋ํธ ์ฌ์ฌ์ฉ
|
||||
|
||||
๋ณต์์ ํ์ดํ๋ผ์ธ์ ๋์ผํ ๋ชจ๋ธ์ด ๋ฐ๋ณต์ ์ผ๋ก ์ฌ์ฉํ๋ค๋ฉด, ๊ตณ์ด ํด๋น ๋ชจ๋ธ์ ๋์ผํ ๊ฐ์ค์น๋ฅผ ์ค๋ณต์ผ๋ก RAM์ ๋ถ๋ฌ์ฌ ํ์๋ ์์ ๊ฒ์
๋๋ค. [`~DiffusionPipeline.components`] ์์ฑ์ ํตํด ํ์ดํ๋ผ์ธ ๋ด๋ถ์ ์ปดํฌ๋ํธ๋ค์ ์ฐธ์กฐํ ์ ์๋๋ฐ, ์ด๋ฒ ๋จ๋ฝ์์๋ ์ด๋ฅผ ํตํด ๋์ผํ ๋ชจ๋ธ ๊ฐ์ค์น๋ฅผ RAM์ ์ค๋ณต์ผ๋ก ๋ถ๋ฌ์ค๋ ๊ฒ์ ๋ฐฉ์งํ๋ ๋ฒ์ ๋ํด ์์๋ณด๊ฒ ์ต๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline
|
||||
|
||||
model_id = "runwayml/stable-diffusion-v1-5"
|
||||
stable_diffusion_txt2img = StableDiffusionPipeline.from_pretrained(model_id)
|
||||
|
||||
components = stable_diffusion_txt2img.components
|
||||
```
|
||||
|
||||
๊ทธ ๋ค์ ์ ์์ ์ฝ๋์์ ์ ์ธํ `components` ๋ณ์๋ฅผ ๋ค๋ฅธ ํ์ดํ๋ผ์ธ์ ์ ๋ฌํจ์ผ๋ก์จ, ๋ชจ๋ธ์ ๊ฐ์ค์น๋ฅผ ์ค๋ณต์ผ๋ก RAM์ ๋ก๋ฉํ์ง ์๊ณ , ๋์ผํ ์ปดํฌ๋ํธ๋ฅผ ์ฌ์ฌ์ฉํ ์ ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
stable_diffusion_img2img = StableDiffusionImg2ImgPipeline(**components)
|
||||
```
|
||||
|
||||
๋ฌผ๋ก ๊ฐ๊ฐ์ ์ปดํฌ๋ํธ๋ค์ ๋ฐ๋ก ๋ฐ๋ก ํ์ดํ๋ผ์ธ์ ์ ๋ฌํ ์๋ ์์ต๋๋ค. ์๋ฅผ ๋ค์ด `stable_diffusion_txt2img` ํ์ดํ๋ผ์ธ ์์ ์ปดํฌ๋ํธ๋ค ๊ฐ์ด๋ฐ์ ์ธ์ดํํฐ ์ฒด์ปค(`safety_checker`)์ ํผ์ณ ์ต์คํธ๋ํฐ(`feature_extractor`)๋ฅผ ์ ์ธํ ์ปดํฌ๋ํธ๋ค๋ง `stable_diffusion_img2img` ํ์ดํ๋ผ์ธ์์ ์ฌ์ฌ์ฉํ๋ ๋ฐฉ์ ์ญ์ ๊ฐ๋ฅํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline
|
||||
|
||||
model_id = "runwayml/stable-diffusion-v1-5"
|
||||
stable_diffusion_txt2img = StableDiffusionPipeline.from_pretrained(model_id)
|
||||
stable_diffusion_img2img = StableDiffusionImg2ImgPipeline(
|
||||
vae=stable_diffusion_txt2img.vae,
|
||||
text_encoder=stable_diffusion_txt2img.text_encoder,
|
||||
tokenizer=stable_diffusion_txt2img.tokenizer,
|
||||
unet=stable_diffusion_txt2img.unet,
|
||||
scheduler=stable_diffusion_txt2img.scheduler,
|
||||
safety_checker=None,
|
||||
feature_extractor=None,
|
||||
requires_safety_checker=False,
|
||||
)
|
||||
```
|
||||
|
||||
## Checkpoint variants
|
||||
|
||||
Variant๋ ์ผ๋ฐ์ ์ผ๋ก ๋ค์๊ณผ ๊ฐ์ ์ฒดํฌํฌ์ธํธ๋ค์ ์๋ฏธํฉ๋๋ค.
|
||||
|
||||
- `torch.float16`๊ณผ ๊ฐ์ด ์ ๋ฐ๋๋ ๋ ๋ฎ์ง๋ง, ์ฉ๋ ์ญ์ ๋ ์์ ๋ถ๋์์์ ํ์
์ ๊ฐ์ค์น๋ฅผ ์ฌ์ฉํ๋ ์ฒดํฌํฌ์ธํธ. *(๋ค๋ง ์ด์ ๊ฐ์ variant์ ๊ฒฝ์ฐ, ์ถ๊ฐ์ ์ธ ํ๋ จ๊ณผ CPUํ๊ฒฝ์์์ ๊ตฌ๋์ด ๋ถ๊ฐ๋ฅํฉ๋๋ค.)*
|
||||
- Non-EMA ๊ฐ์ค์น๋ฅผ ์ฌ์ฉํ๋ ์ฒดํฌํฌ์ธํธ. *(Non-EMA ๊ฐ์ค์น์ ๊ฒฝ์ฐ, ํ์ธ ํ๋ ๋จ๊ณ์์ ์ฌ์ฉํ๋ ๊ฒ์ด ๊ถ์ฅ๋๋๋ฐ, ์ถ๋ก ๋จ๊ณ์์ ์ฌ์ฉํ์ง ์๋ ๊ฒ์ด ๊ถ์ฅ๋ฉ๋๋ค.)*
|
||||
|
||||
<Tip>
|
||||
|
||||
๐ก ๋ชจ๋ธ ๊ตฌ์กฐ๋ ๋์ผํ์ง๋ง ์๋ก ๋ค๋ฅธ ํ์ต ํ๊ฒฝ์์ ์๋ก ๋ค๋ฅธ ๋ฐ์ดํฐ์
์ผ๋ก ํ์ต๋ ์ฒดํฌํฌ์ธํธ๋ค์ด ์์ ๊ฒฝ์ฐ, ํด๋น ์ฒดํฌํฌ์ธํธ๋ค์ variant ๋จ๊ณ๊ฐ ์๋ ๋ ํฌ์งํ ๋ฆฌ ๋จ๊ณ์์ ๋ถ๋ฆฌ๋์ด ๊ด๋ฆฌ๋์ด์ผ ํฉ๋๋ค. (์ฆ, ํด๋น ์ฒดํฌํฌ์ธํธ๋ค์ ์๋ก ๋ค๋ฅธ ๋ ํฌ์งํ ๋ฆฌ์์ ๋ฐ๋ก ๊ด๋ฆฌ๋์ด์ผ ํฉ๋๋ค. ์์: [`stable-diffusion-v1-4`], [`stable-diffusion-v1-5`]).
|
||||
|
||||
</Tip>
|
||||
|
||||
| **checkpoint type** | **weight name** | **argument for loading weights** |
|
||||
| ------------------- | ----------------------------------- | -------------------------------- |
|
||||
| original | diffusion_pytorch_model.bin | |
|
||||
| floating point | diffusion_pytorch_model.fp16.bin | `variant`, `torch_dtype` |
|
||||
| non-EMA | diffusion_pytorch_model.non_ema.bin | `variant` |
|
||||
|
||||
variant๋ฅผ ๋ก๋ํ ๋ 2๊ฐ์ ์ค์ํ argument๊ฐ ์์ต๋๋ค.
|
||||
|
||||
* `torch_dtype`์ ๋ถ๋ฌ์ฌ ์ฒดํฌํฌ์ธํธ์ ๋ถ๋์์์ ์ ์ ์ํฉ๋๋ค. ์๋ฅผ ๋ค์ด `torch_dtype=torch.float16`์ ๋ช
์ํจ์ผ๋ก์จ ๊ฐ์ค์น์ ๋ถ๋์์์ ํ์
์ `fl16`์ผ๋ก ๋ณํํ ์ ์์ต๋๋ค. (๋ง์ฝ ๋ฐ๋ก ์ค์ ํ์ง ์์ ๊ฒฝ์ฐ, ๊ธฐ๋ณธ๊ฐ์ผ๋ก `fp32` ํ์
์ ๊ฐ์ค์น๊ฐ ๋ก๋ฉ๋ฉ๋๋ค.) ๋ํ `variant` ์ธ์๋ฅผ ๋ช
์ํ์ง ์์ ์ฑ๋ก ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ถ๋ฌ์จ ๋ค์, ํด๋น ์ฒดํฌํฌ์ธํธ๋ฅผ `torch_dtype=torch.float16` ์ธ์๋ฅผ ํตํด `fp16` ํ์
์ผ๋ก ๋ณํํ๋ ๊ฒ ์ญ์ ๊ฐ๋ฅํฉ๋๋ค. ์ด ๊ฒฝ์ฐ ๊ธฐ๋ณธ์ผ๋ก ์ค์ ๋ `fp32` ๊ฐ์ค์น๊ฐ ๋จผ์ ๋ค์ด๋ก๋๋๊ณ , ํด๋น ๊ฐ์ค์น๋ค์ ๋ถ๋ฌ์จ ๋ค์ `fp16` ํ์
์ผ๋ก ๋ณํํ๊ฒ ๋ฉ๋๋ค.
|
||||
* `variant` ์ธ์๋ ๋ ํฌ์งํ ๋ฆฌ์์ ์ด๋ค variant๋ฅผ ๋ถ๋ฌ์ฌ ๊ฒ์ธ๊ฐ๋ฅผ ์ ์ํฉ๋๋ค. ๊ฐ๋ น [`diffusers/stable-diffusion-variants`](https://huggingface.co/diffusers/stable-diffusion-variants/tree/main/unet) ๋ ํฌ์งํ ๋ฆฌ๋ก๋ถํฐ `non_ema` ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ถ๋ฌ์ค๊ณ ์ ํ๋ค๋ฉด, `variant="non_ema"` ์ธ์๋ฅผ ์ ๋ฌํด์ผ ํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
# load fp16 variant
|
||||
stable_diffusion = DiffusionPipeline.from_pretrained(
|
||||
"runwayml/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16
|
||||
)
|
||||
# load non_ema variant
|
||||
stable_diffusion = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", variant="non_ema")
|
||||
```
|
||||
|
||||
๋ค๋ฅธ ๋ถ๋์์์ ํ์
์ ๊ฐ์ค์น ํน์ non-EMA ๊ฐ์ค์น๋ฅผ ์ฌ์ฉํ๋ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ์ฅํ๊ธฐ ์ํด์๋, [`DiffusionPipeline.save_pretrained`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํด์ผ ํ๋ฉฐ, ์ด ๋ `variant` ์ธ์๋ฅผ ๋ช
์ํด์ค์ผ ํฉ๋๋ค. ์๋์ ์ฒดํฌํฌ์ธํธ์ ๋์ผํ ํด๋์ variant๋ฅผ ์ ์ฅํด์ผ ํ๋ฉฐ, ์ด๋ ๊ฒ ํ๋ฉด ๋์ผํ ํด๋์์ ์ค๋ฆฌ์ง๋ ์ฒดํฌํฌ์ธํธ๊ณผ variant๋ฅผ ๋ชจ๋ ๋ถ๋ฌ์ฌ ์ ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
# save as fp16 variant
|
||||
stable_diffusion.save_pretrained("runwayml/stable-diffusion-v1-5", variant="fp16")
|
||||
# save as non-ema variant
|
||||
stable_diffusion.save_pretrained("runwayml/stable-diffusion-v1-5", variant="non_ema")
|
||||
```
|
||||
|
||||
๋ง์ฝ variant๋ฅผ ๊ธฐ์กด ํด๋์ ์ ์ฅํ์ง ์์ ๊ฒฝ์ฐ, `variant` ์ธ์๋ฅผ ๋ฐ๋์ ๋ช
์ํด์ผ ํฉ๋๋ค. ๊ทธ๋ ๊ฒ ํ์ง ์์ ๊ฒฝ์ฐ ์๋์ ์ค๋ฆฌ์ง๋ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ฐพ์ ์ ์๊ฒ ๋๊ธฐ ๋๋ฌธ์ ์๋ฌ๊ฐ ๋ฐ์ํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
# ๐ this won't work
|
||||
stable_diffusion = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", torch_dtype=torch.float16)
|
||||
# ๐ this works
|
||||
stable_diffusion = DiffusionPipeline.from_pretrained(
|
||||
"./stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16
|
||||
)
|
||||
```
|
||||
|
||||
### ๋ชจ๋ธ ๋ถ๋ฌ์ค๊ธฐ
|
||||
|
||||
๋ชจ๋ธ๋ค์ [`ModelMixin.from_pretrained`] ๋ฉ์๋๋ฅผ ํตํด ๋ถ๋ฌ์ฌ ์ ์์ต๋๋ค. ํด๋น ๋ฉ์๋๋ ์ต์ ๋ฒ์ ์ ๋ชจ๋ธ ๊ฐ์ค์น ํ์ผ๊ณผ ์ค์ ํ์ผ(configurations)์ ๋ค์ด๋ก๋ํ๊ณ ์บ์ฑํฉ๋๋ค. ๋ง์ฝ ์ด๋ฌํ ํ์ผ๋ค์ด ์ต์ ๋ฒ์ ์ผ๋ก ๋ก์ปฌ ์บ์์ ์ ์ฅ๋์ด ์๋ค๋ฉด, [`ModelMixin.from_pretrained`]๋ ๊ตณ์ด ํด๋น ํ์ผ๋ค์ ๋ค์ ๋ค์ด๋ก๋ํ์ง ์์ผ๋ฉฐ, ๊ทธ์ ์บ์์ ์๋ ์ต์ ํ์ผ๋ค์ ์ฌ์ฌ์ฉํฉ๋๋ค.
|
||||
|
||||
๋ชจ๋ธ์ `subfolder` ์ธ์์ ๋ช
์๋ ํ์ ํด๋๋ก๋ถํฐ ๋ก๋๋ฉ๋๋ค. ์๋ฅผ ๋ค์ด `runwayml/stable-diffusion-v1-5`์ UNet ๋ชจ๋ธ์ ๊ฐ์ค์น๋ [`unet`](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/unet) ํด๋์ ์ ์ฅ๋์ด ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import UNet2DConditionModel
|
||||
|
||||
repo_id = "runwayml/stable-diffusion-v1-5"
|
||||
model = UNet2DConditionModel.from_pretrained(repo_id, subfolder="unet")
|
||||
```
|
||||
|
||||
ํน์ [ํด๋น ๋ชจ๋ธ์ ๋ ํฌ์งํ ๋ฆฌ](https://huggingface.co/google/ddpm-cifar10-32/tree/main)๋ก๋ถํฐ ๋ค์ด๋ ํธ๋ก ๊ฐ์ ธ์ค๋ ๊ฒ ์ญ์ ๊ฐ๋ฅํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import UNet2DModel
|
||||
|
||||
repo_id = "google/ddpm-cifar10-32"
|
||||
model = UNet2DModel.from_pretrained(repo_id)
|
||||
```
|
||||
|
||||
๋ํ ์์ ๋ดค๋ `variant` ์ธ์๋ฅผ ๋ช
์ํจ์ผ๋ก์จ, Non-EMA๋ `fp16`์ ๊ฐ์ค์น๋ฅผ ๊ฐ์ ธ์ค๋ ๊ฒ ์ญ์ ๊ฐ๋ฅํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import UNet2DConditionModel
|
||||
|
||||
model = UNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet", variant="non-ema")
|
||||
model.save_pretrained("./local-unet", variant="non-ema")
|
||||
```
|
||||
|
||||
### ์ค์ผ์ค๋ฌ
|
||||
|
||||
์ค์ผ์ค๋ฌ๋ค์ [`SchedulerMixin.from_pretrained`] ๋ฉ์๋๋ฅผ ํตํด ๋ถ๋ฌ์ฌ ์ ์์ต๋๋ค. ๋ชจ๋ธ๊ณผ ๋ฌ๋ฆฌ ์ค์ผ์ค๋ฌ๋ ๋ณ๋์ ๊ฐ์ค์น๋ฅผ ๊ฐ์ง ์์ผ๋ฉฐ, ๋ฐ๋ผ์ ๋น์ฐํ ๋ณ๋์ ํ์ต๊ณผ์ ์ ์๊ตฌํ์ง ์์ต๋๋ค. ์ด๋ฌํ ์ค์ผ์ค๋ฌ๋ค์ (ํด๋น ์ค์ผ์ค๋ฌ ํ์ํด๋์) configration ํ์ผ์ ํตํด ์ ์๋ฉ๋๋ค.
|
||||
|
||||
์ฌ๋ฌ๊ฐ์ ์ค์ผ์ค๋ฌ๋ฅผ ๋ถ๋ฌ์จ๋ค๊ณ ํด์ ๋ง์ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์๋ชจํ๋ ๊ฒ์ ์๋๋ฉฐ, ๋ค์ํ ์ค์ผ์ค๋ฌ๋ค์ ๋์ผํ ์ค์ผ์ค๋ฌ configration์ ์ ์ฉํ๋ ๊ฒ ์ญ์ ๊ฐ๋ฅํฉ๋๋ค. ๋ค์ ์์ ์ฝ๋์์ ๋ถ๋ฌ์ค๋ ์ค์ผ์ค๋ฌ๋ค์ ๋ชจ๋ [`StableDiffusionPipeline`]๊ณผ ํธํ๋๋๋ฐ, ์ด๋ ๊ณง ํด๋น ์ค์ผ์ค๋ฌ๋ค์ ๋์ผํ ์ค์ผ์ค๋ฌ configration ํ์ผ์ ์ ์ฉํ ์ ์์์ ์๋ฏธํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import StableDiffusionPipeline
|
||||
from diffusers import (
|
||||
DDPMScheduler,
|
||||
DDIMScheduler,
|
||||
PNDMScheduler,
|
||||
LMSDiscreteScheduler,
|
||||
EulerDiscreteScheduler,
|
||||
EulerAncestralDiscreteScheduler,
|
||||
DPMSolverMultistepScheduler,
|
||||
)
|
||||
|
||||
repo_id = "runwayml/stable-diffusion-v1-5"
|
||||
|
||||
ddpm = DDPMScheduler.from_pretrained(repo_id, subfolder="scheduler")
|
||||
ddim = DDIMScheduler.from_pretrained(repo_id, subfolder="scheduler")
|
||||
pndm = PNDMScheduler.from_pretrained(repo_id, subfolder="scheduler")
|
||||
lms = LMSDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
|
||||
euler_anc = EulerAncestralDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
|
||||
euler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
|
||||
dpm = DPMSolverMultistepScheduler.from_pretrained(repo_id, subfolder="scheduler")
|
||||
|
||||
# replace `dpm` with any of `ddpm`, `ddim`, `pndm`, `lms`, `euler_anc`, `euler`
|
||||
pipeline = StableDiffusionPipeline.from_pretrained(repo_id, scheduler=dpm)
|
||||
```
|
||||
|
||||
### DiffusionPipeline์ ๋ํด ์์๋ณด๊ธฐ
|
||||
|
||||
ํด๋์ค ๋ฉ์๋๋ก์ [`DiffusionPipeline.from_pretrained`]์ 2๊ฐ์ง๋ฅผ ๋ด๋นํฉ๋๋ค.
|
||||
|
||||
- ์ฒซ์งธ๋ก, `from_pretrained` ๋ฉ์๋๋ ์ต์ ๋ฒ์ ์ ํ์ดํ๋ผ์ธ์ ๋ค์ด๋ก๋ํ๊ณ , ์บ์์ ์ ์ฅํฉ๋๋ค. ์ด๋ฏธ ๋ก์ปฌ ์บ์์ ์ต์ ๋ฒ์ ์ ํ์ดํ๋ผ์ธ์ด ์ ์ฅ๋์ด ์๋ค๋ฉด, [`DiffusionPipeline.from_pretrained`]์ ํด๋น ํ์ผ๋ค์ ๋ค์ ๋ค์ด๋ก๋ํ์ง ์๊ณ , ๋ก์ปฌ ์บ์์ ์ ์ฅ๋์ด ์๋ ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ต๋๋ค.
|
||||
- `model_index.json` ํ์ผ์ ํตํด ์ฒดํฌํฌ์ธํธ์ ๋์๋๋ ์ ํฉํ ํ์ดํ๋ผ์ธ ํด๋์ค๋ก ๋ถ๋ฌ์ต๋๋ค.
|
||||
|
||||
ํ์ดํ๋ผ์ธ์ ํด๋ ๊ตฌ์กฐ๋ ํด๋น ํ์ดํ๋ผ์ธ ํด๋์ค์ ๊ตฌ์กฐ์ ์ง์ ์ ์ผ๋ก ์ผ์นํฉ๋๋ค. ์๋ฅผ ๋ค์ด [`StableDiffusionPipeline`] ํด๋์ค๋ [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) ๋ ํฌ์งํ ๋ฆฌ์ ๋์๋๋ ๊ตฌ์กฐ๋ฅผ ๊ฐ์ต๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
repo_id = "runwayml/stable-diffusion-v1-5"
|
||||
pipeline = DiffusionPipeline.from_pretrained(repo_id)
|
||||
print(pipeline)
|
||||
```
|
||||
|
||||
์์ ์ฝ๋ ์ถ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ํ์ธํด๋ณด๋ฉด, `pipeline`์ [`StableDiffusionPipeline`]์ ์ธ์คํด์ค์ด๋ฉฐ, ๋ค์๊ณผ ๊ฐ์ด ์ด 7๊ฐ์ ์ปดํฌ๋ํธ๋ก ๊ตฌ์ฑ๋๋ค๋ ๊ฒ์ ์ ์ ์์ต๋๋ค.
|
||||
|
||||
- `"feature_extractor"`: [`~transformers.CLIPFeatureExtractor`]์ ์ธ์คํด์ค
|
||||
- `"safety_checker"`: ์ ํดํ ์ปจํ
์ธ ๋ฅผ ์คํฌ๋ฆฌ๋ํ๊ธฐ ์ํ [์ปดํฌ๋ํธ](https://github.com/huggingface/diffusers/blob/e55687e1e15407f60f32242027b7bb8170e58266/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L32)
|
||||
- `"scheduler"`: [`PNDMScheduler`]์ ์ธ์คํด์ค
|
||||
- `"text_encoder"`: [`~transformers.CLIPTextModel`]์ ์ธ์คํด์ค
|
||||
- `"tokenizer"`: a [`~transformers.CLIPTokenizer`]์ ์ธ์คํด์ค
|
||||
- `"unet"`: [`UNet2DConditionModel`]์ ์ธ์คํด์ค
|
||||
- `"vae"` [`AutoencoderKL`]์ ์ธ์คํด์ค
|
||||
|
||||
```json
|
||||
StableDiffusionPipeline {
|
||||
"feature_extractor": [
|
||||
"transformers",
|
||||
"CLIPImageProcessor"
|
||||
],
|
||||
"safety_checker": [
|
||||
"stable_diffusion",
|
||||
"StableDiffusionSafetyChecker"
|
||||
],
|
||||
"scheduler": [
|
||||
"diffusers",
|
||||
"PNDMScheduler"
|
||||
],
|
||||
"text_encoder": [
|
||||
"transformers",
|
||||
"CLIPTextModel"
|
||||
],
|
||||
"tokenizer": [
|
||||
"transformers",
|
||||
"CLIPTokenizer"
|
||||
],
|
||||
"unet": [
|
||||
"diffusers",
|
||||
"UNet2DConditionModel"
|
||||
],
|
||||
"vae": [
|
||||
"diffusers",
|
||||
"AutoencoderKL"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
ํ์ดํ๋ผ์ธ ์ธ์คํด์ค์ ์ปดํฌ๋ํธ๋ค์ [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)์ ํด๋ ๊ตฌ์กฐ์ ๋น๊ตํด๋ณผ ๊ฒฝ์ฐ, ๊ฐ๊ฐ์ ์ปดํฌ๋ํธ๋ง๋ค ๋ณ๋์ ํด๋๊ฐ ์์์ ํ์ธํ ์ ์์ต๋๋ค.
|
||||
|
||||
```
|
||||
.
|
||||
โโโ feature_extractor
|
||||
โ โโโ preprocessor_config.json
|
||||
โโโ model_index.json
|
||||
โโโ safety_checker
|
||||
โ โโโ config.json
|
||||
โ โโโ pytorch_model.bin
|
||||
โโโ scheduler
|
||||
โ โโโ scheduler_config.json
|
||||
โโโ text_encoder
|
||||
โ โโโ config.json
|
||||
โ โโโ pytorch_model.bin
|
||||
โโโ tokenizer
|
||||
โ โโโ merges.txt
|
||||
โ โโโ special_tokens_map.json
|
||||
โ โโโ tokenizer_config.json
|
||||
โ โโโ vocab.json
|
||||
โโโ unet
|
||||
โ โโโ config.json
|
||||
โ โโโ diffusion_pytorch_model.bin
|
||||
โโโ vae
|
||||
โโโ config.json
|
||||
โโโ diffusion_pytorch_model.bin
|
||||
```
|
||||
|
||||
๋ํ ๊ฐ๊ฐ์ ์ปดํฌ๋ํธ๋ค์ ํ์ดํ๋ผ์ธ ์ธ์คํด์ค์ ์์ฑ์ผ๋ก์จ ์ฐธ์กฐํ ์ ์์ต๋๋ค.
|
||||
|
||||
```py
|
||||
pipeline.tokenizer
|
||||
```
|
||||
|
||||
```python
|
||||
CLIPTokenizer(
|
||||
name_or_path="/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/39593d5650112b4cc580433f6b0435385882d819/tokenizer",
|
||||
vocab_size=49408,
|
||||
model_max_length=77,
|
||||
is_fast=False,
|
||||
padding_side="right",
|
||||
truncation_side="right",
|
||||
special_tokens={
|
||||
"bos_token": AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
|
||||
"eos_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
|
||||
"unk_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
|
||||
"pad_token": "<|endoftext|>",
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
๋ชจ๋ ํ์ดํ๋ผ์ธ์ `model_index.json` ํ์ผ์ ํตํด [`DiffusionPipeline`]์ ๋ค์๊ณผ ๊ฐ์ ์ ๋ณด๋ฅผ ์ ๋ฌํฉ๋๋ค.
|
||||
|
||||
- `_class_name` ๋ ์ด๋ค ํ์ดํ๋ผ์ธ ํด๋์ค๋ฅผ ์ฌ์ฉํด์ผ ํ๋์ง์ ๋ํด ์๋ ค์ค๋๋ค.
|
||||
- `_diffusers_version`๋ ์ด๋ค ๋ฒ์ ์ ๋ํจ์ ์ค๋ก ํ์ดํ๋ผ์ธ ์์ ๋ชจ๋ธ๋ค์ด ๋ง๋ค์ด์ก๋์ง๋ฅผ ์๋ ค์ค๋๋ค.
|
||||
- ๊ทธ ๋ค์์ ๊ฐ๊ฐ์ ์ปดํฌ๋ํธ๋ค์ด ์ด๋ค ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ์ด๋ค ํด๋์ค๋ก ๋ง๋ค์ด์ก๋์ง์ ๋ํด ์๋ ค์ค๋๋ค. (์๋ ์์์์ `"feature_extractor" : ["transformers", "CLIPImageProcessor"]`์ ๊ฒฝ์ฐ, `feature_extractor` ์ปดํฌ๋ํธ๋ `transformers` ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ `CLIPImageProcessor` ํด๋์ค๋ฅผ ํตํด ๋ง๋ค์ด์ก๋ค๋ ๊ฒ์ ์๋ฏธํฉ๋๋ค.)
|
||||
|
||||
```json
|
||||
{
|
||||
"_class_name": "StableDiffusionPipeline",
|
||||
"_diffusers_version": "0.6.0",
|
||||
"feature_extractor": [
|
||||
"transformers",
|
||||
"CLIPImageProcessor"
|
||||
],
|
||||
"safety_checker": [
|
||||
"stable_diffusion",
|
||||
"StableDiffusionSafetyChecker"
|
||||
],
|
||||
"scheduler": [
|
||||
"diffusers",
|
||||
"PNDMScheduler"
|
||||
],
|
||||
"text_encoder": [
|
||||
"transformers",
|
||||
"CLIPTextModel"
|
||||
],
|
||||
"tokenizer": [
|
||||
"transformers",
|
||||
"CLIPTokenizer"
|
||||
],
|
||||
"unet": [
|
||||
"diffusers",
|
||||
"UNet2DConditionModel"
|
||||
],
|
||||
"vae": [
|
||||
"diffusers",
|
||||
"AutoencoderKL"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
191
docs/source/ko/using-diffusers/other-formats.mdx
Normal file
191
docs/source/ko/using-diffusers/other-formats.mdx
Normal file
@@ -0,0 +1,191 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ๋ค์ํ Stable Diffusion ํฌ๋งท ๋ถ๋ฌ์ค๊ธฐ
|
||||
|
||||
Stable Diffusion ๋ชจ๋ธ๋ค์ ํ์ต ๋ฐ ์ ์ฅ๋ ํ๋ ์์ํฌ์ ๋ค์ด๋ก๋ ์์น์ ๋ฐ๋ผ ๋ค์ํ ํ์์ผ๋ก ์ ๊ณต๋ฉ๋๋ค. ์ด๋ฌํ ํ์์ ๐ค Diffusers์์ ์ฌ์ฉํ ์ ์๋๋ก ๋ณํํ๋ฉด ์ถ๋ก ์ ์ํ [๋ค์ํ ์ค์ผ์ค๋ฌ ์ฌ์ฉ](schedulers), ์ฌ์ฉ์ ์ง์ ํ์ดํ๋ผ์ธ ๊ตฌ์ถ, ์ถ๋ก ์๋ ์ต์ ํ๋ฅผ ์ํ ๋ค์ํ ๊ธฐ๋ฒ๊ณผ ๋ฐฉ๋ฒ ๋ฑ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์์ ์ง์ํ๋ ๋ชจ๋ ๊ธฐ๋ฅ์ ์ฌ์ฉํ ์ ์์ต๋๋ค.
|
||||
|
||||
<Tip>
|
||||
|
||||
์ฐ๋ฆฌ๋ `.safetensors` ํ์์ ์ถ์ฒํฉ๋๋ค. ์๋ํ๋ฉด ๊ธฐ์กด์ pickled ํ์ผ์ ์ทจ์ฝํ๊ณ ๋จธ์ ์์ ์ฝ๋๋ฅผ ์คํํ ๋ ์
์ฉ๋ ์ ์๋ ๊ฒ์ ๋นํด ํจ์ฌ ๋ ์์ ํฉ๋๋ค. (safetensors ๋ถ๋ฌ์ค๊ธฐ ๊ฐ์ด๋์์ ์์ธํ ์์๋ณด์ธ์.)
|
||||
|
||||
</Tip>
|
||||
|
||||
์ด ๊ฐ์ด๋์์๋ ๋ค๋ฅธ Stable Diffusion ํ์์ ๐ค Diffusers์ ํธํ๋๋๋ก ๋ณํํ๋ ๋ฐฉ๋ฒ์ ์ค๋ช
ํฉ๋๋ค.
|
||||
|
||||
## PyTorch .ckpt
|
||||
|
||||
์ฒดํฌํฌ์ธํธ ๋๋ `.ckpt` ํ์์ ์ผ๋ฐ์ ์ผ๋ก ๋ชจ๋ธ์ ์ ์ฅํ๋ ๋ฐ ์ฌ์ฉ๋ฉ๋๋ค. `.ckpt` ํ์ผ์ ์ ์ฒด ๋ชจ๋ธ์ ํฌํจํ๋ฉฐ ์ผ๋ฐ์ ์ผ๋ก ํฌ๊ธฐ๊ฐ ๋ช GB์
๋๋ค. `.ckpt` ํ์ผ์ [~StableDiffusionPipeline.from_ckpt] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ ์ง์ ๋ถ๋ฌ์์ ์ฌ์ฉํ ์๋ ์์ง๋ง, ์ผ๋ฐ์ ์ผ๋ก ๋ ๊ฐ์ง ํ์์ ๋ชจ๋ ์ฌ์ฉํ ์ ์๋๋ก `.ckpt` ํ์ผ์ ๐ค Diffusers๋ก ๋ณํํ๋ ๊ฒ์ด ๋ ์ข์ต๋๋ค.
|
||||
|
||||
`.ckpt` ํ์ผ์ ๋ณํํ๋ ๋ ๊ฐ์ง ์ต์
์ด ์์ต๋๋ค. Space๋ฅผ ์ฌ์ฉํ์ฌ ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ณํํ๊ฑฐ๋ ์คํฌ๋ฆฝํธ๋ฅผ ์ฌ์ฉํ์ฌ `.ckpt` ํ์ผ์ ๋ณํํฉ๋๋ค.
|
||||
|
||||
### Space๋ก ๋ณํํ๊ธฐ
|
||||
|
||||
`.ckpt` ํ์ผ์ ๋ณํํ๋ ๊ฐ์ฅ ์ฝ๊ณ ํธ๋ฆฌํ ๋ฐฉ๋ฒ์ SD์์ Diffusers๋ก ์คํ์ด์ค๋ฅผ ์ฌ์ฉํ๋ ๊ฒ์
๋๋ค. Space์ ์ง์นจ์ ๋ฐ๋ผ .ckpt ํ์ผ์ ๋ณํ ํ ์ ์์ต๋๋ค.
|
||||
|
||||
์ด ์ ๊ทผ ๋ฐฉ์์ ๊ธฐ๋ณธ ๋ชจ๋ธ์์๋ ์ ์๋ํ์ง๋ง ๋ ๋ง์ ์ฌ์ฉ์ ์ ์ ๋ชจ๋ธ์์๋ ์ด๋ ค์์ ๊ฒช์ ์ ์์ต๋๋ค. ๋น pull request๋ ์ค๋ฅ๋ฅผ ๋ฐํํ๋ฉด Space๊ฐ ์คํจํ ๊ฒ์
๋๋ค.
|
||||
์ด ๊ฒฝ์ฐ ์คํฌ๋ฆฝํธ๋ฅผ ์ฌ์ฉํ์ฌ `.ckpt` ํ์ผ์ ๋ณํํด ๋ณผ ์ ์์ต๋๋ค.
|
||||
|
||||
### ์คํฌ๋ฆฝํธ๋ก ๋ณํํ๊ธฐ
|
||||
|
||||
๐ค Diffusers๋ `.ckpt`ย ํ์ผ ๋ณํ์ ์ํ ๋ณํ ์คํฌ๋ฆฝํธ๋ฅผ ์ ๊ณตํฉ๋๋ค. ์ด ์ ๊ทผ ๋ฐฉ์์ ์์ Space๋ณด๋ค ๋ ์์ ์ ์
๋๋ค.
|
||||
|
||||
์์ํ๊ธฐ ์ ์ ์คํฌ๋ฆฝํธ๋ฅผ ์คํํ ๐ค Diffusers์ ๋ก์ปฌ ํด๋ก (clone)์ด ์๋์ง ํ์ธํ๊ณ Hugging Face ๊ณ์ ์ ๋ก๊ทธ์ธํ์ฌ pull request๋ฅผ ์ด๊ณ ๋ณํ๋ ๋ชจ๋ธ์ ํ๋ธ์ ํธ์ํ ์ ์๋๋ก ํ์ธ์.
|
||||
|
||||
```bash
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
์คํฌ๋ฆฝํธ๋ฅผ ์ฌ์ฉํ๋ ค๋ฉด:
|
||||
|
||||
1. ๋ณํํ๋ ค๋ `.ckpt`ย ํ์ผ์ด ํฌํจ๋ ๋ฆฌํฌ์งํ ๋ฆฌ๋ฅผ Git์ผ๋ก ํด๋ก (clone)ํฉ๋๋ค.
|
||||
|
||||
์ด ์์ ์์๋ TemporalNet .ckpt ํ์ผ์ ๋ณํํด ๋ณด๊ฒ ์ต๋๋ค:
|
||||
|
||||
```bash
|
||||
git lfs install
|
||||
git clone https://huggingface.co/CiaraRowles/TemporalNet
|
||||
```
|
||||
|
||||
2. ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ณํํ ๋ฆฌํฌ์งํ ๋ฆฌ์์ pull request๋ฅผ ์ฝ๋๋ค:
|
||||
|
||||
```bash
|
||||
cd TemporalNet && git fetch origin refs/pr/13:pr/13
|
||||
git checkout pr/13
|
||||
```
|
||||
|
||||
3. ๋ณํ ์คํฌ๋ฆฝํธ์์ ๊ตฌ์ฑํ ์
๋ ฅ ์ธ์๋ ์ฌ๋ฌ ๊ฐ์ง๊ฐ ์์ง๋ง ๊ฐ์ฅ ์ค์ํ ์ธ์๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค:
|
||||
|
||||
- `checkpoint_path`: ๋ณํํ `.ckpt` ํ์ผ์ ๊ฒฝ๋ก๋ฅผ ์
๋ ฅํฉ๋๋ค.
|
||||
- `original_config_file`: ์๋ ์ํคํ
์ฒ์ ๊ตฌ์ฑ์ ์ ์ํ๋ YAML ํ์ผ์
๋๋ค. ์ด ํ์ผ์ ์ฐพ์ ์ ์๋ ๊ฒฝ์ฐ `.ckpt` ํ์ผ์ ์ฐพ์ GitHub ๋ฆฌํฌ์งํ ๋ฆฌ์์ YAML ํ์ผ์ ๊ฒ์ํด ๋ณด์ธ์.
|
||||
- `dump_path`: ๋ณํ๋ ๋ชจ๋ธ์ ๊ฒฝ๋ก
|
||||
|
||||
์๋ฅผ ๋ค์ด, TemporalNet ๋ชจ๋ธ์ Stable Diffusion v1.5 ๋ฐ ControlNet ๋ชจ๋ธ์ด๊ธฐ ๋๋ฌธ์ ControlNet ๋ฆฌํฌ์งํ ๋ฆฌ์์ cldm_v15.yaml ํ์ผ์ ๊ฐ์ ธ์ฌ ์ ์์ต๋๋ค.
|
||||
|
||||
4. ์ด์ ์คํฌ๋ฆฝํธ๋ฅผ ์คํํ์ฌ .ckpt ํ์ผ์ ๋ณํํ ์ ์์ต๋๋ค:
|
||||
|
||||
```bash
|
||||
python ../diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py --checkpoint_path temporalnetv3.ckpt --original_config_file cldm_v15.yaml --dump_path ./ --controlnet
|
||||
```
|
||||
|
||||
5. ๋ณํ์ด ์๋ฃ๋๋ฉด ๋ณํ๋ ๋ชจ๋ธ์ ์
๋ก๋ํ๊ณ ๊ฒฐ๊ณผ๋ฌผ์ pull requestย [pull request](https://huggingface.co/CiaraRowles/TemporalNet/discussions/13)๋ฅผ ํ
์คํธํ์ธ์!
|
||||
|
||||
```bash
|
||||
git push origin pr/13:refs/pr/13
|
||||
```
|
||||
|
||||
## **Keras .pb or .h5**
|
||||
|
||||
๐งช ์ด ๊ธฐ๋ฅ์ ์คํ์ ์ธ ๊ธฐ๋ฅ์
๋๋ค. ํ์ฌ๋ก์๋ Stable Diffusion v1 ์ฒดํฌํฌ์ธํธ๋ง ๋ณํ KerasCV Space์์ ์ง์๋ฉ๋๋ค.
|
||||
|
||||
[KerasCV](https://keras.io/keras_cv/)๋ [Stable Diffusion](https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/stable_diffusion)ย v1 ๋ฐ v2์ ๋ํ ํ์ต์ ์ง์ํฉ๋๋ค. ๊ทธ๋ฌ๋ ์ถ๋ก ๋ฐ ๋ฐฐํฌ๋ฅผ ์ํ Stable Diffusion ๋ชจ๋ธ ์คํ์ ์ ํ์ ์ผ๋ก ์ง์ํ๋ ๋ฐ๋ฉด, ๐ค Diffusers๋ ๋ค์ํ [noise schedulers](https://huggingface.co/docs/diffusers/using-diffusers/schedulers),ย [flash attention](https://huggingface.co/docs/diffusers/optimization/xformers), andย [other optimization techniques](https://huggingface.co/docs/diffusers/optimization/fp16) ๋ฑ ์ด๋ฌํ ๋ชฉ์ ์ ์ํ ๋ณด๋ค ์๋ฒฝํ ๊ธฐ๋ฅ์ ๊ฐ์ถ๊ณ ์์ต๋๋ค.
|
||||
|
||||
[Convert KerasCV](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers)ย Space ๋ณํ์ `.pb`ย ๋๋ย `.h5`์ PyTorch๋ก ๋ณํํ ๋ค์, ์ถ๋ก ํ ์ ์๋๋ก [`StableDiffusionPipeline`] ์ผ๋ก ๊ฐ์ธ์ ์ค๋นํฉ๋๋ค. ๋ณํ๋ ์ฒดํฌํฌ์ธํธ๋ Hugging Face Hub์ ๋ฆฌํฌ์งํ ๋ฆฌ์ ์ ์ฅ๋ฉ๋๋ค.
|
||||
|
||||
์์ ๋ก, textual-inversion์ผ๋ก ํ์ต๋ `[sayakpaul/textual-inversion-kerasio](https://huggingface.co/sayakpaul/textual-inversion-kerasio/tree/main)`ย ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ณํํด ๋ณด๊ฒ ์ต๋๋ค. ์ด๊ฒ์ ํน์ ํ ํฐ ย `<my-funny-cat>`์ ์ฌ์ฉํ์ฌ ๊ณ ์์ด๋ก ์ด๋ฏธ์ง๋ฅผ ๊ฐ์ธํํฉ๋๋ค.
|
||||
|
||||
KerasCV Space ๋ณํ์์๋ ๋ค์์ ์
๋ ฅํ ์ ์์ต๋๋ค:
|
||||
|
||||
- Hugging Face ํ ํฐ.
|
||||
- UNet ๊ณผ ํ
์คํธ ์ธ์ฝ๋(text encoder) ๊ฐ์ค์น๋ฅผ ๋ค์ด๋ก๋ํ๋ ๊ฒฝ๋ก์
๋๋ค. ๋ชจ๋ธ์ ์ด๋ป๊ฒ ํ์ตํ ์ง ๋ฐฉ์์ ๋ฐ๋ผ, UNet๊ณผ ํ
์คํธ ์ธ์ฝ๋์ ๊ฒฝ๋ก๋ฅผ ๋ชจ๋ ์ ๊ณตํ ํ์๋ ์์ต๋๋ค. ์๋ฅผ ๋ค์ด, textual-inversion์๋ ํ
์คํธ ์ธ์ฝ๋์ ์๋ฒ ๋ฉ๋ง ํ์ํ๊ณ ํ
์คํธ-์ด๋ฏธ์ง(text-to-image) ๋ชจ๋ธ ๋ณํ์๋ UNet ๊ฐ์ค์น๋ง ํ์ํฉ๋๋ค.
|
||||
- Placeholder ํ ํฐ์ textual-inversion ๋ชจ๋ธ์๋ง ์ ์ฉ๋ฉ๋๋ค.
|
||||
- `output_repo_prefix`๋ ๋ณํ๋ ๋ชจ๋ธ์ด ์ ์ฅ๋๋ ๋ฆฌํฌ์งํ ๋ฆฌ์ ์ด๋ฆ์
๋๋ค.
|
||||
|
||||
**Submit**ย (์ ์ถ) ๋ฒํผ์ ํด๋ฆญํ๋ฉด KerasCV ์ฒดํฌํฌ์ธํธ๊ฐ ์๋์ผ๋ก ๋ณํ๋ฉ๋๋ค! ์ฒดํฌํฌ์ธํธ๊ฐ ์ฑ๊ณต์ ์ผ๋ก ๋ณํ๋๋ฉด, ๋ณํ๋ ์ฒดํฌํฌ์ธํธ๊ฐ ํฌํจ๋ ์ ๋ฆฌํฌ์งํ ๋ฆฌ๋ก ์ฐ๊ฒฐ๋๋ ๋งํฌ๊ฐ ํ์๋ฉ๋๋ค. ์ ๋ฆฌํฌ์งํ ๋ฆฌ๋ก ์ฐ๊ฒฐ๋๋ ๋งํฌ๋ฅผ ๋ฐ๋ผ๊ฐ๋ฉด ๋ณํ๋ ๋ชจ๋ธ์ ์ฌ์ฉํด ๋ณผ ์ ์๋ ์ถ๋ก ์์ ฏ์ด ํฌํจ๋ ๋ชจ๋ธ ์นด๋๊ฐ ์์ฑ๋ KerasCV Space ๋ณํ์ ํ์ธํ ์ ์์ต๋๋ค.
|
||||
|
||||
์ฝ๋๋ฅผ ์ฌ์ฉํ์ฌ ์ถ๋ก ์ ์คํํ๋ ค๋ฉด ๋ชจ๋ธ ์นด๋์ ์ค๋ฅธ์ชฝ ์๋จ ๋ชจ์๋ฆฌ์ ์๋ **Use in Diffusers**ย ๋ฒํผ์ ํด๋ฆญํ์ฌ ์์ ์ฝ๋๋ฅผ ๋ณต์ฌํ์ฌ ๋ถ์ฌ๋ฃ์ต๋๋ค:
|
||||
|
||||
```py
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline")
|
||||
```
|
||||
|
||||
๊ทธ๋ฌ๋ฉด ๋ค์๊ณผ ๊ฐ์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์ ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline")
|
||||
pipeline.to("cuda")
|
||||
|
||||
placeholder_token = "<my-funny-cat-token>"
|
||||
prompt = f"two {placeholder_token} getting married, photorealistic, high quality"
|
||||
image = pipeline(prompt, num_inference_steps=50).images[0]
|
||||
```
|
||||
|
||||
## **A1111 LoRA files**
|
||||
|
||||
[Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui)ย (A1111)์ Stable Diffusion์ ์ํด ๋๋ฆฌ ์ฌ์ฉ๋๋ ์น UI๋ก,ย [Civitai](https://civitai.com/) ์ ๊ฐ์ ๋ชจ๋ธ ๊ณต์ ํ๋ซํผ์ ์ง์ํฉ๋๋ค. ํนํ LoRA ๊ธฐ๋ฒ์ผ๋ก ํ์ต๋ ๋ชจ๋ธ์ ํ์ต ์๋๊ฐ ๋น ๋ฅด๊ณ ์์ ํ ํ์ธํ๋๋ ๋ชจ๋ธ๋ณด๋ค ํ์ผ ํฌ๊ธฐ๊ฐ ํจ์ฌ ์๊ธฐ ๋๋ฌธ์ ์ธ๊ธฐ๊ฐ ๋์ต๋๋ค.
|
||||
|
||||
๐ค Diffusers๋ [`~loaders.LoraLoaderMixin.load_lora_weights`]:๋ฅผ ์ฌ์ฉํ์ฌ A1111 LoRA ์ฒดํฌํฌ์ธํธ ๋ถ๋ฌ์ค๊ธฐ๋ฅผ ์ง์ํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
from diffusers import DiffusionPipeline, UniPCMultistepScheduler
|
||||
import torch
|
||||
|
||||
pipeline = DiffusionPipeline.from_pretrained(
|
||||
"andite/anything-v4.0", torch_dtype=torch.float16, safety_checker=None
|
||||
).to("cuda")
|
||||
pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config)
|
||||
```
|
||||
|
||||
Civitai์์ LoRA ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ค์ด๋ก๋ํ์ธ์; ์ด ์์ ์์๋ ย [Howls Moving Castle,Interior/Scenery LoRA (Ghibli Stlye)](https://civitai.com/models/14605?modelVersionId=19998) ์ฒดํฌํฌ์ธํธ๋ฅผ ์ฌ์ฉํ์ง๋ง, ์ด๋ค LoRA ์ฒดํฌํฌ์ธํธ๋ ์์ ๋กญ๊ฒ ์ฌ์ฉํด ๋ณด์ธ์!
|
||||
|
||||
```bash
|
||||
!wget https://civitai.com/api/download/models/19998 -O howls_moving_castle.safetensors
|
||||
```
|
||||
|
||||
๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ ํ์ดํ๋ผ์ธ์ LoRA ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ถ๋ฌ์ต๋๋ค:
|
||||
|
||||
```py
|
||||
pipeline.load_lora_weights(".", weight_name="howls_moving_castle.safetensors")
|
||||
```
|
||||
|
||||
์ด์ ํ์ดํ๋ผ์ธ์ ์ฌ์ฉํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์ ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
prompt = "masterpiece, illustration, ultra-detailed, cityscape, san francisco, golden gate bridge, california, bay area, in the snow, beautiful detailed starry sky"
|
||||
negative_prompt = "lowres, cropped, worst quality, low quality, normal quality, artifacts, signature, watermark, username, blurry, more than one bridge, bad architecture"
|
||||
|
||||
images = pipeline(
|
||||
prompt=prompt,
|
||||
negative_prompt=negative_prompt,
|
||||
width=512,
|
||||
height=512,
|
||||
num_inference_steps=25,
|
||||
num_images_per_prompt=4,
|
||||
generator=torch.manual_seed(0),
|
||||
).images
|
||||
```
|
||||
|
||||
๋ง์ง๋ง์ผ๋ก, ๋์คํ๋ ์ด์ ์ด๋ฏธ์ง๋ฅผ ํ์ํ๋ ํฌํผ ํจ์๋ฅผ ๋ง๋ญ๋๋ค:
|
||||
|
||||
```py
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def image_grid(imgs, rows=2, cols=2):
|
||||
w, h = imgs[0].size
|
||||
grid = Image.new("RGB", size=(cols * w, rows * h))
|
||||
|
||||
for i, img in enumerate(imgs):
|
||||
grid.paste(img, box=(i % cols * w, i // cols * h))
|
||||
return grid
|
||||
|
||||
|
||||
image_grid(images)
|
||||
```
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/a1111-lora-sf.png" />
|
||||
</div>
|
||||
17
docs/source/ko/using-diffusers/pipeline_overview.mdx
Normal file
17
docs/source/ko/using-diffusers/pipeline_overview.mdx
Normal file
@@ -0,0 +1,17 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Overview
|
||||
|
||||
ํ์ดํ๋ผ์ธ์ ๋
๋ฆฝ์ ์ผ๋ก ํ๋ จ๋ ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ๋ฅผ ํจ๊ป ๋ชจ์์ ์ถ๋ก ์ ์ํด diffusion ์์คํ
์ ๋น ๋ฅด๊ณ ์ฝ๊ฒ ์ฌ์ฉํ ์ ์๋ ๋ฐฉ๋ฒ์ ์ ๊ณตํ๋ end-to-end ํด๋์ค์
๋๋ค. ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ์ ํน์ ์กฐํฉ์ ํน์ํ ๊ธฐ๋ฅ๊ณผ ํจ๊ป [`StableDiffusionPipeline`] ๋๋ [`StableDiffusionControlNetPipeline`]๊ณผ ๊ฐ์ ํน์ ํ์ดํ๋ผ์ธ ์ ํ์ ์ ์ํฉ๋๋ค. ๋ชจ๋ ํ์ดํ๋ผ์ธ ์ ํ์ ๊ธฐ๋ณธ [`DiffusionPipeline`] ํด๋์ค์์ ์์๋ฉ๋๋ค. ์ด๋ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ๋ฌํ๋ฉด, ํ์ดํ๋ผ์ธ ์ ํ์ ์๋์ผ๋ก ๊ฐ์งํ๊ณ ํ์ํ ๊ตฌ์ฑ ์์๋ค์ ๋ถ๋ฌ์ต๋๋ค.
|
||||
|
||||
์ด ์น์
์์๋ unconditional ์ด๋ฏธ์ง ์์ฑ, text-to-image ์์ฑ์ ๋ค์ํ ํ
ํฌ๋๊ณผ ๋ณํ๋ฅผ ํ์ดํ๋ผ์ธ์์ ์ง์ํ๋ ์์
๋ค์ ์๊ฐํฉ๋๋ค. ํ๋กฌํํธ์ ์๋ ํน์ ๋จ์ด๊ฐ ์ถ๋ ฅ์ ์ํฅ์ ๋ฏธ์น๋ ๊ฒ์ ์กฐ์ ํ๊ธฐ ์ํด ์ฌํ์ฑ์ ์ํ ์๋ ์ค์ ๊ณผ ํ๋กฌํํธ์ ๊ฐ์ค์น๋ฅผ ๋ถ์ฌํ๋ ๊ฒ์ผ๋ก ์์ฑ ํ๋ก์ธ์ค๋ฅผ ๋ ์ ์ ์ดํ๋ ๋ฐฉ๋ฒ์ ๋ํด ๋ฐฐ์ธ ์ ์์ต๋๋ค. ๋ง์ง๋ง์ผ๋ก ์์ฑ์์๋ถํฐ ์ด๋ฏธ์ง ์์ฑ๊ณผ ๊ฐ์ ์ปค์คํ
์์
์ ์ํ ์ปค๋ฎค๋ํฐ ํ์ดํ๋ผ์ธ์ ๋ง๋๋ ๋ฐฉ๋ฒ์ ์ ์ ์์ต๋๋ค.
|
||||
63
docs/source/ko/using-diffusers/reusing_seeds.mdx
Normal file
63
docs/source/ko/using-diffusers/reusing_seeds.mdx
Normal file
@@ -0,0 +1,63 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Deterministic(๊ฒฐ์ ์ ) ์์ฑ์ ํตํ ์ด๋ฏธ์ง ํ์ง ๊ฐ์
|
||||
|
||||
์์ฑ๋ ์ด๋ฏธ์ง์ ํ์ง์ ๊ฐ์ ํ๋ ์ผ๋ฐ์ ์ธ ๋ฐฉ๋ฒ์ *๊ฒฐ์ ์ batch(๋ฐฐ์น) ์์ฑ*์ ์ฌ์ฉํ๋ ๊ฒ์
๋๋ค. ์ด ๋ฐฉ๋ฒ์ ์ด๋ฏธ์ง batch(๋ฐฐ์น)๋ฅผ ์์ฑํ๊ณ ๋ ๋ฒ์งธ ์ถ๋ก ๋ผ์ด๋์์ ๋ ์์ธํ ํ๋กฌํํธ์ ํจ๊ป ๊ฐ์ ํ ์ด๋ฏธ์ง ํ๋๋ฅผ ์ ํํ๋ ๊ฒ์
๋๋ค. ํต์ฌ์ ์ผ๊ด ์ด๋ฏธ์ง ์์ฑ์ ์ํด ํ์ดํ๋ผ์ธ์ [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html#generator) ๋ชฉ๋ก์ ์ ๋ฌํ๊ณ , ๊ฐ `Generator`๋ฅผ ์๋์ ์ฐ๊ฒฐํ์ฌ ์ด๋ฏธ์ง์ ์ฌ์ฌ์ฉํ ์ ์๋๋ก ํ๋ ๊ฒ์
๋๋ค.
|
||||
|
||||
์๋ฅผ ๋ค์ด [`runwayml/stable-diffusion-v1-5`](runwayml/stable-diffusion-v1-5)๋ฅผ ์ฌ์ฉํ์ฌ ๋ค์ ํ๋กฌํํธ์ ์ฌ๋ฌ ๋ฒ์ ์ ์์ฑํด ๋ด
์๋ค.
|
||||
|
||||
```py
|
||||
prompt = "Labrador in the style of Vermeer"
|
||||
```
|
||||
|
||||
(๊ฐ๋ฅํ๋ค๋ฉด) ํ์ดํ๋ผ์ธ์ [`DiffusionPipeline.from_pretrained`]๋ก ์ธ์คํด์คํํ์ฌ GPU์ ๋ฐฐ์นํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
>>> from diffusers import DiffusionPipeline
|
||||
|
||||
>>> pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
|
||||
>>> pipe = pipe.to("cuda")
|
||||
```
|
||||
|
||||
์ด์ ๋ค ๊ฐ์ ์๋ก ๋ค๋ฅธ `Generator`๋ฅผ ์ ์ํ๊ณ ๊ฐ `Generator`์ ์๋(`0` ~ `3`)๋ฅผ ํ ๋นํ์ฌ ๋์ค์ ํน์ ์ด๋ฏธ์ง์ ๋ํด `Generator`๋ฅผ ์ฌ์ฌ์ฉํ ์ ์๋๋ก ํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
>>> import torch
|
||||
|
||||
>>> generator = [torch.Generator(device="cuda").manual_seed(i) for i in range(4)]
|
||||
```
|
||||
|
||||
์ด๋ฏธ์ง๋ฅผ ์์ฑํ๊ณ ์ดํด๋ด
๋๋ค.
|
||||
|
||||
```python
|
||||
>>> images = pipe(prompt, generator=generator, num_images_per_prompt=4).images
|
||||
>>> images
|
||||
```
|
||||
|
||||

|
||||
|
||||
์ด ์์ ์์๋ ์ฒซ ๋ฒ์งธ ์ด๋ฏธ์ง๋ฅผ ๊ฐ์ ํ์ง๋ง ์ค์ ๋ก๋ ์ํ๋ ๋ชจ๋ ์ด๋ฏธ์ง๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค(์ฌ์ง์ด ๋ ๊ฐ์ ๋์ด ์๋ ์ด๋ฏธ์ง๋!). ์ฒซ ๋ฒ์งธ ์ด๋ฏธ์ง์์๋ ์๋๊ฐ '0'์ธ '์์ฑ๊ธฐ'๋ฅผ ์ฌ์ฉํ๊ธฐ ๋๋ฌธ์ ๋ ๋ฒ์งธ ์ถ๋ก ๋ผ์ด๋์์๋ ์ด '์์ฑ๊ธฐ'๋ฅผ ์ฌ์ฌ์ฉํ ๊ฒ์
๋๋ค. ์ด๋ฏธ์ง์ ํ์ง์ ๊ฐ์ ํ๋ ค๋ฉด ํ๋กฌํํธ์ ๋ช ๊ฐ์ง ํ
์คํธ๋ฅผ ์ถ๊ฐํฉ๋๋ค:
|
||||
|
||||
```python
|
||||
prompt = [prompt + t for t in [", highly realistic", ", artsy", ", trending", ", colorful"]]
|
||||
generator = [torch.Generator(device="cuda").manual_seed(0) for i in range(4)]
|
||||
```
|
||||
|
||||
์๋๊ฐ `0`์ธ ์ ๋๋ ์ดํฐ 4๊ฐ๋ฅผ ์์ฑํ๊ณ , ์ด์ ๋ผ์ด๋์ ์ฒซ ๋ฒ์งธ ์ด๋ฏธ์ง์ฒ๋ผ ๋ณด์ด๋ ๋ค๋ฅธ ์ด๋ฏธ์ง batch(๋ฐฐ์น)๋ฅผ ์์ฑํฉ๋๋ค!
|
||||
|
||||
```python
|
||||
>>> images = pipe(prompt, generator=generator).images
|
||||
>>> images
|
||||
```
|
||||
|
||||

|
||||
329
docs/source/ko/using-diffusers/schedulers.mdx
Normal file
329
docs/source/ko/using-diffusers/schedulers.mdx
Normal file
@@ -0,0 +1,329 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ์ค์ผ์ค๋ฌ
|
||||
|
||||
diffusion ํ์ดํ๋ผ์ธ์ diffusion ๋ชจ๋ธ, ์ค์ผ์ค๋ฌ ๋ฑ์ ์ปดํฌ๋ํธ๋ค๋ก ๊ตฌ์ฑ๋ฉ๋๋ค. ๊ทธ๋ฆฌ๊ณ ํ์ดํ๋ผ์ธ ์์ ์ผ๋ถ ์ปดํฌ๋ํธ๋ฅผ ๋ค๋ฅธ ์ปดํฌ๋ํธ๋ก ๊ต์ฒดํ๋ ์์ ์ปค์คํฐ๋ง์ด์ง ์ญ์ ๊ฐ๋ฅํฉ๋๋ค. ์ด์ ๊ฐ์ ์ปดํฌ๋ํธ ์ปค์คํฐ๋ง์ด์ง์ ๊ฐ์ฅ ๋ํ์ ์ธ ์์๊ฐ ๋ฐ๋ก [์ค์ผ์ค๋ฌ](../api/schedulers/overview.mdx)๋ฅผ ๊ต์ฒดํ๋ ๊ฒ์
๋๋ค.
|
||||
|
||||
|
||||
|
||||
์ค์ผ์ฅด๋ฌ๋ ๋ค์๊ณผ ๊ฐ์ด diffusion ์์คํ
์ ์ ๋ฐ์ ์ธ ๋๋
ธ์ด์ง ํ๋ก์ธ์ค๋ฅผ ์ ์ํฉ๋๋ค.
|
||||
|
||||
- ๋๋
ธ์ด์ง ์คํ
์ ์ผ๋ง๋ ๊ฐ์ ธ๊ฐ์ผ ํ ๊น?
|
||||
- ํ๋ฅ ์ ์ผ๋ก(stochastic) ํน์ ํ์ ์ ์ผ๋ก(deterministic)?
|
||||
- ๋๋
ธ์ด์ง ๋ ์ํ์ ์ฐพ์๋ด๊ธฐ ์ํด ์ด๋ค ์๊ณ ๋ฆฌ์ฆ์ ์ฌ์ฉํด์ผ ํ ๊น?
|
||||
|
||||
์ด๋ฌํ ํ๋ก์ธ์ค๋ ๋ค์ ๋ํดํ๊ณ , ๋๋
ธ์ด์ง ์๋์ ๋๋
ธ์ด์ง ํ๋ฆฌํฐ ์ฌ์ด์ ํธ๋ ์ด๋ ์คํ๋ฅผ ์ ์ํด์ผ ํ๋ ๋ฌธ์ ๊ฐ ๋ ์ ์์ต๋๋ค. ์ฃผ์ด์ง ํ์ดํ๋ผ์ธ์ ์ด๋ค ์ค์ผ์ค๋ฌ๊ฐ ๊ฐ์ฅ ์ ํฉํ์ง๋ฅผ ์ ๋์ ์ผ๋ก ํ๋จํ๋ ๊ฒ์ ๋งค์ฐ ์ด๋ ค์ด ์ผ์
๋๋ค. ์ด๋ก ์ธํด ์ผ๋จ ํด๋น ์ค์ผ์ค๋ฌ๋ฅผ ์ง์ ์ฌ์ฉํ์ฌ, ์์ฑ๋๋ ์ด๋ฏธ์ง๋ฅผ ์ง์ ๋์ผ๋ก ๋ณด๋ฉฐ, ์ ์ฑ์ ์ผ๋ก ์ฑ๋ฅ์ ํ๋จํด๋ณด๋ ๊ฒ์ด ์ถ์ฒ๋๊ณค ํฉ๋๋ค.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## ํ์ดํ๋ผ์ธ ๋ถ๋ฌ์ค๊ธฐ
|
||||
|
||||
๋จผ์ ์คํ
์ด๋ธ diffusion ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ค๋๋ก ํด๋ณด๊ฒ ์ต๋๋ค. ๋ฌผ๋ก ์คํ
์ด๋ธ diffusion์ ์ฌ์ฉํ๊ธฐ ์ํด์๋, ํ๊น
ํ์ด์ค ํ๋ธ์ ๋ฑ๋ก๋ ์ฌ์ฉ์์ฌ์ผ ํ๋ฉฐ, ๊ด๋ จ [๋ผ์ด์ผ์ค](https://huggingface.co/runwayml/stable-diffusion-v1-5)์ ๋์ํด์ผ ํ๋ค๋ ์ ์ ์์ง ๋ง์์ฃผ์ธ์.
|
||||
|
||||
*์ญ์ ์ฃผ: ๋ค๋ง, ํ์ฌ ์ ๊ท๋ก ์์ฑํ ํ๊น
ํ์ด์ค ๊ณ์ ์ ๋ํด์๋ ๋ผ์ด์ผ์ค ๋์๋ฅผ ์๊ตฌํ์ง ์๋ ๊ฒ์ผ๋ก ๋ณด์
๋๋ค!*
|
||||
|
||||
```python
|
||||
from huggingface_hub import login
|
||||
from diffusers import DiffusionPipeline
|
||||
import torch
|
||||
|
||||
# first we need to login with our access token
|
||||
login()
|
||||
|
||||
# Now we can download the pipeline
|
||||
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
|
||||
```
|
||||
|
||||
๋ค์์ผ๋ก, GPU๋ก ์ด๋ํฉ๋๋ค.
|
||||
|
||||
```python
|
||||
pipeline.to("cuda")
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## ์ค์ผ์ค๋ฌ ์ก์ธ์ค
|
||||
|
||||
์ค์ผ์ค๋ฌ๋ ์ธ์ ๋ ํ์ดํ๋ผ์ธ์ ์ปดํฌ๋ํธ๋ก์ ์กด์ฌํ๋ฉฐ, ์ผ๋ฐ์ ์ผ๋ก ํ์ดํ๋ผ์ธ ์ธ์คํด์ค ๋ด์ `scheduler`๋ผ๋ ์ด๋ฆ์ ์์ฑ(property)์ผ๋ก ์ ์๋์ด ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
pipeline.scheduler
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
```
|
||||
PNDMScheduler {
|
||||
"_class_name": "PNDMScheduler",
|
||||
"_diffusers_version": "0.8.0.dev0",
|
||||
"beta_end": 0.012,
|
||||
"beta_schedule": "scaled_linear",
|
||||
"beta_start": 0.00085,
|
||||
"clip_sample": false,
|
||||
"num_train_timesteps": 1000,
|
||||
"set_alpha_to_one": false,
|
||||
"skip_prk_steps": true,
|
||||
"steps_offset": 1,
|
||||
"trained_betas": null
|
||||
}
|
||||
```
|
||||
|
||||
์ถ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ํตํด, ์ฐ๋ฆฌ๋ ํด๋น ์ค์ผ์ค๋ฌ๊ฐ [`PNDMScheduler`]์ ์ธ์คํด์ค๋ผ๋ ๊ฒ์ ์ ์ ์์ต๋๋ค. ์ด์ [`PNDMScheduler`]์ ๋ค๋ฅธ ์ค์ผ์ค๋ฌ๋ค์ ์ฑ๋ฅ์ ๋น๊ตํด๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค. ๋จผ์ ํ
์คํธ์ ์ฌ์ฉํ ํ๋กฌํํธ๋ฅผ ๋ค์๊ณผ ๊ฐ์ด ์ ์ํด๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค.
|
||||
|
||||
```python
|
||||
prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition."
|
||||
```
|
||||
|
||||
๋ค์์ผ๋ก ์ ์ฌํ ์ด๋ฏธ์ง ์์ฑ์ ๋ณด์ฅํ๊ธฐ ์ํด์, ๋ค์๊ณผ ๊ฐ์ด ๋๋ค์๋๋ฅผ ๊ณ ์ ํด์ฃผ๋๋ก ํ๊ฒ ์ต๋๋ค.
|
||||
|
||||
```python
|
||||
generator = torch.Generator(device="cuda").manual_seed(8)
|
||||
image = pipeline(prompt, generator=generator).images[0]
|
||||
image
|
||||
```
|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_pndm.png" width="400"/>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
|
||||
|
||||
|
||||
## ์ค์ผ์ค๋ฌ ๊ต์ฒดํ๊ธฐ
|
||||
|
||||
๋ค์์ผ๋ก ํ์ดํ๋ผ์ธ์ ์ค์ผ์ค๋ฌ๋ฅผ ๋ค๋ฅธ ์ค์ผ์ค๋ฌ๋ก ๊ต์ฒดํ๋ ๋ฐฉ๋ฒ์ ๋ํด ์์๋ณด๊ฒ ์ต๋๋ค. ๋ชจ๋ ์ค์ผ์ค๋ฌ๋ [`SchedulerMixin.compatibles`]๋ผ๋ ์์ฑ(property)์ ๊ฐ๊ณ ์์ต๋๋ค. ํด๋น ์์ฑ์ **ํธํ ๊ฐ๋ฅํ** ์ค์ผ์ค๋ฌ๋ค์ ๋ํ ์ ๋ณด๋ฅผ ๋ด๊ณ ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
pipeline.scheduler.compatibles
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
```
|
||||
[diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler,
|
||||
diffusers.schedulers.scheduling_ddim.DDIMScheduler,
|
||||
diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler,
|
||||
diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler,
|
||||
diffusers.schedulers.scheduling_pndm.PNDMScheduler,
|
||||
diffusers.schedulers.scheduling_ddpm.DDPMScheduler,
|
||||
diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler]
|
||||
```
|
||||
|
||||
ํธํ๋๋ ์ค์ผ์ค๋ฌ๋ค์ ์ดํด๋ณด๋ฉด ์๋์ ๊ฐ์ต๋๋ค.
|
||||
|
||||
- [`LMSDiscreteScheduler`],
|
||||
- [`DDIMScheduler`],
|
||||
- [`DPMSolverMultistepScheduler`],
|
||||
- [`EulerDiscreteScheduler`],
|
||||
- [`PNDMScheduler`],
|
||||
- [`DDPMScheduler`],
|
||||
- [`EulerAncestralDiscreteScheduler`].
|
||||
|
||||
์์ ์ ์ํ๋ ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํด์ ๊ฐ๊ฐ์ ์ค์ผ์ค๋ฌ๋ค์ ๋น๊ตํด๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค.
|
||||
|
||||
๋จผ์ ํ์ดํ๋ผ์ธ ์์ ์ค์ผ์ค๋ฌ๋ฅผ ๋ฐ๊พธ๊ธฐ ์ํด [`ConfigMixin.config`] ์์ฑ๊ณผ [`ConfigMixin.from_config`] ๋ฉ์๋๋ฅผ ํ์ฉํด๋ณด๋ ค๊ณ ํฉ๋๋ค.
|
||||
|
||||
|
||||
|
||||
```python
|
||||
pipeline.scheduler.config
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
```
|
||||
FrozenDict([('num_train_timesteps', 1000),
|
||||
('beta_start', 0.00085),
|
||||
('beta_end', 0.012),
|
||||
('beta_schedule', 'scaled_linear'),
|
||||
('trained_betas', None),
|
||||
('skip_prk_steps', True),
|
||||
('set_alpha_to_one', False),
|
||||
('steps_offset', 1),
|
||||
('_class_name', 'PNDMScheduler'),
|
||||
('_diffusers_version', '0.8.0.dev0'),
|
||||
('clip_sample', False)])
|
||||
```
|
||||
|
||||
๊ธฐ์กด ์ค์ผ์ค๋ฌ์ config๋ฅผ ํธํ ๊ฐ๋ฅํ ๋ค๋ฅธ ์ค์ผ์ค๋ฌ์ ์ด์ํ๋ ๊ฒ ์ญ์ ๊ฐ๋ฅํฉ๋๋ค.
|
||||
|
||||
๋ค์ ์์๋ ๊ธฐ์กด ์ค์ผ์ค๋ฌ(`pipeline.scheduler`)๋ฅผ ๋ค๋ฅธ ์ข
๋ฅ์ ์ค์ผ์ค๋ฌ(`DDIMScheduler`)๋ก ๋ฐ๊พธ๋ ์ฝ๋์
๋๋ค. ๊ธฐ์กด ์ค์ผ์ค๋ฌ๊ฐ ๊ฐ๊ณ ์๋ config๋ฅผ `.from_config` ๋ฉ์๋์ ์ธ์๋ก ์ ๋ฌํ๋ ๊ฒ์ ํ์ธํ ์ ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import DDIMScheduler
|
||||
|
||||
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
|
||||
```
|
||||
|
||||
|
||||
|
||||
์ด์ ํ์ดํ๋ผ์ธ์ ์คํํด์ ๋ ์ค์ผ์ค๋ฌ ์ฌ์ด์ ์์ฑ๋ ์ด๋ฏธ์ง์ ํ๋ฆฌํฐ๋ฅผ ๋น๊ตํด๋ด
์๋ค.
|
||||
|
||||
```python
|
||||
generator = torch.Generator(device="cuda").manual_seed(8)
|
||||
image = pipeline(prompt, generator=generator).images[0]
|
||||
image
|
||||
```
|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_ddim.png" width="400"/>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
|
||||
|
||||
|
||||
## ์ค์ผ์ค๋ฌ๋ค ๋น๊ตํด๋ณด๊ธฐ
|
||||
|
||||
์ง๊ธ๊น์ง๋ [`PNDMScheduler`]์ [`DDIMScheduler`] ์ค์ผ์ค๋ฌ๋ฅผ ์คํํด๋ณด์์ต๋๋ค. ์์ง ๋น๊ตํด๋ณผ ์ค์ผ์ค๋ฌ๋ค์ด ๋ ๋ง์ด ๋จ์์์ผ๋ ๊ณ์ ๋น๊ตํด๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค.
|
||||
|
||||
|
||||
|
||||
[`LMSDiscreteScheduler`]์ ์ผ๋ฐ์ ์ผ๋ก ๋ ์ข์ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ฌ์ค๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import LMSDiscreteScheduler
|
||||
|
||||
pipeline.scheduler = LMSDiscreteScheduler.from_config(pipeline.scheduler.config)
|
||||
|
||||
generator = torch.Generator(device="cuda").manual_seed(8)
|
||||
image = pipeline(prompt, generator=generator).images[0]
|
||||
image
|
||||
```
|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_lms.png" width="400"/>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
|
||||
[`EulerDiscreteScheduler`]์ [`EulerAncestralDiscreteScheduler`] ๊ณ ์ 30๋ฒ์ inference step๋ง์ผ๋ก๋ ๋์ ํ๋ฆฌํฐ์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๋ ๊ฒ์ ์ ์ ์์ต๋๋ค.
|
||||
|
||||
```python
|
||||
from diffusers import EulerDiscreteScheduler
|
||||
|
||||
pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
|
||||
|
||||
generator = torch.Generator(device="cuda").manual_seed(8)
|
||||
image = pipeline(prompt, generator=generator, num_inference_steps=30).images[0]
|
||||
image
|
||||
```
|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_euler_discrete.png" width="400"/>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
|
||||
```python
|
||||
from diffusers import EulerAncestralDiscreteScheduler
|
||||
|
||||
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)
|
||||
|
||||
generator = torch.Generator(device="cuda").manual_seed(8)
|
||||
image = pipeline(prompt, generator=generator, num_inference_steps=30).images[0]
|
||||
image
|
||||
```
|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_euler_ancestral.png" width="400"/>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
|
||||
์ง๊ธ ์ด ๋ฌธ์๋ฅผ ์์ฑํ๋ ํ์์ ๊ธฐ์ค์์ , [`DPMSolverMultistepScheduler`]๊ฐ ์๊ฐ ๋๋น ๊ฐ์ฅ ์ข์ ํ์ง์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๋ ๊ฒ ๊ฐ์ต๋๋ค. 20๋ฒ ์ ๋์ ์คํ
๋ง์ผ๋ก๋ ์คํ๋ ์ ์์ต๋๋ค.
|
||||
|
||||
|
||||
|
||||
```python
|
||||
from diffusers import DPMSolverMultistepScheduler
|
||||
|
||||
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
|
||||
|
||||
generator = torch.Generator(device="cuda").manual_seed(8)
|
||||
image = pipeline(prompt, generator=generator, num_inference_steps=20).images[0]
|
||||
image
|
||||
```
|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_dpm.png" width="400"/>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
|
||||
๋ณด์๋ค์ํผ ์์ฑ๋ ์ด๋ฏธ์ง๋ค์ ๋งค์ฐ ๋น์ทํ๊ณ , ๋น์ทํ ํ๋ฆฌํฐ๋ฅผ ๋ณด์ด๋ ๊ฒ ๊ฐ์ต๋๋ค. ์ค์ ๋ก ์ด๋ค ์ค์ผ์ค๋ฌ๋ฅผ ์ ํํ ๊ฒ์ธ๊ฐ๋ ์ข
์ข
ํน์ ์ด์ฉ ์ฌ๋ก์ ๊ธฐ๋ฐํด์ ๊ฒฐ์ ๋๊ณค ํฉ๋๋ค. ๊ฒฐ๊ตญ ์ฌ๋ฌ ์ข
๋ฅ์ ์ค์ผ์ค๋ฌ๋ฅผ ์ง์ ์คํ์์ผ๋ณด๊ณ ๋์ผ๋ก ์ง์ ๋น๊ตํด์ ํ๋จํ๋ ๊ฒ ์ข์ ์ ํ์ผ ๊ฒ ๊ฐ์ต๋๋ค.
|
||||
|
||||
|
||||
|
||||
## Flax์์ ์ค์ผ์ค๋ฌ ๊ต์ฒดํ๊ธฐ
|
||||
|
||||
JAX/Flax ์ฌ์ฉ์์ธ ๊ฒฝ์ฐ ๊ธฐ๋ณธ ํ์ดํ๋ผ์ธ ์ค์ผ์ค๋ฌ๋ฅผ ๋ณ๊ฒฝํ ์๋ ์์ต๋๋ค. ๋ค์์ Flax Stable Diffusion ํ์ดํ๋ผ์ธ๊ณผ ์ด๊ณ ์ [DDPM-Solver++ ์ค์ผ์ค๋ฌ๋ฅผ](../api/schedulers/multistep_dpm_solver) ์ฌ์ฉํ์ฌ ์ถ๋ก ์ ์คํํ๋ ๋ฐฉ๋ฒ์ ๋ํ ์์์
๋๋ค .
|
||||
|
||||
```Python
|
||||
import jax
|
||||
import numpy as np
|
||||
from flax.jax_utils import replicate
|
||||
from flax.training.common_utils import shard
|
||||
|
||||
from diffusers import FlaxStableDiffusionPipeline, FlaxDPMSolverMultistepScheduler
|
||||
|
||||
model_id = "runwayml/stable-diffusion-v1-5"
|
||||
scheduler, scheduler_state = FlaxDPMSolverMultistepScheduler.from_pretrained(
|
||||
model_id,
|
||||
subfolder="scheduler"
|
||||
)
|
||||
pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
|
||||
model_id,
|
||||
scheduler=scheduler,
|
||||
revision="bf16",
|
||||
dtype=jax.numpy.bfloat16,
|
||||
)
|
||||
params["scheduler"] = scheduler_state
|
||||
|
||||
# Generate 1 image per parallel device (8 on TPUv2-8 or TPUv3-8)
|
||||
prompt = "a photo of an astronaut riding a horse on mars"
|
||||
num_samples = jax.device_count()
|
||||
prompt_ids = pipeline.prepare_inputs([prompt] * num_samples)
|
||||
|
||||
prng_seed = jax.random.PRNGKey(0)
|
||||
num_inference_steps = 25
|
||||
|
||||
# shard inputs and rng
|
||||
params = replicate(params)
|
||||
prng_seed = jax.random.split(prng_seed, jax.device_count())
|
||||
prompt_ids = shard(prompt_ids)
|
||||
|
||||
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
|
||||
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
|
||||
```
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
๋ค์ Flax ์ค์ผ์ค๋ฌ๋ *์์ง* Flax Stable Diffusion ํ์ดํ๋ผ์ธ๊ณผ ํธํ๋์ง ์์ต๋๋ค.
|
||||
|
||||
- `FlaxLMSDiscreteScheduler`
|
||||
- `FlaxDDPMScheduler`
|
||||
|
||||
</Tip>
|
||||
|
||||
@@ -0,0 +1,54 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Unconditional ์ด๋ฏธ์ง ์์ฑ
|
||||
|
||||
[[Colab์์ ์ด๊ธฐ]]
|
||||
|
||||
Unconditional ์ด๋ฏธ์ง ์์ฑ์ ๋น๊ต์ ๊ฐ๋จํ ์์
์
๋๋ค. ๋ชจ๋ธ์ด ํ
์คํธ๋ ์ด๋ฏธ์ง์ ๊ฐ์ ์ถ๊ฐ ์กฐ๊ฑด ์์ด ์ด๋ฏธ ํ์ต๋ ํ์ต ๋ฐ์ดํฐ์ ์ ์ฌํ ์ด๋ฏธ์ง๋ง ์์ฑํฉ๋๋ค.
|
||||
|
||||
['DiffusionPipeline']์ ์ถ๋ก ์ ์ํด ๋ฏธ๋ฆฌ ํ์ต๋ diffusion ์์คํ
์ ์ฌ์ฉํ๋ ๊ฐ์ฅ ์ฌ์ด ๋ฐฉ๋ฒ์
๋๋ค.
|
||||
|
||||
๋จผ์ ['DiffusionPipeline']์ ์ธ์คํด์ค๋ฅผ ์์ฑํ๊ณ ๋ค์ด๋ก๋ํ ํ์ดํ๋ผ์ธ์ [์ฒดํฌํฌ์ธํธ](https://huggingface.co/models?library=diffusers&sort=downloads)๋ฅผ ์ง์ ํฉ๋๋ค. ํ๋ธ์ ๐งจ diffusion ์ฒดํฌํฌ์ธํธ ์ค ํ๋๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค(์ฌ์ฉํ ์ฒดํฌํฌ์ธํธ๋ ๋๋น ์ด๋ฏธ์ง๋ฅผ ์์ฑํฉ๋๋ค).
|
||||
|
||||
<Tip>
|
||||
|
||||
๐ก ๋๋ง์ unconditional ์ด๋ฏธ์ง ์์ฑ ๋ชจ๋ธ์ ํ์ต์ํค๊ณ ์ถ์ผ์ ๊ฐ์? ํ์ต ๊ฐ์ด๋๋ฅผ ์ดํด๋ณด๊ณ ๋๋ง์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๋ ๋ฐฉ๋ฒ์ ์์๋ณด์ธ์.
|
||||
|
||||
</Tip>
|
||||
|
||||
|
||||
์ด ๊ฐ์ด๋์์๋ unconditional ์ด๋ฏธ์ง ์์ฑ์ ['DiffusionPipeline']๊ณผ [DDPM](https://arxiv.org/abs/2006.11239)์ ์ฌ์ฉํฉ๋๋ค:
|
||||
|
||||
```python
|
||||
>>> from diffusers import DiffusionPipeline
|
||||
|
||||
>>> generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128")
|
||||
```
|
||||
[diffusion ํ์ดํ๋ผ์ธ]์ ๋ชจ๋ ๋ชจ๋ธ๋ง, ํ ํฐํ, ์ค์ผ์ค๋ง ๊ตฌ์ฑ ์์๋ฅผ ๋ค์ด๋ก๋ํ๊ณ ์บ์ํฉ๋๋ค. ์ด ๋ชจ๋ธ์ ์ฝ 14์ต ๊ฐ์ ํ๋ผ๋ฏธํฐ๋ก ๊ตฌ์ฑ๋์ด ์๊ธฐ ๋๋ฌธ์ GPU์์ ์คํํ ๊ฒ์ ๊ฐ๋ ฅํ ๊ถ์ฅํฉ๋๋ค. PyTorch์์์ ๋ง์ฐฌ๊ฐ์ง๋ก ์ ๋๋ ์ดํฐ ๊ฐ์ฒด๋ฅผ GPU๋ก ์ฎ๊ธธ ์ ์์ต๋๋ค:
|
||||
```python
|
||||
>>> generator.to("cuda")
|
||||
```
|
||||
์ด์ ์ ๋๋ ์ดํฐ๋ฅผ ์ฌ์ฉํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์ ์์ต๋๋ค:
|
||||
```python
|
||||
>>> image = generator().images[0]
|
||||
```
|
||||
์ถ๋ ฅ์ ๊ธฐ๋ณธ์ ์ผ๋ก [PIL.Image](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) ๊ฐ์ฒด๋ก ๊ฐ์ธ์ง๋๋ค.
|
||||
|
||||
๋ค์์ ํธ์ถํ์ฌ ์ด๋ฏธ์ง๋ฅผ ์ ์ฅํ ์ ์์ต๋๋ค:
|
||||
```python
|
||||
>>> image.save("generated_image.png")
|
||||
```
|
||||
|
||||
์๋ ์คํ์ด์ค(๋ฐ๋ชจ ๋งํฌ)๋ฅผ ์ด์ฉํด ๋ณด๊ณ , ์ถ๋ก ๋จ๊ณ์ ๋งค๊ฐ๋ณ์๋ฅผ ์์ ๋กญ๊ฒ ์กฐ์ ํ์ฌ ์ด๋ฏธ์ง ํ์ง์ ์ด๋ค ์ํฅ์ ๋ฏธ์น๋์ง ํ์ธํด ๋ณด์ธ์!
|
||||
|
||||
<iframe src="https://stevhliu-ddpm-butterflies-128.hf.space" frameborder="0" width="850" height="500"></iframe>
|
||||
14
docs/source/ko/using-diffusers/using_safetensors.mdx
Normal file
14
docs/source/ko/using-diffusers/using_safetensors.mdx
Normal file
@@ -0,0 +1,14 @@
|
||||
# ์ธ์ดํ์ผ์๋ ๋ฌด์์ธ๊ฐ์?
|
||||
|
||||
[์ธ์ดํํ
์](https://github.com/huggingface/safetensors)๋ ํผํด์ ์ฌ์ฉํ๋ ํ์ดํ ์น๋ฅผ ์ฌ์ฉํ๋ ๊ธฐ์กด์ '.bin'๊ณผ๋ ๋ค๋ฅธ ํ์์
๋๋ค.
|
||||
|
||||
ํผํด์ ์
์์ ์ธ ํ์ผ์ด ์์์ ์ฝ๋๋ฅผ ์คํํ ์ ์๋ ์์ ํ์ง ์์ ๊ฒ์ผ๋ก ์
๋ช
์ด ๋์ต๋๋ค.
|
||||
ํ๋ธ ์์ฒด์์ ๋ฌธ์ ๋ฅผ ๋ฐฉ์งํ๊ธฐ ์ํด ๋
ธ๋ ฅํ๊ณ ์์ง๋ง ๋ง๋ณํต์น์ฝ์ ์๋๋๋ค.
|
||||
|
||||
์ธ์ดํํ
์์ ๊ฐ์ฅ ์ค์ํ ๋ชฉํ๋ ์ปดํจํฐ๋ฅผ ํ์ทจํ ์ ์๋ค๋ ์๋ฏธ์์ ๋จธ์ ๋ฌ๋ ๋ชจ๋ธ ๋ก๋ฉ์ *์์ ํ๊ฒ* ๋ง๋๋ ๊ฒ์
๋๋ค.
|
||||
|
||||
# ์ ์ธ์ดํ์ผ์๋ฅผ ์ฌ์ฉํ๋์?
|
||||
|
||||
**์ ์๋ ค์ง์ง ์์ ๋ชจ๋ธ์ ์ฌ์ฉํ๋ ค๋ ๊ฒฝ์ฐ, ๊ทธ๋ฆฌ๊ณ ํ์ผ์ ์ถ์ฒ๊ฐ ํ์คํ์ง ์์ ๊ฒฝ์ฐ "์์ ์ฑ"์ด ํ๋์ ์ด์ ๊ฐ ๋ ์ ์์ต๋๋ค.
|
||||
|
||||
๊ทธ๋ฆฌ๊ณ ๋ ๋ฒ์งธ ์ด์ ๋ **๋ก๋ฉ ์๋**์
๋๋ค. ์ธ์ดํ์ผ์๋ ์ผ๋ฐ ํผํด ํ์ผ๋ณด๋ค ํจ์ฌ ๋น ๋ฅด๊ฒ ๋ชจ๋ธ์ ํจ์ฌ ๋น ๋ฅด๊ฒ ๋ก๋ํ ์ ์์ต๋๋ค. ๋ชจ๋ธ์ ์ ํํ๋ ๋ฐ ๋ง์ ์๊ฐ์ ์๋นํ๋ ๊ฒฝ์ฐ, ์ด๋ ์์ฒญ๋ ์๊ฐ ์ ์ฝ์ด ๊ฐ๋ฅํฉ๋๋ค.
|
||||
290
docs/source/ko/using-diffusers/write_own_pipeline.mdx
Normal file
290
docs/source/ko/using-diffusers/write_own_pipeline.mdx
Normal file
@@ -0,0 +1,290 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ํ์ดํ๋ผ์ธ, ๋ชจ๋ธ ๋ฐ ์ค์ผ์ค๋ฌ ์ดํดํ๊ธฐ
|
||||
|
||||
[[colab์์ ์ด๊ธฐ]]
|
||||
|
||||
๐งจ Diffusers๋ ์ฌ์ฉ์ ์นํ์ ์ด๋ฉฐ ์ ์ฐํ ๋๊ตฌ ์์๋ก, ์ฌ์ฉ์ฌ๋ก์ ๋ง๊ฒ diffusion ์์คํ
์ ๊ตฌ์ถ ํ ์ ์๋๋ก ์ค๊ณ๋์์ต๋๋ค. ์ด ๋๊ตฌ ์์์ ํต์ฌ์ ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ์
๋๋ค. [`DiffusionPipeline`]์ ํธ์๋ฅผ ์ํด ์ด๋ฌํ ๊ตฌ์ฑ ์์๋ฅผ ๋ฒ๋ค๋ก ์ ๊ณตํ์ง๋ง, ํ์ดํ๋ผ์ธ์ ๋ถ๋ฆฌํ๊ณ ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ๋ฅผ ๊ฐ๋ณ์ ์ผ๋ก ์ฌ์ฉํด ์๋ก์ด diffusion ์์คํ
์ ๋ง๋ค ์๋ ์์ต๋๋ค.
|
||||
|
||||
์ด ํํ ๋ฆฌ์ผ์์๋ ๊ธฐ๋ณธ ํ์ดํ๋ผ์ธ๋ถํฐ ์์ํด Stable Diffusion ํ์ดํ๋ผ์ธ๊น์ง ์งํํ๋ฉฐ ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ๋ฅผ ์ฌ์ฉํด ์ถ๋ก ์ ์ํ diffusion ์์คํ
์ ์กฐ๋ฆฝํ๋ ๋ฐฉ๋ฒ์ ๋ฐฐ์๋๋ค.
|
||||
|
||||
## ๊ธฐ๋ณธ ํ์ดํ๋ผ์ธ ํด์ฒดํ๊ธฐ
|
||||
|
||||
ํ์ดํ๋ผ์ธ์ ์ถ๋ก ์ ์ํด ๋ชจ๋ธ์ ์คํํ๋ ๋น ๋ฅด๊ณ ์ฌ์ด ๋ฐฉ๋ฒ์ผ๋ก, ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๋ ๋ฐ ์ฝ๋๊ฐ 4์ค ์ด์ ํ์ํ์ง ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
>>> from diffusers import DDPMPipeline
|
||||
|
||||
>>> ddpm = DDPMPipeline.from_pretrained("google/ddpm-cat-256").to("cuda")
|
||||
>>> image = ddpm(num_inference_steps=25).images[0]
|
||||
>>> image
|
||||
```
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ddpm-cat.png" alt="Image of cat created from DDPMPipeline"/>
|
||||
</div>
|
||||
|
||||
์ ๋ง ์ฝ์ต๋๋ค. ๊ทธ๋ฐ๋ฐ ํ์ดํ๋ผ์ธ์ ์ด๋ป๊ฒ ์ด๋ ๊ฒ ํ ์ ์์์๊น์? ํ์ดํ๋ผ์ธ์ ์ธ๋ถํํ์ฌ ๋ด๋ถ์์ ์ด๋ค ์ผ์ด ์ผ์ด๋๊ณ ์๋์ง ์ดํด๋ณด๊ฒ ์ต๋๋ค.
|
||||
|
||||
์ ์์์์ ํ์ดํ๋ผ์ธ์๋ [`UNet2DModel`] ๋ชจ๋ธ๊ณผ [`DDPMScheduler`]๊ฐ ํฌํจ๋์ด ์์ต๋๋ค. ํ์ดํ๋ผ์ธ์ ์ํ๋ ์ถ๋ ฅ ํฌ๊ธฐ์ ๋๋ค ๋
ธ์ด์ฆ๋ฅผ ๋ฐ์ ๋ชจ๋ธ์ ์ฌ๋ฌ๋ฒ ํต๊ณผ์์ผ ์ด๋ฏธ์ง์ ๋
ธ์ด์ฆ๋ฅผ ์ ๊ฑฐํฉ๋๋ค. ๊ฐ timestep์์ ๋ชจ๋ธ์ *noise residual*์ ์์ธกํ๊ณ ์ค์ผ์ค๋ฌ๋ ์ด๋ฅผ ์ฌ์ฉํ์ฌ ๋
ธ์ด์ฆ๊ฐ ์ ์ ์ด๋ฏธ์ง๋ฅผ ์์ธกํฉ๋๋ค. ํ์ดํ๋ผ์ธ์ ์ง์ ๋ ์ถ๋ก ์คํ
์์ ๋๋ฌํ ๋๊น์ง ์ด ๊ณผ์ ์ ๋ฐ๋ณตํฉ๋๋ค.
|
||||
|
||||
๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ๋ฅผ ๋ณ๋๋ก ์ฌ์ฉํ์ฌ ํ์ดํ๋ผ์ธ์ ๋ค์ ์์ฑํ๊ธฐ ์ํด ์์ฒด์ ์ธ ๋
ธ์ด์ฆ ์ ๊ฑฐ ํ๋ก์ธ์ค๋ฅผ ์์ฑํด ๋ณด๊ฒ ์ต๋๋ค.
|
||||
|
||||
1. ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ๋ฅผ ๋ถ๋ฌ์ต๋๋ค:
|
||||
|
||||
```py
|
||||
>>> from diffusers import DDPMScheduler, UNet2DModel
|
||||
|
||||
>>> scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
|
||||
>>> model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
|
||||
```
|
||||
|
||||
2. ๋
ธ์ด์ฆ ์ ๊ฑฐ ํ๋ก์ธ์ค๋ฅผ ์คํํ timestep ์๋ฅผ ์ค์ ํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
>>> scheduler.set_timesteps(50)
|
||||
```
|
||||
|
||||
3. ์ค์ผ์ค๋ฌ์ timestep์ ์ค์ ํ๋ฉด ๊ท ๋ฑํ ๊ฐ๊ฒฉ์ ๊ตฌ์ฑ ์์๋ฅผ ๊ฐ์ง ํ
์๊ฐ ์์ฑ๋ฉ๋๋ค.(์ด ์์์์๋ 50๊ฐ) ๊ฐ ์์๋ ๋ชจ๋ธ์ด ์ด๋ฏธ์ง์ ๋
ธ์ด์ฆ๋ฅผ ์ ๊ฑฐํ๋ ์๊ฐ ๊ฐ๊ฒฉ์ ํด๋นํฉ๋๋ค. ๋์ค์ ๋
ธ์ด์ฆ ์ ๊ฑฐ ๋ฃจํ๋ฅผ ๋ง๋ค ๋ ์ด ํ
์๋ฅผ ๋ฐ๋ณตํ์ฌ ์ด๋ฏธ์ง์ ๋
ธ์ด์ฆ๋ฅผ ์ ๊ฑฐํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
>>> scheduler.timesteps
|
||||
tensor([980, 960, 940, 920, 900, 880, 860, 840, 820, 800, 780, 760, 740, 720,
|
||||
700, 680, 660, 640, 620, 600, 580, 560, 540, 520, 500, 480, 460, 440,
|
||||
420, 400, 380, 360, 340, 320, 300, 280, 260, 240, 220, 200, 180, 160,
|
||||
140, 120, 100, 80, 60, 40, 20, 0])
|
||||
```
|
||||
|
||||
4. ์ํ๋ ์ถ๋ ฅ๊ณผ ๊ฐ์ ๋ชจ์์ ๊ฐ์ง ๋๋ค ๋
ธ์ด์ฆ๋ฅผ ์์ฑํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
>>> import torch
|
||||
|
||||
>>> sample_size = model.config.sample_size
|
||||
>>> noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda")
|
||||
```
|
||||
|
||||
5. ์ด์ timestep์ ๋ฐ๋ณตํ๋ ๋ฃจํ๋ฅผ ์์ฑํฉ๋๋ค. ๊ฐ timestep์์ ๋ชจ๋ธ์ [`UNet2DModel.forward`]๋ฅผ ํตํด noisy residual์ ๋ฐํํฉ๋๋ค. ์ค์ผ์ค๋ฌ์ [`~DDPMScheduler.step`] ๋ฉ์๋๋ noisy residual, timestep, ๊ทธ๋ฆฌ๊ณ ์
๋ ฅ์ ๋ฐ์ ์ด์ timestep์์ ์ด๋ฏธ์ง๋ฅผ ์์ธกํฉ๋๋ค. ์ด ์ถ๋ ฅ์ ๋
ธ์ด์ฆ ์ ๊ฑฐ ๋ฃจํ์ ๋ชจ๋ธ์ ๋ํ ๋ค์ ์
๋ ฅ์ด ๋๋ฉฐ, `timesteps` ๋ฐฐ์ด์ ๋์ ๋๋ฌํ ๋๊น์ง ๋ฐ๋ณต๋ฉ๋๋ค.
|
||||
|
||||
```py
|
||||
>>> input = noise
|
||||
|
||||
>>> for t in scheduler.timesteps:
|
||||
... with torch.no_grad():
|
||||
... noisy_residual = model(input, t).sample
|
||||
... previous_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
|
||||
... input = previous_noisy_sample
|
||||
```
|
||||
|
||||
์ด๊ฒ์ด ์ ์ฒด ๋
ธ์ด์ฆ ์ ๊ฑฐ ํ๋ก์ธ์ค์ด๋ฉฐ, ๋์ผํ ํจํด์ ์ฌ์ฉํด ๋ชจ๋ diffusion ์์คํ
์ ์์ฑํ ์ ์์ต๋๋ค.
|
||||
|
||||
6. ๋ง์ง๋ง ๋จ๊ณ๋ ๋
ธ์ด์ฆ๊ฐ ์ ๊ฑฐ๋ ์ถ๋ ฅ์ ์ด๋ฏธ์ง๋ก ๋ณํํ๋ ๊ฒ์
๋๋ค:
|
||||
|
||||
```py
|
||||
>>> from PIL import Image
|
||||
>>> import numpy as np
|
||||
|
||||
>>> image = (input / 2 + 0.5).clamp(0, 1)
|
||||
>>> image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
|
||||
>>> image = Image.fromarray((image * 255).round().astype("uint8"))
|
||||
>>> image
|
||||
```
|
||||
|
||||
๋ค์ ์น์
์์๋ ์ฌ๋ฌ๋ถ์ ๊ธฐ์ ์ ์ํํด๋ณด๊ณ ์ข ๋ ๋ณต์กํ Stable Diffusion ํ์ดํ๋ผ์ธ์ ๋ถ์ํด ๋ณด๊ฒ ์ต๋๋ค. ๋ฐฉ๋ฒ์ ๊ฑฐ์ ๋์ผํฉ๋๋ค. ํ์ํ ๊ตฌ์ฑ์์๋ค์ ์ด๊ธฐํํ๊ณ timestep์๋ฅผ ์ค์ ํ์ฌ `timestep` ๋ฐฐ์ด์ ์์ฑํฉ๋๋ค. ๋
ธ์ด์ฆ ์ ๊ฑฐ ๋ฃจํ์์ `timestep` ๋ฐฐ์ด์ด ์ฌ์ฉ๋๋ฉฐ, ์ด ๋ฐฐ์ด์ ๊ฐ ์์์ ๋ํด ๋ชจ๋ธ์ ๋
ธ์ด์ฆ๊ฐ ์ ์ ์ด๋ฏธ์ง๋ฅผ ์์ธกํฉ๋๋ค. ๋
ธ์ด์ฆ ์ ๊ฑฐ ๋ฃจํ๋ `timestep`์ ๋ฐ๋ณตํ๊ณ ๊ฐ timestep์์ noise residual์ ์ถ๋ ฅํ๊ณ ์ค์ผ์ค๋ฌ๋ ์ด๋ฅผ ์ฌ์ฉํ์ฌ ์ด์ timestep์์ ๋
ธ์ด์ฆ๊ฐ ๋ํ ์ด๋ฏธ์ง๋ฅผ ์์ธกํฉ๋๋ค. ์ด ํ๋ก์ธ์ค๋ `timestep` ๋ฐฐ์ด์ ๋์ ๋๋ฌํ ๋๊น์ง ๋ฐ๋ณต๋ฉ๋๋ค.
|
||||
|
||||
ํ๋ฒ ์ฌ์ฉํด ๋ด
์๋ค!
|
||||
|
||||
## Stable Diffusion ํ์ดํ๋ผ์ธ ํด์ฒดํ๊ธฐ
|
||||
|
||||
Stable Diffusion ์ text-to-image *latent diffusion* ๋ชจ๋ธ์
๋๋ค. latent diffusion ๋ชจ๋ธ์ด๋ผ๊ณ ๋ถ๋ฆฌ๋ ์ด์ ๋ ์ค์ ํฝ์
๊ณต๊ฐ ๋์ ์ด๋ฏธ์ง์ ์ ์ฐจ์์ ํํ์ผ๋ก ์์
ํ๊ธฐ ๋๋ฌธ์ด๊ณ , ๋ฉ๋ชจ๋ฆฌ ํจ์จ์ด ๋ ๋์ต๋๋ค. ์ธ์ฝ๋๋ ์ด๋ฏธ์ง๋ฅผ ๋ ์์ ํํ์ผ๋ก ์์ถํ๊ณ , ๋์ฝ๋๋ ์์ถ๋ ํํ์ ๋ค์ ์ด๋ฏธ์ง๋ก ๋ณํํฉ๋๋ค. text-to-image ๋ชจ๋ธ์ ๊ฒฝ์ฐ ํ
์คํธ ์๋ฒ ๋ฉ์ ์์ฑํ๊ธฐ ์ํด tokenizer์ ์ธ์ฝ๋๊ฐ ํ์ํฉ๋๋ค. ์ด์ ์์ ์์ ์ด๋ฏธ UNet ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ๊ฐ ํ์ํ๋ค๋ ๊ฒ์ ์๊ณ ๊ณ์
จ์ ๊ฒ์
๋๋ค.
|
||||
|
||||
๋ณด์๋ค์ํผ, ์ด๊ฒ์ UNet ๋ชจ๋ธ๋ง ํฌํจ๋ DDPM ํ์ดํ๋ผ์ธ๋ณด๋ค ๋ ๋ณต์กํฉ๋๋ค. Stable Diffusion ๋ชจ๋ธ์๋ ์ธ ๊ฐ์ ๊ฐ๋ณ ์ฌ์ ํ์ต๋ ๋ชจ๋ธ์ด ์์ต๋๋ค.
|
||||
|
||||
<Tip>
|
||||
|
||||
๐ก VAE, UNet ๋ฐ ํ
์คํธ ์ธ์ฝ๋ ๋ชจ๋ธ์ ์๋๋ฐฉ์์ ๋ํ ์์ธํ ๋ด์ฉ์ [How does Stable Diffusion work?](https://huggingface.co/blog/stable_diffusion#how-does-stable-diffusion-work) ๋ธ๋ก๊ทธ๋ฅผ ์ฐธ์กฐํ์ธ์.
|
||||
|
||||
</Tip>
|
||||
|
||||
์ด์ Stable Diffusion ํ์ดํ๋ผ์ธ์ ํ์ํ ๊ตฌ์ฑ์์๋ค์ด ๋ฌด์์ธ์ง ์์์ผ๋, [`~ModelMixin.from_pretrained`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํด ๋ชจ๋ ๊ตฌ์ฑ์์๋ฅผ ๋ถ๋ฌ์ต๋๋ค. ์ฌ์ ํ์ต๋ ์ฒดํฌํฌ์ธํธ [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)์์ ์ฐพ์ ์ ์์ผ๋ฉฐ, ๊ฐ ๊ตฌ์ฑ์์๋ค์ ๋ณ๋์ ํ์ ํด๋์ ์ ์ฅ๋์ด ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
>>> from PIL import Image
|
||||
>>> import torch
|
||||
>>> from transformers import CLIPTextModel, CLIPTokenizer
|
||||
>>> from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler
|
||||
|
||||
>>> vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae")
|
||||
>>> tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="tokenizer")
|
||||
>>> text_encoder = CLIPTextModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="text_encoder")
|
||||
>>> unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet")
|
||||
```
|
||||
|
||||
๊ธฐ๋ณธ [`PNDMScheduler`] ๋์ , [`UniPCMultistepScheduler`]๋ก ๊ต์ฒดํ์ฌ ๋ค๋ฅธ ์ค์ผ์ค๋ฌ๋ฅผ ์ผ๋ง๋ ์ฝ๊ฒ ์ฐ๊ฒฐํ ์ ์๋์ง ํ์ธํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
>>> from diffusers import UniPCMultistepScheduler
|
||||
|
||||
>>> scheduler = UniPCMultistepScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler")
|
||||
```
|
||||
|
||||
์ถ๋ก ์๋๋ฅผ ๋์ด๋ ค๋ฉด ์ค์ผ์ค๋ฌ์ ๋ฌ๋ฆฌ ํ์ต ๊ฐ๋ฅํ ๊ฐ์ค์น๊ฐ ์์ผ๋ฏ๋ก ๋ชจ๋ธ์ GPU๋ก ์ฎ๊ธฐ์ธ์:
|
||||
|
||||
```py
|
||||
>>> torch_device = "cuda"
|
||||
>>> vae.to(torch_device)
|
||||
>>> text_encoder.to(torch_device)
|
||||
>>> unet.to(torch_device)
|
||||
```
|
||||
|
||||
### ํ
์คํธ ์๋ฒ ๋ฉ ์์ฑํ๊ธฐ
|
||||
|
||||
๋ค์ ๋จ๊ณ๋ ์๋ฒ ๋ฉ์ ์์ฑํ๊ธฐ ์ํด ํ
์คํธ๋ฅผ ํ ํฐํํ๋ ๊ฒ์
๋๋ค. ์ด ํ
์คํธ๋ UNet ๋ชจ๋ธ์์ condition์ผ๋ก ์ฌ์ฉ๋๊ณ ์
๋ ฅ ํ๋กฌํํธ์ ์ ์ฌํ ๋ฐฉํฅ์ผ๋ก diffusion ํ๋ก์ธ์ค๋ฅผ ์กฐ์ ํ๋ ๋ฐ ์ฌ์ฉ๋ฉ๋๋ค.
|
||||
|
||||
<Tip>
|
||||
|
||||
๐ก `guidance_scale` ๋งค๊ฐ๋ณ์๋ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ๋ ํ๋กฌํํธ์ ์ผ๋ง๋ ๋ง์ ๊ฐ์ค์น๋ฅผ ๋ถ์ฌํ ์ง ๊ฒฐ์ ํฉ๋๋ค.
|
||||
|
||||
</Tip>
|
||||
|
||||
๋ค๋ฅธ ํ๋กฌํํธ๋ฅผ ์์ฑํ๊ณ ์ถ๋ค๋ฉด ์ํ๋ ํ๋กฌํํธ๋ฅผ ์์ ๋กญ๊ฒ ์ ํํ์ธ์!
|
||||
|
||||
```py
|
||||
>>> prompt = ["a photograph of an astronaut riding a horse"]
|
||||
>>> height = 512 # Stable Diffusion์ ๊ธฐ๋ณธ ๋์ด
|
||||
>>> width = 512 # Stable Diffusion์ ๊ธฐ๋ณธ ๋๋น
|
||||
>>> num_inference_steps = 25 # ๋
ธ์ด์ฆ ์ ๊ฑฐ ์คํ
์
|
||||
>>> guidance_scale = 7.5 # classifier-free guidance๋ฅผ ์ํ scale
|
||||
>>> generator = torch.manual_seed(0) # ์ด๊ธฐ ์ ์ฌ ๋
ธ์ด์ฆ๋ฅผ ์์ฑํ๋ seed generator
|
||||
>>> batch_size = len(prompt)
|
||||
```
|
||||
|
||||
ํ
์คํธ๋ฅผ ํ ํฐํํ๊ณ ํ๋กฌํํธ์์ ์๋ฒ ๋ฉ์ ์์ฑํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
>>> text_input = tokenizer(
|
||||
... prompt, padding="max_length", max_length=tokenizer.model_max_length, truncation=True, return_tensors="pt"
|
||||
... )
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... text_embeddings = text_encoder(text_input.input_ids.to(torch_device))[0]
|
||||
```
|
||||
|
||||
๋ํ ํจ๋ฉ ํ ํฐ์ ์๋ฒ ๋ฉ์ธ *unconditional ํ
์คํธ ์๋ฒ ๋ฉ*์ ์์ฑํด์ผ ํฉ๋๋ค. ์ด ์๋ฒ ๋ฉ์ ์กฐ๊ฑด๋ถ `text_embeddings`๊ณผ ๋์ผํ shape(`batch_size` ๊ทธ๋ฆฌ๊ณ `seq_length`)์ ๊ฐ์ ธ์ผ ํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
>>> max_length = text_input.input_ids.shape[-1]
|
||||
>>> uncond_input = tokenizer([""] * batch_size, padding="max_length", max_length=max_length, return_tensors="pt")
|
||||
>>> uncond_embeddings = text_encoder(uncond_input.input_ids.to(torch_device))[0]
|
||||
```
|
||||
|
||||
๋๋ฒ์ forward pass๋ฅผ ํผํ๊ธฐ ์ํด conditional ์๋ฒ ๋ฉ๊ณผ unconditional ์๋ฒ ๋ฉ์ ๋ฐฐ์น(batch)๋ก ์ฐ๊ฒฐํ๊ฒ ์ต๋๋ค:
|
||||
|
||||
```py
|
||||
>>> text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
|
||||
```
|
||||
|
||||
### ๋๋ค ๋
ธ์ด์ฆ ์์ฑ
|
||||
|
||||
๊ทธ๋ค์ diffusion ํ๋ก์ธ์ค์ ์์์ ์ผ๋ก ์ด๊ธฐ ๋๋ค ๋
ธ์ด์ฆ๋ฅผ ์์ฑํฉ๋๋ค. ์ด๊ฒ์ด ์ด๋ฏธ์ง์ ์ ์ฌ์ ํํ์ด๋ฉฐ ์ ์ฐจ์ ์ผ๋ก ๋
ธ์ด์ฆ๊ฐ ์ ๊ฑฐ๋ฉ๋๋ค. ์ด ์์ ์์ `latent` ์ด๋ฏธ์ง๋ ์ต์ข
์ด๋ฏธ์ง ํฌ๊ธฐ๋ณด๋ค ์์ง๋ง ๋์ค์ ๋ชจ๋ธ์ด ์ด๋ฅผ 512x512 ์ด๋ฏธ์ง ํฌ๊ธฐ๋ก ๋ณํํ๋ฏ๋ก ๊ด์ฐฎ์ต๋๋ค.
|
||||
|
||||
<Tip>
|
||||
|
||||
๐ก `vae` ๋ชจ๋ธ์๋ 3๊ฐ์ ๋ค์ด ์ํ๋ง ๋ ์ด์ด๊ฐ ์๊ธฐ ๋๋ฌธ์ ๋์ด์ ๋๋น๊ฐ 8๋ก ๋๋ฉ๋๋ค. ๋ค์์ ์คํํ์ฌ ํ์ธํ ์ ์์ต๋๋ค:
|
||||
|
||||
```py
|
||||
2 ** (len(vae.config.block_out_channels) - 1) == 8
|
||||
```
|
||||
|
||||
</Tip>
|
||||
|
||||
```py
|
||||
>>> latents = torch.randn(
|
||||
... (batch_size, unet.in_channels, height // 8, width // 8),
|
||||
... generator=generator,
|
||||
... )
|
||||
>>> latents = latents.to(torch_device)
|
||||
```
|
||||
|
||||
### ์ด๋ฏธ์ง ๋
ธ์ด์ฆ ์ ๊ฑฐ
|
||||
|
||||
๋จผ์ [`UniPCMultistepScheduler`]์ ๊ฐ์ ํฅ์๋ ์ค์ผ์ค๋ฌ์ ํ์ํ ๋
ธ์ด์ฆ ์ค์ผ์ผ ๊ฐ์ธ ์ด๊ธฐ ๋
ธ์ด์ฆ ๋ถํฌ *sigma* ๋ก ์
๋ ฅ์ ์ค์ผ์ผ๋ง ํ๋ ๊ฒ๋ถํฐ ์์ํฉ๋๋ค:
|
||||
|
||||
```py
|
||||
>>> latents = latents * scheduler.init_noise_sigma
|
||||
```
|
||||
|
||||
๋ง์ง๋ง ๋จ๊ณ๋ `latent`์ ์์ํ ๋
ธ์ด์ฆ๋ฅผ ์ ์ง์ ์ผ๋ก ํ๋กฌํํธ์ ์ค๋ช
๋ ์ด๋ฏธ์ง๋ก ๋ณํํ๋ ๋
ธ์ด์ฆ ์ ๊ฑฐ ๋ฃจํ๋ฅผ ์์ฑํ๋ ๊ฒ์
๋๋ค. ๋
ธ์ด์ฆ ์ ๊ฑฐ ๋ฃจํ๋ ์ธ ๊ฐ์ง ์์
์ ์ํํด์ผ ํ๋ค๋ ์ ์ ๊ธฐ์ตํ์ธ์:
|
||||
|
||||
1. ๋
ธ์ด์ฆ ์ ๊ฑฐ ์ค์ ์ฌ์ฉํ ์ค์ผ์ค๋ฌ์ timesteps๋ฅผ ์ค์ ํฉ๋๋ค.
|
||||
2. timestep์ ๋ฐ๋ผ ๋ฐ๋ณตํฉ๋๋ค.
|
||||
3. ๊ฐ timestep์์ UNet ๋ชจ๋ธ์ ํธ์ถํ์ฌ noise residual์ ์์ธกํ๊ณ ์ค์ผ์ค๋ฌ์ ์ ๋ฌํ์ฌ ์ด์ ๋
ธ์ด์ฆ ์ํ์ ๊ณ์ฐํฉ๋๋ค.
|
||||
|
||||
```py
|
||||
>>> from tqdm.auto import tqdm
|
||||
|
||||
>>> scheduler.set_timesteps(num_inference_steps)
|
||||
|
||||
>>> for t in tqdm(scheduler.timesteps):
|
||||
... # classifier-free guidance๋ฅผ ์ํํ๋ ๊ฒฝ์ฐ ๋๋ฒ์ forward pass๋ฅผ ์ํํ์ง ์๋๋ก latent๋ฅผ ํ์ฅ.
|
||||
... latent_model_input = torch.cat([latents] * 2)
|
||||
|
||||
... latent_model_input = scheduler.scale_model_input(latent_model_input, timestep=t)
|
||||
|
||||
... # noise residual ์์ธก
|
||||
... with torch.no_grad():
|
||||
... noise_pred = unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
|
||||
|
||||
... # guidance ์ํ
|
||||
... noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
|
||||
... noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
|
||||
|
||||
... # ์ด์ ๋
ธ์ด์ฆ ์ํ์ ๊ณ์ฐ x_t -> x_t-1
|
||||
... latents = scheduler.step(noise_pred, t, latents).prev_sample
|
||||
```
|
||||
|
||||
### ์ด๋ฏธ์ง ๋์ฝ๋ฉ
|
||||
|
||||
๋ง์ง๋ง ๋จ๊ณ๋ `vae`๋ฅผ ์ด์ฉํ์ฌ ์ ์ฌ ํํ์ ์ด๋ฏธ์ง๋ก ๋์ฝ๋ฉํ๊ณ `sample`๊ณผ ํจ๊ป ๋์ฝ๋ฉ๋ ์ถ๋ ฅ์ ์ป๋ ๊ฒ์
๋๋ค:
|
||||
|
||||
```py
|
||||
# latent๋ฅผ ์ค์ผ์ผ๋งํ๊ณ vae๋ก ์ด๋ฏธ์ง ๋์ฝ๋ฉ
|
||||
latents = 1 / 0.18215 * latents
|
||||
with torch.no_grad():
|
||||
image = vae.decode(latents).sample
|
||||
```
|
||||
|
||||
๋ง์ง๋ง์ผ๋ก ์ด๋ฏธ์ง๋ฅผ `PIL.Image`๋ก ๋ณํํ๋ฉด ์์ฑ๋ ์ด๋ฏธ์ง๋ฅผ ํ์ธํ ์ ์์ต๋๋ค!
|
||||
|
||||
```py
|
||||
>>> image = (image / 2 + 0.5).clamp(0, 1)
|
||||
>>> image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
|
||||
>>> images = (image * 255).round().astype("uint8")
|
||||
>>> pil_images = [Image.fromarray(image) for image in images]
|
||||
>>> pil_images[0]
|
||||
```
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://huggingface.co/blog/assets/98_stable_diffusion/stable_diffusion_k_lms.png"/>
|
||||
</div>
|
||||
|
||||
## ๋ค์ ๋จ๊ณ
|
||||
|
||||
๊ธฐ๋ณธ ํ์ดํ๋ผ์ธ๋ถํฐ ๋ณต์กํ ํ์ดํ๋ผ์ธ๊น์ง, ์์ ๋ง์ diffusion ์์คํ
์ ์์ฑํ๋ ๋ฐ ํ์ํ ๊ฒ์ ๋
ธ์ด์ฆ ์ ๊ฑฐ ๋ฃจํ๋ฟ์ด๋ผ๋ ๊ฒ์ ์ ์ ์์์ต๋๋ค. ์ด ๋ฃจํ๋ ์ค์ผ์ค๋ฌ์ timesteps๋ฅผ ์ค์ ํ๊ณ , ์ด๋ฅผ ๋ฐ๋ณตํ๋ฉฐ, UNet ๋ชจ๋ธ์ ํธ์ถํ์ฌ noise residual์ ์์ธกํ๊ณ ์ค์ผ์ค๋ฌ์ ์ ๋ฌํ์ฌ ์ด์ ๋
ธ์ด์ฆ ์ํ์ ๊ณ์ฐํ๋ ๊ณผ์ ์ ๋ฒ๊ฐ์ ๊ฐ๋ฉฐ ์ํํด์ผ ํฉ๋๋ค.
|
||||
|
||||
์ด๊ฒ์ด ๋ฐ๋ก ๐งจ Diffusers๊ฐ ์ค๊ณ๋ ๋ชฉ์ ์
๋๋ค: ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ๋ฅผ ์ฌ์ฉํด ์์ ๋ง์ diffusion ์์คํ
์ ์ง๊ด์ ์ด๊ณ ์ฝ๊ฒ ์์ฑํ ์ ์๋๋ก ํ๊ธฐ ์ํด์์
๋๋ค.
|
||||
|
||||
๋ค์ ๋จ๊ณ๋ฅผ ์์ ๋กญ๊ฒ ์งํํ์ธ์:
|
||||
|
||||
* ๐งจ Diffusers์ [ํ์ดํ๋ผ์ธ ๊ตฌ์ถ ๋ฐ ๊ธฐ์ฌ](using-diffusers/#contribute_pipeline)ํ๋ ๋ฐฉ๋ฒ์ ์์๋ณด์ธ์. ์ฌ๋ฌ๋ถ์ด ์ด๋ค ์์ด๋์ด๋ฅผ ๋ด๋์์ง ๊ธฐ๋๋ฉ๋๋ค!
|
||||
* ๋ผ์ด๋ธ๋ฌ๋ฆฌ์์ [๊ธฐ๋ณธ ํ์ดํ๋ผ์ธ](./api/pipelines/overview)์ ์ดํด๋ณด๊ณ , ๋ชจ๋ธ๊ณผ ์ค์ผ์ค๋ฌ๋ฅผ ๋ณ๋๋ก ์ฌ์ฉํ์ฌ ํ์ดํ๋ผ์ธ์ ์ฒ์๋ถํฐ ํด์ฒดํ๊ณ ๋น๋ํ ์ ์๋์ง ํ์ธํด ๋ณด์ธ์.
|
||||
Reference in New Issue
Block a user