1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

[Docs] Korean translation update (#4022)

* feat) optimization kr translation

* fix) typo, italic setting

* feat) dreambooth, text2image kr

* feat) lora kr

* fix) LoRA

* fix) fp16 fix

* fix) doc-builder style

* fix) fp16 ์ผ๋ถ€ ๋‹จ์–ด ์ˆ˜์ •

* fix) fp16 style fix

* fix) opt, training docs update

* merge conflict

* Fix community pipelines (#3266)

* Allow disabling torch 2_0 attention (#3273)

* Allow disabling torch 2_0 attention

* make style

* Update src/diffusers/models/attention.py

* Release: v0.16.1

* feat) toctree update

* feat) toctree update

* Fix custom releases (#3708)

* Fix custom releases

* make style

* Fix loading if unexpected keys are present (#3720)

* Fix loading

* make style

* Release: v0.17.0

* opt_overview

* commit

* Create pipeline_overview.mdx

* unconditional_image_generatoin_1stDraft

* โœจ Add translation for write_own_pipeline.mdx

* conditional-์ง์—ญ, ์–ธ์ปจ๋””์…”๋„

* unconditional_image_generation first draft

* reviese

* Update pipeline_overview.mdx

* revise-2

* โ™ป๏ธ translation fixed for write_own_pipeline.mdx

* complete translate basic_training.mdx

* other-formats.mdx ๋ฒˆ์—ญ ์™„๋ฃŒ

* fix tutorials/basic_training.mdx

* other-formats ์ˆ˜์ •

* inpaint ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ

* depth2img translation

* translate training/adapt-a-model.mdx

* revised_all

* feedback taken

* using_safetensors.mdx_first_draft

* custom_pipeline_examples.mdx_first_draft

* img2img ํ•œ๊ธ€๋ฒˆ์—ญ ์™„๋ฃŒ

* tutorial_overview edit

* reusing_seeds

* torch2.0

* translate complete

* fix) ์šฉ์–ด ํ†ต์ผ ๊ทœ์•ฝ ๋ฐ˜์˜

* [fix] ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ˜์˜ํ•ด์„œ ๋ฒˆ์—ญ ๋ณด์ •

* ์˜คํƒˆ์ž ์ •์ • + ์ปจ๋ฒค์…˜ ์œ„๋ฐฐ๋œ ๋ถ€๋ถ„ ์ •์ •

* typo, style fix

* toctree update

* copyright fix

* toctree fix

* Update _toctree.yml

---------

Co-authored-by: Chanran Kim <seriousran@gmail.com>
Co-authored-by: apolinรกrio <joaopaulo.passos@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lee, Hongkyu <75282888+howsmyanimeprofilepicture@users.noreply.github.com>
Co-authored-by: hyeminan <adios9709@gmail.com>
Co-authored-by: movie5 <oyh5800@naver.com>
Co-authored-by: idra79haza <idra79haza@github.com>
Co-authored-by: Jihwan Kim <cuchoco@naver.com>
Co-authored-by: jungwoo <boonkoonheart@gmail.com>
Co-authored-by: jjuun0 <jh061993@gmail.com>
Co-authored-by: szjung-test <93111772+szjung-test@users.noreply.github.com>
Co-authored-by: idra79haza <37795618+idra79haza@users.noreply.github.com>
Co-authored-by: howsmyanimeprofilepicture <howsmyanimeprofilepicture@gmail.com>
Co-authored-by: hoswmyanimeprofilepicture <hoswmyanimeprofilepicture@gmail.com>
This commit is contained in:
Seongsu Park
2023-07-18 10:28:08 +09:00
committed by GitHub
parent a0597f33ac
commit 8b18cd8e7f
25 changed files with 3470 additions and 9 deletions

View File

@@ -8,14 +8,69 @@
- local: installation
title: "์„ค์น˜"
title: "์‹œ์ž‘ํ•˜๊ธฐ"
- sections:
- local: tutorials/tutorial_overview
title: ๊ฐœ์š”
- local: using-diffusers/write_own_pipeline
title: ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ ์ดํ•ดํ•˜๊ธฐ
- local: tutorials/basic_training
title: Diffusion ๋ชจ๋ธ ํ•™์Šตํ•˜๊ธฐ
title: Tutorials
- sections:
- sections:
- local: in_translation
title: ๊ฐœ์š”
- local: in_translation
- local: using-diffusers/loading
title: ํŒŒ์ดํ”„๋ผ์ธ, ๋ชจ๋ธ, ์Šค์ผ€์ค„๋Ÿฌ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
- local: using-diffusers/schedulers
title: ๋‹ค๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ๋“ค์„ ๊ฐ€์ ธ์˜ค๊ณ  ๋น„๊ตํ•˜๊ธฐ
- local: using-diffusers/custom_pipeline_overview
title: ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
- local: using-diffusers/using_safetensors
title: ์„ธ์ดํ”„ํ…์„œ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
- local: using-diffusers/other-formats
title: ๋‹ค๋ฅธ ํ˜•์‹์˜ Stable Diffusion ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
title: ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ํ—ˆ๋ธŒ
- sections:
- local: using-diffusers/pipeline_overview
title: ๊ฐœ์š”
- local: using-diffusers/unconditional_image_generation
title: Unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ
- local: in_translation
title: Text-to-image ์ƒ์„ฑ
- local: using-diffusers/img2img
title: Text-guided image-to-image
- local: using-diffusers/inpaint
title: Text-guided ์ด๋ฏธ์ง€ ์ธํŽ˜์ธํŒ…
- local: using-diffusers/depth2img
title: Text-guided depth-to-image
- local: in_translation
title: Textual inversion
- local: in_translation
title: ์—ฌ๋Ÿฌ GPU๋ฅผ ์‚ฌ์šฉํ•œ ๋ถ„์‚ฐ ์ถ”๋ก 
- local: using-diffusers/reusing_seeds
title: Deterministic ์ƒ์„ฑ์œผ๋กœ ์ด๋ฏธ์ง€ ํ€„๋ฆฌํ‹ฐ ๋†’์ด๊ธฐ
- local: in_translation
title: ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ํŒŒ์ดํ”„๋ผ์ธ ์ƒ์„ฑํ•˜๊ธฐ
- local: using-diffusers/custom_pipeline_examples
title: ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ๋“ค
- local: in_translation
title: ์ปค๋ฎคํ‹ฐ๋‹ˆ ํŒŒ์ดํ”„๋ผ์ธ์— ๊ธฐ์—ฌํ•˜๋Š” ๋ฐฉ๋ฒ•
- local: in_translation
title: JAX/Flax์—์„œ์˜ Stable Diffusion
- local: in_translation
title: Weighting Prompts
title: ์ถ”๋ก ์„ ์œ„ํ•œ ํŒŒ์ดํ”„๋ผ์ธ
- sections:
- local: training/overview
title: ๊ฐœ์š”
- local: in_translation
title: ํ•™์Šต์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑํ•˜๊ธฐ
- local: training/adapt_a_model
title: ์ƒˆ๋กœ์šด ํƒœ์Šคํฌ์— ๋ชจ๋ธ ์ ์šฉํ•˜๊ธฐ
- local: training/unconditional_training
title: Unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ
- local: training/text_inversion
title: Textual Inversion
- local: training/dreambooth
title: DreamBooth
@@ -27,13 +82,16 @@
title: ControlNet
- local: in_translation
title: InstructPix2Pix ํ•™์Šต
title: ํ•™์Šต
- local: in_translation
title: Custom Diffusion
title: Training
title: Diffusers ์‚ฌ์šฉํ•˜๊ธฐ
- sections:
- local: in_translation
- local: optimization/opt_overview
title: ๊ฐœ์š”
- local: optimization/fp16
title: ๋ฉ”๋ชจ๋ฆฌ์™€ ์†๋„
- local: in_translation
- local: optimization/torch2.0
title: Torch2.0 ์ง€์›
- local: optimization/xformers
title: xFormers
@@ -41,8 +99,12 @@
title: ONNX
- local: optimization/open_vino
title: OpenVINO
- local: in_translation
title: Core ML
- local: optimization/mps
title: MPS
- local: optimization/habana
title: Habana Gaudi
- local: in_translation
title: Token Merging
title: ์ตœ์ ํ™”/ํŠน์ˆ˜ ํ•˜๋“œ์›จ์–ด

View File

@@ -59,7 +59,7 @@ torch.backends.cuda.matmul.allow_tf32 = True
## ๋ฐ˜์ •๋ฐ€๋„ ๊ฐ€์ค‘์น˜
๋” ๋งŽ์€ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ์•ฝํ•˜๊ณ  ๋” ๋น ๋ฅธ ์†๋„๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฐ˜์ •๋ฐ€๋„(half precision)๋กœ ์ง์ ‘ ๋กœ๋“œํ•˜๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋” ๋งŽ์€ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ์•ฝํ•˜๊ณ  ๋” ๋น ๋ฅธ ์†๋„๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฐ˜์ •๋ฐ€๋„(half precision)๋กœ ์ง์ ‘ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์—ฌ๊ธฐ์—๋Š” `fp16`์ด๋ผ๋Š” ๋ธŒ๋žœ์น˜์— ์ €์žฅ๋œ float16 ๋ฒ„์ „์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ , ๊ทธ ๋•Œ `float16` ์œ ํ˜•์„ ์‚ฌ์šฉํ•˜๋„๋ก PyTorch์— ์ง€์‹œํ•˜๋Š” ์ž‘์—…์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
```Python

View File

@@ -0,0 +1,17 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ๊ฐœ์š”
๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์€ ์ถœ๋ ฅ์—์„œ ์ ์€ ์ถœ๋ ฅ์œผ๋กœ ๋งŒ๋“œ๋Š” ๊ณผ์ •์œผ๋กœ ๊ณ ํ’ˆ์งˆ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ๋งŒ๋“œ๋Š” ๊ฐ๊ฐ์˜ ๋ฐ˜๋ณต๋˜๋Š” ์Šคํ…์€ ๋งŽ์€ ๊ณ„์‚ฐ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๐Ÿงจ Diffuser์˜ ๋ชฉํ‘œ ์ค‘ ํ•˜๋‚˜๋Š” ๋ชจ๋“  ์‚ฌ๋žŒ์ด ์ด ๊ธฐ์ˆ ์„ ๋„๋ฆฌ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ, ์—ฌ๊ธฐ์—๋Š” ์†Œ๋น„์ž ๋ฐ ํŠน์ˆ˜ ํ•˜๋“œ์›จ์–ด์—์„œ ๋น ๋ฅธ ์ถ”๋ก ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
์ด ์„น์…˜์—์„œ๋Š” ์ถ”๋ก  ์†๋„๋ฅผ ์ตœ์ ํ™”ํ•˜๊ณ  ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ ๋ฐ˜์ •๋ฐ€(half-precision) ๊ฐ€์ค‘์น˜ ๋ฐ sliced attention๊ณผ ๊ฐ™์€ ํŒ๊ณผ ์š”๋ น์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๋˜ํ•œ [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) ๋˜๋Š” [ONNX Runtime](https://onnxruntime.ai/docs/)์„ ์‚ฌ์šฉํ•˜์—ฌ PyTorch ์ฝ”๋“œ์˜ ์†๋„๋ฅผ ๋†’์ด๊ณ , [xFormers](https://facebookresearch.github.io/xformers/)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ memory-efficient attention์„ ํ™œ์„ฑํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Apple Silicon, Intel ๋˜๋Š” Habana ํ”„๋กœ์„ธ์„œ์™€ ๊ฐ™์€ ํŠน์ • ํ•˜๋“œ์›จ์–ด์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ€์ด๋“œ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

View File

@@ -0,0 +1,445 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Diffusers์—์„œ์˜ PyTorch 2.0 ๊ฐ€์†ํ™” ์ง€์›
`0.13.0` ๋ฒ„์ „๋ถ€ํ„ฐ Diffusers๋Š” [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/)์—์„œ์˜ ์ตœ์‹  ์ตœ์ ํ™”๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์Œ์„ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
1. momory-efficient attention์„ ์‚ฌ์šฉํ•œ ๊ฐ€์†ํ™”๋œ ํŠธ๋žœ์Šคํฌ๋จธ ์ง€์› - `xformers`๊ฐ™์€ ์ถ”๊ฐ€์ ์ธ dependencies ํ•„์š” ์—†์Œ
2. ์ถ”๊ฐ€ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ๊ฐœ๋ณ„ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ปดํŒŒ์ผ ๊ธฐ๋Šฅ [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) ์ง€์›
## ์„ค์น˜
๊ฐ€์†ํ™”๋œ ์–ดํ…์…˜ ๊ตฌํ˜„๊ณผ ๋ฐ `torch.compile()`์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด, pip์—์„œ ์ตœ์‹  ๋ฒ„์ „์˜ PyTorch 2.0์„ ์„ค์น˜๋˜์–ด ์žˆ๊ณ  diffusers 0.13.0. ๋ฒ„์ „ ์ด์ƒ์ธ์ง€ ํ™•์ธํ•˜์„ธ์š”. ์•„๋ž˜ ์„ค๋ช…๋œ ๋ฐ”์™€ ๊ฐ™์ด, PyTorch 2.0์ด ํ™œ์„ฑํ™”๋˜์–ด ์žˆ์„ ๋•Œ diffusers๋Š” ์ตœ์ ํ™”๋œ ์–ดํ…์…˜ ํ”„๋กœ์„ธ์„œ([`AttnProcessor2_0`](https://github.com/huggingface/diffusers/blob/1a5797c6d4491a879ea5285c4efc377664e0332d/src/diffusers/models/attention_processor.py#L798))๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
```bash
pip install --upgrade torch diffusers
```
## ๊ฐ€์†ํ™”๋œ ํŠธ๋žœ์Šคํฌ๋จธ์™€ `torch.compile` ์‚ฌ์šฉํ•˜๊ธฐ.
1. **๊ฐ€์†ํ™”๋œ ํŠธ๋žœ์Šคํฌ๋จธ ๊ตฌํ˜„**
PyTorch 2.0์—๋Š” [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ตœ์ ํ™”๋œ memory-efficient attention์˜ ๊ตฌํ˜„์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ž…๋ ฅ ๋ฐ GPU ์œ ํ˜•์— ๋”ฐ๋ผ ์—ฌ๋Ÿฌ ์ตœ์ ํ™”๋ฅผ ์ž๋™์œผ๋กœ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” [xFormers](https://github.com/facebookresearch/xformers)์˜ `memory_efficient_attention`๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ ๊ธฐ๋ณธ์ ์œผ๋กœ PyTorch์— ๋‚ด์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋Ÿฌํ•œ ์ตœ์ ํ™”๋Š” PyTorch 2.0์ด ์„ค์น˜๋˜์–ด ์žˆ๊ณ  `torch.nn.functional.scaled_dot_product_attention`์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ Diffusers์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด `torch 2.0`์„ ์„ค์น˜ํ•˜๊ณ  ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด:
```Python
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
์ด๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด(ํ•„์ˆ˜๋Š” ์•„๋‹˜) ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```diff
import torch
from diffusers import DiffusionPipeline
+ from diffusers.models.attention_processor import AttnProcessor2_0
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
+ pipe.unet.set_attn_processor(AttnProcessor2_0())
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
์ด ์‹คํ–‰ ๊ณผ์ •์€ `xFormers`๋งŒํผ ๋น ๋ฅด๊ณ  ๋ฉ”๋ชจ๋ฆฌ์ ์œผ๋กœ ํšจ์œจ์ ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ [๋ฒค์น˜๋งˆํฌ](#benchmark)์—์„œ ํ™•์ธํ•˜์„ธ์š”.
ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ณด๋‹ค deterministic์œผ๋กœ ๋งŒ๋“ค๊ฑฐ๋‚˜ ํŒŒ์ธ ํŠœ๋‹๋œ ๋ชจ๋ธ์„ [Core ML](https://huggingface.co/docs/diffusers/v0.16.0/en/optimization/coreml#how-to-run-stable-diffusion-with-core-ml)๊ณผ ๊ฐ™์€ ๋‹ค๋ฅธ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐ”๋‹๋ผ ์–ดํ…์…˜ ํ”„๋กœ์„ธ์„œ ([`AttnProcessor`](https://github.com/huggingface/diffusers/blob/1a5797c6d4491a879ea5285c4efc377664e0332d/src/diffusers/models/attention_processor.py#L402))๋กœ ๋˜๋Œ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜ ์–ดํ…์…˜ ํ”„๋กœ์„ธ์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด [`~diffusers.UNet2DConditionModel.set_default_attn_processor`] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```Python
import torch
from diffusers import DiffusionPipeline
from diffusers.models.attention_processor import AttnProcessor
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipe.unet.set_default_attn_processor()
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
2. **torch.compile**
์ถ”๊ฐ€์ ์ธ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด ์ƒˆ๋กœ์šด `torch.compile` ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋ผ์ธ์˜ UNet์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ณ„์‚ฐ ๋น„์šฉ์ด ๊ฐ€์žฅ ํฌ๊ธฐ ๋•Œ๋ฌธ์— ๋‚˜๋จธ์ง€ ํ•˜์œ„ ๋ชจ๋ธ(ํ…์ŠคํŠธ ์ธ์ฝ”๋”์™€ VAE)์€ ๊ทธ๋Œ€๋กœ ๋‘๊ณ  `unet`์„ `torch.compile`๋กœ ๋ž˜ํ•‘ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ๊ณผ ๋‹ค๋ฅธ ์˜ต์…˜์€ [torch ์ปดํŒŒ์ผ ๋ฌธ์„œ](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
```python
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
images = pipe(prompt, num_inference_steps=steps, num_images_per_prompt=batch_size).images
```
GPU ์œ ํ˜•์— ๋”ฐ๋ผ `compile()`์€ ๊ฐ€์†ํ™”๋œ ํŠธ๋žœ์Šคํฌ๋จธ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด **5% - 300%**์˜ _์ถ”๊ฐ€ ์„ฑ๋Šฅ ํ–ฅ์ƒ_์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ปดํŒŒ์ผ์€ Ampere(A100, 3090), Ada(4090) ๋ฐ Hopper(H100)์™€ ๊ฐ™์€ ์ตœ์‹  GPU ์•„ํ‚คํ…์ฒ˜์—์„œ ๋” ๋งŽ์€ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Œ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.
์ปดํŒŒ์ผ์€ ์™„๋ฃŒํ•˜๋Š” ๋ฐ ์•ฝ๊ฐ„์˜ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๋ฏ€๋กœ, ํŒŒ์ดํ”„๋ผ์ธ์„ ํ•œ ๋ฒˆ ์ค€๋น„ํ•œ ๋‹ค์Œ ๋™์ผํ•œ ์œ ํ˜•์˜ ์ถ”๋ก  ์ž‘์—…์„ ์—ฌ๋Ÿฌ ๋ฒˆ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•˜๋Š” ์ƒํ™ฉ์— ๊ฐ€์žฅ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์ด๋ฏธ์ง€ ํฌ๊ธฐ์—์„œ ์ปดํŒŒ์ผ๋œ ํŒŒ์ดํ”„๋ผ์ธ์„ ํ˜ธ์ถœํ•˜๋ฉด ์‹œ๊ฐ„์  ๋น„์šฉ์ด ๋งŽ์ด ๋“ค ์ˆ˜ ์žˆ๋Š” ์ปดํŒŒ์ผ ์ž‘์—…์ด ๋‹ค์‹œ ํŠธ๋ฆฌ๊ฑฐ๋ฉ๋‹ˆ๋‹ค.
## ๋ฒค์น˜๋งˆํฌ
PyTorch 2.0์˜ ํšจ์œจ์ ์ธ ์–ดํ…์…˜ ๊ตฌํ˜„๊ณผ `torch.compile`์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” 5๊ฐœ์˜ ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•ด ๋‹ค์–‘ํ•œ GPU์™€ ๋ฐฐ์น˜ ํฌ๊ธฐ์— ๊ฑธ์ณ ํฌ๊ด„์ ์ธ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” [`torch.compile()`์ด ์ตœ์ ์œผ๋กœ ํ™œ์šฉ๋˜๋„๋ก ํ•˜๋Š”](https://github.com/huggingface/diffusers/pull/3313) `diffusers 0.17.0.dev0`์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
### ๋ฒค์น˜๋งˆํ‚น ์ฝ”๋“œ
#### Stable Diffusion text-to-image
```python
from diffusers import DiffusionPipeline
import torch
path = "runwayml/stable-diffusion-v1-5"
run_compile = True # Set True / False
pipe = DiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
if run_compile:
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
prompt = "ghibli style, a fantasy landscape with castles"
for _ in range(3):
images = pipe(prompt=prompt).images
```
#### Stable Diffusion image-to-image
```python
from diffusers import StableDiffusionImg2ImgPipeline
import requests
import torch
from PIL import Image
from io import BytesIO
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
path = "runwayml/stable-diffusion-v1-5"
run_compile = True # Set True / False
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(path, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
if run_compile:
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
prompt = "ghibli style, a fantasy landscape with castles"
for _ in range(3):
image = pipe(prompt=prompt, image=init_image).images[0]
```
#### Stable Diffusion - inpainting
```python
from diffusers import StableDiffusionInpaintPipeline
import requests
import torch
from PIL import Image
from io import BytesIO
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
def download_image(url):
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
path = "runwayml/stable-diffusion-inpainting"
run_compile = True # Set True / False
pipe = StableDiffusionInpaintPipeline.from_pretrained(path, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
if run_compile:
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
prompt = "ghibli style, a fantasy landscape with castles"
for _ in range(3):
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
```
#### ControlNet
```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import requests
import torch
from PIL import Image
from io import BytesIO
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
path = "runwayml/stable-diffusion-v1-5"
run_compile = True # Set True / False
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
path, controlnet=controlnet, torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
pipe.controlnet.to(memory_format=torch.channels_last)
if run_compile:
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
pipe.controlnet = torch.compile(pipe.controlnet, mode="reduce-overhead", fullgraph=True)
prompt = "ghibli style, a fantasy landscape with castles"
for _ in range(3):
image = pipe(prompt=prompt, image=init_image).images[0]
```
#### IF text-to-image + upscaling
```python
from diffusers import DiffusionPipeline
import torch
run_compile = True # Set True / False
pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16)
pipe.to("cuda")
pipe_2 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-II-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16)
pipe_2.to("cuda")
pipe_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16)
pipe_3.to("cuda")
pipe.unet.to(memory_format=torch.channels_last)
pipe_2.unet.to(memory_format=torch.channels_last)
pipe_3.unet.to(memory_format=torch.channels_last)
if run_compile:
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
pipe_2.unet = torch.compile(pipe_2.unet, mode="reduce-overhead", fullgraph=True)
pipe_3.unet = torch.compile(pipe_3.unet, mode="reduce-overhead", fullgraph=True)
prompt = "the blue hulk"
prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)
neg_prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)
for _ in range(3):
image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
image_2 = pipe_2(image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
image_3 = pipe_3(prompt=prompt, image=image, noise_level=100).images
```
PyTorch 2.0 ๋ฐ `torch.compile()`๋กœ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅํ•œ ์†๋„ ํ–ฅ์ƒ์— ๋Œ€ํ•ด, [Stable Diffusion text-to-image pipeline](StableDiffusionPipeline)์— ๋Œ€ํ•œ ์ƒ๋Œ€์ ์ธ ์†๋„ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ฐจํŠธ๋ฅผ 5๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ GPU ์ œํ’ˆ๊ตฐ(๋ฐฐ์น˜ ํฌ๊ธฐ 4)์— ๋Œ€ํ•ด ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค:
![t2i_speedup](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/pt2_benchmarks/t2i_speedup.png)
To give you an even better idea of how this speed-up holds for the other pipelines presented above, consider the following
plot that shows the benchmarking numbers from an A100 across three different batch sizes
(with PyTorch 2.0 nightly and `torch.compile()`):
์ด ์†๋„ ํ–ฅ์ƒ์ด ์œ„์— ์ œ์‹œ๋œ ๋‹ค๋ฅธ ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•ด์„œ๋„ ์–ด๋–ป๊ฒŒ ์œ ์ง€๋˜๋Š”์ง€ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด, ์„ธ ๊ฐ€์ง€์˜ ๋‹ค๋ฅธ ๋ฐฐ์น˜ ํฌ๊ธฐ์— ๊ฑธ์ณ A100์˜ ๋ฒค์น˜๋งˆํ‚น(PyTorch 2.0 nightly ๋ฐ `torch.compile() ์‚ฌ์šฉ) ์ˆ˜์น˜๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ์ฐจํŠธ๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค:
![a100_numbers](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/pt2_benchmarks/a100_numbers.png)
_(์œ„ ์ฐจํŠธ์˜ ๋ฒค์น˜๋งˆํฌ ๋ฉ”ํŠธ๋ฆญ์€ **์ดˆ๋‹น iteration ์ˆ˜(iterations/second)**์ž…๋‹ˆ๋‹ค)_
๊ทธ๋Ÿฌ๋‚˜ ํˆฌ๋ช…์„ฑ์„ ์œ„ํ•ด ๋ชจ๋“  ๋ฒค์น˜๋งˆํ‚น ์ˆ˜์น˜๋ฅผ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค!
๋‹ค์Œ ํ‘œ๋“ค์—์„œ๋Š”, **_์ดˆ๋‹น ์ฒ˜๋ฆฌ๋˜๋Š” iteration_** ์ˆ˜ ์ธก๋ฉด์—์„œ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
### A100 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 21.66 | 23.13 | 44.03 | 49.74 |
| SD - img2img | 21.81 | 22.40 | 43.92 | 46.32 |
| SD - inpaint | 22.24 | 23.23 | 43.76 | 49.25 |
| SD - controlnet | 15.02 | 15.82 | 32.13 | 36.08 |
| IF | 20.21 / <br>13.84 / <br>24.00 | 20.12 / <br>13.70 / <br>24.03 | โŒ | 97.34 / <br>27.23 / <br>111.66 |
### A100 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 11.6 | 13.12 | 14.62 | 17.27 |
| SD - img2img | 11.47 | 13.06 | 14.66 | 17.25 |
| SD - inpaint | 11.67 | 13.31 | 14.88 | 17.48 |
| SD - controlnet | 8.28 | 9.38 | 10.51 | 12.41 |
| IF | 25.02 | 18.04 | โŒ | 48.47 |
### A100 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 3.04 | 3.6 | 3.83 | 4.68 |
| SD - img2img | 2.98 | 3.58 | 3.83 | 4.67 |
| SD - inpaint | 3.04 | 3.66 | 3.9 | 4.76 |
| SD - controlnet | 2.15 | 2.58 | 2.74 | 3.35 |
| IF | 8.78 | 9.82 | โŒ | 16.77 |
### V100 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 18.99 | 19.14 | 20.95 | 22.17 |
| SD - img2img | 18.56 | 19.18 | 20.95 | 22.11 |
| SD - inpaint | 19.14 | 19.06 | 21.08 | 22.20 |
| SD - controlnet | 13.48 | 13.93 | 15.18 | 15.88 |
| IF | 20.01 / <br>9.08 / <br>23.34 | 19.79 / <br>8.98 / <br>24.10 | โŒ | 55.75 / <br>11.57 / <br>57.67 |
### V100 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 5.96 | 5.89 | 6.83 | 6.86 |
| SD - img2img | 5.90 | 5.91 | 6.81 | 6.82 |
| SD - inpaint | 5.99 | 6.03 | 6.93 | 6.95 |
| SD - controlnet | 4.26 | 4.29 | 4.92 | 4.93 |
| IF | 15.41 | 14.76 | โŒ | 22.95 |
### V100 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 1.66 | 1.66 | 1.92 | 1.90 |
| SD - img2img | 1.65 | 1.65 | 1.91 | 1.89 |
| SD - inpaint | 1.69 | 1.69 | 1.95 | 1.93 |
| SD - controlnet | 1.19 | 1.19 | OOM after warmup | 1.36 |
| IF | 5.43 | 5.29 | โŒ | 7.06 |
### T4 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 6.9 | 6.95 | 7.3 | 7.56 |
| SD - img2img | 6.84 | 6.99 | 7.04 | 7.55 |
| SD - inpaint | 6.91 | 6.7 | 7.01 | 7.37 |
| SD - controlnet | 4.89 | 4.86 | 5.35 | 5.48 |
| IF | 17.42 / <br>2.47 / <br>18.52 | 16.96 / <br>2.45 / <br>18.69 | โŒ | 24.63 / <br>2.47 / <br>23.39 |
### T4 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 1.79 | 1.79 | 2.03 | 1.99 |
| SD - img2img | 1.77 | 1.77 | 2.05 | 2.04 |
| SD - inpaint | 1.81 | 1.82 | 2.09 | 2.09 |
| SD - controlnet | 1.34 | 1.27 | 1.47 | 1.46 |
| IF | 5.79 | 5.61 | โŒ | 7.39 |
### T4 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 2.34s | 2.30s | OOM after 2nd iteration | 1.99s |
| SD - img2img | 2.35s | 2.31s | OOM after warmup | 2.00s |
| SD - inpaint | 2.30s | 2.26s | OOM after 2nd iteration | 1.95s |
| SD - controlnet | OOM after 2nd iteration | OOM after 2nd iteration | OOM after warmup | OOM after warmup |
| IF * | 1.44 | 1.44 | โŒ | 1.94 |
### RTX 3090 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 22.56 | 22.84 | 23.84 | 25.69 |
| SD - img2img | 22.25 | 22.61 | 24.1 | 25.83 |
| SD - inpaint | 22.22 | 22.54 | 24.26 | 26.02 |
| SD - controlnet | 16.03 | 16.33 | 17.38 | 18.56 |
| IF | 27.08 / <br>9.07 / <br>31.23 | 26.75 / <br>8.92 / <br>31.47 | โŒ | 68.08 / <br>11.16 / <br>65.29 |
### RTX 3090 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 6.46 | 6.35 | 7.29 | 7.3 |
| SD - img2img | 6.33 | 6.27 | 7.31 | 7.26 |
| SD - inpaint | 6.47 | 6.4 | 7.44 | 7.39 |
| SD - controlnet | 4.59 | 4.54 | 5.27 | 5.26 |
| IF | 16.81 | 16.62 | โŒ | 21.57 |
### RTX 3090 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 1.7 | 1.69 | 1.93 | 1.91 |
| SD - img2img | 1.68 | 1.67 | 1.93 | 1.9 |
| SD - inpaint | 1.72 | 1.71 | 1.97 | 1.94 |
| SD - controlnet | 1.23 | 1.22 | 1.4 | 1.38 |
| IF | 5.01 | 5.00 | โŒ | 6.33 |
### RTX 4090 (batch size: 1)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 40.5 | 41.89 | 44.65 | 49.81 |
| SD - img2img | 40.39 | 41.95 | 44.46 | 49.8 |
| SD - inpaint | 40.51 | 41.88 | 44.58 | 49.72 |
| SD - controlnet | 29.27 | 30.29 | 32.26 | 36.03 |
| IF | 69.71 / <br>18.78 / <br>85.49 | 69.13 / <br>18.80 / <br>85.56 | โŒ | 124.60 / <br>26.37 / <br>138.79 |
### RTX 4090 (batch size: 4)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 12.62 | 12.84 | 15.32 | 15.59 |
| SD - img2img | 12.61 | 12,.79 | 15.35 | 15.66 |
| SD - inpaint | 12.65 | 12.81 | 15.3 | 15.58 |
| SD - controlnet | 9.1 | 9.25 | 11.03 | 11.22 |
| IF | 31.88 | 31.14 | โŒ | 43.92 |
### RTX 4090 (batch size: 16)
| **Pipeline** | **torch 2.0 - <br>no compile** | **torch nightly - <br>no compile** | **torch 2.0 - <br>compile** | **torch nightly - <br>compile** |
|:---:|:---:|:---:|:---:|:---:|
| SD - txt2img | 3.17 | 3.2 | 3.84 | 3.85 |
| SD - img2img | 3.16 | 3.2 | 3.84 | 3.85 |
| SD - inpaint | 3.17 | 3.2 | 3.85 | 3.85 |
| SD - controlnet | 2.23 | 2.3 | 2.7 | 2.75 |
| IF | 9.26 | 9.2 | โŒ | 13.31 |
## ์ฐธ๊ณ 
* Follow [this PR](https://github.com/huggingface/diffusers/pull/3313) for more details on the environment used for conducting the benchmarks.
* For the IF pipeline and batch sizes > 1, we only used a batch size of >1 in the first IF pipeline for text-to-image generation and NOT for upscaling. So, that means the two upscaling pipelines received a batch size of 1.
*Thanks to [Horace He](https://github.com/Chillee) from the PyTorch team for their support in improving our support of `torch.compile()` in Diffusers.*
* ๋ฒค์น˜๋งˆํฌ ์ˆ˜ํ–‰์— ์‚ฌ์šฉ๋œ ํ™˜๊ฒฝ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [์ด PR](https://github.com/huggingface/diffusers/pull/3313)์„ ์ฐธ์กฐํ•˜์„ธ์š”.
* IF ํŒŒ์ดํ”„๋ผ์ธ์™€ ๋ฐฐ์น˜ ํฌ๊ธฐ > 1์˜ ๊ฒฝ์šฐ ์ฒซ ๋ฒˆ์งธ IF ํŒŒ์ดํ”„๋ผ์ธ์—์„œ text-to-image ์ƒ์„ฑ์„ ์œ„ํ•œ ๋ฐฐ์น˜ ํฌ๊ธฐ > 1๋งŒ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ ์—…์Šค์ผ€์ผ๋ง์—๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๋‘ ๊ฐœ์˜ ์—…์Šค์ผ€์ผ๋ง ํŒŒ์ดํ”„๋ผ์ธ์ด ๋ฐฐ์น˜ ํฌ๊ธฐ 1์ž„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
*Diffusers์—์„œ `torch.compile()` ์ง€์›์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค€ PyTorch ํŒ€์˜ [Horace He](https://github.com/Chillee)์—๊ฒŒ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.*

View File

@@ -0,0 +1,54 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ์ƒˆ๋กœ์šด ์ž‘์—…์— ๋Œ€ํ•œ ๋ชจ๋ธ์„ ์ ์šฉํ•˜๊ธฐ
๋งŽ์€ diffusion ์‹œ์Šคํ…œ์€ ๊ฐ™์€ ๊ตฌ์„ฑ ์š”์†Œ๋“ค์„ ๊ณต์œ ํ•˜๋ฏ€๋กœ ํ•œ ์ž‘์—…์— ๋Œ€ํ•ด ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์™„์ „ํžˆ ๋‹ค๋ฅธ ์ž‘์—…์— ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์ธํŽ˜์ธํŒ…์„ ์œ„ํ•œ ๊ฐ€์ด๋“œ๋Š” ์‚ฌ์ „ํ•™์Šต๋œ [`UNet2DConditionModel`]์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ์ˆ˜์ •ํ•˜์—ฌ ์‚ฌ์ „ํ•™์Šต๋œ text-to-image ๋ชจ๋ธ์„ ์–ด๋–ป๊ฒŒ ์ธํŽ˜์ธํŒ…์— ์ ์šฉํ•˜๋Š”์ง€๋ฅผ ์•Œ๋ ค์ค„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
## UNet2DConditionModel ํŒŒ๋ผ๋ฏธํ„ฐ ๊ตฌ์„ฑ
[`UNet2DConditionModel`]์€ [input sample](https://huggingface.co/docs/diffusers/v0.16.0/en/api/models#diffusers.UNet2DConditionModel.in_channels)์—์„œ 4๊ฐœ์˜ ์ฑ„๋„์„ ๊ธฐ๋ณธ์ ์œผ๋กœ ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)์™€ ๊ฐ™์€ ์‚ฌ์ „ํ•™์Šต๋œ text-to-image ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ  `in_channels`์˜ ์ˆ˜๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค:
```py
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.unet.config["in_channels"]
4
```
์ธํŽ˜์ธํŒ…์€ ์ž…๋ ฅ ์ƒ˜ํ”Œ์— 9๊ฐœ์˜ ์ฑ„๋„์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. [`runwayml/stable-diffusion-inpainting`](https://huggingface.co/runwayml/stable-diffusion-inpainting)์™€ ๊ฐ™์€ ์‚ฌ์ „ํ•™์Šต๋œ ์ธํŽ˜์ธํŒ… ๋ชจ๋ธ์—์„œ ์ด ๊ฐ’์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipeline.unet.config["in_channels"]
9
```
์ธํŽ˜์ธํŒ…์— ๋Œ€ํ•œ text-to-image ๋ชจ๋ธ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด, `in_channels` ์ˆ˜๋ฅผ 4์—์„œ 9๋กœ ์ˆ˜์ •ํ•ด์•ผ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์‚ฌ์ „ํ•™์Šต๋œ text-to-image ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜์™€ [`UNet2DConditionModel`]์„ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  `in_channels`๋ฅผ 9๋กœ ์ˆ˜์ •ํ•ด ์ฃผ์„ธ์š”. `in_channels`์˜ ์ˆ˜๋ฅผ ์ˆ˜์ •ํ•˜๋ฉด ํฌ๊ธฐ๊ฐ€ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์— ํฌ๊ธฐ๊ฐ€ ์•ˆ ๋งž๋Š” ์˜ค๋ฅ˜๋ฅผ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด `ignore_mismatched_sizes=True` ๋ฐ `low_cpu_mem_usage=False`๋ฅผ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```py
from diffusers import UNet2DConditionModel
model_id = "runwayml/stable-diffusion-v1-5"
unet = UNet2DConditionModel.from_pretrained(
model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True
)
```
Text-to-image ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ ๋‹ค๋ฅธ ๊ตฌ์„ฑ ์š”์†Œ์˜ ์‚ฌ์ „ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋Š” ์ฒดํฌํฌ์ธํŠธ๋กœ๋ถ€ํ„ฐ ์ดˆ๊ธฐํ™”๋˜์ง€๋งŒ `unet`์˜ ์ž…๋ ฅ ์ฑ„๋„ ๊ฐ€์ค‘์น˜ (`conv_in.weight`)๋Š” ๋žœ๋คํ•˜๊ฒŒ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ๋ชจ๋ธ์ด ๋…ธ์ด์ฆˆ๋ฅผ ๋ฆฌํ„ดํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ธํŽ˜์ธํŒ…์˜ ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹ ํ•  ๋•Œ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

View File

@@ -273,7 +273,7 @@ from diffusers import DiffusionPipeline, UNet2DConditionModel
from transformers import CLIPTextModel
import torch
# ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ธ์ˆ˜(model, revision)๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
# ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ธ์ˆ˜(model, revision)๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet")
@@ -294,7 +294,7 @@ If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an in
from accelerate import Accelerator
from diffusers import DiffusionPipeline
# ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ธ์ˆ˜(model, revision)๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
# ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ธ์ˆ˜(model, revision)๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
model_id = "CompVis/stable-diffusion-v1-4"
pipeline = DiffusionPipeline.from_pretrained(model_id)

View File

@@ -102,7 +102,7 @@ accelerate launch train_dreambooth_lora.py \
>>> pipe = StableDiffusionPipeline.from_pretrained(model_base, torch_dtype=torch.float16)
```
*๊ธฐ๋ณธ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜ ์œ„์—* ํŒŒ์ธํŠœ๋‹๋œ DreamBooth ๋ชจ๋ธ์—์„œ LoRA ๊ฐ€์ค‘์น˜๋ฅผ ๋กœ๋“œํ•œ ๋‹ค์Œ, ๋” ๋น ๋ฅธ ์ถ”๋ก ์„ ์œ„ํ•ด ํŒŒ์ดํ”„๋ผ์ธ์„ GPU๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค. LoRA ๊ฐ€์ค‘์น˜๋ฅผ ํ”„๋ฆฌ์ง•๋œ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜์™€ ๋ณ‘ํ•ฉํ•  ๋•Œ, ์„ ํƒ์ ์œผ๋กœ 'scale' ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์–ด๋А ์ •๋„์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋ณ‘ํ•ฉํ•  ์ง€ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
*๊ธฐ๋ณธ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜ ์œ„์—* ํŒŒ์ธํŠœ๋‹๋œ DreamBooth ๋ชจ๋ธ์—์„œ LoRA ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์˜จ ๋‹ค์Œ, ๋” ๋น ๋ฅธ ์ถ”๋ก ์„ ์œ„ํ•ด ํŒŒ์ดํ”„๋ผ์ธ์„ GPU๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค. LoRA ๊ฐ€์ค‘์น˜๋ฅผ ํ”„๋ฆฌ์ง•๋œ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜์™€ ๋ณ‘ํ•ฉํ•  ๋•Œ, ์„ ํƒ์ ์œผ๋กœ 'scale' ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์–ด๋А ์ •๋„์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋ณ‘ํ•ฉํ•  ์ง€ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
<Tip>

View File

@@ -0,0 +1,73 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ๐Ÿงจ Diffusers ํ•™์Šต ์˜ˆ์‹œ
์ด๋ฒˆ ์ฑ•ํ„ฐ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์œ ์ฆˆ์ผ€์ด์Šค๋“ค์— ๋Œ€ํ•œ ์˜ˆ์ œ ์ฝ”๋“œ๋“ค์„ ํ†ตํ•ด ์–ด๋–ป๊ฒŒํ•˜๋ฉด ํšจ๊ณผ์ ์œผ๋กœ `diffusers` ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„๊นŒ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
**Note**: ํ˜น์‹œ ์˜คํ”ผ์…œํ•œ ์˜ˆ์‹œ์ฝ”๋“œ๋ฅผ ์ฐพ๊ณ  ์žˆ๋‹ค๋ฉด, [์—ฌ๊ธฐ](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)๋ฅผ ์ฐธ๊ณ ํ•ด๋ณด์„ธ์š”!
์—ฌ๊ธฐ์„œ ๋‹ค๋ฃฐ ์˜ˆ์‹œ๋“ค์€ ๋‹ค์Œ์„ ์ง€ํ–ฅํ•ฉ๋‹ˆ๋‹ค.
- **์†์‰ฌ์šด ๋””ํŽœ๋˜์‹œ ์„ค์น˜** (Self-contained) : ์—ฌ๊ธฐ์„œ ์‚ฌ์šฉ๋  ์˜ˆ์‹œ ์ฝ”๋“œ๋“ค์˜ ๋””ํŽœ๋˜์‹œ ํŒจํ‚ค์ง€๋“ค์€ ์ „๋ถ€ `pip install` ๋ช…๋ น์–ด๋ฅผ ํ†ตํ•ด ์„ค์น˜ ๊ฐ€๋Šฅํ•œ ํŒจํ‚ค์ง€๋“ค์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ์นœ์ ˆํ•˜๊ฒŒ `requirements.txt` ํŒŒ์ผ์— ํ•ด๋‹น ํŒจํ‚ค์ง€๋“ค์ด ๋ช…์‹œ๋˜์–ด ์žˆ์–ด, `pip install -r requirements.txt`๋กœ ๊ฐ„ํŽธํ•˜๊ฒŒ ํ•ด๋‹น ๋””ํŽœ๋˜์‹œ๋“ค์„ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์‹œ: [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements.txt)
- **์†์‰ฌ์šด ์ˆ˜์ •** (Easy-to-tweak) : ์ €ํฌ๋Š” ๊ฐ€๋Šฅํ•˜๋ฉด ๋งŽ์€ ์œ ์ฆˆ ์ผ€์ด์Šค๋“ค์„ ์ œ๊ณตํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์˜ˆ์‹œ๋Š” ๊ฒฐ๊ตญ ๊ทธ์ € ์˜ˆ์‹œ๋ผ๋Š” ์ ๋“ค ๊ธฐ์–ตํ•ด์ฃผ์„ธ์š”. ์—ฌ๊ธฐ์„œ ์ œ๊ณต๋˜๋Š” ์˜ˆ์‹œ์ฝ”๋“œ๋“ค์„ ๊ทธ์ € ๋‹จ์ˆœํžˆ ๋ณต์‚ฌ-๋ถ™ํ˜€๋„ฃ๊ธฐํ•˜๋Š” ์‹์œผ๋กœ๋Š” ์—ฌ๋Ÿฌ๋ถ„์ด ๋งˆ์ฃผํ•œ ๋ฌธ์ œ๋“ค์„ ์†์‰ฝ๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆœ ์—†์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‹ค์‹œ ๋งํ•ด ์–ด๋А ์ •๋„๋Š” ์—ฌ๋Ÿฌ๋ถ„์˜ ์ƒํ™ฉ๊ณผ ๋‹ˆ์ฆˆ์— ๋งž์ถฐ ์ฝ”๋“œ๋ฅผ ์ผ์ • ๋ถ€๋ถ„ ๊ณ ์ณ๋‚˜๊ฐ€์•ผ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋Œ€๋ถ€๋ถ„์˜ ํ•™์Šต ์˜ˆ์‹œ๋“ค์€ ๋ฐ์ดํ„ฐ์˜ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •๊ณผ ํ•™์Šต ๊ณผ์ •์— ๋Œ€ํ•œ ์ฝ”๋“œ๋“ค์„ ํ•จ๊ป˜ ์ œ๊ณตํ•จ์œผ๋กœ์จ, ์‚ฌ์šฉ์ž๊ฐ€ ๋‹ˆ์ฆˆ์— ๋งž๊ฒŒ ์†์‰ฌ์šด ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
- **์ž…๋ฌธ์ž ์นœํ™”์ ์ธ** (Beginner-friendly) : ์ด๋ฒˆ ์ฑ•ํ„ฐ๋Š” diffusion ๋ชจ๋ธ๊ณผ `diffusers` ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ๋Œ€ํ•œ ์ „๋ฐ˜์ ์ธ ์ดํ•ด๋ฅผ ๋•๊ธฐ ์œ„ํ•ด ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ diffusion ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ตœ์‹  SOTA (state-of-the-art) ๋ฐฉ๋ฒ•๋ก ๋“ค ๊ฐ€์šด๋ฐ์„œ๋„, ์ž…๋ฌธ์ž์—๊ฒŒ๋Š” ๋งŽ์ด ์–ด๋ ค์šธ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํŒ๋‹จ๋˜๋ฉด, ํ•ด๋‹น ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ์—ฌ๊ธฐ์„œ ๋‹ค๋ฃจ์ง€ ์•Š์œผ๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
- **ํ•˜๋‚˜์˜ ํƒœ์Šคํฌ๋งŒ ํฌํ•จํ•  ๊ฒƒ**(One-purpose-only): ์—ฌ๊ธฐ์„œ ๋‹ค๋ฃฐ ์˜ˆ์‹œ๋“ค์€ ํ•˜๋‚˜์˜ ํƒœ์Šคํฌ๋งŒ ํฌํ•จํ•˜๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ฌผ๋ก  ์ด๋ฏธ์ง€ ์ดˆํ•ด์ƒํ™”(super-resolution)์™€ ์ด๋ฏธ์ง€ ๋ณด์ •(modification)๊ณผ ๊ฐ™์€ ์œ ์‚ฌํ•œ ๋ชจ๋ธ๋ง ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ–๋Š” ํƒœ์Šคํฌ๋“ค์ด ์กด์žฌํ•˜๊ฒ ์ง€๋งŒ, ํ•˜๋‚˜์˜ ์˜ˆ์ œ์— ํ•˜๋‚˜์˜ ํƒœ์Šคํฌ๋งŒ์„ ๋‹ด๋Š” ๊ฒƒ์ด ๋” ์ดํ•ดํ•˜๊ธฐ ์šฉ์ดํ•˜๋‹ค๊ณ  ํŒ๋‹จํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
์ €ํฌ๋Š” diffusion ๋ชจ๋ธ์˜ ๋Œ€ํ‘œ์ ์ธ ํƒœ์Šคํฌ๋“ค์„ ๋‹ค๋ฃจ๋Š” ๊ณต์‹ ์˜ˆ์ œ๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. *๊ณต์‹* ์˜ˆ์ œ๋Š” ํ˜„์žฌ ์ง„ํ–‰ํ˜•์œผ๋กœ `diffusers` ๊ด€๋ฆฌ์ž๋“ค(maintainers)์— ์˜ํ•ด ๊ด€๋ฆฌ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ €ํฌ๋Š” ์•ž์„œ ์ •์˜ํ•œ ์ €ํฌ์˜ ์ฒ ํ•™์„ ์—„๊ฒฉํ•˜๊ฒŒ ๋”ฐ๋ฅด๊ณ ์ž ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜น์‹œ ์—ฌ๋Ÿฌ๋ถ„๊ป˜์„œ ์ด๋Ÿฌํ•œ ์˜ˆ์‹œ๊ฐ€ ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐ๋˜์‹ ๋‹ค๋ฉด, ์–ธ์ œ๋“ ์ง€ [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) ํ˜น์€ ์ง์ ‘ [Pull Request](https://github.com/huggingface/diffusers/compare)๋ฅผ ์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. ์ €ํฌ๋Š” ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค!
ํ•™์Šต ์˜ˆ์‹œ๋“ค์€ ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ๋“ค์— ๋Œ€ํ•ด diffusion ๋ชจ๋ธ์„ ์‚ฌ์ „ํ•™์Šต(pretrain)ํ•˜๊ฑฐ๋‚˜ ํŒŒ์ธํŠœ๋‹(fine-tuning)ํ•˜๋Š” ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํ˜„์žฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜ˆ์ œ๋“ค์„ ์ง€์›ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
- [Unconditional Training](./unconditional_training)
- [Text-to-Image Training](./text2image)
- [Text Inversion](./text_inversion)
- [Dreambooth](./dreambooth)
memory-efficient attention ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ€๋Šฅํ•˜๋ฉด [xFormers](../optimization/xformers)๋ฅผ ์„ค์น˜ํ•ด์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ•™์Šต ์†๋„๋ฅผ ๋Š˜๋ฆฌ๊ณ  ๋ฉ”๋ชจ๋ฆฌ์— ๋Œ€ํ•œ ๋ถ€๋‹ด์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
| Task | ๐Ÿค— Accelerate | ๐Ÿค— Datasets | Colab
|---|---|:---:|:---:|
| [**Unconditional Image Generation**](./unconditional_training) | โœ… | โœ… | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
| [**Text-to-Image fine-tuning**](./text2image) | โœ… | โœ… |
| [**Textual Inversion**](./text_inversion) | โœ… | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
| [**Dreambooth**](./dreambooth) | โœ… | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb)
| [**Training with LoRA**](./lora) | โœ… | - | - |
| [**ControlNet**](./controlnet) | โœ… | โœ… | - |
| [**InstructPix2Pix**](./instructpix2pix) | โœ… | โœ… | - |
| [**Custom Diffusion**](./custom_diffusion) | โœ… | โœ… | - |
## ์ปค๋ฎค๋‹ˆํ‹ฐ
๊ณต์‹ ์˜ˆ์ œ ์™ธ์—๋„ **์ปค๋ฎค๋‹ˆํ‹ฐ ์˜ˆ์ œ** ์—ญ์‹œ ์ œ๊ณตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ์˜ˆ์ œ๋“ค์€ ์šฐ๋ฆฌ์˜ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์˜ํ•ด ๊ด€๋ฆฌ๋ฉ๋‹ˆ๋‹ค. ์ปค๋ฎค๋‹ˆํ‹ฐ ์˜ˆ์ฉจ๋Š” ํ•™์Šต ์˜ˆ์‹œ๋‚˜ ์ถ”๋ก  ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ปค๋ฎค๋‹ˆํ‹ฐ ์˜ˆ์‹œ๋“ค์˜ ๊ฒฝ์šฐ, ์•ž์„œ ์ •์˜ํ–ˆ๋˜ ์ฒ ํ•™๋“ค์„ ์ข€ ๋” ๊ด€๋Œ€ํ•˜๊ฒŒ ์ ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด๋Ÿฌํ•œ ์ปค๋ฎค๋‹ˆํ‹ฐ ์˜ˆ์‹œ๋“ค์˜ ๊ฒฝ์šฐ, ๋ชจ๋“  ์ด์Šˆ๋“ค์— ๋Œ€ํ•œ ์œ ์ง€๋ณด์ˆ˜๋ฅผ ๋ณด์žฅํ•  ์ˆ˜๋Š” ์—†์Šต๋‹ˆ๋‹ค.
์œ ์šฉํ•˜๊ธด ํ•˜์ง€๋งŒ, ์•„์ง์€ ๋Œ€์ค‘์ ์ด์ง€ ๋ชปํ•˜๊ฑฐ๋‚˜ ์ €ํฌ์˜ ์ฒ ํ•™์— ๋ถ€ํ•ฉํ•˜์ง€ ์•Š๋Š” ์˜ˆ์ œ๋“ค์€ [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) ํด๋”์— ๋‹ด๊ธฐ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
**Note**: ์ปค๋ฎค๋‹ˆํ‹ฐ ์˜ˆ์ œ๋Š” `diffusers`์— ๊ธฐ์—ฌ(contribution)๋ฅผ ํฌ๋งํ•˜๋Š” ๋ถ„๋“ค์—๊ฒŒ [์•„์ฃผ ์ข‹์€ ๊ธฐ์—ฌ ์ˆ˜๋‹จ](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
## ์ฃผ๋ชฉํ•  ์‚ฌํ•ญ๋“ค
์ตœ์‹  ๋ฒ„์ „์˜ ์˜ˆ์‹œ ์ฝ”๋“œ๋“ค์˜ ์„ฑ๊ณต์ ์ธ ๊ตฌ๋™์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, ๋ฐ˜๋“œ์‹œ **์†Œ์Šค์ฝ”๋“œ๋ฅผ ํ†ตํ•ด `diffusers`๋ฅผ ์„ค์น˜ํ•ด์•ผ ํ•˜๋ฉฐ,** ํ•ด๋‹น ์˜ˆ์‹œ ์ฝ”๋“œ๋“ค์ด ์š”๊ตฌํ•˜๋Š” ๋””ํŽœ๋˜์‹œ๋“ค ์—ญ์‹œ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ƒˆ๋กœ์šด ๊ฐ€์ƒ ํ™˜๊ฒฝ์„ ๊ตฌ์ถ•ํ•˜๊ณ  ๋‹ค์Œ์˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
```
๊ทธ ๋‹ค์Œ `cd` ๋ช…๋ น์–ด๋ฅผ ํ†ตํ•ด ํ•ด๋‹น ์˜ˆ์ œ ๋””๋ ‰ํ† ๋ฆฌ์— ์ ‘๊ทผํ•ด์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
```bash
pip install -r requirements.txt
```

View File

@@ -0,0 +1,275 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Textual-Inversion
[[open-in-colab]]
[textual-inversion](https://arxiv.org/abs/2208.01618)์€ ์†Œ์ˆ˜์˜ ์˜ˆ์‹œ ์ด๋ฏธ์ง€์—์„œ ์ƒˆ๋กœ์šด ์ฝ˜์…‰ํŠธ๋ฅผ ํฌ์ฐฉํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๊ธฐ์ˆ ์€ ์›๋ž˜ [Latent Diffusion](https://github.com/CompVis/latent-diffusion)์—์„œ ์‹œ์—ฐ๋˜์—ˆ์ง€๋งŒ, ์ดํ›„ [Stable Diffusion](https://huggingface.co/docs/diffusers/main/en/conceptual/stable_diffusion)๊ณผ ๊ฐ™์€ ์œ ์‚ฌํ•œ ๋‹ค๋ฅธ ๋ชจ๋ธ์—๋„ ์ ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต๋œ ์ฝ˜์…‰ํŠธ๋Š” text-to-image ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ๋” ์ž˜ ์ œ์–ดํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์˜ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์—์„œ ์ƒˆ๋กœ์šด '๋‹จ์–ด'๋ฅผ ํ•™์Šตํ•˜์—ฌ ๊ฐœ์ธํ™”๋œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ์œ„ํ•œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ๋‚ด์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
![Textual Inversion example](https://textual-inversion.github.io/static/images/editing/colorful_teapot.JPG)
<small>By using just 3-5 images you can teach new concepts to a model such as Stable Diffusion for personalized image generation <a href="https://github.com/rinongal/textual_inversion">(image source)</a>.</small>
์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” textual-inversion์œผ๋กœ [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ€์ด๋“œ์—์„œ ์‚ฌ์šฉ๋œ ๋ชจ๋“  textual-inversion ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” [์—ฌ๊ธฐ](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚ด๋ถ€์ ์œผ๋กœ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ณ  ์‹ถ์œผ์‹œ๋‹ค๋ฉด ํ•ด๋‹น ๋งํฌ๋ฅผ ์ฐธ์กฐํ•ด์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
<Tip>
[Stable Diffusion Textual Inversion Concepts Library](https://huggingface.co/sd-concepts-library)์—๋Š” ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์ œ์ž‘ํ•œ ํ•™์Šต๋œ textual-inversion ๋ชจ๋ธ๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋” ๋งŽ์€ ์ฝ˜์…‰ํŠธ๋“ค์ด ์ถ”๊ฐ€๋˜์–ด ์œ ์šฉํ•œ ๋ฆฌ์†Œ์Šค๋กœ ์„ฑ์žฅํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค!
</Tip>
์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ•™์Šต์„ ์œ„ํ•œ ์˜์กด์„ฑ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์„ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
```bash
pip install diffusers accelerate transformers
```
์˜์กด์„ฑ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์˜ ์„ค์น˜๊ฐ€ ์™„๋ฃŒ๋˜๋ฉด, [๐Ÿค—Accelerate](https://github.com/huggingface/accelerate/) ํ™˜๊ฒฝ์„ ์ดˆ๊ธฐํ™”์‹œํ‚ต๋‹ˆ๋‹ค.
```bash
accelerate config
```
๋ณ„๋„์˜ ์„ค์ •์—†์ด, ๊ธฐ๋ณธ ๐Ÿค—Accelerate ํ™˜๊ฒฝ์„ ์„ค์ •ํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•˜์„ธ์š”:
```bash
accelerate config default
```
๋˜๋Š” ์‚ฌ์šฉ ์ค‘์ธ ํ™˜๊ฒฝ์ด ๋…ธํŠธ๋ถ๊ณผ ๊ฐ™์€ ๋Œ€ํ™”ํ˜• ์…ธ์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
from accelerate.utils import write_basic_config
write_basic_config()
```
๋งˆ์ง€๋ง‰์œผ๋กœ, Memory-Efficient Attention์„ ํ†ตํ•ด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ด๊ธฐ ์œ„ํ•ด [xFormers](https://huggingface.co/docs/diffusers/main/en/training/optimization/xformers)๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. xFormers๋ฅผ ์„ค์น˜ํ•œ ํ›„, ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— `--enable_xformers_memory_efficient_attention` ์ธ์ž๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. xFormers๋Š” Flax์—์„œ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
## ํ—ˆ๋ธŒ์— ๋ชจ๋ธ ์—…๋กœ๋“œํ•˜๊ธฐ
๋ชจ๋ธ์„ ํ—ˆ๋ธŒ์— ์ €์žฅํ•˜๋ ค๋ฉด, ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ž๋ฅผ ์ถ”๊ฐ€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```bash
--push_to_hub
```
## ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ ๋ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
ํ•™์Šต์ค‘์— ๋ชจ๋ธ์˜ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ •๊ธฐ์ ์œผ๋กœ ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์–ด๋–ค ์ด์œ ๋กœ๋“  ํ•™์Šต์ด ์ค‘๋‹จ๋œ ๊ฒฝ์šฐ ์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ํ•™์Šต์„ ๋‹ค์‹œ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ž๋ฅผ ์ „๋‹ฌํ•˜๋ฉด 500๋‹จ๊ณ„๋งˆ๋‹ค ์ „์ฒด ํ•™์Šต ์ƒํƒœ๊ฐ€ `output_dir`์˜ ํ•˜์œ„ ํด๋”์— ์ฒดํฌํฌ์ธํŠธ๋กœ์„œ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.
```bash
--checkpointing_steps=500
```
์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ํ•™์Šต์„ ์žฌ๊ฐœํ•˜๋ ค๋ฉด, ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์™€ ์žฌ๊ฐœํ•  ํŠน์ • ์ฒดํฌํฌ์ธํŠธ์— ๋‹ค์Œ ์ธ์ž๋ฅผ ์ „๋‹ฌํ•˜์„ธ์š”.
```bash
--resume_from_checkpoint="checkpoint-1500"
```
## ํŒŒ์ธ ํŠœ๋‹
ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ [๊ณ ์–‘์ด ์žฅ๋‚œ๊ฐ ๋ฐ์ดํ„ฐ์…‹](https://huggingface.co/datasets/diffusers/cat_toy_example)์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ๋””๋ ‰ํ† ๋ฆฌ์— ์ €์žฅํ•˜์„ธ์š”. ์—ฌ๋Ÿฌ๋ถ„๋งŒ์˜ ๊ณ ์œ ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•œ๋‹ค๋ฉด, [ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ์…‹ ๋งŒ๋“ค๊ธฐ](https://huggingface.co/docs/diffusers/training/create_dataset) ๊ฐ€์ด๋“œ๋ฅผ ์‚ดํŽด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
```py
from huggingface_hub import snapshot_download
local_dir = "./cat"
snapshot_download(
"diffusers/cat_toy_example", local_dir=local_dir, repo_type="dataset", ignore_patterns=".gitattributes"
)
```
๋ชจ๋ธ์˜ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ ID(๋˜๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๊ฐ€ ํฌํ•จ๋œ ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ)๋ฅผ `MODEL_NAME` ํ™˜๊ฒฝ ๋ณ€์ˆ˜์— ํ• ๋‹นํ•˜๊ณ , ํ•ด๋‹น ๊ฐ’์„ [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) ์ธ์ž์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฏธ์ง€๊ฐ€ ํฌํ•จ๋œ ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ๋ฅผ `DATA_DIR` ํ™˜๊ฒฝ ๋ณ€์ˆ˜์— ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค.
์ด์ œ [ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py)๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์Šคํฌ๋ฆฝํŠธ๋Š” ๋‹ค์Œ ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ  ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
- `learned_embeds.bin`
- `token_identifier.txt`
- `type_of_concept.txt`.
<Tip>
๐Ÿ’กV100 GPU 1๊ฐœ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ „์ฒด ํ•™์Šต์—๋Š” ์ตœ๋Œ€ 1์‹œ๊ฐ„์ด ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค. ํ•™์Šต์ด ์™„๋ฃŒ๋˜๊ธฐ๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๋Š” ๋™์•ˆ ๊ถ๊ธˆํ•œ ์ ์ด ์žˆ์œผ๋ฉด ์•„๋ž˜ ์„น์…˜์—์„œ [textual-inversion์ด ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€](https://huggingface.co/docs/diffusers/training/text_inversion#how-it-works) ์ž์œ ๋กญ๊ฒŒ ํ™•์ธํ•˜์„ธ์š” !
</Tip>
<frameworkcontent>
<pt>
```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR="./cat"
accelerate launch textual_inversion.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$DATA_DIR \
--learnable_property="object" \
--placeholder_token="<cat-toy>" --initializer_token="toy" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=3000 \
--learning_rate=5.0e-04 --scale_lr \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--output_dir="textual_inversion_cat" \
--push_to_hub
```
<Tip>
๐Ÿ’กํ•™์Šต ์„ฑ๋Šฅ์„ ์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด, ํ”Œ๋ ˆ์ด์Šคํ™€๋” ํ† ํฐ(`<cat-toy>`)์„ (๋‹จ์ผํ•œ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๊ฐ€ ์•„๋‹Œ) ๋ณต์ˆ˜์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ ์—ญ์‹œ ๊ณ ๋ คํ•  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํŠธ๋ฆญ์ด ๋ชจ๋ธ์ด ๋ณด๋‹ค ๋ณต์žกํ•œ ์ด๋ฏธ์ง€์˜ ์Šคํƒ€์ผ(์•ž์„œ ๋งํ•œ ์ฝ˜์…‰ํŠธ)์„ ๋” ์ž˜ ์บก์ฒ˜ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณต์ˆ˜์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ํ•™์Šต์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด ๋‹ค์Œ ์˜ต์…˜์„ ์ „๋‹ฌํ•˜์‹ญ์‹œ์˜ค.
```bash
--num_vectors=5
```
</Tip>
</pt>
<jax>
TPU์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ, [Flax ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋” ๋น ๋ฅด๊ฒŒ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ๋ณด์„ธ์š”. (๋ฌผ๋ก  GPU์—์„œ๋„ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.) ๋™์ผํ•œ ์„ค์ •์—์„œ Flax ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” PyTorch ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ณด๋‹ค ์ตœ์†Œ 70% ๋” ๋นจ๋ผ์•ผ ํ•ฉ๋‹ˆ๋‹ค! โšก๏ธ
์‹œ์ž‘ํ•˜๊ธฐ ์•ž์„œ Flax์— ๋Œ€ํ•œ ์˜์กด์„ฑ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์„ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```bash
pip install -U -r requirements_flax.txt
```
๋ชจ๋ธ์˜ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ ID(๋˜๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๊ฐ€ ํฌํ•จ๋œ ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ)๋ฅผ `MODEL_NAME` ํ™˜๊ฒฝ ๋ณ€์ˆ˜์— ํ• ๋‹นํ•˜๊ณ , ํ•ด๋‹น ๊ฐ’์„ [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) ์ธ์ž์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
๊ทธ๋Ÿฐ ๋‹ค์Œ [ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py)๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export DATA_DIR="./cat"
python textual_inversion_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$DATA_DIR \
--learnable_property="object" \
--placeholder_token="<cat-toy>" --initializer_token="toy" \
--resolution=512 \
--train_batch_size=1 \
--max_train_steps=3000 \
--learning_rate=5.0e-04 --scale_lr \
--output_dir="textual_inversion_cat" \
--push_to_hub
```
</jax>
</frameworkcontent>
### ์ค‘๊ฐ„ ๋กœ๊น…
๋ชจ๋ธ์˜ ํ•™์Šต ์ง„ํ–‰ ์ƒํ™ฉ์„ ์ถ”์ ํ•˜๋Š” ๋ฐ ๊ด€์‹ฌ์ด ์žˆ๋Š” ๊ฒฝ์šฐ, ํ•™์Šต ๊ณผ์ •์—์„œ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์ค‘๊ฐ„ ๋กœ๊น…์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.
- `validation_prompt` : ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํ”„๋กฌํ”„ํŠธ(๊ธฐ๋ณธ๊ฐ’์€ `None`์œผ๋กœ ์„ค์ •๋˜๋ฉฐ, ์ด ๋•Œ ์ค‘๊ฐ„ ๋กœ๊น…์€ ๋น„ํ™œ์„ฑํ™”๋จ)
- `num_validation_images` : ์ƒ์„ฑํ•  ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€ ์ˆ˜
- `validation_steps` : `validation_prompt`๋กœ๋ถ€ํ„ฐ ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์ „ ์Šคํ…์˜ ์ˆ˜
```bash
--validation_prompt="A <cat-toy> backpack"
--num_validation_images=4
--validation_steps=100
```
## ์ถ”๋ก 
๋ชจ๋ธ์„ ํ•™์Šตํ•œ ํ›„์—๋Š”, ํ•ด๋‹น ๋ชจ๋ธ์„ [`StableDiffusionPipeline`]์„ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
textual-inversion ์Šคํฌ๋ฆฝํŠธ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ textual-inversion์„ ํ†ตํ•ด ์–ป์–ด์ง„ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋งŒ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ํ•ด๋‹น ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋“ค์€ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์˜ ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ์— ์ถ”๊ฐ€๋˜์–ด ์žˆ์Šต์Šต๋‹ˆ๋‹ค.
<frameworkcontent>
<pt>
<Tip>
๐Ÿ’ก ์ปค๋ฎค๋‹ˆํ‹ฐ๋Š” [sd-concepts-library](https://huggingface.co/sd-concepts-library) ๋ผ๋Š” ๋Œ€๊ทœ๋ชจ์˜ textual-inversion ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. textual-inversion ์ž„๋ฒ ๋”ฉ์„ ๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ํ•™์Šตํ•˜๋Š” ๋Œ€์‹ , ํ•ด๋‹น ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ๋ณธ์ธ์ด ์ฐพ๋Š” textual-inversion ์ž„๋ฒ ๋”ฉ์ด ์ด๋ฏธ ์ถ”๊ฐ€๋˜์–ด ์žˆ์ง€ ์•Š์€์ง€๋ฅผ ํ™•์ธํ•˜๋Š” ๊ฒƒ๋„ ์ข‹์€ ๋ฐฉ๋ฒ•์ด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
</Tip>
textual-inversion ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์„ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์œ„ํ•ด์„œ๋Š”, ๋จผ์ € ํ•ด๋‹น ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ํ•™์Šตํ•  ๋•Œ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/docs/diffusers/training/runwayml/stable-diffusion-v1-5) ๋ชจ๋ธ์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ  ๋ถˆ๋Ÿฌ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค.
```python
from diffusers import StableDiffusionPipeline
import torch
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
```
๋‹ค์Œ์œผ๋กœ `TextualInversionLoaderMixin.load_textual_inversion` ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด, textual-inversion ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๋Š” ์ด์ „์˜ `<cat-toy>` ์˜ˆ์ œ์˜ ์ž„๋ฒ ๋”ฉ์„ ๋ถˆ๋Ÿฌ์˜ฌ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
```python
pipe.load_textual_inversion("sd-concepts-library/cat-toy")
```
์ด์ œ ํ”Œ๋ ˆ์ด์Šคํ™€๋” ํ† ํฐ(`<cat-toy>`)์ด ์ž˜ ๋™์ž‘ํ•˜๋Š”์ง€๋ฅผ ํ™•์ธํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
prompt = "A <cat-toy> backpack"
image = pipe(prompt, num_inference_steps=50).images[0]
image.save("cat-backpack.png")
```
`TextualInversionLoaderMixin.load_textual_inversion`์€ Diffusers ํ˜•์‹์œผ๋กœ ์ €์žฅ๋œ ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, [Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui) ํ˜•์‹์œผ๋กœ ์ €์žฅ๋œ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋„ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ ค๋ฉด, ๋จผ์ € [civitAI](https://civitai.com/models/3036?modelVersionId=8387)์—์„œ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•œ ๋‹ค์Œ ๋กœ์ปฌ์—์„œ ๋ถˆ๋Ÿฌ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```python
pipe.load_textual_inversion("./charturnerv2.pt")
```
</pt>
<jax>
ํ˜„์žฌ Flax์— ๋Œ€ํ•œ `load_textual_inversion` ํ•จ์ˆ˜๋Š” ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ•™์Šต ํ›„ textual-inversion ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๊ฐ€ ๋ชจ๋ธ์˜ ์ผ๋ถ€๋กœ์„œ ์ €์žฅ๋˜์—ˆ๋Š”์ง€๋ฅผ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ์€ ๋‹ค๋ฅธ Flax ๋ชจ๋ธ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
import jax
import numpy as np
from flax.jax_utils import replicate
from flax.training.common_utils import shard
from diffusers import FlaxStableDiffusionPipeline
model_path = "path-to-your-trained-model"
pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16)
prompt = "A <cat-toy> backpack"
prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 50
num_samples = jax.device_count()
prompt = num_samples * [prompt]
prompt_ids = pipeline.prepare_inputs(prompt)
# shard inputs and rng
params = replicate(params)
prng_seed = jax.random.split(prng_seed, jax.device_count())
prompt_ids = shard(prompt_ids)
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
image.save("cat-backpack.png")
```
</jax>
</frameworkcontent>
## ์ž‘๋™ ๋ฐฉ์‹
![Diagram from the paper showing overview](https://textual-inversion.github.io/static/images/training/training.JPG)
<small>Architecture overview from the Textual Inversion <a href="https://textual-inversion.github.io/">blog post.</a></small>
์ผ๋ฐ˜์ ์œผ๋กœ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋Š” ๋ชจ๋ธ์— ์ „๋‹ฌ๋˜๊ธฐ ์ „์— ์ž„๋ฒ ๋”ฉ์œผ๋กœ ํ† ํฐํ™”๋ฉ๋‹ˆ๋‹ค. textual-inversion์€ ๋น„์Šทํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜์ง€๋งŒ, ์œ„ ๋‹ค์ด์–ด๊ทธ๋žจ์˜ ํŠน์ˆ˜ ํ† ํฐ `S*`๋กœ๋ถ€ํ„ฐ ์ƒˆ๋กœ์šด ํ† ํฐ ์ž„๋ฒ ๋”ฉ `v*`๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์•„์›ƒํ’‹์€ ๋””ํ“จ์ „ ๋ชจ๋ธ์„ ์กฐ์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋ฉฐ, ๋””ํ“จ์ „ ๋ชจ๋ธ์ด ๋‹จ ๋ช‡ ๊ฐœ์˜ ์˜ˆ์ œ ์ด๋ฏธ์ง€์—์„œ ์‹ ์†ํ•˜๊ณ  ์ƒˆ๋กœ์šด ์ฝ˜์…‰ํŠธ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.
์ด๋ฅผ ์œ„ํ•ด textual-inversion์€ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ๋ชจ๋ธ๊ณผ ํ•™์Šต์šฉ ์ด๋ฏธ์ง€์˜ ๋…ธ์ด์ฆˆ ๋ฒ„์ „์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ œ๋„ˆ๋ ˆ์ดํ„ฐ๋Š” ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ๋ฒ„์ „์˜ ์ด๋ฏธ์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋ ค๊ณ  ์‹œ๋„ํ•˜๋ฉฐ ํ† ํฐ ์ž„๋ฒ ๋”ฉ `v*`์€ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ์˜ ์„ฑ๋Šฅ์— ๋”ฐ๋ผ ์ตœ์ ํ™”๋ฉ๋‹ˆ๋‹ค. ํ† ํฐ ์ž„๋ฒ ๋”ฉ์ด ์ƒˆ๋กœ์šด ์ฝ˜์…‰ํŠธ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ํฌ์ฐฉํ•˜๋ฉด ๋””ํ“จ์ „ ๋ชจ๋ธ์— ๋” ์œ ์šฉํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๊ณ  ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ๋” ์„ ๋ช…ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ตœ์ ํ™” ํ”„๋กœ์„ธ์Šค๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹ค์–‘ํ•œ ํ”„๋กฌํ”„ํŠธ์™€ ์ด๋ฏธ์ง€์— ์ˆ˜์ฒœ ๋ฒˆ์— ๋…ธ์ถœ๋จ์œผ๋กœ์จ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

View File

@@ -0,0 +1,144 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ
unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ์€ text-to-image ๋˜๋Š” image-to-image ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ ํ…์ŠคํŠธ๋‚˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ์กฐ๊ฑด์ด ์—†์ด ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์™€ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๋งŒ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
<iframe
src="https://stevhliu-ddpm-butterflies-128.hf.space"
frameborder="0"
width="850"
height="550"
></iframe>
์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” ๊ธฐ์กด์— ์กด์žฌํ•˜๋˜ ๋ฐ์ดํ„ฐ์…‹๊ณผ ์ž์‹ ๋งŒ์˜ ์ปค์Šคํ…€ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด unconditional image generation ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์„ธ๋ถ€ ์‚ฌํ•ญ์— ๋Œ€ํ•ด ๋” ์ž์„ธํžˆ ์•Œ๊ณ  ์‹ถ๋‹ค๋ฉด unconditional image generation์„ ์œ„ํ•œ ๋ชจ๋“  ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ [์—ฌ๊ธฐ](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์ „, ๋จผ์ € ์˜์กด์„ฑ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์„ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```bash
pip install diffusers[training] accelerate datasets
```
๊ทธ ๋‹ค์Œ ๐Ÿค— [Accelerate](https://github.com/huggingface/accelerate/) ํ™˜๊ฒฝ์„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
```bash
accelerate config
```
๋ณ„๋„์˜ ์„ค์ • ์—†์ด ๊ธฐ๋ณธ ์„ค์ •์œผ๋กœ ๐Ÿค— [Accelerate](https://github.com/huggingface/accelerate/) ํ™˜๊ฒฝ์„ ์ดˆ๊ธฐํ™”ํ•ด๋ด…์‹œ๋‹ค.
```bash
accelerate config default
```
๋…ธํŠธ๋ถ๊ณผ ๊ฐ™์€ ๋Œ€ํ™”ํ˜• ์‰˜์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ํ™˜๊ฒฝ์˜ ๊ฒฝ์šฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉํ•ด๋ณผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
```py
from accelerate.utils import write_basic_config
write_basic_config()
```
## ๋ชจ๋ธ์„ ํ—ˆ๋ธŒ์— ์—…๋กœ๋“œํ•˜๊ธฐ
ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ž๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํ—ˆ๋ธŒ์— ๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```bash
--push_to_hub
```
## ์ฒดํฌํฌ์ธํŠธ ์ €์žฅํ•˜๊ณ  ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
ํ›ˆ๋ จ ์ค‘ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ๋ฅผ ๋Œ€๋น„ํ•˜์—ฌ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ •๊ธฐ์ ์œผ๋กœ ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•˜๋ ค๋ฉด ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ž๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค:
```bash
--checkpointing_steps=500
```
์ „์ฒด ํ›ˆ๋ จ ์ƒํƒœ๋Š” 500์Šคํ…๋งˆ๋‹ค `output_dir`์˜ ํ•˜์œ„ ํด๋”์— ์ €์žฅ๋˜๋ฉฐ, ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— `--resume_from_checkpoint` ์ธ์ž๋ฅผ ์ „๋‹ฌํ•จ์œผ๋กœ์จ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ํ›ˆ๋ จ์„ ์žฌ๊ฐœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```bash
--resume_from_checkpoint="checkpoint-1500"
```
## ํŒŒ์ธํŠœ๋‹
์ด์ œ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹œ์ž‘ํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! `--dataset_name` ์ธ์ž์— ํŒŒ์ธํŠœ๋‹ํ•  ๋ฐ์ดํ„ฐ์…‹ ์ด๋ฆ„์„ ์ง€์ •ํ•œ ๋‹ค์Œ, `--output_dir` ์ธ์ž์— ์ง€์ •๋œ ๊ฒฝ๋กœ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๋ณธ์ธ๋งŒ์˜ ๋ฐ์ดํ„ฐ์…‹๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด, [ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ์…‹ ๋งŒ๋“ค๊ธฐ](create_dataset) ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” `diffusion_pytorch_model.bin` ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ , ๊ทธ๊ฒƒ์„ ๋‹น์‹ ์˜ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
<Tip>
๐Ÿ’ก ์ „์ฒด ํ•™์Šต์€ V100 GPU 4๊ฐœ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ, 2์‹œ๊ฐ„์ด ์†Œ์š”๋ฉ๋‹ˆ๋‹ค.
</Tip>
์˜ˆ๋ฅผ ๋“ค์–ด, [Oxford Flowers](https://huggingface.co/datasets/huggan/flowers-102-categories) ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ด ํŒŒ์ธํŠœ๋‹ํ•  ๊ฒฝ์šฐ:
```bash
accelerate launch train_unconditional.py \
--dataset_name="huggan/flowers-102-categories" \
--resolution=64 \
--output_dir="ddpm-ema-flowers-64" \
--train_batch_size=16 \
--num_epochs=100 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-4 \
--lr_warmup_steps=500 \
--mixed_precision=no \
--push_to_hub
```
<div class="flex justify-center">
<img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png"/>
</div>
[Pokemon](https://huggingface.co/datasets/huggan/pokemon) ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ:
```bash
accelerate launch train_unconditional.py \
--dataset_name="huggan/pokemon" \
--resolution=64 \
--output_dir="ddpm-ema-pokemon-64" \
--train_batch_size=16 \
--num_epochs=100 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-4 \
--lr_warmup_steps=500 \
--mixed_precision=no \
--push_to_hub
```
<div class="flex justify-center">
<img src="https://user-images.githubusercontent.com/26864830/180248200-928953b4-db38-48db-b0c6-8b740fe6786f.png"/>
</div>
### ์—ฌ๋Ÿฌ๊ฐœ์˜ GPU๋กœ ํ›ˆ๋ จํ•˜๊ธฐ
`accelerate`์„ ์‚ฌ์šฉํ•˜๋ฉด ์›ํ™œํ•œ ๋‹ค์ค‘ GPU ํ›ˆ๋ จ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. `accelerate`์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„์‚ฐ ํ›ˆ๋ จ์„ ์‹คํ–‰ํ•˜๋ ค๋ฉด [์—ฌ๊ธฐ](https://huggingface.co/docs/accelerate/basic_tutorials/launch) ์ง€์นจ์„ ๋”ฐ๋ฅด์„ธ์š”. ๋‹ค์Œ์€ ๋ช…๋ น์–ด ์˜ˆ์ œ์ž…๋‹ˆ๋‹ค.
```bash
accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
--dataset_name="huggan/pokemon" \
--resolution=64 --center_crop --random_flip \
--output_dir="ddpm-ema-pokemon-64" \
--train_batch_size=16 \
--num_epochs=100 \
--gradient_accumulation_steps=1 \
--use_ema \
--learning_rate=1e-4 \
--lr_warmup_steps=500 \
--mixed_precision="fp16" \
--logger="wandb" \
--push_to_hub
```

View File

@@ -0,0 +1,405 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
[[open-in-colab]]
# Diffusion ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ
Unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ์€ ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹๊ณผ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” diffusion ๋ชจ๋ธ์—์„œ ์ธ๊ธฐ ์žˆ๋Š” ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๋Š” ํŠน์ • ๋ฐ์ดํ„ฐ์…‹์— ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด [ํ—ˆ๋ธŒ](https://huggingface.co/search/full-text?q=unconditional-image-generation&type=model)์—์„œ ์ด๋Ÿฌํ•œ ๋งŽ์€ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์ง€๋งŒ, ๋งŒ์•ฝ ๋งˆ์Œ์— ๋“œ๋Š” ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ฐพ์ง€ ๋ชปํ–ˆ๋‹ค๋ฉด, ์–ธ์ œ๋“ ์ง€ ์Šค์Šค๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!
์ด ํŠœํ† ๋ฆฌ์–ผ์€ ๋‚˜๋งŒ์˜ ๐Ÿฆ‹ ๋‚˜๋น„ ๐Ÿฆ‹๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด [Smithsonian Butterflies](https://huggingface.co/datasets/huggan/smithsonian_butterflies_subset) ๋ฐ์ดํ„ฐ์…‹์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์—์„œ [`UNet2DModel`] ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ฐ€๋ฅด์ณ์ค„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
<Tip>
๐Ÿ’ก ์ด ํ•™์Šต ํŠœํ† ๋ฆฌ์–ผ์€ [Training with ๐Ÿงจ Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) ๋…ธํŠธ๋ถ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. Diffusion ๋ชจ๋ธ์˜ ์ž‘๋™ ๋ฐฉ์‹ ๋ฐ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋…ธํŠธ๋ถ์„ ํ™•์ธํ•˜์„ธ์š”!
</Tip>
์‹œ์ž‘ ์ „์—, ๐Ÿค— Datasets์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ์…‹์ด ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ๋‹ค์ˆ˜ GPU์—์„œ ํ•™์Šต์„ ๊ฐ„์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๐Ÿค— Accelerate ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”. ๊ทธ ํ›„ ํ•™์Šต ๋ฉ”ํŠธ๋ฆญ์„ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด [TensorBoard](https://www.tensorflow.org/tensorboard)๋ฅผ ๋˜ํ•œ ์„ค์น˜ํ•˜์„ธ์š”. (๋˜ํ•œ ํ•™์Šต ์ถ”์ ์„ ์œ„ํ•ด [Weights & Biases](https://docs.wandb.ai/)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.)
```bash
!pip install diffusers[training]
```
์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋ชจ๋ธ์„ ๊ณต์œ ํ•  ๊ฒƒ์„ ๊ถŒ์žฅํ•˜๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด์„œ Hugging Face ๊ณ„์ •์— ๋กœ๊ทธ์ธ์„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. (๊ณ„์ •์ด ์—†๋‹ค๋ฉด [์—ฌ๊ธฐ](https://hf.co/join)์—์„œ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.) ๋…ธํŠธ๋ถ์—์„œ ๋กœ๊ทธ์ธํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋ฉ”์‹œ์ง€๊ฐ€ ํ‘œ์‹œ๋˜๋ฉด ํ† ํฐ์„ ์ž…๋ ฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```py
>>> from huggingface_hub import notebook_login
>>> notebook_login()
```
๋˜๋Š” ํ„ฐ๋ฏธ๋„๋กœ ๋กœ๊ทธ์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```bash
huggingface-cli login
```
๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์ƒ๋‹นํžˆ ํฌ๊ธฐ ๋•Œ๋ฌธ์— [Git-LFS](https://git-lfs.com/)์—์„œ ๋Œ€์šฉ๋Ÿ‰ ํŒŒ์ผ์˜ ๋ฒ„์ „ ๊ด€๋ฆฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```bash
!sudo apt -qq install git-lfs
!git config --global credential.helper store
```
## ํ•™์Šต ๊ตฌ์„ฑ
ํŽธ์˜๋ฅผ ์œ„ํ•ด ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ํฌํ•จํ•œ `TrainingConfig` ํด๋ž˜์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค (์ž์œ ๋กญ๊ฒŒ ์กฐ์ • ๊ฐ€๋Šฅ):
```py
>>> from dataclasses import dataclass
>>> @dataclass
... class TrainingConfig:
... image_size = 128 # ์ƒ์„ฑ๋˜๋Š” ์ด๋ฏธ์ง€ ํ•ด์ƒ๋„
... train_batch_size = 16
... eval_batch_size = 16 # ํ‰๊ฐ€ ๋™์•ˆ์— ์ƒ˜ํ”Œ๋งํ•  ์ด๋ฏธ์ง€ ์ˆ˜
... num_epochs = 50
... gradient_accumulation_steps = 1
... learning_rate = 1e-4
... lr_warmup_steps = 500
... save_image_epochs = 10
... save_model_epochs = 30
... mixed_precision = "fp16" # `no`๋Š” float32, ์ž๋™ ํ˜ผํ•ฉ ์ •๋ฐ€๋„๋ฅผ ์œ„ํ•œ `fp16`
... output_dir = "ddpm-butterflies-128" # ๋กœ์ปฌ ๋ฐ HF Hub์— ์ €์žฅ๋˜๋Š” ๋ชจ๋ธ๋ช…
... push_to_hub = True # ์ €์žฅ๋œ ๋ชจ๋ธ์„ HF Hub์— ์—…๋กœ๋“œํ• ์ง€ ์—ฌ๋ถ€
... hub_private_repo = False
... overwrite_output_dir = True # ๋…ธํŠธ๋ถ์„ ๋‹ค์‹œ ์‹คํ–‰ํ•  ๋•Œ ์ด์ „ ๋ชจ๋ธ์— ๋ฎ์–ด์”Œ์šธ์ง€
... seed = 0
>>> config = TrainingConfig()
```
## ๋ฐ์ดํ„ฐ์…‹ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
๐Ÿค— Datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ [Smithsonian Butterflies](https://huggingface.co/datasets/huggan/smithsonian_butterflies_subset) ๋ฐ์ดํ„ฐ์…‹์„ ์‰ฝ๊ฒŒ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```py
>>> from datasets import load_dataset
>>> config.dataset_name = "huggan/smithsonian_butterflies_subset"
>>> dataset = load_dataset(config.dataset_name, split="train")
```
๐Ÿ’ก[HugGan Community Event](https://huggingface.co/huggan) ์—์„œ ์ถ”๊ฐ€์˜ ๋ฐ์ดํ„ฐ์…‹์„ ์ฐพ๊ฑฐ๋‚˜ ๋กœ์ปฌ์˜ [`ImageFolder`](https://huggingface.co/docs/datasets/image_dataset#imagefolder)๋ฅผ ๋งŒ๋“ฆ์œผ๋กœ์จ ๋‚˜๋งŒ์˜ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. HugGan Community Event ์— ๊ฐ€์ ธ์˜จ ๋ฐ์ดํ„ฐ์…‹์˜ ๊ฒฝ์šฐ ๋ ˆํฌ์ง€ํ† ๋ฆฌ์˜ id๋กœ `config.dataset_name` ์„ ์„ค์ •ํ•˜๊ณ , ๋‚˜๋งŒ์˜ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ `imagefolder` ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿค— Datasets์€ [`~datasets.Image`] ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•ด ์ž๋™์œผ๋กœ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋””์ฝ”๋”ฉํ•˜๊ณ  [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html)๋กœ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. ์ด๋ฅผ ์‹œ๊ฐํ™” ํ•ด๋ณด๋ฉด:
```py
>>> import matplotlib.pyplot as plt
>>> fig, axs = plt.subplots(1, 4, figsize=(16, 4))
>>> for i, image in enumerate(dataset[:4]["image"]):
... axs[i].imshow(image)
... axs[i].set_axis_off()
>>> fig.show()
```
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/butterflies_ds.png)
์ด๋ฏธ์ง€๋Š” ๋ชจ๋‘ ๋‹ค๋ฅธ ์‚ฌ์ด์ฆˆ์ด๊ธฐ ๋•Œ๋ฌธ์—, ์šฐ์„  ์ „์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค:
- `Resize` ๋Š” `config.image_size` ์— ์ •์˜๋œ ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.
- `RandomHorizontalFlip` ์€ ๋žœ๋ค์ ์œผ๋กœ ์ด๋ฏธ์ง€๋ฅผ ๋ฏธ๋Ÿฌ๋งํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹์„ ๋ณด๊ฐ•ํ•ฉ๋‹ˆ๋‹ค.
- `Normalize` ๋Š” ๋ชจ๋ธ์ด ์˜ˆ์ƒํ•˜๋Š” [-1, 1] ๋ฒ”์œ„๋กœ ํ”ฝ์…€ ๊ฐ’์„ ์žฌ์กฐ์ • ํ•˜๋Š”๋ฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
```py
>>> from torchvision import transforms
>>> preprocess = transforms.Compose(
... [
... transforms.Resize((config.image_size, config.image_size)),
... transforms.RandomHorizontalFlip(),
... transforms.ToTensor(),
... transforms.Normalize([0.5], [0.5]),
... ]
... )
```
ํ•™์Šต ๋„์ค‘์— `preprocess` ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋ ค๋ฉด ๐Ÿค— Datasets์˜ [`~datasets.Dataset.set_transform`] ๋ฐฉ๋ฒ•์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
```py
>>> def transform(examples):
... images = [preprocess(image.convert("RGB")) for image in examples["image"]]
... return {"images": images}
>>> dataset.set_transform(transform)
```
์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๊ฐ€ ์กฐ์ •๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€๋ฅผ ๋‹ค์‹œ ์‹œ๊ฐํ™”ํ•ด๋ณด์„ธ์š”. ์ด์ œ [DataLoader](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader)์— ๋ฐ์ดํ„ฐ์…‹์„ ํฌํ•จํ•ด ํ•™์Šตํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค!
```py
>>> import torch
>>> train_dataloader = torch.utils.data.DataLoader(dataset, batch_size=config.train_batch_size, shuffle=True)
```
## UNet2DModel ์ƒ์„ฑํ•˜๊ธฐ
๐Ÿงจ Diffusers์— ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ๋“ค์€ ๋ชจ๋ธ ํด๋ž˜์Šค์—์„œ ์›ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์‰ฝ๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, [`UNet2DModel`]๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด:
```py
>>> from diffusers import UNet2DModel
>>> model = UNet2DModel(
... sample_size=config.image_size, # ํƒ€๊ฒŸ ์ด๋ฏธ์ง€ ํ•ด์ƒ๋„
... in_channels=3, # ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜, RGB ์ด๋ฏธ์ง€์—์„œ 3
... out_channels=3, # ์ถœ๋ ฅ ์ฑ„๋„ ์ˆ˜
... layers_per_block=2, # UNet ๋ธ”๋Ÿญ๋‹น ๋ช‡ ๊ฐœ์˜ ResNet ๋ ˆ์ด์–ด๊ฐ€ ์‚ฌ์šฉ๋˜๋Š”์ง€
... block_out_channels=(128, 128, 256, 256, 512, 512), # ๊ฐ UNet ๋ธ”๋Ÿญ์„ ์œ„ํ•œ ์ถœ๋ ฅ ์ฑ„๋„ ์ˆ˜
... down_block_types=(
... "DownBlock2D", # ์ผ๋ฐ˜์ ์ธ ResNet ๋‹ค์šด์ƒ˜ํ”Œ๋ง ๋ธ”๋Ÿญ
... "DownBlock2D",
... "DownBlock2D",
... "DownBlock2D",
... "AttnDownBlock2D", # spatial self-attention์ด ํฌํ•จ๋œ ์ผ๋ฐ˜์ ์ธ ResNet ๋‹ค์šด์ƒ˜ํ”Œ๋ง ๋ธ”๋Ÿญ
... "DownBlock2D",
... ),
... up_block_types=(
... "UpBlock2D", # ์ผ๋ฐ˜์ ์ธ ResNet ์—…์ƒ˜ํ”Œ๋ง ๋ธ”๋Ÿญ
... "AttnUpBlock2D", # spatial self-attention์ด ํฌํ•จ๋œ ์ผ๋ฐ˜์ ์ธ ResNet ์—…์ƒ˜ํ”Œ๋ง ๋ธ”๋Ÿญ
... "UpBlock2D",
... "UpBlock2D",
... "UpBlock2D",
... "UpBlock2D",
... ),
... )
```
์ƒ˜ํ”Œ์˜ ์ด๋ฏธ์ง€ ํฌ๊ธฐ์™€ ๋ชจ๋ธ ์ถœ๋ ฅ ํฌ๊ธฐ๊ฐ€ ๋งž๋Š”์ง€ ๋น ๋ฅด๊ฒŒ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ์ข‹์€ ์•„์ด๋””์–ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
>>> sample_image = dataset[0]["images"].unsqueeze(0)
>>> print("Input shape:", sample_image.shape)
Input shape: torch.Size([1, 3, 128, 128])
>>> print("Output shape:", model(sample_image, timestep=0).sample.shape)
Output shape: torch.Size([1, 3, 128, 128])
```
ํ›Œ๋ฅญํ•ด์š”! ๋‹ค์Œ, ์ด๋ฏธ์ง€์— ์•ฝ๊ฐ„์˜ ๋…ธ์ด์ฆˆ๋ฅผ ๋”ํ•˜๊ธฐ ์œ„ํ•ด ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
## ์Šค์ผ€์ค„๋Ÿฌ ์ƒ์„ฑํ•˜๊ธฐ
์Šค์ผ€์ค„๋Ÿฌ๋Š” ๋ชจ๋ธ์„ ํ•™์Šต ๋˜๋Š” ์ถ”๋ก ์— ์‚ฌ์šฉํ•˜๋Š”์ง€์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ์ถ”๋ก ์‹œ์—, ์Šค์ผ€์ค„๋Ÿฌ๋Š” ๋…ธ์ด์ฆˆ๋กœ๋ถ€ํ„ฐ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต์‹œ ์Šค์ผ€์ค„๋Ÿฌ๋Š” diffusion ๊ณผ์ •์—์„œ์˜ ํŠน์ • ํฌ์ธํŠธ๋กœ๋ถ€ํ„ฐ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ ๋˜๋Š” ์ƒ˜ํ”Œ์„ ๊ฐ€์ ธ์™€ *๋…ธ์ด์ฆˆ ์Šค์ผ€์ค„* ๊ณผ *์—…๋ฐ์ดํŠธ ๊ทœ์น™*์— ๋”ฐ๋ผ ์ด๋ฏธ์ง€์— ๋…ธ์ด์ฆˆ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
`DDPMScheduler`๋ฅผ ๋ณด๋ฉด ์ด์ „์œผ๋กœ๋ถ€ํ„ฐ `sample_image`์— ๋žœ๋คํ•œ ๋…ธ์ด์ฆˆ๋ฅผ ๋”ํ•˜๋Š” `add_noise` ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:
```py
>>> import torch
>>> from PIL import Image
>>> from diffusers import DDPMScheduler
>>> noise_scheduler = DDPMScheduler(num_train_timesteps=1000)
>>> noise = torch.randn(sample_image.shape)
>>> timesteps = torch.LongTensor([50])
>>> noisy_image = noise_scheduler.add_noise(sample_image, noise, timesteps)
>>> Image.fromarray(((noisy_image.permute(0, 2, 3, 1) + 1.0) * 127.5).type(torch.uint8).numpy()[0])
```
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/noisy_butterfly.png)
๋ชจ๋ธ์˜ ํ•™์Šต ๋ชฉ์ ์€ ์ด๋ฏธ์ง€์— ๋”ํ•ด์ง„ ๋…ธ์ด์ฆˆ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ ์†์‹ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
>>> import torch.nn.functional as F
>>> noise_pred = model(noisy_image, timesteps).sample
>>> loss = F.mse_loss(noise_pred, noise)
```
## ๋ชจ๋ธ ํ•™์Šตํ•˜๊ธฐ
์ง€๊ธˆ๊นŒ์ง€, ๋ชจ๋ธ ํ•™์Šต์„ ์‹œ์ž‘ํ•˜๊ธฐ ์œ„ํ•ด ๋งŽ์€ ๋ถ€๋ถ„์„ ๊ฐ–์ถ”์—ˆ์œผ๋ฉฐ ์ด์ œ ๋‚จ์€ ๊ฒƒ์€ ๋ชจ๋“  ๊ฒƒ์„ ์กฐํ•ฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์šฐ์„  ์˜ตํ‹ฐ๋งˆ์ด์ €(optimizer)์™€ ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„๋Ÿฌ(learning rate scheduler)๊ฐ€ ํ•„์š”ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค:
```py
>>> from diffusers.optimization import get_cosine_schedule_with_warmup
>>> optimizer = torch.optim.AdamW(model.parameters(), lr=config.learning_rate)
>>> lr_scheduler = get_cosine_schedule_with_warmup(
... optimizer=optimizer,
... num_warmup_steps=config.lr_warmup_steps,
... num_training_steps=(len(train_dataloader) * config.num_epochs),
... )
```
๊ทธ ํ›„, ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด, `DDPMPipeline`์„ ์‚ฌ์šฉํ•ด ๋ฐฐ์น˜์˜ ์ด๋ฏธ์ง€ ์ƒ˜ํ”Œ๋“ค์„ ์ƒ์„ฑํ•˜๊ณ  ๊ทธ๋ฆฌ๋“œ ํ˜•ํƒœ๋กœ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
>>> from diffusers import DDPMPipeline
>>> import math
>>> import os
>>> def make_grid(images, rows, cols):
... w, h = images[0].size
... grid = Image.new("RGB", size=(cols * w, rows * h))
... for i, image in enumerate(images):
... grid.paste(image, box=(i % cols * w, i // cols * h))
... return grid
>>> def evaluate(config, epoch, pipeline):
... # ๋žœ๋คํ•œ ๋…ธ์ด์ฆˆ๋กœ ๋ถ€ํ„ฐ ์ด๋ฏธ์ง€๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.(์ด๋Š” ์—ญ์ „ํŒŒ diffusion ๊ณผ์ •์ž…๋‹ˆ๋‹ค.)
... # ๊ธฐ๋ณธ ํŒŒ์ดํ”„๋ผ์ธ ์ถœ๋ ฅ ํ˜•ํƒœ๋Š” `List[PIL.Image]` ์ž…๋‹ˆ๋‹ค.
... images = pipeline(
... batch_size=config.eval_batch_size,
... generator=torch.manual_seed(config.seed),
... ).images
... # ์ด๋ฏธ์ง€๋“ค์„ ๊ทธ๋ฆฌ๋“œ๋กœ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค.
... image_grid = make_grid(images, rows=4, cols=4)
... # ์ด๋ฏธ์ง€๋“ค์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
... test_dir = os.path.join(config.output_dir, "samples")
... os.makedirs(test_dir, exist_ok=True)
... image_grid.save(f"{test_dir}/{epoch:04d}.png")
```
TensorBoard์— ๋กœ๊น…, ๊ทธ๋ž˜๋””์–ธํŠธ ๋ˆ„์  ๋ฐ ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ•™์Šต์„ ์‰ฝ๊ฒŒ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๐Ÿค— Accelerate๋ฅผ ํ•™์Šต ๋ฃจํ”„์— ํ•จ๊ป˜ ์•ž์„œ ๋งํ•œ ๋ชจ๋“  ๊ตฌ์„ฑ ์ •๋ณด๋“ค์„ ๋ฌถ์–ด ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ—ˆ๋ธŒ์— ๋ชจ๋ธ์„ ์—…๋กœ๋“œ ํ•˜๊ธฐ ์œ„ํ•ด ๋ ˆํฌ์ง€ํ† ๋ฆฌ ์ด๋ฆ„ ๋ฐ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•œ ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•˜๊ณ  ํ—ˆ๋ธŒ์— ์—…๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ’ก์•„๋ž˜์˜ ํ•™์Šต ๋ฃจํ”„๋Š” ์–ด๋ ต๊ณ  ๊ธธ์–ด ๋ณด์ผ ์ˆ˜ ์žˆ์ง€๋งŒ, ๋‚˜์ค‘์— ํ•œ ์ค„์˜ ์ฝ”๋“œ๋กœ ํ•™์Šต์„ ํ•œ๋‹ค๋ฉด ๊ทธ๋งŒํ•œ ๊ฐ€์น˜๊ฐ€ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค! ๋งŒ์•ฝ ๊ธฐ๋‹ค๋ฆฌ์ง€ ๋ชปํ•˜๊ณ  ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด, ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์ž์œ ๋กญ๊ฒŒ ๋ถ™์—ฌ๋„ฃ๊ณ  ์ž‘๋™์‹œํ‚ค๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๐Ÿค—
```py
>>> from accelerate import Accelerator
>>> from huggingface_hub import HfFolder, Repository, whoami
>>> from tqdm.auto import tqdm
>>> from pathlib import Path
>>> import os
>>> def get_full_repo_name(model_id: str, organization: str = None, token: str = None):
... if token is None:
... token = HfFolder.get_token()
... if organization is None:
... username = whoami(token)["name"]
... return f"{username}/{model_id}"
... else:
... return f"{organization}/{model_id}"
>>> def train_loop(config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler):
... # accelerator์™€ tensorboard ๋กœ๊น… ์ดˆ๊ธฐํ™”
... accelerator = Accelerator(
... mixed_precision=config.mixed_precision,
... gradient_accumulation_steps=config.gradient_accumulation_steps,
... log_with="tensorboard",
... logging_dir=os.path.join(config.output_dir, "logs"),
... )
... if accelerator.is_main_process:
... if config.push_to_hub:
... repo_name = get_full_repo_name(Path(config.output_dir).name)
... repo = Repository(config.output_dir, clone_from=repo_name)
... elif config.output_dir is not None:
... os.makedirs(config.output_dir, exist_ok=True)
... accelerator.init_trackers("train_example")
... # ๋ชจ๋“  ๊ฒƒ์ด ์ค€๋น„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
... # ๊ธฐ์–ตํ•ด์•ผ ํ•  ํŠน์ •ํ•œ ์ˆœ์„œ๋Š” ์—†์œผ๋ฉฐ ์ค€๋น„ํ•œ ๋ฐฉ๋ฒ•์— ์ œ๊ณตํ•œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ˆœ์„œ๋กœ ๊ฐ์ฒด์˜ ์••์ถ•์„ ํ’€๋ฉด ๋ฉ๋‹ˆ๋‹ค.
... model, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
... model, optimizer, train_dataloader, lr_scheduler
... )
... global_step = 0
... # ์ด์ œ ๋ชจ๋ธ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
... for epoch in range(config.num_epochs):
... progress_bar = tqdm(total=len(train_dataloader), disable=not accelerator.is_local_main_process)
... progress_bar.set_description(f"Epoch {epoch}")
... for step, batch in enumerate(train_dataloader):
... clean_images = batch["images"]
... # ์ด๋ฏธ์ง€์— ๋”ํ•  ๋…ธ์ด์ฆˆ๋ฅผ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค.
... noise = torch.randn(clean_images.shape).to(clean_images.device)
... bs = clean_images.shape[0]
... # ๊ฐ ์ด๋ฏธ์ง€๋ฅผ ์œ„ํ•œ ๋žœ๋คํ•œ ํƒ€์ž„์Šคํ…(timestep)์„ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค.
... timesteps = torch.randint(
... 0, noise_scheduler.config.num_train_timesteps, (bs,), device=clean_images.device
... ).long()
... # ๊ฐ ํƒ€์ž„์Šคํ…์˜ ๋…ธ์ด์ฆˆ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๊นจ๋—ํ•œ ์ด๋ฏธ์ง€์— ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
... # (์ด๋Š” foward diffusion ๊ณผ์ •์ž…๋‹ˆ๋‹ค.)
... noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)
... with accelerator.accumulate(model):
... # ๋…ธ์ด์ฆˆ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
... noise_pred = model(noisy_images, timesteps, return_dict=False)[0]
... loss = F.mse_loss(noise_pred, noise)
... accelerator.backward(loss)
... accelerator.clip_grad_norm_(model.parameters(), 1.0)
... optimizer.step()
... lr_scheduler.step()
... optimizer.zero_grad()
... progress_bar.update(1)
... logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0], "step": global_step}
... progress_bar.set_postfix(**logs)
... accelerator.log(logs, step=global_step)
... global_step += 1
... # ๊ฐ ์—ํฌํฌ๊ฐ€ ๋๋‚œ ํ›„ evaluate()์™€ ๋ช‡ ๊ฐ€์ง€ ๋ฐ๋ชจ ์ด๋ฏธ์ง€๋ฅผ ์„ ํƒ์ ์œผ๋กœ ์ƒ˜ํ”Œ๋งํ•˜๊ณ  ๋ชจ๋ธ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
... if accelerator.is_main_process:
... pipeline = DDPMPipeline(unet=accelerator.unwrap_model(model), scheduler=noise_scheduler)
... if (epoch + 1) % config.save_image_epochs == 0 or epoch == config.num_epochs - 1:
... evaluate(config, epoch, pipeline)
... if (epoch + 1) % config.save_model_epochs == 0 or epoch == config.num_epochs - 1:
... if config.push_to_hub:
... repo.push_to_hub(commit_message=f"Epoch {epoch}", blocking=True)
... else:
... pipeline.save_pretrained(config.output_dir)
```
ํœด, ์ฝ”๋“œ๊ฐ€ ๊ฝค ๋งŽ์•˜๋„ค์š”! ํ•˜์ง€๋งŒ ๐Ÿค— Accelerate์˜ [`~accelerate.notebook_launcher`] ํ•จ์ˆ˜์™€ ํ•™์Šต์„ ์‹œ์ž‘ํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•จ์ˆ˜์— ํ•™์Šต ๋ฃจํ”„, ๋ชจ๋“  ํ•™์Šต ์ธ์ˆ˜, ํ•™์Šต์— ์‚ฌ์šฉํ•  ํ”„๋กœ์„ธ์Šค ์ˆ˜(์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ GPU์˜ ์ˆ˜๋ฅผ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์Œ)๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค:
```py
>>> from accelerate import notebook_launcher
>>> args = (config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler)
>>> notebook_launcher(train_loop, args, num_processes=1)
```
ํ•œ๋ฒˆ ํ•™์Šต์ด ์™„๋ฃŒ๋˜๋ฉด, diffusion ๋ชจ๋ธ๋กœ ์ƒ์„ฑ๋œ ์ตœ์ข… ๐Ÿฆ‹์ด๋ฏธ์ง€๐Ÿฆ‹๋ฅผ ํ™•์ธํ•ด๋ณด๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค!
```py
>>> import glob
>>> sample_images = sorted(glob.glob(f"{config.output_dir}/samples/*.png"))
>>> Image.open(sample_images[-1])
```
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/butterflies_final.png)
## ๋‹ค์Œ ๋‹จ๊ณ„
Unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ์€ ํ•™์Šต๋  ์ˆ˜ ์žˆ๋Š” ์ž‘์—… ์ค‘ ํ•˜๋‚˜์˜ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์ž‘์—…๊ณผ ํ•™์Šต ๋ฐฉ๋ฒ•์€ [๐Ÿงจ Diffusers ํ•™์Šต ์˜ˆ์‹œ](../training/overview) ํŽ˜์ด์ง€์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค:
- [Textual Inversion](../training/text_inversion), ํŠน์ • ์‹œ๊ฐ์  ๊ฐœ๋…์„ ํ•™์Šต์‹œ์ผœ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์— ํ†ตํ•ฉ์‹œํ‚ค๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.
- [DreamBooth](../training/dreambooth), ์ฃผ์ œ์— ๋Œ€ํ•œ ๋ช‡ ๊ฐ€์ง€ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋“ค์ด ์ฃผ์–ด์ง€๋ฉด ์ฃผ์ œ์— ๋Œ€ํ•œ ๊ฐœ์ธํ™”๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.
- [Guide](../training/text2image) ๋ฐ์ดํ„ฐ์…‹์— Stable Diffusion ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
- [Guide](../training/lora) LoRA๋ฅผ ์‚ฌ์šฉํ•ด ๋งค์šฐ ํฐ ๋ชจ๋ธ์„ ๋น ๋ฅด๊ฒŒ ํŒŒ์ธํŠœ๋‹ํ•˜๊ธฐ ์œ„ํ•œ ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ ์ธ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.

View File

@@ -0,0 +1,23 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Overview
๐Ÿงจย Diffusers์— ์˜ค์‹  ๊ฑธ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค! ์—ฌ๋Ÿฌ๋ถ„์ด diffusion ๋ชจ๋ธ๊ณผ ์ƒ์„ฑ AI๋ฅผ ์ฒ˜์Œ ์ ‘ํ•˜๊ณ , ๋” ๋งŽ์€ ๊ฑธ ๋ฐฐ์šฐ๊ณ  ์‹ถ์œผ์…จ๋‹ค๋ฉด ์ œ๋Œ€๋กœ ์ฐพ์•„์˜ค์…จ์Šต๋‹ˆ๋‹ค. ์ด ํŠœํ† ๋ฆฌ์–ผ์€ diffusion model์„ ์—ฌ๋Ÿฌ๋ถ„์—๊ฒŒ ์  ํ‹€ํ•˜๊ฒŒ ์†Œ๊ฐœํ•˜๊ณ , ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ๊ธฐ๋ณธ ์‚ฌํ•ญ(ํ•ต์‹ฌ ๊ตฌ์„ฑ์š”์†Œ์™€ ๐Ÿงจย Diffusers ์‚ฌ์šฉ๋ฒ•)์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
์—ฌ๋Ÿฌ๋ถ„์€ ์ด ํŠœํ† ๋ฆฌ์–ผ์„ ํ†ตํ•ด ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„  ์ถ”๋ก  ํŒŒ์ดํ”„๋ผ์ธ์„ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ modular toolbox์ฒ˜๋Ÿผ ์ด์šฉํ•ด์„œ ์—ฌ๋Ÿฌ๋ถ„๋งŒ์˜ diffusion system์„ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ๋„๋ก ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถ„ํ•ดํ•˜๋Š” ๋ฒ•์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ๋‹จ์›์—์„œ๋Š” ์—ฌ๋Ÿฌ๋ถ„์ด ์›ํ•˜๋Š” ๊ฒƒ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ž์‹ ๋งŒ์˜ diffusion model์„ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์šฐ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
ํŠœํ† ๋ฆฌ์–ผ์„ ์™„๋ฃŒํ•œ๋‹ค๋ฉด ์—ฌ๋Ÿฌ๋ถ„์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ง์ ‘ ํƒ์ƒ‰ํ•˜๊ณ , ์ž์‹ ์˜ ํ”„๋กœ์ ํŠธ์™€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ์šฉํ•  ์Šคํ‚ฌ๋“ค์„ ์Šต๋“ํ•  ์ˆ˜ ์žˆ์„ ๊ฒ๋‹ˆ๋‹ค.
[Discord](https://discord.com/invite/JfAtkvEtRb)๋‚˜ [ํฌ๋Ÿผ](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์ž์œ ๋กญ๊ฒŒ ์ฐธ์—ฌํ•ด์„œ ๋‹ค๋ฅธ ์‚ฌ์šฉ์ž์™€ ๊ฐœ๋ฐœ์ž๋“ค๊ณผ ๊ต๋ฅ˜ํ•˜๊ณ  ํ˜‘์—…ํ•ด ๋ณด์„ธ์š”!
์ž ์ง€๊ธˆ๋ถ€ํ„ฐ diffusing์„ ์‹œ์ž‘ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค! ๐Ÿงจ

View File

@@ -0,0 +1,275 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ
> **์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [์ด ์ด์Šˆ](https://github.com/huggingface/diffusers/issues/841)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
**์ปค๋ฎค๋‹ˆํ‹ฐ** ์˜ˆ์ œ๋Š” ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์ถ”๊ฐ€ํ•œ ์ถ”๋ก  ๋ฐ ํ›ˆ๋ จ ์˜ˆ์ œ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค์Œ ํ‘œ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ๋ชจ๋“  ์ปค๋ฎค๋‹ˆํ‹ฐ ์˜ˆ์ œ์— ๋Œ€ํ•œ ๊ฐœ์š”๋ฅผ ํ™•์ธํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. **์ฝ”๋“œ ์˜ˆ์ œ**๋ฅผ ํด๋ฆญํ•˜๋ฉด ๋ณต์‚ฌํ•˜์—ฌ ๋ถ™์—ฌ๋„ฃ๊ธฐํ•  ์ˆ˜ ์žˆ๋Š” ์ฝ”๋“œ ์˜ˆ์ œ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ปค๋ฎค๋‹ˆํ‹ฐ๊ฐ€ ์˜ˆ์ƒ๋Œ€๋กœ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ์ด์Šˆ๋ฅผ ๊ฐœ์„คํ•˜๊ณ  ์ž‘์„ฑ์ž์—๊ฒŒ ํ•‘์„ ๋ณด๋‚ด์ฃผ์„ธ์š”.
| ์˜ˆ | ์„ค๋ช… | ์ฝ”๋“œ ์˜ˆ์ œ | ์ฝœ๋žฉ |์ €์ž |
|:---------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------:|
| CLIP Guided Stable Diffusion | CLIP ๊ฐ€์ด๋“œ ๊ธฐ๋ฐ˜์˜ Stable Diffusion์œผ๋กœ ํ…์ŠคํŠธ์—์„œ ์ด๋ฏธ์ง€๋กœ ์ƒ์„ฑํ•˜๊ธฐ | [CLIP Guided Stable Diffusion](#clip-guided-stable-diffusion) | [![์ฝœ๋žฉ์—์„œ ์—ด๊ธฐ](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb) | [Suraj Patil](https://github.com/patil-suraj/) |
| One Step U-Net (Dummy) | ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ์˜ˆ์‹œ(์ฐธ๊ณ  https://github.com/huggingface/diffusers/issues/841) | [One Step U-Net](#one-step-unet) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Stable Diffusion Interpolation | ์„œ๋กœ ๋‹ค๋ฅธ ํ”„๋กฌํ”„ํŠธ/์‹œ๋“œ ๊ฐ„ Stable Diffusion์˜ latent space ๋ณด๊ฐ„ | [Stable Diffusion Interpolation](#stable-diffusion-interpolation) | - | [Nate Raw](https://github.com/nateraw/) |
| Stable Diffusion Mega | ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ๊ฐ–์ถ˜ **ํ•˜๋‚˜์˜** Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ [Text2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py), [Image2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py) and [Inpainting](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | [Stable Diffusion Mega](#stable-diffusion-mega) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Long Prompt Weighting Stable Diffusion | ํ† ํฐ ๊ธธ์ด ์ œํ•œ์ด ์—†๊ณ  ํ”„๋กฌํ”„ํŠธ์—์„œ ํŒŒ์‹ฑ ๊ฐ€์ค‘์น˜ ์ง€์›์„ ํ•˜๋Š” **ํ•˜๋‚˜์˜** Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ, | [Long Prompt Weighting Stable Diffusion](#long-prompt-weighting-stable-diffusion) |- | [SkyTNT](https://github.com/SkyTNT) |
| Speech to Image | ์ž๋™ ์Œ์„ฑ ์ธ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ๋ฅผ ์ž‘์„ฑํ•˜๊ณ  Stable Diffusion์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. | [Speech to Image](#speech-to-image) | - | [Mikail Duzenli](https://github.com/MikailINTech) |
์ปค์Šคํ…€ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ค๋ ค๋ฉด `diffusers/examples/community`์— ์žˆ๋Š” ํŒŒ์ผ ์ค‘ ํ•˜๋‚˜๋กœ์„œ `custom_pipeline` ์ธ์ˆ˜๋ฅผ `DiffusionPipeline`์— ์ „๋‹ฌํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ž์‹ ๋งŒ์˜ ํŒŒ์ดํ”„๋ผ์ธ์ด ์žˆ๋Š” PR์„ ๋ณด๋‚ด์ฃผ์‹œ๋ฉด ๋น ๋ฅด๊ฒŒ ๋ณ‘ํ•ฉํ•ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.
```py
pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", custom_pipeline="filename_in_the_community_folder"
)
```
## ์‚ฌ์šฉ ์˜ˆ์‹œ
### CLIP ๊ฐ€์ด๋“œ ๊ธฐ๋ฐ˜์˜ Stable Diffusion
๋ชจ๋“  ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋‹จ๊ณ„์—์„œ ์ถ”๊ฐ€ CLIP ๋ชจ๋ธ์„ ํ†ตํ•ด Stable Diffusion์„ ๊ฐ€์ด๋“œํ•จ์œผ๋กœ์จ CLIP ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ Stable Diffusion์€ ๋ณด๋‹ค ๋” ์‚ฌ์‹ค์ ์ธ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค์Œ ์ฝ”๋“œ๋Š” ์•ฝ 12GB์˜ GPU RAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
from transformers import CLIPImageProcessor, CLIPModel
import torch
feature_extractor = CLIPImageProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
clip_model = CLIPModel.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16)
guided_pipeline = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
custom_pipeline="clip_guided_stable_diffusion",
clip_model=clip_model,
feature_extractor=feature_extractor,
torch_dtype=torch.float16,
)
guided_pipeline.enable_attention_slicing()
guided_pipeline = guided_pipeline.to("cuda")
prompt = "fantasy book cover, full moon, fantasy forest landscape, golden vector elements, fantasy magic, dark light night, intricate, elegant, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by WLOP and Artgerm and Albert Bierstadt, masterpiece"
generator = torch.Generator(device="cuda").manual_seed(0)
images = []
for i in range(4):
image = guided_pipeline(
prompt,
num_inference_steps=50,
guidance_scale=7.5,
clip_guidance_scale=100,
num_cutouts=4,
use_cutouts=False,
generator=generator,
).images[0]
images.append(image)
# ์ด๋ฏธ์ง€ ๋กœ์ปฌ์— ์ €์žฅํ•˜๊ธฐ
for i, img in enumerate(images):
img.save(f"./clip_guided_sd/image_{i}.png")
```
์ด๋ฏธ์ง€` ๋ชฉ๋ก์—๋Š” ๋กœ์ปฌ์— ์ €์žฅํ•˜๊ฑฐ๋‚˜ ๊ตฌ๊ธ€ ์ฝœ๋žฉ์— ์ง์ ‘ ํ‘œ์‹œํ•  ์ˆ˜ ์žˆ๋Š” PIL ์ด๋ฏธ์ง€ ๋ชฉ๋ก์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์•ˆ์ •์ ์ธ ํ™•์‚ฐ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ’ˆ์งˆ์ด ๋†’์€ ๊ฒฝํ–ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์œ„์˜ ์Šคํฌ๋ฆฝํŠธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
![clip_guidance](https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/clip_guidance/merged_clip_guidance.jpg).
### One Step Unet
์˜ˆ์‹œ "one-step-unet"๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="one_step_unet")
pipe()
```
**์ฐธ๊ณ **: ์ด ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์€ ๊ธฐ๋Šฅ์œผ๋กœ ์œ ์šฉํ•˜์ง€ ์•Š์œผ๋ฉฐ ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์˜ ์˜ˆ์‹œ์ผ ๋ฟ์ž…๋‹ˆ๋‹ค(https://github.com/huggingface/diffusers/issues/841 ์ฐธ์กฐ).
### Stable Diffusion Interpolation
๋‹ค์Œ ์ฝ”๋“œ๋Š” ์ตœ์†Œ 8GB VRAM์˜ GPU์—์„œ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์•ฝ 5๋ถ„ ์ •๋„ ์†Œ์š”๋ฉ๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16,
safety_checker=None, # Very important for videos...lots of false positives while interpolating
custom_pipeline="interpolate_stable_diffusion",
).to("cuda")
pipe.enable_attention_slicing()
frame_filepaths = pipe.walk(
prompts=["a dog", "a cat", "a horse"],
seeds=[42, 1337, 1234],
num_interpolation_steps=16,
output_dir="./dreams",
batch_size=4,
height=512,
width=512,
guidance_scale=8.5,
num_inference_steps=50,
)
```
walk(...)` ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ์€ `output_dir`์— ์ •์˜๋œ ๋Œ€๋กœ ํด๋”์— ์ €์žฅ๋œ ์ด๋ฏธ์ง€ ๋ชฉ๋ก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์•ˆ์ •์ ์œผ๋กœ ํ™•์‚ฐ๋˜๋Š” ๋™์˜์ƒ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
> ์•ˆ์ •๋œ ํ™•์‚ฐ์„ ์ด์šฉํ•œ ๋™์˜์ƒ ์ œ์ž‘ ๋ฐฉ๋ฒ•๊ณผ ๋” ๋งŽ์€ ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ https://github.com/nateraw/stable-diffusion-videos ์—์„œ ํ™•์ธํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
### Stable Diffusion Mega
The Stable Diffusion Mega ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•˜๋ฉด Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ฃผ์š” ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ๋‹จ์ผ ํด๋ž˜์Šค์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
#!/usr/bin/env python3
from diffusers import DiffusionPipeline
import PIL
import requests
from io import BytesIO
import torch
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
custom_pipeline="stable_diffusion_mega",
torch_dtype=torch.float16,
)
pipe.to("cuda")
pipe.enable_attention_slicing()
### Text-to-Image
images = pipe.text2img("An astronaut riding a horse").images
### Image-to-Image
init_image = download_image(
"https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
)
prompt = "A fantasy landscape, trending on artstation"
images = pipe.img2img(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
### Inpainting
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
prompt = "a cat sitting on a bench"
images = pipe.inpaint(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.75).images
```
์œ„์— ํ‘œ์‹œ๋œ ๊ฒƒ์ฒ˜๋Ÿผ ํ•˜๋‚˜์˜ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ 'ํ…์ŠคํŠธ-์ด๋ฏธ์ง€ ๋ณ€ํ™˜', '์ด๋ฏธ์ง€-์ด๋ฏธ์ง€ ๋ณ€ํ™˜', '์ธํŽ˜์ธํŒ…'์„ ๋ชจ๋‘ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### Long Prompt Weighting Stable Diffusion
ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•˜๋ฉด 77๊ฐœ์˜ ํ† ํฐ ๊ธธ์ด ์ œํ•œ ์—†์ด ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž…๋ ฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ "()"๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ์–ด ๊ฐ€์ค‘์น˜๋ฅผ ๋†’์ด๊ฑฐ๋‚˜ "[]"๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ์–ด ๊ฐ€์ค‘์น˜๋ฅผ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋˜ํ•œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋‹จ์ผ ํด๋ž˜์Šค์—์„œ Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ฃผ์š” ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
#### pytorch
```python
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"hakurei/waifu-diffusion", custom_pipeline="lpw_stable_diffusion", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
prompt = "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes happy hood japanese_clothes kimono long_sleeves red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms"
neg_prompt = "lowres, bad_anatomy, error_body, error_hair, error_arm, error_hands, bad_hands, error_fingers, bad_fingers, missing_fingers, error_legs, bad_legs, multiple_legs, missing_legs, error_lighting, error_shadow, error_reflection, text, error, extra_digit, fewer_digits, cropped, worst_quality, low_quality, normal_quality, jpeg_artifacts, signature, watermark, username, blurry"
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0]
```
#### onnxruntime
```python
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
custom_pipeline="lpw_stable_diffusion_onnx",
revision="onnx",
provider="CUDAExecutionProvider",
)
prompt = "a photo of an astronaut riding a horse on mars, best quality"
neg_prompt = "lowres, bad anatomy, error body, error hair, error arm, error hands, bad hands, error fingers, bad fingers, missing fingers, error legs, bad legs, multiple legs, missing legs, error lighting, error shadow, error reflection, text, error, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0]
```
ํ† ํฐ ์ธ๋ฑ์Šค ์‹œํ€€์Šค ๊ธธ์ด๊ฐ€ ์ด ๋ชจ๋ธ์— ์ง€์ •๋œ ์ตœ๋Œ€ ์‹œํ€€์Šค ๊ธธ์ด๋ณด๋‹ค ๊ธธ๋ฉด(*** > 77). ์ด ์‹œํ€€์Šค๋ฅผ ๋ชจ๋ธ์—์„œ ์‹คํ–‰ํ•˜๋ฉด ์ธ๋ฑ์‹ฑ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค`. ์ •์ƒ์ ์ธ ํ˜„์ƒ์ด๋‹ˆ ๊ฑฑ์ •ํ•˜์ง€ ๋งˆ์„ธ์š”.
### Speech to Image
๋‹ค์Œ ์ฝ”๋“œ๋Š” ์‚ฌ์ „ํ•™์Šต๋œ OpenAI whisper-small๊ณผ Stable Diffusion์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋””์˜ค ์ƒ˜ํ”Œ์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```Python
import torch
import matplotlib.pyplot as plt
from datasets import load_dataset
from diffusers import DiffusionPipeline
from transformers import (
WhisperForConditionalGeneration,
WhisperProcessor,
)
device = "cuda" if torch.cuda.is_available() else "cpu"
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
audio_sample = ds[3]
text = audio_sample["text"].lower()
speech_data = audio_sample["audio"]["array"]
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small").to(device)
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
diffuser_pipeline = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
custom_pipeline="speech_to_image_diffusion",
speech_model=model,
speech_processor=processor,
torch_dtype=torch.float16,
)
diffuser_pipeline.enable_attention_slicing()
diffuser_pipeline = diffuser_pipeline.to(device)
output = diffuser_pipeline(speech_data)
plt.imshow(output.images[0])
```
์œ„ ์˜ˆ์‹œ๋Š” ๋‹ค์Œ์˜ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค.
![image](https://user-images.githubusercontent.com/45072645/196901736-77d9c6fc-63ee-4072-90b0-dc8b903d63e3.png)

View File

@@ -0,0 +1,56 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ์ปค์Šคํ…€ ํŒŒ์ดํ”„๋ผ์ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
[[open-in-colab]]
์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์€ ๋…ผ๋ฌธ์— ๋ช…์‹œ๋œ ์›๋ž˜์˜ ๊ตฌํ˜„์ฒด์™€ ๋‹ค๋ฅธ ํ˜•ํƒœ๋กœ ๊ตฌํ˜„๋œ ๋ชจ๋“  [`DiffusionPipeline`] ํด๋ž˜์Šค๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. (์˜ˆ๋ฅผ ๋“ค์–ด, [`StableDiffusionControlNetPipeline`]๋Š” ["Text-to-Image Generation with ControlNet Conditioning"](https://arxiv.org/abs/2302.05543) ํ•ด๋‹น) ์ด๋“ค์€ ์ถ”๊ฐ€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๊ฑฐ๋‚˜ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์›๋ž˜ ๊ตฌํ˜„์„ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
[Speech to Image](https://github.com/huggingface/diffusers/tree/main/examples/community#speech-to-image) ๋˜๋Š” [Composable Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#composable-stable-diffusion) ๊ณผ ๊ฐ™์€ ๋ฉ‹์ง„ ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์ด ๋งŽ์ด ์žˆ์œผ๋ฉฐ [์—ฌ๊ธฐ์—์„œ](https://github.com/huggingface/diffusers/tree/main/examples/community) ๋ชจ๋“  ๊ณต์‹ ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ—ˆ๋ธŒ์—์„œ ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ๋“œํ•˜๋ ค๋ฉด, ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์˜ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ ID์™€ (ํŒŒ์ดํ”„๋ผ์ธ ๊ฐ€์ค‘์น˜ ๋ฐ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๋กœ๋“œํ•˜๋ ค๋Š”) ๋ชจ๋ธ์˜ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ ID๋ฅผ ์ธ์ž๋กœ ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์•„๋ž˜ ์˜ˆ์‹œ์—์„œ๋Š” `hf-internal-testing/diffusers-dummy-pipeline`์—์„œ ๋”๋ฏธ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ , `google/ddpm-cifar10-32`์—์„œ ํŒŒ์ดํ”„๋ผ์ธ์˜ ๊ฐ€์ค‘์น˜์™€ ์ปดํฌ๋„ŒํŠธ๋“ค์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
<Tip warning={true}>
๐Ÿ”’ ํ—ˆ๊น… ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์—์„œ ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฒƒ์€ ๊ณง ํ•ด๋‹น ์ฝ”๋“œ๊ฐ€ ์•ˆ์ „ํ•˜๋‹ค๊ณ  ์‹ ๋ขฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฝ”๋“œ๋ฅผ ์ž๋™์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์‹คํ–‰ํ•˜๊ธฐ ์•ž์„œ ๋ฐ˜๋“œ์‹œ ์˜จ๋ผ์ธ์œผ๋กœ ํ•ด๋‹น ์ฝ”๋“œ์˜ ์‹ ๋ขฐ์„ฑ์„ ๊ฒ€์‚ฌํ•˜์„ธ์š”!
</Tip>
```py
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"google/ddpm-cifar10-32", custom_pipeline="hf-internal-testing/diffusers-dummy-pipeline"
)
```
๊ณต์‹ ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฒƒ์€ ๋น„์Šทํ•˜์ง€๋งŒ, ๊ณต์‹ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ ID์—์„œ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฒƒ๊ณผ ๋”๋ถˆ์–ด ํ•ด๋‹น ํŒŒ์ดํ”„๋ผ์ธ ๋‚ด์˜ ์ปดํฌ๋„ŒํŠธ๋ฅผ ์ง์ ‘ ์ง€์ •ํ•˜๋Š” ๊ฒƒ ์—ญ์‹œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ์˜ˆ์ œ๋ฅผ ๋ณด๋ฉด ์ปค๋ฎค๋‹ˆํ‹ฐ [CLIP Guided Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#clip-guided-stable-diffusion) ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ๋“œํ•  ๋•Œ, ํ•ด๋‹น ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์‚ฌ์šฉํ•  `clip_model` ์ปดํฌ๋„ŒํŠธ์™€ `feature_extractor` ์ปดํฌ๋„ŒํŠธ๋ฅผ ์ง์ ‘ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```py
from diffusers import DiffusionPipeline
from transformers import CLIPImageProcessor, CLIPModel
clip_model_id = "laion/CLIP-ViT-B-32-laion2B-s34B-b79K"
feature_extractor = CLIPImageProcessor.from_pretrained(clip_model_id)
clip_model = CLIPModel.from_pretrained(clip_model_id)
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
custom_pipeline="clip_guided_stable_diffusion",
clip_model=clip_model,
feature_extractor=feature_extractor,
)
```
์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ](https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/custom_pipeline_examples) ๊ฐ€์ด๋“œ๋ฅผ ์‚ดํŽด๋ณด์„ธ์š”. ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ ๋“ฑ๋ก์— ๊ด€์‹ฌ์ด ์žˆ๋Š” ๊ฒฝ์šฐ [์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์— ๊ธฐ์—ฌํ•˜๋Š” ๋ฐฉ๋ฒ•](https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/contribute_pipeline)์— ๋Œ€ํ•œ ๊ฐ€์ด๋“œ๋ฅผ ํ™•์ธํ•˜์„ธ์š” !

View File

@@ -0,0 +1,57 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Text-guided depth-to-image ์ƒ์„ฑ
[[open-in-colab]]
[`StableDiffusionDepth2ImgPipeline`]์„ ์‚ฌ์šฉํ•˜๋ฉด ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์™€ ์ดˆ๊ธฐ ์ด๋ฏธ์ง€๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ์ƒˆ ์ด๋ฏธ์ง€์˜ ์ƒ์„ฑ์„ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด๋ฏธ์ง€ ๊ตฌ์กฐ๋ฅผ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•ด `depth_map`์„ ์ „๋‹ฌํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. `depth_map`์ด ์ œ๊ณต๋˜์ง€ ์•Š์œผ๋ฉด ํŒŒ์ดํ”„๋ผ์ธ์€ ํ†ตํ•ฉ๋œ [depth-estimation model](https://github.com/isl-org/MiDaS)์„ ํ†ตํ•ด ์ž๋™์œผ๋กœ ๊นŠ์ด๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
๋จผ์ € [`StableDiffusionDepth2ImgPipeline`]์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
```python
import torch
import requests
from PIL import Image
from diffusers import StableDiffusionDepth2ImgPipeline
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16,
).to("cuda")
```
์ด์ œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ํŠน์ • ๋‹จ์–ด๊ฐ€ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ๊ฐ€์ด๋“œ ํ•˜๋Š”๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด `negative_prompt`๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = Image.open(requests.get(url, stream=True).raw)
prompt = "two tigers"
n_prompt = "bad, deformed, ugly, bad anatomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0]
image
```
| Input | Output |
|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/coco-cats.png" width="500"/> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/depth2img-tigers.png" width="500"/> |
์•„๋ž˜์˜ Spaces๋ฅผ ๊ฐ€์ง€๊ณ  ๋†€๋ฉฐ depth map์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ ์—†๋Š” ์ด๋ฏธ์ง€์˜ ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด ๋ณด์„ธ์š”!
<iframe
src="https://radames-stable-diffusion-depth2img.hf.space"
frameborder="0"
width="850"
height="500"
></iframe>

View File

@@ -0,0 +1,100 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ image-to-image ์ƒ์„ฑ
[[Colab์—์„œ ์—ด๊ธฐ]]
[`StableDiffusionImg2ImgPipeline`]์„ ์‚ฌ์šฉํ•˜๋ฉด ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์™€ ์‹œ์ž‘ ์ด๋ฏธ์ง€๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ์ƒˆ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์˜ ์กฐ๊ฑด์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:
```bash
!pip install diffusers transformers ftfy accelerate
```
[`nitrosocke/Ghibli-Diffusion`](https://huggingface.co/nitrosocke/Ghibli-Diffusion)๊ณผ ๊ฐ™์€ ์‚ฌ์ „ํ•™์Šต๋œ stable diffusion ๋ชจ๋ธ๋กœ [`StableDiffusionImg2ImgPipeline`]์„ ์ƒ์„ฑํ•˜์—ฌ ์‹œ์ž‘ํ•˜์„ธ์š”.
```python
import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionImg2ImgPipeline
device = "cuda"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("nitrosocke/Ghibli-Diffusion", torch_dtype=torch.float16).to(
device
)
```
์ดˆ๊ธฐ ์ด๋ฏธ์ง€๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์‚ฌ์ „ ์ฒ˜๋ฆฌํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image.thumbnail((768, 768))
init_image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_8_output_0.jpeg"/>
</div>
<Tip>
๐Ÿ’ก `strength`๋Š” ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ์ถ”๊ฐ€๋˜๋Š” ๋…ธ์ด์ฆˆ์˜ ์–‘์„ ์ œ์–ดํ•˜๋Š” 0.0์—์„œ 1.0 ์‚ฌ์ด์˜ ๊ฐ’์ž…๋‹ˆ๋‹ค. 1.0์— ๊ฐ€๊นŒ์šด ๊ฐ’์€ ๋‹ค์–‘ํ•œ ๋ณ€ํ˜•์„ ํ—ˆ์šฉํ•˜์ง€๋งŒ ์ž…๋ ฅ ์ด๋ฏธ์ง€์™€ ์˜๋ฏธ์ ์œผ๋กœ ์ผ์น˜ํ•˜์ง€ ์•Š๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
</Tip>
ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ •์˜ํ•˜๊ณ (์ง€๋ธŒ๋ฆฌ ์Šคํƒ€์ผ(Ghibli-style)์— ๋งž๊ฒŒ ์กฐ์ •๋œ ์ด ์ฒดํฌํฌ์ธํŠธ์˜ ๊ฒฝ์šฐ ํ”„๋กฌํ”„ํŠธ ์•ž์— `ghibli style` ํ† ํฐ์„ ๋ถ™์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค) ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:
```python
prompt = "ghibli style, a fantasy landscape with castles"
generator = torch.Generator(device=device).manual_seed(1024)
image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ghibli-castles.png"/>
</div>
๋‹ค๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ๋กœ ์‹คํ—˜ํ•˜์—ฌ ์ถœ๋ ฅ์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
from diffusers import LMSDiscreteScheduler
lms = LMSDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.scheduler = lms
generator = torch.Generator(device=device).manual_seed(1024)
image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lms-ghibli.png"/>
</div>
์•„๋ž˜ ๊ณต๋ฐฑ์„ ํ™•์ธํ•˜๊ณ  `strength` ๊ฐ’์„ ๋‹ค๋ฅด๊ฒŒ ์„ค์ •ํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด ๋ณด์„ธ์š”. `strength`๋ฅผ ๋‚ฎ๊ฒŒ ์„ค์ •ํ•˜๋ฉด ์›๋ณธ ์ด๋ฏธ์ง€์™€ ๋” ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๊ฐ€ ์ƒ์„ฑ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ž์œ ๋กญ๊ฒŒ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ [`LMSDiscreteScheduler`]๋กœ ์ „ํ™˜ํ•˜์—ฌ ์ถœ๋ ฅ์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ํ™•์ธํ•ด ๋ณด์„ธ์š”.
<iframe
src="https://stevhliu-ghibli-img2img.hf.space"
frameborder="0"
width="850"
height="500"
></iframe>

View File

@@ -0,0 +1,75 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Text-guided ์ด๋ฏธ์ง€ ์ธํŽ˜์ธํŒ…(inpainting)
[[์ฝ”๋žฉ์—์„œ ์—ด๊ธฐ]]
[`StableDiffusionInpaintPipeline`]์€ ๋งˆ์Šคํฌ์™€ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์ด๋ฏธ์ง€์˜ ํŠน์ • ๋ถ€๋ถ„์„ ํŽธ์ง‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ธฐ๋Šฅ์€ ์ธํŽ˜์ธํŒ… ์ž‘์—…์„ ์œ„ํ•ด ํŠน๋ณ„ํžˆ ํ›ˆ๋ จ๋œ [`runwayml/stable-diffusion-inpainting`](https://huggingface.co/runwayml/stable-diffusion-inpainting)๊ณผ ๊ฐ™์€ Stable Diffusion ๋ฒ„์ „์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
๋จผ์ € [`StableDiffusionInpaintPipeline`] ์ธ์Šคํ„ด์Šค๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค:
```python
import PIL
import requests
import torch
from io import BytesIO
from diffusers import StableDiffusionInpaintPipeline
pipeline = StableDiffusionInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16,
)
pipeline = pipeline.to("cuda")
```
๋‚˜์ค‘์— ๊ต์ฒดํ•  ๊ฐ•์•„์ง€ ์ด๋ฏธ์ง€์™€ ๋งˆ์Šคํฌ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์„ธ์š”:
```python
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
```
์ด์ œ ๋งˆ์Šคํฌ๋ฅผ ๋‹ค๋ฅธ ๊ฒƒ์œผ๋กœ ๊ต์ฒดํ•˜๋ผ๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
```
`image` | `mask_image` | `prompt` | output |
:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:|
<img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" alt="drawing" width="250"/> | <img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" alt="drawing" width="250"/> | ***Face of a yellow cat, high resolution, sitting on a park bench*** | <img src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/in_paint/yellow_cat_sitting_on_a_park_bench.png" alt="drawing" width="250"/> |
<Tip warning={true}>
์ด์ „์˜ ์‹คํ—˜์ ์ธ ์ธํŽ˜์ธํŒ… ๊ตฌํ˜„์—์„œ๋Š” ํ’ˆ์งˆ์ด ๋‚ฎ์€ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์Šค๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ „ ๋ฒ„์ „๊ณผ์˜ ํ˜ธํ™˜์„ฑ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ์ƒˆ ๋ชจ๋ธ์ด ํฌํ•จ๋˜์ง€ ์•Š์€ ์‚ฌ์ „ํ•™์Šต๋œ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ค๋ฉด ์ด์ „ ์ธํŽ˜์ธํŒ… ๋ฐฉ๋ฒ•์ด ๊ณ„์† ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.
</Tip>
์•„๋ž˜ Space์—์„œ ์ด๋ฏธ์ง€ ์ธํŽ˜์ธํŒ…์„ ์ง์ ‘ ํ•ด๋ณด์„ธ์š”!
<iframe
src="https://runwayml-stable-diffusion-inpainting.hf.space"
frameborder="0"
width="850"
height="500"
></iframe>

View File

@@ -0,0 +1,442 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ํŒŒ์ดํ”„๋ผ์ธ, ๋ชจ๋ธ, ์Šค์ผ€์ค„๋Ÿฌ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
๊ธฐ๋ณธ์ ์œผ๋กœ diffusion ๋ชจ๋ธ์€ ๋‹ค์–‘ํ•œ ์ปดํฌ๋„ŒํŠธ๋“ค(๋ชจ๋ธ, ํ† ํฌ๋‚˜์ด์ €, ์Šค์ผ€์ค„๋Ÿฌ) ๊ฐ„์˜ ๋ณต์žกํ•œ ์ƒํ˜ธ์ž‘์šฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ๋””ํ“จ์ €์Šค(Diffusers)๋Š” ์ด๋Ÿฌํ•œ diffusion ๋ชจ๋ธ์„ ๋ณด๋‹ค ์‰ฝ๊ณ  ๊ฐ„ํŽธํ•œ API๋กœ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. [`DiffusionPipeline`]์€ diffusion ๋ชจ๋ธ์ด ๊ฐ–๋Š” ๋ณต์žก์„ฑ์„ ํ•˜๋‚˜์˜ ํŒŒ์ดํ”„๋ผ์ธ API๋กœ ํ†ตํ•ฉํ•˜๊ณ , ๋™์‹œ์— ์ด๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ฐ๊ฐ์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์„ ํƒœ์Šคํฌ์— ๋งž์ถฐ ์œ ์—ฐํ•˜๊ฒŒ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
diffusion ๋ชจ๋ธ์˜ ํ›ˆ๋ จ๊ณผ ์ถ”๋ก ์— ํ•„์š”ํ•œ ๋ชจ๋“  ๊ฒƒ์€ [`DiffusionPipeline.from_pretrained`] ๋ฉ”์„œ๋“œ๋ฅผ ํ†ตํ•ด ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (์ด ๋ง์˜ ์˜๋ฏธ๋Š” ๋‹ค์Œ ๋‹จ๋ฝ์—์„œ ๋ณด๋‹ค ์ž์„ธํ•˜๊ฒŒ ๋‹ค๋ค„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.)
์ด ๋ฌธ์„œ์—์„œ๋Š” ์„ค๋ช…ํ•  ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
* ํ—ˆ๋ธŒ๋ฅผ ํ†ตํ•ด ํ˜น์€ ๋กœ์ปฌ๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฒ•
* ํŒŒ์ดํ”„๋ผ์ธ์— ๋‹ค๋ฅธ ์ปดํฌ๋„ŒํŠธ๋“ค์„ ์ ์šฉํ•˜๋Š” ๋ฒ•
* ์˜ค๋ฆฌ์ง€๋„ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์•„๋‹Œ variant๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฒ• (variant๋ž€ ๊ธฐ๋ณธ์œผ๋กœ ์„ค์ •๋œ `fp32`๊ฐ€ ์•„๋‹Œ ๋‹ค๋ฅธ ๋ถ€๋™ ์†Œ์ˆ˜์  ํƒ€์ž…(์˜ˆ: `fp16`)์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ Non-EMA ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ฒดํฌํฌ์ธํŠธ๋“ค์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.)
* ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฒ•
## Diffusion ํŒŒ์ดํ”„๋ผ์ธ
<Tip>
๐Ÿ’ก [`DiffusionPipeline`] ํด๋ž˜์Šค๊ฐ€ ๋™์ž‘ํ•˜๋Š” ๋ฐฉ์‹์— ๋ณด๋‹ค ์ž์„ธํ•œ ๋‚ด์šฉ์ด ๊ถ๊ธˆํ•˜๋‹ค๋ฉด, [DiffusionPipeline explained](#diffusionpipeline์—-๋Œ€ํ•ด-์•Œ์•„๋ณด๊ธฐ) ์„น์…˜์„ ํ™•์ธํ•ด๋ณด์„ธ์š”.
</Tip>
[`DiffusionPipeline`] ํด๋ž˜์Šค๋Š” diffusion ๋ชจ๋ธ์„ [ํ—ˆ๋ธŒ](https://huggingface.co/models?library=diffusers)๋กœ๋ถ€ํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฐ€์žฅ ์‹ฌํ”Œํ•˜๋ฉด์„œ ๋ณดํŽธ์ ์ธ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. [`DiffusionPipeline.from_pretrained`] ๋ฉ”์„œ๋“œ๋Š” ์ ํ•ฉํ•œ ํŒŒ์ดํ”„๋ผ์ธ ํด๋ž˜์Šค๋ฅผ ์ž๋™์œผ๋กœ ํƒ์ง€ํ•˜๊ณ , ํ•„์š”ํ•œ ๊ตฌ์„ฑ์š”์†Œ(configuration)์™€ ๊ฐ€์ค‘์น˜(weight) ํŒŒ์ผ๋“ค์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์บ์‹ฑํ•œ ๋‹ค์Œ, ํ•ด๋‹น ํŒŒ์ดํ”„๋ผ์ธ ์ธ์Šคํ„ด์Šค๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
repo_id = "runwayml/stable-diffusion-v1-5"
pipe = DiffusionPipeline.from_pretrained(repo_id)
```
๋ฌผ๋ก  [`DiffusionPipeline`] ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ , ๋ช…์‹œ์ ์œผ๋กœ ์ง์ ‘ ํ•ด๋‹น ํŒŒ์ดํ”„๋ผ์ธ ํด๋ž˜์Šค๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ์˜ˆ์‹œ ์ฝ”๋“œ๋Š” ์œ„ ์˜ˆ์‹œ์™€ ๋™์ผํ•œ ์ธ์Šคํ„ด์Šค๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import StableDiffusionPipeline
repo_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(repo_id)
```
[CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)์ด๋‚˜ [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) ๊ฐ™์€ ์ฒดํฌํฌ์ธํŠธ๋“ค์˜ ๊ฒฝ์šฐ, ํ•˜๋‚˜ ์ด์ƒ์˜ ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ์— ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (์˜ˆ๋ฅผ ๋“ค์–ด ์œ„์˜ ๋‘ ์ฒดํฌํฌ์ธํŠธ์˜ ๊ฒฝ์šฐ, text-to-image์™€ image-to-image์— ๋ชจ๋‘ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.) ๋งŒ์•ฝ ์ด๋Ÿฌํ•œ ์ฒดํฌํฌ์ธํŠธ๋“ค์„ ๊ธฐ๋ณธ ์„ค์ • ํƒœ์Šคํฌ๊ฐ€ ์•„๋‹Œ ๋‹ค๋ฅธ ํƒœ์Šคํฌ์— ํ™œ์šฉํ•˜๊ณ ์ž ํ•œ๋‹ค๋ฉด, ํ•ด๋‹น ํƒœ์Šคํฌ์— ๋Œ€์‘๋˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ(task-specific pipeline)์„ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import StableDiffusionImg2ImgPipeline
repo_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(repo_id)
```
### ๋กœ์ปฌ ํŒŒ์ดํ”„๋ผ์ธ
ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ์ปฌ๋กœ ๋ถˆ๋Ÿฌ์˜ค๊ณ ์ž ํ•œ๋‹ค๋ฉด, `git-lfs`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง์ ‘ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋กœ์ปฌ ๋””์Šคํฌ์— ๋‹ค์šด๋กœ๋“œ ๋ฐ›์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜๋ฉด `./stable-diffusion-v1-5`๋ž€ ์ด๋ฆ„์œผ๋กœ ํด๋”๊ฐ€ ๋กœ์ปฌ๋””์Šคํฌ์— ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
```bash
git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
```
๊ทธ๋Ÿฐ ๋‹ค์Œ ํ•ด๋‹น ๋กœ์ปฌ ๊ฒฝ๋กœ๋ฅผ [`~DiffusionPipeline.from_pretrained`] ๋ฉ”์„œ๋“œ์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
repo_id = "./stable-diffusion-v1-5"
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id)
```
์œ„์˜ ์˜ˆ์‹œ์ฝ”๋“œ์ฒ˜๋Ÿผ ๋งŒ์•ฝ `repo_id`๊ฐ€ ๋กœ์ปฌ ํŒจ์Šค(local path)๋ผ๋ฉด, [`~DiffusionPipeline.from_pretrained`] ๋ฉ”์„œ๋“œ๋Š” ์ด๋ฅผ ์ž๋™์œผ๋กœ ๊ฐ์ง€ํ•˜์—ฌ ํ—ˆ๋ธŒ์—์„œ ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ๋กœ์ปฌ ๋””์Šคํฌ์— ์ €์žฅ๋œ ํŒŒ์ดํ”„๋ผ์ธ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์ตœ์‹  ๋ฒ„์ „์ด ์•„๋‹ ๊ฒฝ์šฐ์—๋„, ์ตœ์‹  ๋ฒ„์ „์„ ๋‹ค์šด๋กœ๋“œํ•˜์ง€ ์•Š๊ณ  ๊ธฐ์กด ๋กœ์ปฌ ๋””์Šคํฌ์— ์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
### ํŒŒ์ดํ”„๋ผ์ธ ๋‚ด๋ถ€์˜ ์ปดํฌ๋„ŒํŠธ ๊ต์ฒดํ•˜๊ธฐ
ํŒŒ์ดํ”„๋ผ์ธ ๋‚ด๋ถ€์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์€ ํ˜ธํ™˜ ๊ฐ€๋Šฅํ•œ ๋‹ค๋ฅธ ์ปดํฌ๋„ŒํŠธ๋กœ ๊ต์ฒด๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ์ปดํฌ๋„ŒํŠธ ๊ต์ฒด๊ฐ€ ์ค‘์š”ํ•œ ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
- ์–ด๋–ค ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์ธ๊ฐ€๋Š” ์ƒ์„ฑ์†๋„์™€ ์ƒ์„ฑํ’ˆ์งˆ ๊ฐ„์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๋ฅผ ์ •์˜ํ•˜๋Š” ์ค‘์š”ํ•œ ์š”์†Œ์ž…๋‹ˆ๋‹ค.
- diffusion ๋ชจ๋ธ ๋‚ด๋ถ€์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ฐ๊ฐ ๋…๋ฆฝ์ ์œผ๋กœ ํ›ˆ๋ จ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ปดํฌ๋„ŒํŠธ๊ฐ€ ์žˆ๋‹ค๋ฉด ๊ทธ๊ฑธ๋กœ ๊ต์ฒดํ•˜๋Š” ์‹์œผ๋กœ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ํŒŒ์ธ ํŠœ๋‹ ๋‹จ๊ณ„์—์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ UNet ํ˜น์€ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์™€ ๊ฐ™์€ ์ผ๋ถ€ ์ปดํฌ๋„ŒํŠธ๋“ค๋งŒ ํ›ˆ๋ จํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
์–ด๋–ค ์Šค์ผ€์ค„๋Ÿฌ๋“ค์ด ํ˜ธํ™˜๊ฐ€๋Šฅํ•œ์ง€๋Š” `compatibles` ์†์„ฑ์„ ํ†ตํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
repo_id = "runwayml/stable-diffusion-v1-5"
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id)
stable_diffusion.scheduler.compatibles
```
์ด๋ฒˆ์—๋Š” [`SchedulerMixin.from_pretrained`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด์„œ, ๊ธฐ์กด ๊ธฐ๋ณธ ์Šค์ผ€์ค„๋Ÿฌ์˜€๋˜ [`PNDMScheduler`]๋ฅผ ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์˜ [`EulerDiscreteScheduler`]๋กœ ๋ฐ”๊ฟ”๋ด…์‹œ๋‹ค. ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋กœ๋“œํ•  ๋•Œ๋Š” `subfolder` ์ธ์ž๋ฅผ ํ†ตํ•ด, ํ•ด๋‹น ํŒŒ์ดํ”„๋ผ์ธ์˜ ๋ ˆํฌ์ง€ํ† ๋ฆฌ์—์„œ [์Šค์ผ€์ค„๋Ÿฌ์— ๊ด€ํ•œ ํ•˜์œ„ํด๋”](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/scheduler)๋ฅผ ๋ช…์‹œํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๊ทธ ๋‹ค์Œ ์ƒˆ๋กญ๊ฒŒ ์ƒ์„ฑํ•œ [`EulerDiscreteScheduler`] ์ธ์Šคํ„ด์Šค๋ฅผ [`DiffusionPipeline`]์˜ `scheduler` ์ธ์ž์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline, EulerDiscreteScheduler, DPMSolverMultistepScheduler
repo_id = "runwayml/stable-diffusion-v1-5"
scheduler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, scheduler=scheduler)
```
### ์„ธ์ดํ”„ํ‹ฐ ์ฒด์ปค
์Šคํ…Œ์ด๋ธ” diffusion๊ณผ ๊ฐ™์€ diffusion ๋ชจ๋ธ๋“ค์€ ์œ ํ•ดํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์˜ˆ๋ฐฉํ•˜๊ธฐ ์œ„ํ•ด ๋””ํ“จ์ €์Šค๋Š” ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์˜ ์œ ํ•ด์„ฑ์„ ํŒ๋‹จํ•˜๋Š” [์„ธ์ดํ”„ํ‹ฐ ์ฒด์ปค(safety checker)](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) ๊ธฐ๋Šฅ์„ ์ง€์›ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์„ธ์ดํ”„ํ‹ฐ ์ฒด์ปค์˜ ์‚ฌ์šฉ์„ ์›ํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, `safety_checker` ์ธ์ž์— `None`์„ ์ „๋‹ฌํ•ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
repo_id = "runwayml/stable-diffusion-v1-5"
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, safety_checker=None)
```
### ์ปดํฌ๋„ŒํŠธ ์žฌ์‚ฌ์šฉ
๋ณต์ˆ˜์˜ ํŒŒ์ดํ”„๋ผ์ธ์— ๋™์ผํ•œ ๋ชจ๋ธ์ด ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, ๊ตณ์ด ํ•ด๋‹น ๋ชจ๋ธ์˜ ๋™์ผํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ์ค‘๋ณต์œผ๋กœ RAM์— ๋ถˆ๋Ÿฌ์˜ฌ ํ•„์š”๋Š” ์—†์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. [`~DiffusionPipeline.components`] ์†์„ฑ์„ ํ†ตํ•ด ํŒŒ์ดํ”„๋ผ์ธ ๋‚ด๋ถ€์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์„ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋ฒˆ ๋‹จ๋ฝ์—์„œ๋Š” ์ด๋ฅผ ํ†ตํ•ด ๋™์ผํ•œ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ RAM์— ์ค‘๋ณต์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๋Š” ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
```python
from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline
model_id = "runwayml/stable-diffusion-v1-5"
stable_diffusion_txt2img = StableDiffusionPipeline.from_pretrained(model_id)
components = stable_diffusion_txt2img.components
```
๊ทธ ๋‹ค์Œ ์œ„ ์˜ˆ์‹œ ์ฝ”๋“œ์—์„œ ์„ ์–ธํ•œ `components` ๋ณ€์ˆ˜๋ฅผ ๋‹ค๋ฅธ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•จ์œผ๋กœ์จ, ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ค‘๋ณต์œผ๋กœ RAM์— ๋กœ๋”ฉํ•˜์ง€ ์•Š๊ณ , ๋™์ผํ•œ ์ปดํฌ๋„ŒํŠธ๋ฅผ ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
stable_diffusion_img2img = StableDiffusionImg2ImgPipeline(**components)
```
๋ฌผ๋ก  ๊ฐ๊ฐ์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์„ ๋”ฐ๋กœ ๋”ฐ๋กœ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด `stable_diffusion_txt2img` ํŒŒ์ดํ”„๋ผ์ธ ์•ˆ์˜ ์ปดํฌ๋„ŒํŠธ๋“ค ๊ฐ€์šด๋ฐ์„œ ์„ธ์ดํ”„ํ‹ฐ ์ฒด์ปค(`safety_checker`)์™€ ํ”ผ์ณ ์ต์ŠคํŠธ๋ž™ํ„ฐ(`feature_extractor`)๋ฅผ ์ œ์™ธํ•œ ์ปดํฌ๋„ŒํŠธ๋“ค๋งŒ `stable_diffusion_img2img` ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์žฌ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹ ์—ญ์‹œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline
model_id = "runwayml/stable-diffusion-v1-5"
stable_diffusion_txt2img = StableDiffusionPipeline.from_pretrained(model_id)
stable_diffusion_img2img = StableDiffusionImg2ImgPipeline(
vae=stable_diffusion_txt2img.vae,
text_encoder=stable_diffusion_txt2img.text_encoder,
tokenizer=stable_diffusion_txt2img.tokenizer,
unet=stable_diffusion_txt2img.unet,
scheduler=stable_diffusion_txt2img.scheduler,
safety_checker=None,
feature_extractor=None,
requires_safety_checker=False,
)
```
## Checkpoint variants
Variant๋ž€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ฒดํฌํฌ์ธํŠธ๋“ค์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
- `torch.float16`๊ณผ ๊ฐ™์ด ์ •๋ฐ€๋„๋Š” ๋” ๋‚ฎ์ง€๋งŒ, ์šฉ๋Ÿ‰ ์—ญ์‹œ ๋” ์ž‘์€ ๋ถ€๋™์†Œ์ˆ˜์  ํƒ€์ž…์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ฒดํฌํฌ์ธํŠธ. *(๋‹ค๋งŒ ์ด์™€ ๊ฐ™์€ variant์˜ ๊ฒฝ์šฐ, ์ถ”๊ฐ€์ ์ธ ํ›ˆ๋ จ๊ณผ CPUํ™˜๊ฒฝ์—์„œ์˜ ๊ตฌ๋™์ด ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.)*
- Non-EMA ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ฒดํฌํฌ์ธํŠธ. *(Non-EMA ๊ฐ€์ค‘์น˜์˜ ๊ฒฝ์šฐ, ํŒŒ์ธ ํŠœ๋‹ ๋‹จ๊ณ„์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ถŒ์žฅ๋˜๋Š”๋ฐ, ์ถ”๋ก  ๋‹จ๊ณ„์—์„  ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค.)*
<Tip>
๐Ÿ’ก ๋ชจ๋ธ ๊ตฌ์กฐ๋Š” ๋™์ผํ•˜์ง€๋งŒ ์„œ๋กœ ๋‹ค๋ฅธ ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šต๋œ ์ฒดํฌํฌ์ธํŠธ๋“ค์ด ์žˆ์„ ๊ฒฝ์šฐ, ํ•ด๋‹น ์ฒดํฌํฌ์ธํŠธ๋“ค์€ variant ๋‹จ๊ณ„๊ฐ€ ์•„๋‹Œ ๋ ˆํฌ์ง€ํ† ๋ฆฌ ๋‹จ๊ณ„์—์„œ ๋ถ„๋ฆฌ๋˜์–ด ๊ด€๋ฆฌ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. (์ฆ‰, ํ•ด๋‹น ์ฒดํฌํฌ์ธํŠธ๋“ค์€ ์„œ๋กœ ๋‹ค๋ฅธ ๋ ˆํฌ์ง€ํ† ๋ฆฌ์—์„œ ๋”ฐ๋กœ ๊ด€๋ฆฌ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์‹œ: [`stable-diffusion-v1-4`], [`stable-diffusion-v1-5`]).
</Tip>
| **checkpoint type** | **weight name** | **argument for loading weights** |
| ------------------- | ----------------------------------- | -------------------------------- |
| original | diffusion_pytorch_model.bin | |
| floating point | diffusion_pytorch_model.fp16.bin | `variant`, `torch_dtype` |
| non-EMA | diffusion_pytorch_model.non_ema.bin | `variant` |
variant๋ฅผ ๋กœ๋“œํ•  ๋•Œ 2๊ฐœ์˜ ์ค‘์š”ํ•œ argument๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
* `torch_dtype`์€ ๋ถˆ๋Ÿฌ์˜ฌ ์ฒดํฌํฌ์ธํŠธ์˜ ๋ถ€๋™์†Œ์ˆ˜์ ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด `torch_dtype=torch.float16`์„ ๋ช…์‹œํ•จ์œผ๋กœ์จ ๊ฐ€์ค‘์น˜์˜ ๋ถ€๋™์†Œ์ˆ˜์  ํƒ€์ž…์„ `fl16`์œผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (๋งŒ์•ฝ ๋”ฐ๋กœ ์„ค์ •ํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ, ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ `fp32` ํƒ€์ž…์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ๋กœ๋”ฉ๋ฉ๋‹ˆ๋‹ค.) ๋˜ํ•œ `variant` ์ธ์ž๋ฅผ ๋ช…์‹œํ•˜์ง€ ์•Š์€ ์ฑ„๋กœ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ ๋‹ค์Œ, ํ•ด๋‹น ์ฒดํฌํฌ์ธํŠธ๋ฅผ `torch_dtype=torch.float16` ์ธ์ž๋ฅผ ํ†ตํ•ด `fp16` ํƒ€์ž…์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ ์—ญ์‹œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ๊ธฐ๋ณธ์œผ๋กœ ์„ค์ •๋œ `fp32` ๊ฐ€์ค‘์น˜๊ฐ€ ๋จผ์ € ๋‹ค์šด๋กœ๋“œ๋˜๊ณ , ํ•ด๋‹น ๊ฐ€์ค‘์น˜๋“ค์„ ๋ถˆ๋Ÿฌ์˜จ ๋‹ค์Œ `fp16` ํƒ€์ž…์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
* `variant` ์ธ์ž๋Š” ๋ ˆํฌ์ง€ํ† ๋ฆฌ์—์„œ ์–ด๋–ค variant๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๊ฒƒ์ธ๊ฐ€๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€๋ น [`diffusers/stable-diffusion-variants`](https://huggingface.co/diffusers/stable-diffusion-variants/tree/main/unet) ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋กœ๋ถ€ํ„ฐ `non_ema` ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ ์ž ํ•œ๋‹ค๋ฉด, `variant="non_ema"` ์ธ์ž๋ฅผ ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
# load fp16 variant
stable_diffusion = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16
)
# load non_ema variant
stable_diffusion = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", variant="non_ema")
```
๋‹ค๋ฅธ ๋ถ€๋™์†Œ์ˆ˜์  ํƒ€์ž…์˜ ๊ฐ€์ค‘์น˜ ํ˜น์€ non-EMA ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, [`DiffusionPipeline.save_pretrained`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋ฉฐ, ์ด ๋•Œ `variant` ์ธ์ž๋ฅผ ๋ช…์‹œํ•ด์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์›๋ž˜์˜ ์ฒดํฌํฌ์ธํŠธ์™€ ๋™์ผํ•œ ํด๋”์— variant๋ฅผ ์ €์žฅํ•ด์•ผ ํ•˜๋ฉฐ, ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋™์ผํ•œ ํด๋”์—์„œ ์˜ค๋ฆฌ์ง€๋„ ์ฒดํฌํฌ์ธํŠธ๊ณผ variant๋ฅผ ๋ชจ๋‘ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
# save as fp16 variant
stable_diffusion.save_pretrained("runwayml/stable-diffusion-v1-5", variant="fp16")
# save as non-ema variant
stable_diffusion.save_pretrained("runwayml/stable-diffusion-v1-5", variant="non_ema")
```
๋งŒ์•ฝ variant๋ฅผ ๊ธฐ์กด ํด๋”์— ์ €์žฅํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ, `variant` ์ธ์ž๋ฅผ ๋ฐ˜๋“œ์‹œ ๋ช…์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ฒŒ ํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ ์›๋ž˜์˜ ์˜ค๋ฆฌ์ง€๋„ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†๊ฒŒ ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
```python
# ๐Ÿ‘Ž this won't work
stable_diffusion = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", torch_dtype=torch.float16)
# ๐Ÿ‘ this works
stable_diffusion = DiffusionPipeline.from_pretrained(
"./stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16
)
```
### ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
๋ชจ๋ธ๋“ค์€ [`ModelMixin.from_pretrained`] ๋ฉ”์„œ๋“œ๋ฅผ ํ†ตํ•ด ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ฉ”์„œ๋“œ๋Š” ์ตœ์‹  ๋ฒ„์ „์˜ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ํŒŒ์ผ๊ณผ ์„ค์ • ํŒŒ์ผ(configurations)์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์บ์‹ฑํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ด๋Ÿฌํ•œ ํŒŒ์ผ๋“ค์ด ์ตœ์‹  ๋ฒ„์ „์œผ๋กœ ๋กœ์ปฌ ์บ์‹œ์— ์ €์žฅ๋˜์–ด ์žˆ๋‹ค๋ฉด, [`ModelMixin.from_pretrained`]๋Š” ๊ตณ์ด ํ•ด๋‹น ํŒŒ์ผ๋“ค์„ ๋‹ค์‹œ ๋‹ค์šด๋กœ๋“œํ•˜์ง€ ์•Š์œผ๋ฉฐ, ๊ทธ์ € ์บ์‹œ์— ์žˆ๋Š” ์ตœ์‹  ํŒŒ์ผ๋“ค์„ ์žฌ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
๋ชจ๋ธ์€ `subfolder` ์ธ์ž์— ๋ช…์‹œ๋œ ํ•˜์œ„ ํด๋”๋กœ๋ถ€ํ„ฐ ๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด `runwayml/stable-diffusion-v1-5`์˜ UNet ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋Š” [`unet`](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/unet) ํด๋”์— ์ €์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
```python
from diffusers import UNet2DConditionModel
repo_id = "runwayml/stable-diffusion-v1-5"
model = UNet2DConditionModel.from_pretrained(repo_id, subfolder="unet")
```
ํ˜น์€ [ํ•ด๋‹น ๋ชจ๋ธ์˜ ๋ ˆํฌ์ง€ํ† ๋ฆฌ](https://huggingface.co/google/ddpm-cifar10-32/tree/main)๋กœ๋ถ€ํ„ฐ ๋‹ค์ด๋ ‰ํŠธ๋กœ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ ์—ญ์‹œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import UNet2DModel
repo_id = "google/ddpm-cifar10-32"
model = UNet2DModel.from_pretrained(repo_id)
```
๋˜ํ•œ ์•ž์„œ ๋ดค๋˜ `variant` ์ธ์ž๋ฅผ ๋ช…์‹œํ•จ์œผ๋กœ์จ, Non-EMA๋‚˜ `fp16`์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ ์—ญ์‹œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import UNet2DConditionModel
model = UNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet", variant="non-ema")
model.save_pretrained("./local-unet", variant="non-ema")
```
### ์Šค์ผ€์ค„๋Ÿฌ
์Šค์ผ€์ค„๋Ÿฌ๋“ค์€ [`SchedulerMixin.from_pretrained`] ๋ฉ”์„œ๋“œ๋ฅผ ํ†ตํ•ด ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ ์Šค์ผ€์ค„๋Ÿฌ๋Š” ๋ณ„๋„์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ–์ง€ ์•Š์œผ๋ฉฐ, ๋”ฐ๋ผ์„œ ๋‹น์—ฐํžˆ ๋ณ„๋„์˜ ํ•™์Šต๊ณผ์ •์„ ์š”๊ตฌํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์Šค์ผ€์ค„๋Ÿฌ๋“ค์€ (ํ•ด๋‹น ์Šค์ผ€์ค„๋Ÿฌ ํ•˜์œ„ํด๋”์˜) configration ํŒŒ์ผ์„ ํ†ตํ•ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.
์—ฌ๋Ÿฌ๊ฐœ์˜ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค๊ณ  ํ•ด์„œ ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์†Œ๋ชจํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋ฉฐ, ๋‹ค์–‘ํ•œ ์Šค์ผ€์ค„๋Ÿฌ๋“ค์— ๋™์ผํ•œ ์Šค์ผ€์ค„๋Ÿฌ configration์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ ์—ญ์‹œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ์˜ˆ์‹œ ์ฝ”๋“œ์—์„œ ๋ถˆ๋Ÿฌ์˜ค๋Š” ์Šค์ผ€์ค„๋Ÿฌ๋“ค์€ ๋ชจ๋‘ [`StableDiffusionPipeline`]๊ณผ ํ˜ธํ™˜๋˜๋Š”๋ฐ, ์ด๋Š” ๊ณง ํ•ด๋‹น ์Šค์ผ€์ค„๋Ÿฌ๋“ค์— ๋™์ผํ•œ ์Šค์ผ€์ค„๋Ÿฌ configration ํŒŒ์ผ์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import StableDiffusionPipeline
from diffusers import (
DDPMScheduler,
DDIMScheduler,
PNDMScheduler,
LMSDiscreteScheduler,
EulerDiscreteScheduler,
EulerAncestralDiscreteScheduler,
DPMSolverMultistepScheduler,
)
repo_id = "runwayml/stable-diffusion-v1-5"
ddpm = DDPMScheduler.from_pretrained(repo_id, subfolder="scheduler")
ddim = DDIMScheduler.from_pretrained(repo_id, subfolder="scheduler")
pndm = PNDMScheduler.from_pretrained(repo_id, subfolder="scheduler")
lms = LMSDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
euler_anc = EulerAncestralDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
euler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
dpm = DPMSolverMultistepScheduler.from_pretrained(repo_id, subfolder="scheduler")
# replace `dpm` with any of `ddpm`, `ddim`, `pndm`, `lms`, `euler_anc`, `euler`
pipeline = StableDiffusionPipeline.from_pretrained(repo_id, scheduler=dpm)
```
### DiffusionPipeline์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ธฐ
ํด๋ž˜์Šค ๋ฉ”์„œ๋“œ๋กœ์„œ [`DiffusionPipeline.from_pretrained`]์€ 2๊ฐ€์ง€๋ฅผ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
- ์ฒซ์งธ๋กœ, `from_pretrained` ๋ฉ”์„œ๋“œ๋Š” ์ตœ์‹  ๋ฒ„์ „์˜ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ , ์บ์‹œ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ ๋กœ์ปฌ ์บ์‹œ์— ์ตœ์‹  ๋ฒ„์ „์˜ ํŒŒ์ดํ”„๋ผ์ธ์ด ์ €์žฅ๋˜์–ด ์žˆ๋‹ค๋ฉด, [`DiffusionPipeline.from_pretrained`]์€ ํ•ด๋‹น ํŒŒ์ผ๋“ค์„ ๋‹ค์‹œ ๋‹ค์šด๋กœ๋“œํ•˜์ง€ ์•Š๊ณ , ๋กœ์ปฌ ์บ์‹œ์— ์ €์žฅ๋˜์–ด ์žˆ๋Š” ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
- `model_index.json` ํŒŒ์ผ์„ ํ†ตํ•ด ์ฒดํฌํฌ์ธํŠธ์— ๋Œ€์‘๋˜๋Š” ์ ํ•ฉํ•œ ํŒŒ์ดํ”„๋ผ์ธ ํด๋ž˜์Šค๋กœ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
ํŒŒ์ดํ”„๋ผ์ธ์˜ ํด๋” ๊ตฌ์กฐ๋Š” ํ•ด๋‹น ํŒŒ์ดํ”„๋ผ์ธ ํด๋ž˜์Šค์˜ ๊ตฌ์กฐ์™€ ์ง์ ‘์ ์œผ๋กœ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด [`StableDiffusionPipeline`] ํด๋ž˜์Šค๋Š” [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) ๋ ˆํฌ์ง€ํ† ๋ฆฌ์™€ ๋Œ€์‘๋˜๋Š” ๊ตฌ์กฐ๋ฅผ ๊ฐ–์Šต๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline
repo_id = "runwayml/stable-diffusion-v1-5"
pipeline = DiffusionPipeline.from_pretrained(repo_id)
print(pipeline)
```
์œ„์˜ ์ฝ”๋“œ ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ด๋ณด๋ฉด, `pipeline`์€ [`StableDiffusionPipeline`]์˜ ์ธ์Šคํ„ด์Šค์ด๋ฉฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ด 7๊ฐœ์˜ ์ปดํฌ๋„ŒํŠธ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- `"feature_extractor"`: [`~transformers.CLIPFeatureExtractor`]์˜ ์ธ์Šคํ„ด์Šค
- `"safety_checker"`: ์œ ํ•ดํ•œ ์ปจํ…์ธ ๋ฅผ ์Šคํฌ๋ฆฌ๋‹ํ•˜๊ธฐ ์œ„ํ•œ [์ปดํฌ๋„ŒํŠธ](https://github.com/huggingface/diffusers/blob/e55687e1e15407f60f32242027b7bb8170e58266/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L32)
- `"scheduler"`: [`PNDMScheduler`]์˜ ์ธ์Šคํ„ด์Šค
- `"text_encoder"`: [`~transformers.CLIPTextModel`]์˜ ์ธ์Šคํ„ด์Šค
- `"tokenizer"`: a [`~transformers.CLIPTokenizer`]์˜ ์ธ์Šคํ„ด์Šค
- `"unet"`: [`UNet2DConditionModel`]์˜ ์ธ์Šคํ„ด์Šค
- `"vae"` [`AutoencoderKL`]์˜ ์ธ์Šคํ„ด์Šค
```json
StableDiffusionPipeline {
"feature_extractor": [
"transformers",
"CLIPImageProcessor"
],
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"PNDMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
```
ํŒŒ์ดํ”„๋ผ์ธ ์ธ์Šคํ„ด์Šค์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์„ [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)์˜ ํด๋” ๊ตฌ์กฐ์™€ ๋น„๊ตํ•ด๋ณผ ๊ฒฝ์šฐ, ๊ฐ๊ฐ์˜ ์ปดํฌ๋„ŒํŠธ๋งˆ๋‹ค ๋ณ„๋„์˜ ํด๋”๊ฐ€ ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```
.
โ”œโ”€โ”€ feature_extractor
โ”‚ โ””โ”€โ”€ preprocessor_config.json
โ”œโ”€โ”€ model_index.json
โ”œโ”€โ”€ safety_checker
โ”‚ โ”œโ”€โ”€ config.json
โ”‚ โ””โ”€โ”€ pytorch_model.bin
โ”œโ”€โ”€ scheduler
โ”‚ โ””โ”€โ”€ scheduler_config.json
โ”œโ”€โ”€ text_encoder
โ”‚ โ”œโ”€โ”€ config.json
โ”‚ โ””โ”€โ”€ pytorch_model.bin
โ”œโ”€โ”€ tokenizer
โ”‚ โ”œโ”€โ”€ merges.txt
โ”‚ โ”œโ”€โ”€ special_tokens_map.json
โ”‚ โ”œโ”€โ”€ tokenizer_config.json
โ”‚ โ””โ”€โ”€ vocab.json
โ”œโ”€โ”€ unet
โ”‚ โ”œโ”€โ”€ config.json
โ”‚ โ”œโ”€โ”€ diffusion_pytorch_model.bin
โ””โ”€โ”€ vae
โ”œโ”€โ”€ config.json
โ”œโ”€โ”€ diffusion_pytorch_model.bin
```
๋˜ํ•œ ๊ฐ๊ฐ์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์„ ํŒŒ์ดํ”„๋ผ์ธ ์ธ์Šคํ„ด์Šค์˜ ์†์„ฑ์œผ๋กœ์จ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```py
pipeline.tokenizer
```
```python
CLIPTokenizer(
name_or_path="/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/39593d5650112b4cc580433f6b0435385882d819/tokenizer",
vocab_size=49408,
model_max_length=77,
is_fast=False,
padding_side="right",
truncation_side="right",
special_tokens={
"bos_token": AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
"eos_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
"unk_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
"pad_token": "<|endoftext|>",
},
)
```
๋ชจ๋“  ํŒŒ์ดํ”„๋ผ์ธ์€ `model_index.json` ํŒŒ์ผ์„ ํ†ตํ•ด [`DiffusionPipeline`]์— ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
- `_class_name` ๋Š” ์–ด๋–ค ํŒŒ์ดํ”„๋ผ์ธ ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋Š”์ง€์— ๋Œ€ํ•ด ์•Œ๋ ค์ค๋‹ˆ๋‹ค.
- `_diffusers_version`๋Š” ์–ด๋–ค ๋ฒ„์ „์˜ ๋””ํ“จ์ €์Šค๋กœ ํŒŒ์ดํ”„๋ผ์ธ ์•ˆ์˜ ๋ชจ๋ธ๋“ค์ด ๋งŒ๋“ค์–ด์กŒ๋Š”์ง€๋ฅผ ์•Œ๋ ค์ค๋‹ˆ๋‹ค.
- ๊ทธ ๋‹ค์Œ์€ ๊ฐ๊ฐ์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์ด ์–ด๋–ค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ์–ด๋–ค ํด๋ž˜์Šค๋กœ ๋งŒ๋“ค์–ด์กŒ๋Š”์ง€์— ๋Œ€ํ•ด ์•Œ๋ ค์ค๋‹ˆ๋‹ค. (์•„๋ž˜ ์˜ˆ์‹œ์—์„œ `"feature_extractor" : ["transformers", "CLIPImageProcessor"]`์˜ ๊ฒฝ์šฐ, `feature_extractor` ์ปดํฌ๋„ŒํŠธ๋Š” `transformers` ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ `CLIPImageProcessor` ํด๋ž˜์Šค๋ฅผ ํ†ตํ•ด ๋งŒ๋“ค์–ด์กŒ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.)
```json
{
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.6.0",
"feature_extractor": [
"transformers",
"CLIPImageProcessor"
],
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"PNDMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
```

View File

@@ -0,0 +1,191 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ๋‹ค์–‘ํ•œ Stable Diffusion ํฌ๋งท ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
Stable Diffusion ๋ชจ๋ธ๋“ค์€ ํ•™์Šต ๋ฐ ์ €์žฅ๋œ ํ”„๋ ˆ์ž„์›Œํฌ์™€ ๋‹ค์šด๋กœ๋“œ ์œ„์น˜์— ๋”ฐ๋ผ ๋‹ค์–‘ํ•œ ํ˜•์‹์œผ๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ˜•์‹์„ ๐Ÿค— Diffusers์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ณ€ํ™˜ํ•˜๋ฉด ์ถ”๋ก ์„ ์œ„ํ•œ [๋‹ค์–‘ํ•œ ์Šค์ผ€์ค„๋Ÿฌ ์‚ฌ์šฉ](schedulers), ์‚ฌ์šฉ์ž ์ง€์ • ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•, ์ถ”๋ก  ์†๋„ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•๊ณผ ๋ฐฉ๋ฒ• ๋“ฑ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ์ง€์›ํ•˜๋Š” ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
<Tip>
์šฐ๋ฆฌ๋Š” `.safetensors` ํ˜•์‹์„ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ๊ธฐ์กด์˜ pickled ํŒŒ์ผ์€ ์ทจ์•ฝํ•˜๊ณ  ๋จธ์‹ ์—์„œ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•  ๋•Œ ์•…์šฉ๋  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์— ๋น„ํ•ด ํ›จ์”ฌ ๋” ์•ˆ์ „ํ•ฉ๋‹ˆ๋‹ค. (safetensors ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ๊ฐ€์ด๋“œ์—์„œ ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”.)
</Tip>
์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” ๋‹ค๋ฅธ Stable Diffusion ํ˜•์‹์„ ๐Ÿค— Diffusers์™€ ํ˜ธํ™˜๋˜๋„๋ก ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
## PyTorch .ckpt
์ฒดํฌํฌ์ธํŠธ ๋˜๋Š” `.ckpt` ํ˜•์‹์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ชจ๋ธ์„ ์ €์žฅํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. `.ckpt` ํŒŒ์ผ์€ ์ „์ฒด ๋ชจ๋ธ์„ ํฌํ•จํ•˜๋ฉฐ ์ผ๋ฐ˜์ ์œผ๋กœ ํฌ๊ธฐ๊ฐ€ ๋ช‡ GB์ž…๋‹ˆ๋‹ค. `.ckpt` ํŒŒ์ผ์„ [~StableDiffusionPipeline.from_ckpt] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง์ ‘ ๋ถˆ๋Ÿฌ์™€์„œ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ์ผ๋ฐ˜์ ์œผ๋กœ ๋‘ ๊ฐ€์ง€ ํ˜•์‹์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก `.ckpt` ํŒŒ์ผ์„ ๐Ÿค— Diffusers๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ๋” ์ข‹์Šต๋‹ˆ๋‹ค.
`.ckpt` ํŒŒ์ผ์„ ๋ณ€ํ™˜ํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ์˜ต์…˜์ด ์žˆ์Šต๋‹ˆ๋‹ค. Space๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ณ€ํ™˜ํ•˜๊ฑฐ๋‚˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ `.ckpt` ํŒŒ์ผ์„ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
### Space๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ
`.ckpt` ํŒŒ์ผ์„ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฐ€์žฅ ์‰ฝ๊ณ  ํŽธ๋ฆฌํ•œ ๋ฐฉ๋ฒ•์€ SD์—์„œ Diffusers๋กœ ์ŠคํŽ˜์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Space์˜ ์ง€์นจ์— ๋”ฐ๋ผ .ckpt ํŒŒ์ผ์„ ๋ณ€ํ™˜ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ๊ธฐ๋ณธ ๋ชจ๋ธ์—์„œ๋Š” ์ž˜ ์ž‘๋™ํ•˜์ง€๋งŒ ๋” ๋งŽ์€ ์‚ฌ์šฉ์ž ์ •์˜ ๋ชจ๋ธ์—์„œ๋Š” ์–ด๋ ค์›€์„ ๊ฒช์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋นˆ pull request๋‚˜ ์˜ค๋ฅ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉด Space๊ฐ€ ์‹คํŒจํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ๊ฒฝ์šฐ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ `.ckpt` ํŒŒ์ผ์„ ๋ณ€ํ™˜ํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### ์Šคํฌ๋ฆฝํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ
๐Ÿค— Diffusers๋Š” `.ckpt`ย  ํŒŒ์ผ ๋ณ€ํ™˜์„ ์œ„ํ•œ ๋ณ€ํ™˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ์œ„์˜ Space๋ณด๋‹ค ๋” ์•ˆ์ •์ ์ž…๋‹ˆ๋‹ค.
์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•  ๐Ÿค— Diffusers์˜ ๋กœ์ปฌ ํด๋ก (clone)์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ  Hugging Face ๊ณ„์ •์— ๋กœ๊ทธ์ธํ•˜์—ฌ pull request๋ฅผ ์—ด๊ณ  ๋ณ€ํ™˜๋œ ๋ชจ๋ธ์„ ํ—ˆ๋ธŒ์— ํ‘ธ์‹œํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์„ธ์š”.
```bash
huggingface-cli login
```
์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด:
1. ๋ณ€ํ™˜ํ•˜๋ ค๋Š” `.ckpt`ย  ํŒŒ์ผ์ด ํฌํ•จ๋œ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ๋ฅผ Git์œผ๋กœ ํด๋ก (clone)ํ•ฉ๋‹ˆ๋‹ค.
์ด ์˜ˆ์ œ์—์„œ๋Š” TemporalNet .ckpt ํŒŒ์ผ์„ ๋ณ€ํ™˜ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:
```bash
git lfs install
git clone https://huggingface.co/CiaraRowles/TemporalNet
```
2. ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ณ€ํ™˜ํ•  ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ pull request๋ฅผ ์—ฝ๋‹ˆ๋‹ค:
```bash
cd TemporalNet && git fetch origin refs/pr/13:pr/13
git checkout pr/13
```
3. ๋ณ€ํ™˜ ์Šคํฌ๋ฆฝํŠธ์—์„œ ๊ตฌ์„ฑํ•  ์ž…๋ ฅ ์ธ์ˆ˜๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€๊ฐ€ ์žˆ์ง€๋งŒ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ธ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
- `checkpoint_path`: ๋ณ€ํ™˜ํ•  `.ckpt` ํŒŒ์ผ์˜ ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
- `original_config_file`: ์›๋ž˜ ์•„ํ‚คํ…์ฒ˜์˜ ๊ตฌ์„ฑ์„ ์ •์˜ํ•˜๋Š” YAML ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ์ด ํŒŒ์ผ์„ ์ฐพ์„ ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ `.ckpt` ํŒŒ์ผ์„ ์ฐพ์€ GitHub ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ YAML ํŒŒ์ผ์„ ๊ฒ€์ƒ‰ํ•ด ๋ณด์„ธ์š”.
- `dump_path`: ๋ณ€ํ™˜๋œ ๋ชจ๋ธ์˜ ๊ฒฝ๋กœ
์˜ˆ๋ฅผ ๋“ค์–ด, TemporalNet ๋ชจ๋ธ์€ Stable Diffusion v1.5 ๋ฐ ControlNet ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์— ControlNet ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ cldm_v15.yaml ํŒŒ์ผ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4. ์ด์ œ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜์—ฌ .ckpt ํŒŒ์ผ์„ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```bash
python ../diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py --checkpoint_path temporalnetv3.ckpt --original_config_file cldm_v15.yaml --dump_path ./ --controlnet
```
5. ๋ณ€ํ™˜์ด ์™„๋ฃŒ๋˜๋ฉด ๋ณ€ํ™˜๋œ ๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฌผ์„ pull requestย [pull request](https://huggingface.co/CiaraRowles/TemporalNet/discussions/13)๋ฅผ ํ…Œ์ŠคํŠธํ•˜์„ธ์š”!
```bash
git push origin pr/13:refs/pr/13
```
## **Keras .pb or .h5**
๐Ÿงช ์ด ๊ธฐ๋Šฅ์€ ์‹คํ—˜์ ์ธ ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค. ํ˜„์žฌ๋กœ์„œ๋Š” Stable Diffusion v1 ์ฒดํฌํฌ์ธํŠธ๋งŒ ๋ณ€ํ™˜ KerasCV Space์—์„œ ์ง€์›๋ฉ๋‹ˆ๋‹ค.
[KerasCV](https://keras.io/keras_cv/)๋Š” [Stable Diffusion](https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/stable_diffusion)ย  v1 ๋ฐ v2์— ๋Œ€ํ•œ ํ•™์Šต์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ถ”๋ก  ๋ฐ ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ Stable Diffusion ๋ชจ๋ธ ์‹คํ—˜์„ ์ œํ•œ์ ์œผ๋กœ ์ง€์›ํ•˜๋Š” ๋ฐ˜๋ฉด, ๐Ÿค— Diffusers๋Š” ๋‹ค์–‘ํ•œ [noise schedulers](https://huggingface.co/docs/diffusers/using-diffusers/schedulers),ย [flash attention](https://huggingface.co/docs/diffusers/optimization/xformers), andย [other optimization techniques](https://huggingface.co/docs/diffusers/optimization/fp16) ๋“ฑ ์ด๋Ÿฌํ•œ ๋ชฉ์ ์„ ์œ„ํ•œ ๋ณด๋‹ค ์™„๋ฒฝํ•œ ๊ธฐ๋Šฅ์„ ๊ฐ–์ถ”๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
[Convert KerasCV](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers)ย Space ๋ณ€ํ™˜์€ `.pb`ย ๋˜๋Š”ย `.h5`์„ PyTorch๋กœ ๋ณ€ํ™˜ํ•œ ๋‹ค์Œ, ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ๋„๋ก [`StableDiffusionPipeline`] ์œผ๋กœ ๊ฐ์‹ธ์„œ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค. ๋ณ€ํ™˜๋œ ์ฒดํฌํฌ์ธํŠธ๋Š” Hugging Face Hub์˜ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.
์˜ˆ์ œ๋กœ, textual-inversion์œผ๋กœ ํ•™์Šต๋œ `[sayakpaul/textual-inversion-kerasio](https://huggingface.co/sayakpaul/textual-inversion-kerasio/tree/main)`ย ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ณ€ํ™˜ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ํŠน์ˆ˜ ํ† ํฐ ย `<my-funny-cat>`์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์–‘์ด๋กœ ์ด๋ฏธ์ง€๋ฅผ ๊ฐœ์ธํ™”ํ•ฉ๋‹ˆ๋‹ค.
KerasCV Space ๋ณ€ํ™˜์—์„œ๋Š” ๋‹ค์Œ์„ ์ž…๋ ฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
- Hugging Face ํ† ํฐ.
- UNet ๊ณผ ํ…์ŠคํŠธ ์ธ์ฝ”๋”(text encoder) ๊ฐ€์ค‘์น˜๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๋Š” ๊ฒฝ๋กœ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ์–ด๋–ป๊ฒŒ ํ•™์Šตํ• ์ง€ ๋ฐฉ์‹์— ๋”ฐ๋ผ, UNet๊ณผ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์˜ ๊ฒฝ๋กœ๋ฅผ ๋ชจ๋‘ ์ œ๊ณตํ•  ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, textual-inversion์—๋Š” ํ…์ŠคํŠธ ์ธ์ฝ”๋”์˜ ์ž„๋ฒ ๋”ฉ๋งŒ ํ•„์š”ํ•˜๊ณ  ํ…์ŠคํŠธ-์ด๋ฏธ์ง€(text-to-image) ๋ชจ๋ธ ๋ณ€ํ™˜์—๋Š” UNet ๊ฐ€์ค‘์น˜๋งŒ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
- Placeholder ํ† ํฐ์€ textual-inversion ๋ชจ๋ธ์—๋งŒ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.
- `output_repo_prefix`๋Š” ๋ณ€ํ™˜๋œ ๋ชจ๋ธ์ด ์ €์žฅ๋˜๋Š” ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์˜ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค.
**Submit**ย (์ œ์ถœ) ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜๋ฉด KerasCV ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์ž๋™์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค! ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์„ฑ๊ณต์ ์œผ๋กœ ๋ณ€ํ™˜๋˜๋ฉด, ๋ณ€ํ™˜๋œ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ํฌํ•จ๋œ ์ƒˆ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ๋กœ ์—ฐ๊ฒฐ๋˜๋Š” ๋งํฌ๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. ์ƒˆ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ๋กœ ์—ฐ๊ฒฐ๋˜๋Š” ๋งํฌ๋ฅผ ๋”ฐ๋ผ๊ฐ€๋ฉด ๋ณ€ํ™˜๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ๋Š” ์ถ”๋ก  ์œ„์ ฏ์ด ํฌํ•จ๋œ ๋ชจ๋ธ ์นด๋“œ๊ฐ€ ์ƒ์„ฑ๋œ KerasCV Space ๋ณ€ํ™˜์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก ์„ ์‹คํ–‰ํ•˜๋ ค๋ฉด ๋ชจ๋ธ ์นด๋“œ์˜ ์˜ค๋ฅธ์ชฝ ์ƒ๋‹จ ๋ชจ์„œ๋ฆฌ์— ์žˆ๋Š” **Use in Diffusers**ย  ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜์—ฌ ์˜ˆ์‹œ ์ฝ”๋“œ๋ฅผ ๋ณต์‚ฌํ•˜์—ฌ ๋ถ™์—ฌ๋„ฃ์Šต๋‹ˆ๋‹ค:
```py
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline")
```
๊ทธ๋Ÿฌ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline")
pipeline.to("cuda")
placeholder_token = "<my-funny-cat-token>"
prompt = f"two {placeholder_token} getting married, photorealistic, high quality"
image = pipeline(prompt, num_inference_steps=50).images[0]
```
## **A1111 LoRA files**
[Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui)ย (A1111)์€ Stable Diffusion์„ ์œ„ํ•ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ์›น UI๋กœ,ย [Civitai](https://civitai.com/) ์™€ ๊ฐ™์€ ๋ชจ๋ธ ๊ณต์œ  ํ”Œ๋žซํผ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ LoRA ๊ธฐ๋ฒ•์œผ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์€ ํ•™์Šต ์†๋„๊ฐ€ ๋น ๋ฅด๊ณ  ์™„์ „ํžˆ ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ๋ณด๋‹ค ํŒŒ์ผ ํฌ๊ธฐ๊ฐ€ ํ›จ์”ฌ ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ์ธ๊ธฐ๊ฐ€ ๋†’์Šต๋‹ˆ๋‹ค.
๐Ÿค— Diffusers๋Š” [`~loaders.LoraLoaderMixin.load_lora_weights`]:๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ A1111 LoRA ์ฒดํฌํฌ์ธํŠธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค:
```py
from diffusers import DiffusionPipeline, UniPCMultistepScheduler
import torch
pipeline = DiffusionPipeline.from_pretrained(
"andite/anything-v4.0", torch_dtype=torch.float16, safety_checker=None
).to("cuda")
pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config)
```
Civitai์—์„œ LoRA ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์„ธ์š”; ์ด ์˜ˆ์ œ์—์„œ๋Š” ย [Howls Moving Castle,Interior/Scenery LoRA (Ghibli Stlye)](https://civitai.com/models/14605?modelVersionId=19998) ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ, ์–ด๋–ค LoRA ์ฒดํฌํฌ์ธํŠธ๋“  ์ž์œ ๋กญ๊ฒŒ ์‚ฌ์šฉํ•ด ๋ณด์„ธ์š”!
```bash
!wget https://civitai.com/api/download/models/19998 -O howls_moving_castle.safetensors
```
๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์— LoRA ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค:
```py
pipeline.load_lora_weights(".", weight_name="howls_moving_castle.safetensors")
```
์ด์ œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
prompt = "masterpiece, illustration, ultra-detailed, cityscape, san francisco, golden gate bridge, california, bay area, in the snow, beautiful detailed starry sky"
negative_prompt = "lowres, cropped, worst quality, low quality, normal quality, artifacts, signature, watermark, username, blurry, more than one bridge, bad architecture"
images = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
width=512,
height=512,
num_inference_steps=25,
num_images_per_prompt=4,
generator=torch.manual_seed(0),
).images
```
๋งˆ์ง€๋ง‰์œผ๋กœ, ๋””์Šคํ”Œ๋ ˆ์ด์— ์ด๋ฏธ์ง€๋ฅผ ํ‘œ์‹œํ•˜๋Š” ํ—ฌํผ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค:
```py
from PIL import Image
def image_grid(imgs, rows=2, cols=2):
w, h = imgs[0].size
grid = Image.new("RGB", size=(cols * w, rows * h))
for i, img in enumerate(imgs):
grid.paste(img, box=(i % cols * w, i // cols * h))
return grid
image_grid(images)
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/a1111-lora-sf.png" />
</div>

View File

@@ -0,0 +1,17 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Overview
ํŒŒ์ดํ”„๋ผ์ธ์€ ๋…๋ฆฝ์ ์œผ๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ํ•จ๊ป˜ ๋ชจ์•„์„œ ์ถ”๋ก ์„ ์œ„ํ•ด diffusion ์‹œ์Šคํ…œ์„ ๋น ๋ฅด๊ณ  ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•˜๋Š” end-to-end ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ์˜ ํŠน์ • ์กฐํ•ฉ์€ ํŠน์ˆ˜ํ•œ ๊ธฐ๋Šฅ๊ณผ ํ•จ๊ป˜ [`StableDiffusionPipeline`] ๋˜๋Š” [`StableDiffusionControlNetPipeline`]๊ณผ ๊ฐ™์€ ํŠน์ • ํŒŒ์ดํ”„๋ผ์ธ ์œ ํ˜•์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ํŒŒ์ดํ”„๋ผ์ธ ์œ ํ˜•์€ ๊ธฐ๋ณธ [`DiffusionPipeline`] ํด๋ž˜์Šค์—์„œ ์ƒ์†๋ฉ๋‹ˆ๋‹ค. ์–ด๋А ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ „๋‹ฌํ•˜๋ฉด, ํŒŒ์ดํ”„๋ผ์ธ ์œ ํ˜•์„ ์ž๋™์œผ๋กœ ๊ฐ์ง€ํ•˜๊ณ  ํ•„์š”ํ•œ ๊ตฌ์„ฑ ์š”์†Œ๋“ค์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
์ด ์„น์…˜์—์„œ๋Š” unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ, text-to-image ์ƒ์„ฑ์˜ ๋‹ค์–‘ํ•œ ํ…Œํฌ๋‹‰๊ณผ ๋ณ€ํ™”๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์ง€์›ํ•˜๋Š” ์ž‘์—…๋“ค์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กฌํ”„ํŠธ์— ์žˆ๋Š” ํŠน์ • ๋‹จ์–ด๊ฐ€ ์ถœ๋ ฅ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒƒ์„ ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ์žฌํ˜„์„ฑ์„ ์œ„ํ•œ ์‹œ๋“œ ์„ค์ •๊ณผ ํ”„๋กฌํ”„ํŠธ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ƒ์„ฑ ํ”„๋กœ์„ธ์Šค๋ฅผ ๋” ์ž˜ ์ œ์–ดํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋ฐฐ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์Œ์„ฑ์—์„œ๋ถ€ํ„ฐ ์ด๋ฏธ์ง€ ์ƒ์„ฑ๊ณผ ๊ฐ™์€ ์ปค์Šคํ…€ ์ž‘์—…์„ ์œ„ํ•œ ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

View File

@@ -0,0 +1,63 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Deterministic(๊ฒฐ์ •์ ) ์ƒ์„ฑ์„ ํ†ตํ•œ ์ด๋ฏธ์ง€ ํ’ˆ์งˆ ๊ฐœ์„ 
์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์˜ ํ’ˆ์งˆ์„ ๊ฐœ์„ ํ•˜๋Š” ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์€ *๊ฒฐ์ •์  batch(๋ฐฐ์น˜) ์ƒ์„ฑ*์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์ด๋ฏธ์ง€ batch(๋ฐฐ์น˜)๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋‘ ๋ฒˆ์งธ ์ถ”๋ก  ๋ผ์šด๋“œ์—์„œ ๋” ์ž์„ธํ•œ ํ”„๋กฌํ”„ํŠธ์™€ ํ•จ๊ป˜ ๊ฐœ์„ ํ•  ์ด๋ฏธ์ง€ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•ต์‹ฌ์€ ์ผ๊ด„ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ์œ„ํ•ด ํŒŒ์ดํ”„๋ผ์ธ์— [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html#generator) ๋ชฉ๋ก์„ ์ „๋‹ฌํ•˜๊ณ , ๊ฐ `Generator`๋ฅผ ์‹œ๋“œ์— ์—ฐ๊ฒฐํ•˜์—ฌ ์ด๋ฏธ์ง€์— ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด [`runwayml/stable-diffusion-v1-5`](runwayml/stable-diffusion-v1-5)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์Œ ํ”„๋กฌํ”„ํŠธ์˜ ์—ฌ๋Ÿฌ ๋ฒ„์ „์„ ์ƒ์„ฑํ•ด ๋ด…์‹œ๋‹ค.
```py
prompt = "Labrador in the style of Vermeer"
```
(๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด) ํŒŒ์ดํ”„๋ผ์ธ์„ [`DiffusionPipeline.from_pretrained`]๋กœ ์ธ์Šคํ„ด์Šคํ™”ํ•˜์—ฌ GPU์— ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค.
```python
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
>>> pipe = pipe.to("cuda")
```
์ด์ œ ๋„ค ๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ `Generator`๋ฅผ ์ •์˜ํ•˜๊ณ  ๊ฐ `Generator`์— ์‹œ๋“œ(`0` ~ `3`)๋ฅผ ํ• ๋‹นํ•˜์—ฌ ๋‚˜์ค‘์— ํŠน์ • ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด `Generator`๋ฅผ ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
```python
>>> import torch
>>> generator = [torch.Generator(device="cuda").manual_seed(i) for i in range(4)]
```
์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์‚ดํŽด๋ด…๋‹ˆ๋‹ค.
```python
>>> images = pipe(prompt, generator=generator, num_images_per_prompt=4).images
>>> images
```
![img](https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/reusabe_seeds.jpg)
์ด ์˜ˆ์ œ์—์„œ๋Š” ์ฒซ ๋ฒˆ์งธ ์ด๋ฏธ์ง€๋ฅผ ๊ฐœ์„ ํ–ˆ์ง€๋งŒ ์‹ค์ œ๋กœ๋Š” ์›ํ•˜๋Š” ๋ชจ๋“  ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์‹ฌ์ง€์–ด ๋‘ ๊ฐœ์˜ ๋ˆˆ์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋„!). ์ฒซ ๋ฒˆ์งธ ์ด๋ฏธ์ง€์—์„œ๋Š” ์‹œ๋“œ๊ฐ€ '0'์ธ '์ƒ์„ฑ๊ธฐ'๋ฅผ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋‘ ๋ฒˆ์งธ ์ถ”๋ก  ๋ผ์šด๋“œ์—์„œ๋Š” ์ด '์ƒ์„ฑ๊ธฐ'๋ฅผ ์žฌ์‚ฌ์šฉํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€์˜ ํ’ˆ์งˆ์„ ๊ฐœ์„ ํ•˜๋ ค๋ฉด ํ”„๋กฌํ”„ํŠธ์— ๋ช‡ ๊ฐ€์ง€ ํ…์ŠคํŠธ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค:
```python
prompt = [prompt + t for t in [", highly realistic", ", artsy", ", trending", ", colorful"]]
generator = [torch.Generator(device="cuda").manual_seed(0) for i in range(4)]
```
์‹œ๋“œ๊ฐ€ `0`์ธ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ 4๊ฐœ๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ด์ „ ๋ผ์šด๋“œ์˜ ์ฒซ ๋ฒˆ์งธ ์ด๋ฏธ์ง€์ฒ˜๋Ÿผ ๋ณด์ด๋Š” ๋‹ค๋ฅธ ์ด๋ฏธ์ง€ batch(๋ฐฐ์น˜)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค!
```python
>>> images = pipe(prompt, generator=generator).images
>>> images
```
![img](https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/reusabe_seeds_2.jpg)

View File

@@ -0,0 +1,329 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ์Šค์ผ€์ค„๋Ÿฌ
diffusion ํŒŒ์ดํ”„๋ผ์ธ์€ diffusion ๋ชจ๋ธ, ์Šค์ผ€์ค„๋Ÿฌ ๋“ฑ์˜ ์ปดํฌ๋„ŒํŠธ๋“ค๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํŒŒ์ดํ”„๋ผ์ธ ์•ˆ์˜ ์ผ๋ถ€ ์ปดํฌ๋„ŒํŠธ๋ฅผ ๋‹ค๋ฅธ ์ปดํฌ๋„ŒํŠธ๋กœ ๊ต์ฒดํ•˜๋Š” ์‹์˜ ์ปค์Šคํ„ฐ๋งˆ์ด์ง• ์—ญ์‹œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ์ปดํฌ๋„ŒํŠธ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•์˜ ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๊ฐ€ ๋ฐ”๋กœ [์Šค์ผ€์ค„๋Ÿฌ](../api/schedulers/overview.mdx)๋ฅผ ๊ต์ฒดํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์Šค์ผ€์ฅด๋Ÿฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด diffusion ์‹œ์Šคํ…œ์˜ ์ „๋ฐ˜์ ์ธ ๋””๋…ธ์ด์ง• ํ”„๋กœ์„ธ์Šค๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
- ๋””๋…ธ์ด์ง• ์Šคํ…์„ ์–ผ๋งˆ๋‚˜ ๊ฐ€์ ธ๊ฐ€์•ผ ํ• ๊นŒ?
- ํ™•๋ฅ ์ ์œผ๋กœ(stochastic) ํ˜น์€ ํ™•์ •์ ์œผ๋กœ(deterministic)?
- ๋””๋…ธ์ด์ง• ๋œ ์ƒ˜ํ”Œ์„ ์ฐพ์•„๋‚ด๊ธฐ ์œ„ํ•ด ์–ด๋–ค ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•ด์•ผ ํ• ๊นŒ?
์ด๋Ÿฌํ•œ ํ”„๋กœ์„ธ์Šค๋Š” ๋‹ค์†Œ ๋‚œํ•ดํ•˜๊ณ , ๋””๋…ธ์ด์ง• ์†๋„์™€ ๋””๋…ธ์ด์ง• ํ€„๋ฆฌํ‹ฐ ์‚ฌ์ด์˜ ํŠธ๋ ˆ์ด๋“œ ์˜คํ”„๋ฅผ ์ •์˜ํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ ํŒŒ์ดํ”„๋ผ์ธ์— ์–ด๋–ค ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ๊ฐ€์žฅ ์ ํ•ฉํ•œ์ง€๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ค์šด ์ผ์ž…๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ์ผ๋‹จ ํ•ด๋‹น ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ง์ ‘ ์‚ฌ์šฉํ•˜์—ฌ, ์ƒ์„ฑ๋˜๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ง์ ‘ ๋ˆˆ์œผ๋กœ ๋ณด๋ฉฐ, ์ •์„ฑ์ ์œผ๋กœ ์„ฑ๋Šฅ์„ ํŒ๋‹จํ•ด๋ณด๋Š” ๊ฒƒ์ด ์ถ”์ฒœ๋˜๊ณค ํ•ฉ๋‹ˆ๋‹ค.
## ํŒŒ์ดํ”„๋ผ์ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
๋จผ์ € ์Šคํ…Œ์ด๋ธ” diffusion ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ค๋„๋ก ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ก  ์Šคํ…Œ์ด๋ธ” diffusion์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ๋“ฑ๋ก๋œ ์‚ฌ์šฉ์ž์—ฌ์•ผ ํ•˜๋ฉฐ, ๊ด€๋ จ [๋ผ์ด์„ผ์Šค](https://huggingface.co/runwayml/stable-diffusion-v1-5)์— ๋™์˜ํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์„ ์žŠ์ง€ ๋ง์•„์ฃผ์„ธ์š”.
*์—ญ์ž ์ฃผ: ๋‹ค๋งŒ, ํ˜„์žฌ ์‹ ๊ทœ๋กœ ์ƒ์„ฑํ•œ ํ—ˆ๊น…ํŽ˜์ด์Šค ๊ณ„์ •์— ๋Œ€ํ•ด์„œ๋Š” ๋ผ์ด์„ผ์Šค ๋™์˜๋ฅผ ์š”๊ตฌํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค!*
```python
from huggingface_hub import login
from diffusers import DiffusionPipeline
import torch
# first we need to login with our access token
login()
# Now we can download the pipeline
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
```
๋‹ค์Œ์œผ๋กœ, GPU๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
```python
pipeline.to("cuda")
```
## ์Šค์ผ€์ค„๋Ÿฌ ์•ก์„ธ์Šค
์Šค์ผ€์ค„๋Ÿฌ๋Š” ์–ธ์ œ๋‚˜ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ปดํฌ๋„ŒํŠธ๋กœ์„œ ์กด์žฌํ•˜๋ฉฐ, ์ผ๋ฐ˜์ ์œผ๋กœ ํŒŒ์ดํ”„๋ผ์ธ ์ธ์Šคํ„ด์Šค ๋‚ด์— `scheduler`๋ผ๋Š” ์ด๋ฆ„์˜ ์†์„ฑ(property)์œผ๋กœ ์ •์˜๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
```python
pipeline.scheduler
```
**Output**:
```
PNDMScheduler {
"_class_name": "PNDMScheduler",
"_diffusers_version": "0.8.0.dev0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": false,
"num_train_timesteps": 1000,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null
}
```
์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด, ์šฐ๋ฆฌ๋Š” ํ•ด๋‹น ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ [`PNDMScheduler`]์˜ ์ธ์Šคํ„ด์Šค๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ [`PNDMScheduler`]์™€ ๋‹ค๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ๋“ค์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋จผ์ € ํ…Œ์ŠคํŠธ์— ์‚ฌ์šฉํ•  ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
```python
prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition."
```
๋‹ค์Œ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด์„œ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋žœ๋ค์‹œ๋“œ๋ฅผ ๊ณ ์ •ํ•ด์ฃผ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
```python
generator = torch.Generator(device="cuda").manual_seed(8)
image = pipeline(prompt, generator=generator).images[0]
image
```
<p align="center">
<br>
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_pndm.png" width="400"/>
<br>
</p>
## ์Šค์ผ€์ค„๋Ÿฌ ๊ต์ฒดํ•˜๊ธฐ
๋‹ค์Œ์œผ๋กœ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋‹ค๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ๋กœ ๊ต์ฒดํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  ์Šค์ผ€์ค„๋Ÿฌ๋Š” [`SchedulerMixin.compatibles`]๋ผ๋Š” ์†์„ฑ(property)์„ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ์†์„ฑ์€ **ํ˜ธํ™˜ ๊ฐ€๋Šฅํ•œ** ์Šค์ผ€์ค„๋Ÿฌ๋“ค์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
```python
pipeline.scheduler.compatibles
```
**Output**:
```
[diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler,
diffusers.schedulers.scheduling_ddim.DDIMScheduler,
diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler,
diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler,
diffusers.schedulers.scheduling_pndm.PNDMScheduler,
diffusers.schedulers.scheduling_ddpm.DDPMScheduler,
diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler]
```
ํ˜ธํ™˜๋˜๋Š” ์Šค์ผ€์ค„๋Ÿฌ๋“ค์„ ์‚ดํŽด๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
- [`LMSDiscreteScheduler`],
- [`DDIMScheduler`],
- [`DPMSolverMultistepScheduler`],
- [`EulerDiscreteScheduler`],
- [`PNDMScheduler`],
- [`DDPMScheduler`],
- [`EulerAncestralDiscreteScheduler`].
์•ž์„œ ์ •์˜ํ–ˆ๋˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ฐ๊ฐ์˜ ์Šค์ผ€์ค„๋Ÿฌ๋“ค์„ ๋น„๊ตํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
๋จผ์ € ํŒŒ์ดํ”„๋ผ์ธ ์•ˆ์˜ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋ฐ”๊พธ๊ธฐ ์œ„ํ•ด [`ConfigMixin.config`] ์†์„ฑ๊ณผ [`ConfigMixin.from_config`] ๋ฉ”์„œ๋“œ๋ฅผ ํ™œ์šฉํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
```python
pipeline.scheduler.config
```
**Output**:
```
FrozenDict([('num_train_timesteps', 1000),
('beta_start', 0.00085),
('beta_end', 0.012),
('beta_schedule', 'scaled_linear'),
('trained_betas', None),
('skip_prk_steps', True),
('set_alpha_to_one', False),
('steps_offset', 1),
('_class_name', 'PNDMScheduler'),
('_diffusers_version', '0.8.0.dev0'),
('clip_sample', False)])
```
๊ธฐ์กด ์Šค์ผ€์ค„๋Ÿฌ์˜ config๋ฅผ ํ˜ธํ™˜ ๊ฐ€๋Šฅํ•œ ๋‹ค๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ์— ์ด์‹ํ•˜๋Š” ๊ฒƒ ์—ญ์‹œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๋‹ค์Œ ์˜ˆ์‹œ๋Š” ๊ธฐ์กด ์Šค์ผ€์ค„๋Ÿฌ(`pipeline.scheduler`)๋ฅผ ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ ์Šค์ผ€์ค„๋Ÿฌ(`DDIMScheduler`)๋กœ ๋ฐ”๊พธ๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ๊ฐ–๊ณ  ์žˆ๋˜ config๋ฅผ `.from_config` ๋ฉ”์„œ๋“œ์˜ ์ธ์ž๋กœ ์ „๋‹ฌํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
from diffusers import DDIMScheduler
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
```
์ด์ œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•ด์„œ ๋‘ ์Šค์ผ€์ค„๋Ÿฌ ์‚ฌ์ด์˜ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์˜ ํ€„๋ฆฌํ‹ฐ๋ฅผ ๋น„๊ตํ•ด๋ด…์‹œ๋‹ค.
```python
generator = torch.Generator(device="cuda").manual_seed(8)
image = pipeline(prompt, generator=generator).images[0]
image
```
<p align="center">
<br>
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_ddim.png" width="400"/>
<br>
</p>
## ์Šค์ผ€์ค„๋Ÿฌ๋“ค ๋น„๊ตํ•ด๋ณด๊ธฐ
์ง€๊ธˆ๊นŒ์ง€๋Š” [`PNDMScheduler`]์™€ [`DDIMScheduler`] ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์‹คํ–‰ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์•„์ง ๋น„๊ตํ•ด๋ณผ ์Šค์ผ€์ค„๋Ÿฌ๋“ค์ด ๋” ๋งŽ์ด ๋‚จ์•„์žˆ์œผ๋‹ˆ ๊ณ„์† ๋น„๊ตํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
[`LMSDiscreteScheduler`]์„ ์ผ๋ฐ˜์ ์œผ๋กœ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
```python
from diffusers import LMSDiscreteScheduler
pipeline.scheduler = LMSDiscreteScheduler.from_config(pipeline.scheduler.config)
generator = torch.Generator(device="cuda").manual_seed(8)
image = pipeline(prompt, generator=generator).images[0]
image
```
<p align="center">
<br>
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_lms.png" width="400"/>
<br>
</p>
[`EulerDiscreteScheduler`]์™€ [`EulerAncestralDiscreteScheduler`] ๊ณ ์ž‘ 30๋ฒˆ์˜ inference step๋งŒ์œผ๋กœ๋„ ๋†’์€ ํ€„๋ฆฌํ‹ฐ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
from diffusers import EulerDiscreteScheduler
pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
generator = torch.Generator(device="cuda").manual_seed(8)
image = pipeline(prompt, generator=generator, num_inference_steps=30).images[0]
image
```
<p align="center">
<br>
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_euler_discrete.png" width="400"/>
<br>
</p>
```python
from diffusers import EulerAncestralDiscreteScheduler
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)
generator = torch.Generator(device="cuda").manual_seed(8)
image = pipeline(prompt, generator=generator, num_inference_steps=30).images[0]
image
```
<p align="center">
<br>
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_euler_ancestral.png" width="400"/>
<br>
</p>
์ง€๊ธˆ ์ด ๋ฌธ์„œ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ํ˜„์‹œ์  ๊ธฐ์ค€์—์„ , [`DPMSolverMultistepScheduler`]๊ฐ€ ์‹œ๊ฐ„ ๋Œ€๋น„ ๊ฐ€์žฅ ์ข‹์€ ํ’ˆ์งˆ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 20๋ฒˆ ์ •๋„์˜ ์Šคํ…๋งŒ์œผ๋กœ๋„ ์‹คํ–‰๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```python
from diffusers import DPMSolverMultistepScheduler
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
generator = torch.Generator(device="cuda").manual_seed(8)
image = pipeline(prompt, generator=generator, num_inference_steps=20).images[0]
image
```
<p align="center">
<br>
<img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/diffusers_docs/astronaut_dpm.png" width="400"/>
<br>
</p>
๋ณด์‹œ๋‹ค์‹œํ”ผ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋“ค์€ ๋งค์šฐ ๋น„์Šทํ•˜๊ณ , ๋น„์Šทํ•œ ํ€„๋ฆฌํ‹ฐ๋ฅผ ๋ณด์ด๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์–ด๋–ค ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์„ ํƒํ•  ๊ฒƒ์ธ๊ฐ€๋Š” ์ข…์ข… ํŠน์ • ์ด์šฉ ์‚ฌ๋ก€์— ๊ธฐ๋ฐ˜ํ•ด์„œ ๊ฒฐ์ •๋˜๊ณค ํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ง์ ‘ ์‹คํ–‰์‹œ์ผœ๋ณด๊ณ  ๋ˆˆ์œผ๋กœ ์ง์ ‘ ๋น„๊ตํ•ด์„œ ํŒ๋‹จํ•˜๋Š” ๊ฒŒ ์ข‹์€ ์„ ํƒ์ผ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
## Flax์—์„œ ์Šค์ผ€์ค„๋Ÿฌ ๊ต์ฒดํ•˜๊ธฐ
JAX/Flax ์‚ฌ์šฉ์ž์ธ ๊ฒฝ์šฐ ๊ธฐ๋ณธ ํŒŒ์ดํ”„๋ผ์ธ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋ณ€๊ฒฝํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ Flax Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ๊ณผ ์ดˆ๊ณ ์† [DDPM-Solver++ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ](../api/schedulers/multistep_dpm_solver) ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค .
```Python
import jax
import numpy as np
from flax.jax_utils import replicate
from flax.training.common_utils import shard
from diffusers import FlaxStableDiffusionPipeline, FlaxDPMSolverMultistepScheduler
model_id = "runwayml/stable-diffusion-v1-5"
scheduler, scheduler_state = FlaxDPMSolverMultistepScheduler.from_pretrained(
model_id,
subfolder="scheduler"
)
pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
model_id,
scheduler=scheduler,
revision="bf16",
dtype=jax.numpy.bfloat16,
)
params["scheduler"] = scheduler_state
# Generate 1 image per parallel device (8 on TPUv2-8 or TPUv3-8)
prompt = "a photo of an astronaut riding a horse on mars"
num_samples = jax.device_count()
prompt_ids = pipeline.prepare_inputs([prompt] * num_samples)
prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 25
# shard inputs and rng
params = replicate(params)
prng_seed = jax.random.split(prng_seed, jax.device_count())
prompt_ids = shard(prompt_ids)
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
```
<Tip warning={true}>
๋‹ค์Œ Flax ์Šค์ผ€์ค„๋Ÿฌ๋Š” *์•„์ง* Flax Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ๊ณผ ํ˜ธํ™˜๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
- `FlaxLMSDiscreteScheduler`
- `FlaxDDPMScheduler`
</Tip>

View File

@@ -0,0 +1,54 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ
[[Colab์—์„œ ์—ด๊ธฐ]]
Unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ์€ ๋น„๊ต์  ๊ฐ„๋‹จํ•œ ์ž‘์—…์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ํ…์ŠคํŠธ๋‚˜ ์ด๋ฏธ์ง€์™€ ๊ฐ™์€ ์ถ”๊ฐ€ ์กฐ๊ฑด ์—†์ด ์ด๋ฏธ ํ•™์Šต๋œ ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๋งŒ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
['DiffusionPipeline']์€ ์ถ”๋ก ์„ ์œ„ํ•ด ๋ฏธ๋ฆฌ ํ•™์Šต๋œ diffusion ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
๋จผ์ € ['DiffusionPipeline']์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋‹ค์šด๋กœ๋“œํ•  ํŒŒ์ดํ”„๋ผ์ธ์˜ [์ฒดํฌํฌ์ธํŠธ](https://huggingface.co/models?library=diffusers&sort=downloads)๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ํ—ˆ๋ธŒ์˜ ๐Ÿงจ diffusion ์ฒดํฌํฌ์ธํŠธ ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์‚ฌ์šฉํ•  ์ฒดํฌํฌ์ธํŠธ๋Š” ๋‚˜๋น„ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค).
<Tip>
๐Ÿ’ก ๋‚˜๋งŒ์˜ unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ณ  ์‹ถ์œผ์‹ ๊ฐ€์š”? ํ•™์Šต ๊ฐ€์ด๋“œ๋ฅผ ์‚ดํŽด๋ณด๊ณ  ๋‚˜๋งŒ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด์„ธ์š”.
</Tip>
์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ์— ['DiffusionPipeline']๊ณผ [DDPM](https://arxiv.org/abs/2006.11239)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:
```python
>>> from diffusers import DiffusionPipeline
>>> generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128")
```
[diffusion ํŒŒ์ดํ”„๋ผ์ธ]์€ ๋ชจ๋“  ๋ชจ๋ธ๋ง, ํ† ํฐํ™”, ์Šค์ผ€์ค„๋ง ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์บ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์•ฝ 14์–ต ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— GPU์—์„œ ์‹คํ–‰ํ•  ๊ฒƒ์„ ๊ฐ•๋ ฅํžˆ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. PyTorch์—์„œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ๊ฐ์ฒด๋ฅผ GPU๋กœ ์˜ฎ๊ธธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
>>> generator.to("cuda")
```
์ด์ œ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
>>> image = generator().images[0]
```
์ถœ๋ ฅ์€ ๊ธฐ๋ณธ์ ์œผ๋กœ [PIL.Image](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) ๊ฐ์ฒด๋กœ ๊ฐ์‹ธ์ง‘๋‹ˆ๋‹ค.
๋‹ค์Œ์„ ํ˜ธ์ถœํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
>>> image.save("generated_image.png")
```
์•„๋ž˜ ์ŠคํŽ˜์ด์Šค(๋ฐ๋ชจ ๋งํฌ)๋ฅผ ์ด์šฉํ•ด ๋ณด๊ณ , ์ถ”๋ก  ๋‹จ๊ณ„์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ž์œ ๋กญ๊ฒŒ ์กฐ์ ˆํ•˜์—ฌ ์ด๋ฏธ์ง€ ํ’ˆ์งˆ์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ํ™•์ธํ•ด ๋ณด์„ธ์š”!
<iframe src="https://stevhliu-ddpm-butterflies-128.hf.space" frameborder="0" width="850" height="500"></iframe>

View File

@@ -0,0 +1,14 @@
# ์„ธ์ดํ”„์„ผ์„œ๋ž€ ๋ฌด์—‡์ธ๊ฐ€์š”?
[์„ธ์ดํ”„ํ…์„œ](https://github.com/huggingface/safetensors)๋Š” ํ”ผํด์„ ์‚ฌ์šฉํ•˜๋Š” ํŒŒ์ดํ† ์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ์กด์˜ '.bin'๊ณผ๋Š” ๋‹ค๋ฅธ ํ˜•์‹์ž…๋‹ˆ๋‹ค.
ํ”ผํด์€ ์•…์˜์ ์ธ ํŒŒ์ผ์ด ์ž„์˜์˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์•ˆ์ „ํ•˜์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ์•…๋ช…์ด ๋†’์Šต๋‹ˆ๋‹ค.
ํ—ˆ๋ธŒ ์ž์ฒด์—์„œ ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์ง€๋งŒ ๋งŒ๋ณ‘ํ†ต์น˜์•ฝ์€ ์•„๋‹™๋‹ˆ๋‹ค.
์„ธ์ดํ”„ํ…์„œ์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ชฉํ‘œ๋Š” ์ปดํ“จํ„ฐ๋ฅผ ํƒˆ์ทจํ•  ์ˆ˜ ์—†๋‹ค๋Š” ์˜๋ฏธ์—์„œ ๋จธ์‹  ๋Ÿฌ๋‹ ๋ชจ๋ธ ๋กœ๋”ฉ์„ *์•ˆ์ „ํ•˜๊ฒŒ* ๋งŒ๋“œ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
# ์™œ ์„ธ์ดํ”„์„ผ์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋‚˜์š”?
**์ž˜ ์•Œ๋ ค์ง€์ง€ ์•Š์€ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋Š” ๊ฒฝ์šฐ, ๊ทธ๋ฆฌ๊ณ  ํŒŒ์ผ์˜ ์ถœ์ฒ˜๊ฐ€ ํ™•์‹คํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ "์•ˆ์ „์„ฑ"์ด ํ•˜๋‚˜์˜ ์ด์œ ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ๋‘ ๋ฒˆ์งธ ์ด์œ ๋Š” **๋กœ๋”ฉ ์†๋„**์ž…๋‹ˆ๋‹ค. ์„ธ์ดํ”„์„ผ์„œ๋Š” ์ผ๋ฐ˜ ํ”ผํด ํŒŒ์ผ๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ๋ชจ๋ธ์„ ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ์ „ํ™˜ํ•˜๋Š” ๋ฐ ๋งŽ์€ ์‹œ๊ฐ„์„ ์†Œ๋น„ํ•˜๋Š” ๊ฒฝ์šฐ, ์ด๋Š” ์—„์ฒญ๋‚œ ์‹œ๊ฐ„ ์ ˆ์•ฝ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

View File

@@ -0,0 +1,290 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ํŒŒ์ดํ”„๋ผ์ธ, ๋ชจ๋ธ ๋ฐ ์Šค์ผ€์ค„๋Ÿฌ ์ดํ•ดํ•˜๊ธฐ
[[colab์—์„œ ์—ด๊ธฐ]]
๐Ÿงจ Diffusers๋Š” ์‚ฌ์šฉ์ž ์นœํ™”์ ์ด๋ฉฐ ์œ ์—ฐํ•œ ๋„๊ตฌ ์ƒ์ž๋กœ, ์‚ฌ์šฉ์‚ฌ๋ก€์— ๋งž๊ฒŒ diffusion ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ• ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋„๊ตฌ ์ƒ์ž์˜ ํ•ต์‹ฌ์€ ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ์ž…๋‹ˆ๋‹ค. [`DiffusionPipeline`]์€ ํŽธ์˜๋ฅผ ์œ„ํ•ด ์ด๋Ÿฌํ•œ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๋ฒˆ๋“ค๋กœ ์ œ๊ณตํ•˜์ง€๋งŒ, ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถ„๋ฆฌํ•˜๊ณ  ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์‚ฌ์šฉํ•ด ์ƒˆ๋กœ์šด diffusion ์‹œ์Šคํ…œ์„ ๋งŒ๋“ค ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ๊ธฐ๋ณธ ํŒŒ์ดํ”„๋ผ์ธ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ๊นŒ์ง€ ์ง„ํ–‰ํ•˜๋ฉฐ ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•ด ์ถ”๋ก ์„ ์œ„ํ•œ diffusion ์‹œ์Šคํ…œ์„ ์กฐ๋ฆฝํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์›๋‹ˆ๋‹ค.
## ๊ธฐ๋ณธ ํŒŒ์ดํ”„๋ผ์ธ ํ•ด์ฒดํ•˜๊ธฐ
ํŒŒ์ดํ”„๋ผ์ธ์€ ์ถ”๋ก ์„ ์œ„ํ•ด ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋Š” ๋น ๋ฅด๊ณ  ์‰ฌ์šด ๋ฐฉ๋ฒ•์œผ๋กœ, ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์ฝ”๋“œ๊ฐ€ 4์ค„ ์ด์ƒ ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค:
```py
>>> from diffusers import DDPMPipeline
>>> ddpm = DDPMPipeline.from_pretrained("google/ddpm-cat-256").to("cuda")
>>> image = ddpm(num_inference_steps=25).images[0]
>>> image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ddpm-cat.png" alt="Image of cat created from DDPMPipeline"/>
</div>
์ •๋ง ์‰ฝ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ์€ ์–ด๋–ป๊ฒŒ ์ด๋ ‡๊ฒŒ ํ•  ์ˆ˜ ์žˆ์—ˆ์„๊นŒ์š”? ํŒŒ์ดํ”„๋ผ์ธ์„ ์„ธ๋ถ„ํ™”ํ•˜์—ฌ ๋‚ด๋ถ€์—์„œ ์–ด๋–ค ์ผ์ด ์ผ์–ด๋‚˜๊ณ  ์žˆ๋Š”์ง€ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
์œ„ ์˜ˆ์‹œ์—์„œ ํŒŒ์ดํ”„๋ผ์ธ์—๋Š” [`UNet2DModel`] ๋ชจ๋ธ๊ณผ [`DDPMScheduler`]๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋ผ์ธ์€ ์›ํ•˜๋Š” ์ถœ๋ ฅ ํฌ๊ธฐ์˜ ๋žœ๋ค ๋…ธ์ด์ฆˆ๋ฅผ ๋ฐ›์•„ ๋ชจ๋ธ์„ ์—ฌ๋Ÿฌ๋ฒˆ ํ†ต๊ณผ์‹œ์ผœ ์ด๋ฏธ์ง€์˜ ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. ๊ฐ timestep์—์„œ ๋ชจ๋ธ์€ *noise residual*์„ ์˜ˆ์ธกํ•˜๊ณ  ์Šค์ผ€์ค„๋Ÿฌ๋Š” ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ด๋ฏธ์ง€๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋ผ์ธ์€ ์ง€์ •๋œ ์ถ”๋ก  ์Šคํ…์ˆ˜์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.
๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋ณ„๋„๋กœ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋‹ค์‹œ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ž์ฒด์ ์ธ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ž‘์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
1. ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค:
```py
>>> from diffusers import DDPMScheduler, UNet2DModel
>>> scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
>>> model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
```
2. ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ์‹คํ–‰ํ•  timestep ์ˆ˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค:
```py
>>> scheduler.set_timesteps(50)
```
3. ์Šค์ผ€์ค„๋Ÿฌ์˜ timestep์„ ์„ค์ •ํ•˜๋ฉด ๊ท ๋“ฑํ•œ ๊ฐ„๊ฒฉ์˜ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๊ฐ€์ง„ ํ…์„œ๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.(์ด ์˜ˆ์‹œ์—์„œ๋Š” 50๊ฐœ) ๊ฐ ์š”์†Œ๋Š” ๋ชจ๋ธ์ด ์ด๋ฏธ์ง€์˜ ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๋‚˜์ค‘์— ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„๋ฅผ ๋งŒ๋“ค ๋•Œ ์ด ํ…์„œ๋ฅผ ๋ฐ˜๋ณตํ•˜์—ฌ ์ด๋ฏธ์ง€์˜ ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค:
```py
>>> scheduler.timesteps
tensor([980, 960, 940, 920, 900, 880, 860, 840, 820, 800, 780, 760, 740, 720,
700, 680, 660, 640, 620, 600, 580, 560, 540, 520, 500, 480, 460, 440,
420, 400, 380, 360, 340, 320, 300, 280, 260, 240, 220, 200, 180, 160,
140, 120, 100, 80, 60, 40, 20, 0])
```
4. ์›ํ•˜๋Š” ์ถœ๋ ฅ๊ณผ ๊ฐ™์€ ๋ชจ์–‘์„ ๊ฐ€์ง„ ๋žœ๋ค ๋…ธ์ด์ฆˆ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
```py
>>> import torch
>>> sample_size = model.config.sample_size
>>> noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda")
```
5. ์ด์ œ timestep์„ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฃจํ”„๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ฐ timestep์—์„œ ๋ชจ๋ธ์€ [`UNet2DModel.forward`]๋ฅผ ํ†ตํ•ด noisy residual์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์Šค์ผ€์ค„๋Ÿฌ์˜ [`~DDPMScheduler.step`] ๋ฉ”์„œ๋“œ๋Š” noisy residual, timestep, ๊ทธ๋ฆฌ๊ณ  ์ž…๋ ฅ์„ ๋ฐ›์•„ ์ด์ „ timestep์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ด ์ถœ๋ ฅ์€ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„์˜ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋‹ค์Œ ์ž…๋ ฅ์ด ๋˜๋ฉฐ, `timesteps` ๋ฐฐ์—ด์˜ ๋์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณต๋ฉ๋‹ˆ๋‹ค.
```py
>>> input = noise
>>> for t in scheduler.timesteps:
... with torch.no_grad():
... noisy_residual = model(input, t).sample
... previous_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
... input = previous_noisy_sample
```
์ด๊ฒƒ์ด ์ „์ฒด ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค์ด๋ฉฐ, ๋™์ผํ•œ ํŒจํ„ด์„ ์‚ฌ์šฉํ•ด ๋ชจ๋“  diffusion ์‹œ์Šคํ…œ์„ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
6. ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„๋Š” ๋…ธ์ด์ฆˆ๊ฐ€ ์ œ๊ฑฐ๋œ ์ถœ๋ ฅ์„ ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค:
```py
>>> from PIL import Image
>>> import numpy as np
>>> image = (input / 2 + 0.5).clamp(0, 1)
>>> image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
>>> image = Image.fromarray((image * 255).round().astype("uint8"))
>>> image
```
๋‹ค์Œ ์„น์…˜์—์„œ๋Š” ์—ฌ๋Ÿฌ๋ถ„์˜ ๊ธฐ์ˆ ์„ ์‹œํ—˜ํ•ด๋ณด๊ณ  ์ข€ ๋” ๋ณต์žกํ•œ Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถ„์„ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ฐฉ๋ฒ•์€ ๊ฑฐ์˜ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ํ•„์š”ํ•œ ๊ตฌ์„ฑ์š”์†Œ๋“ค์„ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  timestep์ˆ˜๋ฅผ ์„ค์ •ํ•˜์—ฌ `timestep` ๋ฐฐ์—ด์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„์—์„œ `timestep` ๋ฐฐ์—ด์ด ์‚ฌ์šฉ๋˜๋ฉฐ, ์ด ๋ฐฐ์—ด์˜ ๊ฐ ์š”์†Œ์— ๋Œ€ํ•ด ๋ชจ๋ธ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ด๋ฏธ์ง€๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„๋Š” `timestep`์„ ๋ฐ˜๋ณตํ•˜๊ณ  ๊ฐ timestep์—์„œ noise residual์„ ์ถœ๋ ฅํ•˜๊ณ  ์Šค์ผ€์ค„๋Ÿฌ๋Š” ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด์ „ timestep์—์„œ ๋…ธ์ด์ฆˆ๊ฐ€ ๋œํ•œ ์ด๋ฏธ์ง€๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์„ธ์Šค๋Š” `timestep` ๋ฐฐ์—ด์˜ ๋์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณต๋ฉ๋‹ˆ๋‹ค.
ํ•œ๋ฒˆ ์‚ฌ์šฉํ•ด ๋ด…์‹œ๋‹ค!
## Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ ํ•ด์ฒดํ•˜๊ธฐ
Stable Diffusion ์€ text-to-image *latent diffusion* ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. latent diffusion ๋ชจ๋ธ์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ด์œ ๋Š” ์‹ค์ œ ํ”ฝ์…€ ๊ณต๊ฐ„ ๋Œ€์‹  ์ด๋ฏธ์ง€์˜ ์ €์ฐจ์›์˜ ํ‘œํ˜„์œผ๋กœ ์ž‘์—…ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๊ณ , ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ด ๋” ๋†’์Šต๋‹ˆ๋‹ค. ์ธ์ฝ”๋”๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋” ์ž‘์€ ํ‘œํ˜„์œผ๋กœ ์••์ถ•ํ•˜๊ณ , ๋””์ฝ”๋”๋Š” ์••์ถ•๋œ ํ‘œํ˜„์„ ๋‹ค์‹œ ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. text-to-image ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด tokenizer์™€ ์ธ์ฝ”๋”๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ ์˜ˆ์ œ์—์„œ ์ด๋ฏธ UNet ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์€ ์•Œ๊ณ  ๊ณ„์…จ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๋ณด์‹œ๋‹ค์‹œํ”ผ, ์ด๊ฒƒ์€ UNet ๋ชจ๋ธ๋งŒ ํฌํ•จ๋œ DDPM ํŒŒ์ดํ”„๋ผ์ธ๋ณด๋‹ค ๋” ๋ณต์žกํ•ฉ๋‹ˆ๋‹ค. Stable Diffusion ๋ชจ๋ธ์—๋Š” ์„ธ ๊ฐœ์˜ ๊ฐœ๋ณ„ ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
<Tip>
๐Ÿ’ก VAE, UNet ๋ฐ ํ…์ŠคํŠธ ์ธ์ฝ”๋” ๋ชจ๋ธ์˜ ์ž‘๋™๋ฐฉ์‹์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [How does Stable Diffusion work?](https://huggingface.co/blog/stable_diffusion#how-does-stable-diffusion-work) ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
</Tip>
์ด์ œ Stable Diffusion ํŒŒ์ดํ”„๋ผ์ธ์— ํ•„์š”ํ•œ ๊ตฌ์„ฑ์š”์†Œ๋“ค์ด ๋ฌด์—‡์ธ์ง€ ์•Œ์•˜์œผ๋‹ˆ, [`~ModelMixin.from_pretrained`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด ๋ชจ๋“  ๊ตฌ์„ฑ์š”์†Œ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. ์‚ฌ์ „ํ•™์Šต๋œ ์ฒดํฌํฌ์ธํŠธ [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ฐ ๊ตฌ์„ฑ์š”์†Œ๋“ค์€ ๋ณ„๋„์˜ ํ•˜์œ„ ํด๋”์— ์ €์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค:
```py
>>> from PIL import Image
>>> import torch
>>> from transformers import CLIPTextModel, CLIPTokenizer
>>> from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler
>>> vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae")
>>> tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="tokenizer")
>>> text_encoder = CLIPTextModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="text_encoder")
>>> unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet")
```
๊ธฐ๋ณธ [`PNDMScheduler`] ๋Œ€์‹ , [`UniPCMultistepScheduler`]๋กœ ๊ต์ฒดํ•˜์—ฌ ๋‹ค๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์–ผ๋งˆ๋‚˜ ์‰ฝ๊ฒŒ ์—ฐ๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค:
```py
>>> from diffusers import UniPCMultistepScheduler
>>> scheduler = UniPCMultistepScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler")
```
์ถ”๋ก  ์†๋„๋ฅผ ๋†’์ด๋ ค๋ฉด ์Šค์ผ€์ค„๋Ÿฌ์™€ ๋‹ฌ๋ฆฌ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๊ฐ€์ค‘์น˜๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ๋ชจ๋ธ์„ GPU๋กœ ์˜ฎ๊ธฐ์„ธ์š”:
```py
>>> torch_device = "cuda"
>>> vae.to(torch_device)
>>> text_encoder.to(torch_device)
>>> unet.to(torch_device)
```
### ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑํ•˜๊ธฐ
๋‹ค์Œ ๋‹จ๊ณ„๋Š” ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ํ…์ŠคํŠธ๋ฅผ ํ† ํฐํ™”ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ํ…์ŠคํŠธ๋Š” UNet ๋ชจ๋ธ์—์„œ condition์œผ๋กœ ์‚ฌ์šฉ๋˜๊ณ  ์ž…๋ ฅ ํ”„๋กฌํ”„ํŠธ์™€ ์œ ์‚ฌํ•œ ๋ฐฉํ–ฅ์œผ๋กœ diffusion ํ”„๋กœ์„ธ์Šค๋ฅผ ์กฐ์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
<Tip>
๐Ÿ’ก `guidance_scale` ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ํ”„๋กฌํ”„ํŠธ์— ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ• ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
</Tip>
๋‹ค๋ฅธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์›ํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž์œ ๋กญ๊ฒŒ ์„ ํƒํ•˜์„ธ์š”!
```py
>>> prompt = ["a photograph of an astronaut riding a horse"]
>>> height = 512 # Stable Diffusion์˜ ๊ธฐ๋ณธ ๋†’์ด
>>> width = 512 # Stable Diffusion์˜ ๊ธฐ๋ณธ ๋„ˆ๋น„
>>> num_inference_steps = 25 # ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ์Šคํ… ์ˆ˜
>>> guidance_scale = 7.5 # classifier-free guidance๋ฅผ ์œ„ํ•œ scale
>>> generator = torch.manual_seed(0) # ์ดˆ๊ธฐ ์ž ์žฌ ๋…ธ์ด์ฆˆ๋ฅผ ์ƒ์„ฑํ•˜๋Š” seed generator
>>> batch_size = len(prompt)
```
ํ…์ŠคํŠธ๋ฅผ ํ† ํฐํ™”ํ•˜๊ณ  ํ”„๋กฌํ”„ํŠธ์—์„œ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
```py
>>> text_input = tokenizer(
... prompt, padding="max_length", max_length=tokenizer.model_max_length, truncation=True, return_tensors="pt"
... )
>>> with torch.no_grad():
... text_embeddings = text_encoder(text_input.input_ids.to(torch_device))[0]
```
๋˜ํ•œ ํŒจ๋”ฉ ํ† ํฐ์˜ ์ž„๋ฒ ๋”ฉ์ธ *unconditional ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ*์„ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ž„๋ฒ ๋”ฉ์€ ์กฐ๊ฑด๋ถ€ `text_embeddings`๊ณผ ๋™์ผํ•œ shape(`batch_size` ๊ทธ๋ฆฌ๊ณ  `seq_length`)์„ ๊ฐ€์ ธ์•ผ ํ•ฉ๋‹ˆ๋‹ค:
```py
>>> max_length = text_input.input_ids.shape[-1]
>>> uncond_input = tokenizer([""] * batch_size, padding="max_length", max_length=max_length, return_tensors="pt")
>>> uncond_embeddings = text_encoder(uncond_input.input_ids.to(torch_device))[0]
```
๋‘๋ฒˆ์˜ forward pass๋ฅผ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด conditional ์ž„๋ฒ ๋”ฉ๊ณผ unconditional ์ž„๋ฒ ๋”ฉ์„ ๋ฐฐ์น˜(batch)๋กœ ์—ฐ๊ฒฐํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค:
```py
>>> text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
```
### ๋žœ๋ค ๋…ธ์ด์ฆˆ ์ƒ์„ฑ
๊ทธ๋‹ค์Œ diffusion ํ”„๋กœ์„ธ์Šค์˜ ์‹œ์ž‘์ ์œผ๋กœ ์ดˆ๊ธฐ ๋žœ๋ค ๋…ธ์ด์ฆˆ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ์ด๋ฏธ์ง€์˜ ์ž ์žฌ์  ํ‘œํ˜„์ด๋ฉฐ ์ ์ฐจ์ ์œผ๋กœ ๋…ธ์ด์ฆˆ๊ฐ€ ์ œ๊ฑฐ๋ฉ๋‹ˆ๋‹ค. ์ด ์‹œ์ ์—์„œ `latent` ์ด๋ฏธ์ง€๋Š” ์ตœ์ข… ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ณด๋‹ค ์ž‘์ง€๋งŒ ๋‚˜์ค‘์— ๋ชจ๋ธ์ด ์ด๋ฅผ 512x512 ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋กœ ๋ณ€ํ™˜ํ•˜๋ฏ€๋กœ ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค.
<Tip>
๐Ÿ’ก `vae` ๋ชจ๋ธ์—๋Š” 3๊ฐœ์˜ ๋‹ค์šด ์ƒ˜ํ”Œ๋ง ๋ ˆ์ด์–ด๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋†’์ด์™€ ๋„ˆ๋น„๊ฐ€ 8๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์„ ์‹คํ–‰ํ•˜์—ฌ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
2 ** (len(vae.config.block_out_channels) - 1) == 8
```
</Tip>
```py
>>> latents = torch.randn(
... (batch_size, unet.in_channels, height // 8, width // 8),
... generator=generator,
... )
>>> latents = latents.to(torch_device)
```
### ์ด๋ฏธ์ง€ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ
๋จผ์ € [`UniPCMultistepScheduler`]์™€ ๊ฐ™์€ ํ–ฅ์ƒ๋œ ์Šค์ผ€์ค„๋Ÿฌ์— ํ•„์š”ํ•œ ๋…ธ์ด์ฆˆ ์Šค์ผ€์ผ ๊ฐ’์ธ ์ดˆ๊ธฐ ๋…ธ์ด์ฆˆ ๋ถ„ํฌ *sigma* ๋กœ ์ž…๋ ฅ์„ ์Šค์ผ€์ผ๋ง ํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค:
```py
>>> latents = latents * scheduler.init_noise_sigma
```
๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„๋Š” `latent`์˜ ์ˆœ์ˆ˜ํ•œ ๋…ธ์ด์ฆˆ๋ฅผ ์ ์ง„์ ์œผ๋กœ ํ”„๋กฌํ”„ํŠธ์— ์„ค๋ช…๋œ ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„๋Š” ์„ธ ๊ฐ€์ง€ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์„ ๊ธฐ์–ตํ•˜์„ธ์š”:
1. ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ์ค‘์— ์‚ฌ์šฉํ•  ์Šค์ผ€์ค„๋Ÿฌ์˜ timesteps๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
2. timestep์„ ๋”ฐ๋ผ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.
3. ๊ฐ timestep์—์„œ UNet ๋ชจ๋ธ์„ ํ˜ธ์ถœํ•˜์—ฌ noise residual์„ ์˜ˆ์ธกํ•˜๊ณ  ์Šค์ผ€์ค„๋Ÿฌ์— ์ „๋‹ฌํ•˜์—ฌ ์ด์ „ ๋…ธ์ด์ฆˆ ์ƒ˜ํ”Œ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
```py
>>> from tqdm.auto import tqdm
>>> scheduler.set_timesteps(num_inference_steps)
>>> for t in tqdm(scheduler.timesteps):
... # classifier-free guidance๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ ๋‘๋ฒˆ์˜ forward pass๋ฅผ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๋„๋ก latent๋ฅผ ํ™•์žฅ.
... latent_model_input = torch.cat([latents] * 2)
... latent_model_input = scheduler.scale_model_input(latent_model_input, timestep=t)
... # noise residual ์˜ˆ์ธก
... with torch.no_grad():
... noise_pred = unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
... # guidance ์ˆ˜ํ–‰
... noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
... noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
... # ์ด์ „ ๋…ธ์ด์ฆˆ ์ƒ˜ํ”Œ์„ ๊ณ„์‚ฐ x_t -> x_t-1
... latents = scheduler.step(noise_pred, t, latents).prev_sample
```
### ์ด๋ฏธ์ง€ ๋””์ฝ”๋”ฉ
๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„๋Š” `vae`๋ฅผ ์ด์šฉํ•˜์—ฌ ์ž ์žฌ ํ‘œํ˜„์„ ์ด๋ฏธ์ง€๋กœ ๋””์ฝ”๋”ฉํ•˜๊ณ  `sample`๊ณผ ํ•จ๊ป˜ ๋””์ฝ”๋”ฉ๋œ ์ถœ๋ ฅ์„ ์–ป๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค:
```py
# latent๋ฅผ ์Šค์ผ€์ผ๋งํ•˜๊ณ  vae๋กœ ์ด๋ฏธ์ง€ ๋””์ฝ”๋”ฉ
latents = 1 / 0.18215 * latents
with torch.no_grad():
image = vae.decode(latents).sample
```
๋งˆ์ง€๋ง‰์œผ๋กœ ์ด๋ฏธ์ง€๋ฅผ `PIL.Image`๋กœ ๋ณ€ํ™˜ํ•˜๋ฉด ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!
```py
>>> image = (image / 2 + 0.5).clamp(0, 1)
>>> image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
>>> images = (image * 255).round().astype("uint8")
>>> pil_images = [Image.fromarray(image) for image in images]
>>> pil_images[0]
```
<div class="flex justify-center">
<img src="https://huggingface.co/blog/assets/98_stable_diffusion/stable_diffusion_k_lms.png"/>
</div>
## ๋‹ค์Œ ๋‹จ๊ณ„
๊ธฐ๋ณธ ํŒŒ์ดํ”„๋ผ์ธ๋ถ€ํ„ฐ ๋ณต์žกํ•œ ํŒŒ์ดํ”„๋ผ์ธ๊นŒ์ง€, ์ž์‹ ๋งŒ์˜ diffusion ์‹œ์Šคํ…œ์„ ์ž‘์„ฑํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ฒƒ์€ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„๋ฟ์ด๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฃจํ”„๋Š” ์Šค์ผ€์ค„๋Ÿฌ์˜ timesteps๋ฅผ ์„ค์ •ํ•˜๊ณ , ์ด๋ฅผ ๋ฐ˜๋ณตํ•˜๋ฉฐ, UNet ๋ชจ๋ธ์„ ํ˜ธ์ถœํ•˜์—ฌ noise residual์„ ์˜ˆ์ธกํ•˜๊ณ  ์Šค์ผ€์ค„๋Ÿฌ์— ์ „๋‹ฌํ•˜์—ฌ ์ด์ „ ๋…ธ์ด์ฆˆ ์ƒ˜ํ”Œ์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ •์„ ๋ฒˆ๊ฐˆ์•„ ๊ฐ€๋ฉฐ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์ด๊ฒƒ์ด ๋ฐ”๋กœ ๐Ÿงจ Diffusers๊ฐ€ ์„ค๊ณ„๋œ ๋ชฉ์ ์ž…๋‹ˆ๋‹ค: ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•ด ์ž์‹ ๋งŒ์˜ diffusion ์‹œ์Šคํ…œ์„ ์ง๊ด€์ ์ด๊ณ  ์‰ฝ๊ฒŒ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด์„œ์ž…๋‹ˆ๋‹ค.
๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ์ž์œ ๋กญ๊ฒŒ ์ง„ํ–‰ํ•˜์„ธ์š”:
* ๐Ÿงจ Diffusers์— [ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ• ๋ฐ ๊ธฐ์—ฌ](using-diffusers/#contribute_pipeline)ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด์„ธ์š”. ์—ฌ๋Ÿฌ๋ถ„์ด ์–ด๋–ค ์•„์ด๋””์–ด๋ฅผ ๋‚ด๋†“์„์ง€ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค!
* ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ [๊ธฐ๋ณธ ํŒŒ์ดํ”„๋ผ์ธ](./api/pipelines/overview)์„ ์‚ดํŽด๋ณด๊ณ , ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋ณ„๋„๋กœ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•ด์ฒดํ•˜๊ณ  ๋นŒ๋“œํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด ๋ณด์„ธ์š”.