mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
300 lines
12 KiB
Markdown
300 lines
12 KiB
Markdown
<!--Copyright 2025 Custom Diffusion authors The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# ์ปค์คํ
Diffusion ํ์ต ์์
|
|
|
|
[์ปค์คํ
Diffusion](https://huggingface.co/papers/2212.04488)์ ํผ์ฌ์ฒด์ ์ด๋ฏธ์ง ๋ช ์ฅ(4~5์ฅ)๋ง ์ฃผ์ด์ง๋ฉด Stable Diffusion์ฒ๋ผ text-to-image ๋ชจ๋ธ์ ์ปค์คํฐ๋ง์ด์งํ๋ ๋ฐฉ๋ฒ์
๋๋ค.
|
|
'train_custom_diffusion.py' ์คํฌ๋ฆฝํธ๋ ํ์ต ๊ณผ์ ์ ๊ตฌํํ๊ณ ์ด๋ฅผ Stable Diffusion์ ๋ง๊ฒ ์กฐ์ ํ๋ ๋ฐฉ๋ฒ์ ๋ณด์ฌ์ค๋๋ค.
|
|
|
|
์ด ๊ต์ก ์ฌ๋ก๋ [Nupur Kumari](https://nupurkmr9.github.io/)๊ฐ ์ ๊ณตํ์์ต๋๋ค. (Custom Diffusion์ ์ ์ ์ค ํ๋ช
).
|
|
|
|
## ๋ก์ปฌ์์ PyTorch๋ก ์คํํ๊ธฐ
|
|
|
|
### Dependencies ์ค์นํ๊ธฐ
|
|
|
|
์คํฌ๋ฆฝํธ๋ฅผ ์คํํ๊ธฐ ์ ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ํ์ต dependencies๋ฅผ ์ค์นํด์ผ ํฉ๋๋ค:
|
|
|
|
**์ค์**
|
|
|
|
์์ ์คํฌ๋ฆฝํธ์ ์ต์ ๋ฒ์ ์ ์ฑ๊ณต์ ์ผ๋ก ์คํํ๋ ค๋ฉด **์์ค๋ก๋ถํฐ ์ค์น**ํ๋ ๊ฒ์ ๋งค์ฐ ๊ถ์ฅํ๋ฉฐ, ์์ ์คํฌ๋ฆฝํธ๋ฅผ ์์ฃผ ์
๋ฐ์ดํธํ๋ ๋งํผ ์ผ๋ถ ์์ ๋ณ ์๊ตฌ ์ฌํญ์ ์ค์นํ๊ณ ์ค์น๋ฅผ ์ต์ ์ํ๋ก ์ ์งํ๋ ๊ฒ์ด ์ข์ต๋๋ค. ์ด๋ฅผ ์ํด ์ ๊ฐ์ ํ๊ฒฝ์์ ๋ค์ ๋จ๊ณ๋ฅผ ์คํํ์ธ์:
|
|
|
|
|
|
```bash
|
|
git clone https://github.com/huggingface/diffusers
|
|
cd diffusers
|
|
pip install -e .
|
|
```
|
|
|
|
[example folder](https://github.com/huggingface/diffusers/tree/main/examples/custom_diffusion)๋ก cdํ์ฌ ์ด๋ํ์ธ์.
|
|
|
|
```
|
|
cd examples/custom_diffusion
|
|
```
|
|
|
|
์ด์ ์คํ
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
pip install clip-retrieval
|
|
```
|
|
|
|
๊ทธ๋ฆฌ๊ณ [๐คAccelerate](https://github.com/huggingface/accelerate/) ํ๊ฒฝ์ ์ด๊ธฐํ:
|
|
|
|
```bash
|
|
accelerate config
|
|
```
|
|
|
|
๋๋ ์ฌ์ฉ์ ํ๊ฒฝ์ ๋ํ ์ง๋ฌธ์ ๋ตํ์ง ์๊ณ ๊ธฐ๋ณธ ๊ฐ์ ๊ตฌ์ฑ์ ์ฌ์ฉํ๋ ค๋ฉด ๋ค์๊ณผ ๊ฐ์ด ํ์ธ์.
|
|
|
|
```bash
|
|
accelerate config default
|
|
```
|
|
|
|
๋๋ ์ฌ์ฉ ์ค์ธ ํ๊ฒฝ์ด ๋ํํ ์
ธ์ ์ง์ํ์ง ์๋ ๊ฒฝ์ฐ(์: jupyter notebook)
|
|
|
|
```python
|
|
from accelerate.utils import write_basic_config
|
|
|
|
write_basic_config()
|
|
```
|
|
### ๊ณ ์์ด ์์ ๐บ
|
|
|
|
์ด์ ๋ฐ์ดํฐ์
์ ๊ฐ์ ธ์ต๋๋ค. [์ฌ๊ธฐ](https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip)์์ ๋ฐ์ดํฐ์
์ ๋ค์ด๋ก๋ํ๊ณ ์์ถ์ ํ๋๋ค. ์ง์ ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ๋ ค๋ฉด [ํ์ต์ฉ ๋ฐ์ดํฐ์
์์ฑํ๊ธฐ](create_dataset) ๊ฐ์ด๋๋ฅผ ์ฐธ๊ณ ํ์ธ์.
|
|
|
|
๋ํ 'clip-retrieval'์ ์ฌ์ฉํ์ฌ 200๊ฐ์ ์ค์ ์ด๋ฏธ์ง๋ฅผ ์์งํ๊ณ , regularization์ผ๋ก์ ์ด๋ฅผ ํ์ต ๋ฐ์ดํฐ์
์ ํ๊ฒ ์ด๋ฏธ์ง์ ๊ฒฐํฉํฉ๋๋ค. ์ด๋ ๊ฒ ํ๋ฉด ์ฃผ์ด์ง ํ๊ฒ ์ด๋ฏธ์ง์ ๋ํ ๊ณผ์ ํฉ์ ๋ฐฉ์งํ ์ ์์ต๋๋ค. ๋ค์ ํ๋๊ทธ๋ฅผ ์ฌ์ฉํ๋ฉด `prior_loss_weight=1.`๋ก `prior_preservation`, `real_prior` regularization์ ํ์ฑํํ ์ ์์ต๋๋ค.
|
|
ํด๋์ค_ํ๋กฌํํธ`๋ ๋์ ์ด๋ฏธ์ง์ ๋์ผํ ์นดํ
๊ณ ๋ฆฌ ์ด๋ฆ์ด์ด์ผ ํฉ๋๋ค. ์์ง๋ ์ค์ ์ด๋ฏธ์ง์๋ `class_prompt`์ ์ ์ฌํ ํ
์คํธ ์บก์
์ด ์์ต๋๋ค. ๊ฒ์๋ ์ด๋ฏธ์ง๋ `class_data_dir`์ ์ ์ฅ๋ฉ๋๋ค. ์์ฑ๋ ์ด๋ฏธ์ง๋ฅผ regularization์ผ๋ก ์ฌ์ฉํ๊ธฐ ์ํด `real_prior`๋ฅผ ๋นํ์ฑํํ ์ ์์ต๋๋ค. ์ค์ ์ด๋ฏธ์ง๋ฅผ ์์งํ๋ ค๋ฉด ํ๋ จ ์ ์ ์ด ๋ช
๋ น์ ๋จผ์ ์ฌ์ฉํ์ญ์์ค.
|
|
|
|
```bash
|
|
pip install clip-retrieval
|
|
python retrieve.py --class_prompt cat --class_data_dir real_reg/samples_cat --num_class_images 200
|
|
```
|
|
|
|
**___์ฐธ๊ณ : [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 ๋ชจ๋ธ์ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ 'ํด์๋'๋ฅผ 768๋ก ๋ณ๊ฒฝํ์ธ์.___**
|
|
|
|
์คํฌ๋ฆฝํธ๋ ๋ชจ๋ธ ์ฒดํฌํฌ์ธํธ์ `pytorch_custom_diffusion_weights.bin` ํ์ผ์ ์์ฑํ์ฌ ์ ์ฅ์์ ์ ์ฅํฉ๋๋ค.
|
|
|
|
```bash
|
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
|
|
export OUTPUT_DIR="path-to-save-model"
|
|
export INSTANCE_DIR="./data/cat"
|
|
|
|
accelerate launch train_custom_diffusion.py \
|
|
--pretrained_model_name_or_path=$MODEL_NAME \
|
|
--instance_data_dir=$INSTANCE_DIR \
|
|
--output_dir=$OUTPUT_DIR \
|
|
--class_data_dir=./real_reg/samples_cat/ \
|
|
--with_prior_preservation --real_prior --prior_loss_weight=1.0 \
|
|
--class_prompt="cat" --num_class_images=200 \
|
|
--instance_prompt="photo of a <new1> cat" \
|
|
--resolution=512 \
|
|
--train_batch_size=2 \
|
|
--learning_rate=1e-5 \
|
|
--lr_warmup_steps=0 \
|
|
--max_train_steps=250 \
|
|
--scale_lr --hflip \
|
|
--modifier_token "<new1>" \
|
|
--push_to_hub
|
|
```
|
|
|
|
**๋ ๋ฎ์ VRAM ์๊ตฌ ์ฌํญ(GPU๋น 16GB)์ผ๋ก ๋ ๋น ๋ฅด๊ฒ ํ๋ จํ๋ ค๋ฉด `--enable_xformers_memory_efficient_attention`์ ์ฌ์ฉํ์ธ์. ์ค์น ๋ฐฉ๋ฒ์ [๊ฐ์ด๋](https://github.com/facebookresearch/xformers)๋ฅผ ๋ฐ๋ฅด์ธ์.**
|
|
|
|
๊ฐ์ค์น ๋ฐ ํธํฅ(`wandb`)์ ์ฌ์ฉํ์ฌ ์คํ์ ์ถ์ ํ๊ณ ์ค๊ฐ ๊ฒฐ๊ณผ๋ฅผ ์ ์ฅํ๋ ค๋ฉด(๊ฐ๋ ฅํ ๊ถ์ฅํฉ๋๋ค) ๋ค์ ๋จ๊ณ๋ฅผ ๋ฐ๋ฅด์ธ์:
|
|
|
|
* `wandb` ์ค์น: `pip install wandb`.
|
|
* ๋ก๊ทธ์ธ : `wandb login`.
|
|
* ๊ทธ๋ฐ ๋ค์ ํธ๋ ์ด๋์ ์์ํ๋ ๋์ `validation_prompt`๋ฅผ ์ง์ ํ๊ณ `report_to`๋ฅผ `wandb`๋ก ์ค์ ํฉ๋๋ค. ๋ค์๊ณผ ๊ฐ์ ๊ด๋ จ ์ธ์๋ฅผ ๊ตฌ์ฑํ ์๋ ์์ต๋๋ค:
|
|
* `num_validation_images`
|
|
* `validation_steps`
|
|
|
|
```bash
|
|
accelerate launch train_custom_diffusion.py \
|
|
--pretrained_model_name_or_path=$MODEL_NAME \
|
|
--instance_data_dir=$INSTANCE_DIR \
|
|
--output_dir=$OUTPUT_DIR \
|
|
--class_data_dir=./real_reg/samples_cat/ \
|
|
--with_prior_preservation --real_prior --prior_loss_weight=1.0 \
|
|
--class_prompt="cat" --num_class_images=200 \
|
|
--instance_prompt="photo of a <new1> cat" \
|
|
--resolution=512 \
|
|
--train_batch_size=2 \
|
|
--learning_rate=1e-5 \
|
|
--lr_warmup_steps=0 \
|
|
--max_train_steps=250 \
|
|
--scale_lr --hflip \
|
|
--modifier_token "<new1>" \
|
|
--validation_prompt="<new1> cat sitting in a bucket" \
|
|
--report_to="wandb" \
|
|
--push_to_hub
|
|
```
|
|
|
|
๋ค์์ [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/26ghrcau)์ ์์์ด๋ฉฐ, ์ฌ๋ฌ ํ์ต ์ธ๋ถ ์ ๋ณด์ ํจ๊ป ์ค๊ฐ ๊ฒฐ๊ณผ๋ค์ ํ์ธํ ์ ์์ต๋๋ค.
|
|
|
|
`--push_to_hub`๋ฅผ ์ง์ ํ๋ฉด ํ์ต๋ ํ๋ผ๋ฏธํฐ๊ฐ ํ๊น
ํ์ด์ค ํ๋ธ์ ๋ฆฌํฌ์งํ ๋ฆฌ์ ํธ์๋ฉ๋๋ค. ๋ค์์ [์์ ๋ฆฌํฌ์งํ ๋ฆฌ](https://huggingface.co/sayakpaul/custom-diffusion-cat)์
๋๋ค.
|
|
|
|
### ๋ฉํฐ ์ปจ์
์ ๋ํ ํ์ต ๐ฑ๐ชต
|
|
|
|
[this](https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)์ ์ ์ฌํ๊ฒ ๊ฐ ์ปจ์
์ ๋ํ ์ ๋ณด๊ฐ ํฌํจ๋ [json](https://github.com/adobe-research/custom-diffusion/blob/main/assets/concept_list.json) ํ์ผ์ ์ ๊ณตํฉ๋๋ค.
|
|
|
|
์ค์ ์ด๋ฏธ์ง๋ฅผ ์์งํ๋ ค๋ฉด json ํ์ผ์ ๊ฐ ์ปจ์
์ ๋ํด ์ด ๋ช
๋ น์ ์คํํฉ๋๋ค.
|
|
|
|
```bash
|
|
pip install clip-retrieval
|
|
python retrieve.py --class_prompt {} --class_data_dir {} --num_class_images 200
|
|
```
|
|
|
|
๊ทธ๋ผ ์ฐ๋ฆฌ๋ ํ์ต์ํฌ ์ค๋น๊ฐ ๋์์ต๋๋ค!
|
|
|
|
```bash
|
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
|
|
export OUTPUT_DIR="path-to-save-model"
|
|
|
|
accelerate launch train_custom_diffusion.py \
|
|
--pretrained_model_name_or_path=$MODEL_NAME \
|
|
--output_dir=$OUTPUT_DIR \
|
|
--concepts_list=./concept_list.json \
|
|
--with_prior_preservation --real_prior --prior_loss_weight=1.0 \
|
|
--resolution=512 \
|
|
--train_batch_size=2 \
|
|
--learning_rate=1e-5 \
|
|
--lr_warmup_steps=0 \
|
|
--max_train_steps=500 \
|
|
--num_class_images=200 \
|
|
--scale_lr --hflip \
|
|
--modifier_token "<new1>+<new2>" \
|
|
--push_to_hub
|
|
```
|
|
|
|
๋ค์์ [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/3990tzkg)์ ์์์ด๋ฉฐ, ๋ค๋ฅธ ํ์ต ์ธ๋ถ ์ ๋ณด์ ํจ๊ป ์ค๊ฐ ๊ฒฐ๊ณผ๋ค์ ํ์ธํ ์ ์์ต๋๋ค.
|
|
|
|
### ์ฌ๋ ์ผ๊ตด์ ๋ํ ํ์ต
|
|
|
|
์ฌ๋ ์ผ๊ตด์ ๋ํ ํ์ธํ๋์ ์ํด ๋ค์๊ณผ ๊ฐ์ ์ค์ ์ด ๋ ํจ๊ณผ์ ์ด๋ผ๋ ๊ฒ์ ํ์ธํ์ต๋๋ค: `learning_rate=5e-6`, `max_train_steps=1000 to 2000`, `freeze_model=crossattn`์ ์ต์ 15~20๊ฐ์ ์ด๋ฏธ์ง๋ก ์ค์ ํฉ๋๋ค.
|
|
|
|
์ค์ ์ด๋ฏธ์ง๋ฅผ ์์งํ๋ ค๋ฉด ํ๋ จ ์ ์ ์ด ๋ช
๋ น์ ๋จผ์ ์ฌ์ฉํ์ญ์์ค.
|
|
|
|
```bash
|
|
pip install clip-retrieval
|
|
python retrieve.py --class_prompt person --class_data_dir real_reg/samples_person --num_class_images 200
|
|
```
|
|
|
|
์ด์ ํ์ต์ ์์ํ์ธ์!
|
|
|
|
```bash
|
|
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
|
|
export OUTPUT_DIR="path-to-save-model"
|
|
export INSTANCE_DIR="path-to-images"
|
|
|
|
accelerate launch train_custom_diffusion.py \
|
|
--pretrained_model_name_or_path=$MODEL_NAME \
|
|
--instance_data_dir=$INSTANCE_DIR \
|
|
--output_dir=$OUTPUT_DIR \
|
|
--class_data_dir=./real_reg/samples_person/ \
|
|
--with_prior_preservation --real_prior --prior_loss_weight=1.0 \
|
|
--class_prompt="person" --num_class_images=200 \
|
|
--instance_prompt="photo of a <new1> person" \
|
|
--resolution=512 \
|
|
--train_batch_size=2 \
|
|
--learning_rate=5e-6 \
|
|
--lr_warmup_steps=0 \
|
|
--max_train_steps=1000 \
|
|
--scale_lr --hflip --noaug \
|
|
--freeze_model crossattn \
|
|
--modifier_token "<new1>" \
|
|
--enable_xformers_memory_efficient_attention \
|
|
--push_to_hub
|
|
```
|
|
|
|
## ์ถ๋ก
|
|
|
|
์ ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ํ์ต์ํจ ํ์๋ ์๋ ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํ์ฌ ์ถ๋ก ์ ์คํํ ์ ์์ต๋๋ค. ํ๋กฌํํธ์ 'modifier token'(์: ์ ์์ ์์๋ \<new1\>)์ ๋ฐ๋์ ํฌํจํด์ผ ํฉ๋๋ค.
|
|
|
|
```python
|
|
import torch
|
|
from diffusers import DiffusionPipeline
|
|
|
|
pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16).to("cuda")
|
|
pipe.unet.load_attn_procs("path-to-save-model", weight_name="pytorch_custom_diffusion_weights.bin")
|
|
pipe.load_textual_inversion("path-to-save-model", weight_name="<new1>.bin")
|
|
|
|
image = pipe(
|
|
"<new1> cat sitting in a bucket",
|
|
num_inference_steps=100,
|
|
guidance_scale=6.0,
|
|
eta=1.0,
|
|
).images[0]
|
|
image.save("cat.png")
|
|
```
|
|
|
|
ํ๋ธ ๋ฆฌํฌ์งํ ๋ฆฌ์์ ์ด๋ฌํ ๋งค๊ฐ๋ณ์๋ฅผ ์ง์ ๋ก๋ํ ์ ์์ต๋๋ค:
|
|
|
|
```python
|
|
import torch
|
|
from huggingface_hub.repocard import RepoCard
|
|
from diffusers import DiffusionPipeline
|
|
|
|
model_id = "sayakpaul/custom-diffusion-cat"
|
|
card = RepoCard.load(model_id)
|
|
base_model_id = card.data.to_dict()["base_model"]
|
|
|
|
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16).to("cuda")
|
|
pipe.unet.load_attn_procs(model_id, weight_name="pytorch_custom_diffusion_weights.bin")
|
|
pipe.load_textual_inversion(model_id, weight_name="<new1>.bin")
|
|
|
|
image = pipe(
|
|
"<new1> cat sitting in a bucket",
|
|
num_inference_steps=100,
|
|
guidance_scale=6.0,
|
|
eta=1.0,
|
|
).images[0]
|
|
image.save("cat.png")
|
|
```
|
|
|
|
๋ค์์ ์ฌ๋ฌ ์ปจ์
์ผ๋ก ์ถ๋ก ์ ์ํํ๋ ์์ ์
๋๋ค:
|
|
|
|
```python
|
|
import torch
|
|
from huggingface_hub.repocard import RepoCard
|
|
from diffusers import DiffusionPipeline
|
|
|
|
model_id = "sayakpaul/custom-diffusion-cat-wooden-pot"
|
|
card = RepoCard.load(model_id)
|
|
base_model_id = card.data.to_dict()["base_model"]
|
|
|
|
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16).to("cuda")
|
|
pipe.unet.load_attn_procs(model_id, weight_name="pytorch_custom_diffusion_weights.bin")
|
|
pipe.load_textual_inversion(model_id, weight_name="<new1>.bin")
|
|
pipe.load_textual_inversion(model_id, weight_name="<new2>.bin")
|
|
|
|
image = pipe(
|
|
"the <new1> cat sculpture in the style of a <new2> wooden pot",
|
|
num_inference_steps=100,
|
|
guidance_scale=6.0,
|
|
eta=1.0,
|
|
).images[0]
|
|
image.save("multi-subject.png")
|
|
```
|
|
|
|
์ฌ๊ธฐ์ '๊ณ ์์ด'์ '๋๋ฌด ๋๋น'๋ ์ฌ๋ฌ ์ปจ์
์ ๋งํฉ๋๋ค.
|
|
|
|
### ํ์ต๋ ์ฒดํฌํฌ์ธํธ์์ ์ถ๋ก ํ๊ธฐ
|
|
|
|
`--checkpointing_steps` ์ธ์๋ฅผ ์ฌ์ฉํ ๊ฒฝ์ฐ ํ์ต ๊ณผ์ ์์ ์ ์ฅ๋ ์ ์ฒด ์ฒดํฌํฌ์ธํธ ์ค ํ๋์์ ์ถ๋ก ์ ์ํํ ์๋ ์์ต๋๋ค.
|
|
|
|
## Grads๋ฅผ None์ผ๋ก ์ค์
|
|
|
|
๋ ๋ง์ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ ์ฝํ๋ ค๋ฉด ์คํฌ๋ฆฝํธ์ `--set_grads_to_none` ์ธ์๋ฅผ ์ ๋ฌํ์ธ์. ์ด๋ ๊ฒ ํ๋ฉด ์ฑ์ ์ด 0์ด ์๋ ์์์ผ๋ก ์ค์ ๋ฉ๋๋ค. ๊ทธ๋ฌ๋ ํน์ ๋์์ด ๋ณ๊ฒฝ๋๋ฏ๋ก ๋ฌธ์ ๊ฐ ๋ฐ์ํ๋ฉด ์ด ์ธ์๋ฅผ ์ ๊ฑฐํ์ธ์.
|
|
|
|
์์ธํ ์ ๋ณด: https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html
|
|
|
|
## ์คํ ๊ฒฐ๊ณผ
|
|
|
|
์คํ์ ๋ํ ์์ธํ ๋ด์ฉ์ [๋น์ฌ ์นํ์ด์ง](https://www.cs.cmu.edu/~custom-diffusion/)๋ฅผ ์ฐธ์กฐํ์ธ์. |