1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Use HF Papers (#11567)

* Use HF Papers

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This commit is contained in:
Quentin Gallouรฉdec
2025-05-19 09:22:33 -07:00
committed by GitHub
parent 799adf4a10
commit c8bb1ff53e
507 changed files with 2312 additions and 2293 deletions

View File

@@ -64,7 +64,7 @@ diffusion ๋ชจ๋ธ ์ƒ์„ฑ์„ ์ œ์–ดํ•˜๊ธฐ ์œ„ํ•ด `diffusers`๊ฐ€ ์ง€์›ํ•˜๋Š” ๋ช‡
## Pix2Pix Instruct
[Paper](https://arxiv.org/abs/2211.09800)
[Paper](https://huggingface.co/papers/2211.09800)
[Instruct Pix2Pix](../api/pipelines/stable_diffusion/pix2pix) ๋Š” ์ž…๋ ฅ ์ด๋ฏธ์ง€ ํŽธ์ง‘์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•ด stable diffusion์—์„œ ๋ฏธ์„ธ-์กฐ์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€์™€ ํŽธ์ง‘์„ ์„ค๋ช…ํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ํŽธ์ง‘๋œ ์ด๋ฏธ์ง€๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
Instruct Pix2Pix๋Š” [InstructGPT](https://openai.com/blog/instruction-following/)์™€ ๊ฐ™์€ ํ”„๋กฌํ”„ํŠธ์™€ ์ž˜ ์ž‘๋™ํ•˜๋„๋ก ๋ช…์‹œ์ ์œผ๋กœ ํ›ˆ๋ จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
@@ -73,7 +73,7 @@ Instruct Pix2Pix๋Š” [InstructGPT](https://openai.com/blog/instruction-following/
## Pix2Pix Zero
[Paper](https://arxiv.org/abs/2302.03027)
[Paper](https://huggingface.co/papers/2302.03027)
[Pix2Pix Zero](../api/pipelines/stable_diffusion/pix2pix_zero)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ผ๋ฐ˜์ ์ธ ์ด๋ฏธ์ง€ ์˜๋ฏธ๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ํ•œ ๊ฐœ๋…์ด๋‚˜ ํ”ผ์‚ฌ์ฒด๊ฐ€ ๋‹ค๋ฅธ ๊ฐœ๋…์ด๋‚˜ ํ”ผ์‚ฌ์ฒด๋กœ ๋ณ€ํ™˜๋˜๋„๋ก ์ด๋ฏธ์ง€๋ฅผ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
@@ -98,7 +98,7 @@ Pix2Pix Zero๋Š” '์ œ๋กœ ์ƒท(zero-shot)' ์ด๋ฏธ์ง€ ํŽธ์ง‘์ด ๊ฐ€๋Šฅํ•œ ์ตœ์ดˆ์˜
## Attend and Excite
[Paper](https://arxiv.org/abs/2301.13826)
[Paper](https://huggingface.co/papers/2301.13826)
[Attend and Excite](../api/pipelines/stable_diffusion/attend_and_excite)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ํ”„๋กฌํ”„ํŠธ์˜ ํ”ผ์‚ฌ์ฒด๊ฐ€ ์ตœ์ข… ์ด๋ฏธ์ง€์— ์ถฉ์‹คํ•˜๊ฒŒ ํ‘œํ˜„๋˜๋„๋ก ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
@@ -110,7 +110,7 @@ Pix2Pix Zero์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Attend and Excite ์—ญ์‹œ ํŒŒ์ดํ”„๋ผ์ธ์— ๋ฏธ
## Semantic Guidance (SEGA)
[Paper](https://arxiv.org/abs/2301.12247)
[Paper](https://huggingface.co/papers/2301.12247)
์˜๋ฏธ์œ ๋„(SEGA)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ด๋ฏธ์ง€์—์„œ ํ•˜๋‚˜ ์ด์ƒ์˜ ์ปจ์…‰์„ ์ ์šฉํ•˜๊ฑฐ๋‚˜ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ปจ์…‰์˜ ๊ฐ•๋„๋„ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ์Šค๋งˆ์ผ ์ปจ์…‰์„ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๋ฌผ ์‚ฌ์ง„์˜ ์Šค๋งˆ์ผ์„ ์ ์ง„์ ์œผ๋กœ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
@@ -122,7 +122,7 @@ Pix2Pix Zero ๋˜๋Š” Attend and Excite์™€ ๋‹ฌ๋ฆฌ SEGA๋Š” ๋ช…์‹œ์ ์ธ ๊ทธ๋ผ๋ฐ
## Self-attention Guidance (SAG)
[Paper](https://arxiv.org/abs/2210.00939)
[Paper](https://huggingface.co/papers/2210.00939)
[์ž๊ธฐ ์ฃผ์˜ ์•ˆ๋‚ด](../api/pipelines/stable_diffusion/self_attention_guidance)๋Š” ์ด๋ฏธ์ง€์˜ ์ „๋ฐ˜์ ์ธ ํ’ˆ์งˆ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
@@ -150,7 +150,7 @@ InstructPix2Pix์™€ Pix2Pix Zero์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์˜ ์ค‘์š”ํ•œ ์ฐจ์ด์ ์€ ์ „
## MultiDiffusion Panorama
[Paper](https://arxiv.org/abs/2302.08113)
[Paper](https://huggingface.co/papers/2302.08113)
MultiDiffusion์€ ์‚ฌ์ „ ํ•™์Šต๋œ diffusion model์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ์ƒ์„ฑ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์„ธ์Šค๋Š” ๊ณ ํ’ˆ์งˆ์˜ ๋‹ค์–‘ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‰ฝ๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ diffusion ์ƒ์„ฑ ๋ฐฉ๋ฒ•์„ ํ•˜๋‚˜๋กœ ๋ฌถ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ์›ํ•˜๋Š” ์ข…ํšก๋น„(์˜ˆ: ํŒŒ๋…ธ๋ผ๋งˆ) ๋ฐ ํƒ€์ดํŠธํ•œ ๋ถ„ํ•  ๋งˆ์Šคํฌ์—์„œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์— ์ด๋ฅด๋Š” ๊ณต๊ฐ„ ์•ˆ๋‚ด ์‹ ํ˜ธ์™€ ๊ฐ™์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ œ๊ณตํ•œ ์ œ์–ด๋ฅผ ์ค€์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.
[MultiDiffusion ํŒŒ๋…ธ๋ผ๋งˆ](../api/pipelines/stable_diffusion/panorama)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ž„์˜์˜ ์ข…ํšก๋น„(์˜ˆ: ํŒŒ๋…ธ๋ผ๋งˆ)๋กœ ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
@@ -175,7 +175,7 @@ MultiDiffusion์€ ์‚ฌ์ „ ํ•™์Šต๋œ diffusion model์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ์ƒ์„ฑ
## ControlNet
[Paper](https://arxiv.org/abs/2302.05543)
[Paper](https://huggingface.co/papers/2302.05543)
[ControlNet](../api/pipelines/stable_diffusion/controlnet)์€ ์ถ”๊ฐ€ ์กฐ๊ฑด์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ณด์กฐ ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค.
๊ฐ€์žฅ์ž๋ฆฌ ๊ฐ์ง€, ๋‚™์„œ, ๊นŠ์ด ๋งต, ์˜๋ฏธ์  ์„ธ๊ทธ๋จผํŠธ์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋œ 8๊ฐœ์˜ ํ‘œ์ค€ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ControlNet์ด ์žˆ์Šต๋‹ˆ๋‹ค,
@@ -200,7 +200,7 @@ DreamBooth ๋ฐ Textual Inversion ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์‚ฌ์šฉ์ž ์ง€์ • ํ™•์‚ฐ์€ ์‚ฌ
## Model Editing
[Paper](https://arxiv.org/abs/2303.08084)
[Paper](https://huggingface.co/papers/2303.08084)
[ํ…์ŠคํŠธ-์ด๋ฏธ์ง€ ๋ชจ๋ธ ํŽธ์ง‘ ํŒŒ์ดํ”„๋ผ์ธ](../api/pipelines/model_editing)์„ ์‚ฌ์šฉํ•˜๋ฉด ์‚ฌ์ „ํ•™์Šต๋œ text-to-image diffusion ๋ชจ๋ธ์ด ์ž…๋ ฅ ํ”„๋กฌํ”„ํŠธ์— ์žˆ๋Š” ํ”ผ์‚ฌ์ฒด์— ๋Œ€ํ•ด ๋‚ด๋ฆด ์ˆ˜ ์žˆ๋Š” ์ž˜๋ชป๋œ ์•”์‹œ์  ๊ฐ€์ •์„ ์™„ํ™”ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด, ์•ˆ์ •์  ํ™•์‚ฐ์— "A pack of roses"์— ๋Œ€ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋ผ๋Š” ๋ฉ”์‹œ์ง€๋ฅผ ํ‘œ์‹œํ•˜๋ฉด ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์˜ ์žฅ๋ฏธ๋Š” ๋นจ๊ฐ„์ƒ‰์ผ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค. ์ด ํŒŒ์ดํ”„๋ผ์ธ์€ ์ด๋Ÿฌํ•œ ๊ฐ€์ •์„ ๋ณ€๊ฒฝํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
@@ -209,7 +209,7 @@ DreamBooth ๋ฐ Textual Inversion ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์‚ฌ์šฉ์ž ์ง€์ • ํ™•์‚ฐ์€ ์‚ฌ
## DiffEdit
[Paper](https://arxiv.org/abs/2210.11427)
[Paper](https://huggingface.co/papers/2210.11427)
[DiffEdit](../api/pipelines/diffedit)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์›๋ณธ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ์ตœ๋Œ€ํ•œ ๋ณด์กดํ•˜๋ฉด์„œ ์ž…๋ ฅ ํ”„๋กฌํ”„ํŠธ์™€ ํ•จ๊ป˜ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ์˜๋ฏธ๋ก ์  ํŽธ์ง‘์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
@@ -218,7 +218,7 @@ DreamBooth ๋ฐ Textual Inversion ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์‚ฌ์šฉ์ž ์ง€์ • ํ™•์‚ฐ์€ ์‚ฌ
## T2I-Adapter
[Paper](https://arxiv.org/abs/2302.08453)
[Paper](https://huggingface.co/papers/2302.08453)
[T2I-์–ด๋Œ‘ํ„ฐ](../api/pipelines/stable_diffusion/adapter)๋Š” ์ถ”๊ฐ€์ ์ธ ์กฐ๊ฑด์„ ์ถ”๊ฐ€ํ•˜๋Š” auxiliary ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค.
๊ฐ€์žฅ์ž๋ฆฌ ๊ฐ์ง€, ์Šค์ผ€์น˜, depth maps, semantic segmentations์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋œ 8๊ฐœ์˜ ํ‘œ์ค€ ์‚ฌ์ „ํ›ˆ๋ จ๋œ adapter๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค,

View File

@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
[[open-in-colab]]
์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์€ ๋…ผ๋ฌธ์— ๋ช…์‹œ๋œ ์›๋ž˜์˜ ๊ตฌํ˜„์ฒด์™€ ๋‹ค๋ฅธ ํ˜•ํƒœ๋กœ ๊ตฌํ˜„๋œ ๋ชจ๋“  [`DiffusionPipeline`] ํด๋ž˜์Šค๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. (์˜ˆ๋ฅผ ๋“ค์–ด, [`StableDiffusionControlNetPipeline`]๋Š” ["Text-to-Image Generation with ControlNet Conditioning"](https://arxiv.org/abs/2302.05543) ํ•ด๋‹น) ์ด๋“ค์€ ์ถ”๊ฐ€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๊ฑฐ๋‚˜ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์›๋ž˜ ๊ตฌํ˜„์„ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์€ ๋…ผ๋ฌธ์— ๋ช…์‹œ๋œ ์›๋ž˜์˜ ๊ตฌํ˜„์ฒด์™€ ๋‹ค๋ฅธ ํ˜•ํƒœ๋กœ ๊ตฌํ˜„๋œ ๋ชจ๋“  [`DiffusionPipeline`] ํด๋ž˜์Šค๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. (์˜ˆ๋ฅผ ๋“ค์–ด, [`StableDiffusionControlNetPipeline`]๋Š” ["Text-to-Image Generation with ControlNet Conditioning"](https://huggingface.co/papers/2302.05543) ํ•ด๋‹น) ์ด๋“ค์€ ์ถ”๊ฐ€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๊ฑฐ๋‚˜ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์›๋ž˜ ๊ตฌํ˜„์„ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
[Speech to Image](https://github.com/huggingface/diffusers/tree/main/examples/community#speech-to-image) ๋˜๋Š” [Composable Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#composable-stable-diffusion) ๊ณผ ๊ฐ™์€ ๋ฉ‹์ง„ ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์ด ๋งŽ์ด ์žˆ์œผ๋ฉฐ [์—ฌ๊ธฐ์—์„œ](https://github.com/huggingface/diffusers/tree/main/examples/community) ๋ชจ๋“  ๊ณต์‹ ์ปค๋ฎค๋‹ˆํ‹ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

View File

@@ -27,7 +27,7 @@ Unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ์€ ๋น„๊ต์  ๊ฐ„๋‹จํ•œ ์ž‘์—…์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด
</Tip>
์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ์— ['DiffusionPipeline']๊ณผ [DDPM](https://arxiv.org/abs/2006.11239)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:
์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ์— ['DiffusionPipeline']๊ณผ [DDPM](https://huggingface.co/papers/2006.11239)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:
```python
>>> from diffusers import DiffusionPipeline