1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

add translated docs (#2587)

* add translated docs

* improve translated content

* improve translated content

* Modify the translation content
This commit is contained in:
Sian
2023-03-10 20:55:12 +08:00
committed by GitHub
parent d761b58bfc
commit 4aa68291a9
4 changed files with 794 additions and 0 deletions

238
docs/source/zh/_toctree.yml Normal file
View File

@@ -0,0 +1,238 @@
- sections:
- local: index
title: 🧨 Diffusers
- local: quicktour
title: 快速入门
- local: stable_diffusion
title: Stable Diffusion
- local: installation
title: 安装
title: 开始
- sections:
- local: tutorials/basic_training
title: Train a diffusion model
title: Tutorials
- sections:
- sections:
- local: using-diffusers/loading
title: Loading Pipelines, Models, and Schedulers
- local: using-diffusers/schedulers
title: Using different Schedulers
- local: using-diffusers/configuration
title: Configuring Pipelines, Models, and Schedulers
- local: using-diffusers/custom_pipeline_overview
title: Loading and Adding Custom Pipelines
- local: using-diffusers/kerascv
title: Using KerasCV Stable Diffusion Checkpoints in Diffusers
title: Loading & Hub
- sections:
- local: using-diffusers/unconditional_image_generation
title: Unconditional Image Generation
- local: using-diffusers/conditional_image_generation
title: Text-to-Image Generation
- local: using-diffusers/img2img
title: Text-Guided Image-to-Image
- local: using-diffusers/inpaint
title: Text-Guided Image-Inpainting
- local: using-diffusers/depth2img
title: Text-Guided Depth-to-Image
- local: using-diffusers/controlling_generation
title: Controlling generation
- local: using-diffusers/reusing_seeds
title: Reusing seeds for deterministic generation
- local: using-diffusers/reproducibility
title: Reproducibility
- local: using-diffusers/custom_pipeline_examples
title: Community Pipelines
- local: using-diffusers/contribute_pipeline
title: How to contribute a Pipeline
- local: using-diffusers/using_safetensors
title: Using safetensors
title: Pipelines for Inference
- sections:
- local: using-diffusers/rl
title: Reinforcement Learning
- local: using-diffusers/audio
title: Audio
- local: using-diffusers/other-modalities
title: Other Modalities
title: Taking Diffusers Beyond Images
title: Using Diffusers
- sections:
- local: optimization/fp16
title: Memory and Speed
- local: optimization/torch2.0
title: Torch2.0 support
- local: optimization/xformers
title: xFormers
- local: optimization/onnx
title: ONNX
- local: optimization/open_vino
title: OpenVINO
- local: optimization/mps
title: MPS
- local: optimization/habana
title: Habana Gaudi
title: Optimization/Special Hardware
- sections:
- local: training/overview
title: Overview
- local: training/unconditional_training
title: Unconditional Image Generation
- local: training/text_inversion
title: Textual Inversion
- local: training/dreambooth
title: DreamBooth
- local: training/text2image
title: Text-to-image
- local: training/lora
title: Low-Rank Adaptation of Large Language Models (LoRA)
title: Training
- sections:
- local: conceptual/philosophy
title: Philosophy
- local: conceptual/contribution
title: How to contribute?
- local: conceptual/ethical_guidelines
title: Diffusers' Ethical Guidelines
title: Conceptual Guides
- sections:
- sections:
- local: api/models
title: Models
- local: api/diffusion_pipeline
title: Diffusion Pipeline
- local: api/logging
title: Logging
- local: api/configuration
title: Configuration
- local: api/outputs
title: Outputs
- local: api/loaders
title: Loaders
title: Main Classes
- sections:
- local: api/pipelines/overview
title: Overview
- local: api/pipelines/alt_diffusion
title: AltDiffusion
- local: api/pipelines/audio_diffusion
title: Audio Diffusion
- local: api/pipelines/cycle_diffusion
title: Cycle Diffusion
- local: api/pipelines/dance_diffusion
title: Dance Diffusion
- local: api/pipelines/ddim
title: DDIM
- local: api/pipelines/ddpm
title: DDPM
- local: api/pipelines/dit
title: DiT
- local: api/pipelines/latent_diffusion
title: Latent Diffusion
- local: api/pipelines/paint_by_example
title: PaintByExample
- local: api/pipelines/pndm
title: PNDM
- local: api/pipelines/repaint
title: RePaint
- local: api/pipelines/stable_diffusion_safe
title: Safe Stable Diffusion
- local: api/pipelines/score_sde_ve
title: Score SDE VE
- local: api/pipelines/semantic_stable_diffusion
title: Semantic Guidance
- sections:
- local: api/pipelines/stable_diffusion/overview
title: Overview
- local: api/pipelines/stable_diffusion/text2img
title: Text-to-Image
- local: api/pipelines/stable_diffusion/img2img
title: Image-to-Image
- local: api/pipelines/stable_diffusion/inpaint
title: Inpaint
- local: api/pipelines/stable_diffusion/depth2img
title: Depth-to-Image
- local: api/pipelines/stable_diffusion/image_variation
title: Image-Variation
- local: api/pipelines/stable_diffusion/upscale
title: Super-Resolution
- local: api/pipelines/stable_diffusion/latent_upscale
title: Stable-Diffusion-Latent-Upscaler
- local: api/pipelines/stable_diffusion/pix2pix
title: InstructPix2Pix
- local: api/pipelines/stable_diffusion/attend_and_excite
title: Attend and Excite
- local: api/pipelines/stable_diffusion/pix2pix_zero
title: Pix2Pix Zero
- local: api/pipelines/stable_diffusion/self_attention_guidance
title: Self-Attention Guidance
- local: api/pipelines/stable_diffusion/panorama
title: MultiDiffusion Panorama
- local: api/pipelines/stable_diffusion/controlnet
title: Text-to-Image Generation with ControlNet Conditioning
title: Stable Diffusion
- local: api/pipelines/stable_diffusion_2
title: Stable Diffusion 2
- local: api/pipelines/stable_unclip
title: Stable unCLIP
- local: api/pipelines/stochastic_karras_ve
title: Stochastic Karras VE
- local: api/pipelines/unclip
title: UnCLIP
- local: api/pipelines/latent_diffusion_uncond
title: Unconditional Latent Diffusion
- local: api/pipelines/versatile_diffusion
title: Versatile Diffusion
- local: api/pipelines/vq_diffusion
title: VQ Diffusion
title: Pipelines
- sections:
- local: api/schedulers/overview
title: Overview
- local: api/schedulers/ddim
title: DDIM
- local: api/schedulers/ddim_inverse
title: DDIMInverse
- local: api/schedulers/ddpm
title: DDPM
- local: api/schedulers/deis
title: DEIS
- local: api/schedulers/dpm_discrete
title: DPM Discrete Scheduler
- local: api/schedulers/dpm_discrete_ancestral
title: DPM Discrete Scheduler with ancestral sampling
- local: api/schedulers/euler_ancestral
title: Euler Ancestral Scheduler
- local: api/schedulers/euler
title: Euler scheduler
- local: api/schedulers/heun
title: Heun Scheduler
- local: api/schedulers/ipndm
title: IPNDM
- local: api/schedulers/lms_discrete
title: Linear Multistep
- local: api/schedulers/multistep_dpm_solver
title: Multistep DPM-Solver
- local: api/schedulers/pndm
title: PNDM
- local: api/schedulers/repaint
title: RePaint Scheduler
- local: api/schedulers/singlestep_dpm_solver
title: Singlestep DPM-Solver
- local: api/schedulers/stochastic_karras_ve
title: Stochastic Kerras VE
- local: api/schedulers/unipc
title: UniPCMultistepScheduler
- local: api/schedulers/score_sde_ve
title: VE-SDE
- local: api/schedulers/score_sde_vp
title: VP-SDE
- local: api/schedulers/vq_diffusion
title: VQDiffusionScheduler
title: Schedulers
- sections:
- local: api/experimental/rl
title: RL Planning
title: Experimental Features
title: API

78
docs/source/zh/index.mdx Normal file
View File

@@ -0,0 +1,78 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
<p align="center">
<br>
<img src="https://raw.githubusercontent.com/huggingface/diffusers/77aadfee6a891ab9fcfb780f87c693f7a5beeb8e/docs/source/imgs/diffusers_library.jpg" width="400"/>
<br>
</p>
# 🧨 Diffusers
🤗Diffusers提供了预训练好的视觉和音频扩散模型并可以作为推理和训练的模块化工具箱。
更准确地说🤗Diffusers提供了
- 最先进的扩散管道,可以在推理中仅用几行代码运行(详情看[**Using Diffusers**](./using-diffusers/conditional_image_generation))或看[**管道**](#pipelines) 以获取所有支持的管道及其对应的论文的概述。
- 可以在推理中交替使用的各种噪声调度程序,以便在推理过程中权衡如何选择速度和质量。有关更多信息,可以看[**Schedulers**](./api/schedulers/overview)。
- 多种类型的模型如U-Net可用作端到端扩散系统中的构建模块。有关更多详细信息可以看 [**Models**](./api/models) 。
- 训练示例,展示如何训练最流行的扩散模型任务。更多相关信息,可以看[**Training**](./training/overview)。
## 🧨 Diffusers pipelines
下表总结了所有官方支持的pipelines及其对应的论文部分提供了colab可以直接尝试一下。
| 管道 | 论文 | 任务 | Colab
|---|---|:---:|:---:|
| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation |
| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb)
| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [**ControlNet with Stable Diffusion**](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb)
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |
| [ddim](./api/pipelines/ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation |
| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation |
| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image |
| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation |
| [paint_by_example](./api/pipelines/paint_by_example) | [**Paint by Example: Exemplar-based Image Editing with Diffusion Models**](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting |
| [pndm](./api/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation |
| [score_sde_ve](./api/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
| [score_sde_vp](./api/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
| [semantic_stable_diffusion](./api/pipelines/semantic_stable_diffusion) | [**Semantic Guidance**](https://arxiv.org/abs/2301.12247) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb)
| [stable_diffusion_text2img](./api/pipelines/stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
| [stable_diffusion_img2img](./api/pipelines/stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
| [stable_diffusion_inpaint](./api/pipelines/stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
| [stable_diffusion_panorama](./api/pipelines/stable_diffusion/panorama) | [**MultiDiffusion**](https://multidiffusion.github.io/) | Text-to-Panorama Generation |
| [stable_diffusion_pix2pix](./api/pipelines/stable_diffusion/pix2pix) | [**InstructPix2Pix**](https://github.com/timothybrooks/instruct-pix2pix) | Text-Guided Image Editing|
| [stable_diffusion_pix2pix_zero](./api/pipelines/stable_diffusion/pix2pix_zero) | [**Zero-shot Image-to-Image Translation**](https://pix2pixzero.github.io/) | Text-Guided Image Editing |
| [stable_diffusion_attend_and_excite](./api/pipelines/stable_diffusion/attend_and_excite) | [**Attend and Excite for Stable Diffusion**](https://attendandexcite.github.io/Attend-and-Excite/) | Text-to-Image Generation |
| [stable_diffusion_self_attention_guidance](./api/pipelines/stable_diffusion/self_attention_guidance) | [**Self-Attention Guidance**](https://ku-cvlab.github.io/Self-Attention-Guidance) | Text-to-Image Generation |
| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [**Stable Diffusion Image Variations**](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation |
| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [**Stable Diffusion Latent Upscaler**](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image |
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation |
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting |
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Depth-Conditional Stable Diffusion**](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) | Depth-to-Image Generation |
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Text-to-Image Generation |
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Image-to-Image Text-Guided Generation |
| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |
**注意**: 管道是如何使用相应论文中提出的扩散模型的简单示例。

View File

@@ -0,0 +1,147 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# 安装
安装🤗 Diffusers 到你正在使用的任何深度学习框架中。
🤗 Diffusers已在Python 3.7+、PyTorch 1.7.0+和Flax上进行了测试。按照下面的安装说明针对你正在使用的深度学习框架进行安装
- [PyTorch](https://pytorch.org/get-started/locally/) installation instructions.
- [Flax](https://flax.readthedocs.io/en/latest/) installation instructions.
## 使用pip安装
你需要在[虚拟环境](https://docs.python.org/3/library/venv.html)中安装🤗 Diffusers 。
如果你对 Python 虚拟环境不熟悉,可以看看这个[教程](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
使用虚拟环境你可以轻松管理不同的项目,避免了依赖项之间的兼容性问题。
首先,在你的项目目录下创建一个虚拟环境:
```bash
python -m venv .env
```
激活虚拟环境:
```bash
source .env/bin/activate
```
现在你就可以安装 🤗 Diffusers了使用下边这个命令
**PyTorch**
```bash
pip install diffusers["torch"]
```
**Flax**
```bash
pip install diffusers["flax"]
```
## 从源代码安装
在从源代码安装 `diffusers` 之前,你先确定你已经安装了 `torch` 和 `accelerate`。
`torch`的安装教程可以看 `torch` [文档](https://pytorch.org/get-started/locally/#start-locally).
安装 `accelerate`
```bash
pip install accelerate
```
从源码安装 🤗 Diffusers 使用以下命令:
```bash
pip install git+https://github.com/huggingface/diffusers
```
这个命令安装的是最新的 `main`版本,而不是最近的`stable`版。
`main`是一直和最新进展保持一致的。比如上次正式版发布了有bug新的正式版还没推出但是`main`中可以看到这个bug被修复了。
但是这也意味着 `main`版本并不总是稳定的。
我们努力保持`main`版本正常运行,大多数问题都能在几个小时或一天之内解决
如果你遇到了问题,可以提 [Issue](https://github.com/huggingface/transformers/issues),这样我们就能更快修复问题了。
## 可修改安装
如果你想做以下两件事,那你可能需要一个可修改代码的安装方式:
* 使用 `main`版本的源代码。
* 为 🤗 Diffusers 贡献,需要测试代码中的变化。
使用以下命令克隆并安装 🤗 Diffusers:
```bash
git clone https://github.com/huggingface/diffusers.git
cd diffusers
```
**PyTorch**
```
pip install -e ".[torch]"
```
**Flax**
```
pip install -e ".[flax]"
```
这些命令将连接你克隆的版本库和你的 Python 库路径。
现在除了正常的库路径外Python 还会在你克隆的文件夹内寻找。
例如,如果你的 Python 包通常安装在 `~/anaconda3/envs/main/lib/python3.7/Site-packages/`Python 也会搜索你克隆到的文件夹。`~/diffusers/`。
<Tip warning={true}>
如果你想继续使用这个库,你必须保留 `diffusers` 文件夹。
</Tip>
现在你可以用下面的命令轻松地将你克隆的🤗Diffusers仓库更新到最新版本。
```bash
cd ~/diffusers/
git pull
```
你的Python环境将在下次运行时找到`main`版本的🤗 Diffusers。
## 注意遥测日志
我们的库会在使用`from_pretrained()`请求期间收集信息。这些数据包括Diffusers和PyTorch/Flax的版本请求的模型或管道以及预训练检查点的路径如果它被托管在Hub上
这些使用数据有助于我们调试问题并优先考虑新功能。
当从HuggingFace Hub加载模型和管道时才会发送遥测数据并且在本地使用时不会收集数据。
我们知道并不是每个人都想分享这些的信息,我们尊重您的隐私,
因此您可以通过在终端中设置“DISABLE_TELEMETRY”环境变量来禁用遥测数据的收集
在Linux/MacOS中:
```bash
export DISABLE_TELEMETRY=YES
```
在Windows中:
```bash
set DISABLE_TELEMETRY=YES
```

View File

@@ -0,0 +1,331 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
[[open-in-colab]]
# 快速上手
训练扩散模型,是为了对随机高斯噪声进行逐步去噪,以生成令人感兴趣的样本,比如图像或者语音。
扩散模型的发展引起了人们对生成式人工智能的极大兴趣,你可能已经在网上见过扩散生成的图像了。🧨 Diffusers库的目的是让大家更易上手扩散模型。
无论你是开发人员还是普通用户,本文将向你介绍🧨 Diffusers 并帮助你快速开始生成内容!
🧨 Diffusers 库的三个主要组件:
无论你是开发者还是普通用户,这个快速指南将向你介绍🧨 Diffusers并帮助你快速使用和生成该库三个主要部分如下
* [`DiffusionPipeline`]是一个高级的端到端类,旨在通过预训练的扩散模型快速生成样本进行推理。
* 作为创建扩散系统做组件的流行的预训练[模型](./api/models)框架和模块。
* 许多不同的[调度器](./api/schedulers/overview):控制如何在训练过程中添加噪声的算法,以及如何在推理过程中生成去噪图像的算法。
快速入门将告诉你如何使用[`DiffusionPipeline`]进行推理,然后指导你如何结合模型和调度器以复现[`DiffusionPipeline`]内部发生的事情。
<Tip>
快速入门是🧨[Diffusers入门](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb)的简化版,可以帮助你快速上手。如果你想了解更多关于🧨 Diffusers的目标、设计理念以及关于它的核心API的更多细节可以点击🧨[Diffusers入门](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb)查看。
</Tip>
在开始之前,确认一下你已经安装好了所需要的库:
```bash
pip install --upgrade diffusers accelerate transformers
```
- [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) 在推理和训练过程中加速模型加载。
- [🤗 Transformers](https://huggingface.co/docs/transformers/index) 是运行最流行的扩散模型所必须的库,比如[Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview).
## 扩散模型管道
[`DiffusionPipeline`]是用预训练的扩散系统进行推理的最简单方法。它是一个包含模型和调度器的端到端系统。你可以直接使用[`DiffusionPipeline`]完成许多任务。请查看下面的表格以了解一些支持的任务,要获取完整的支持任务列表,请查看[🧨 Diffusers 总结](./api/pipelines/overview#diffusers-summary) 。
| **任务** | **描述** | **管道**
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|
| Unconditional Image Generation | 从高斯噪声中生成图片 | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) |
| Text-Guided Image Generation | 给定文本提示生成图像 | [conditional_image_generation](./using-diffusers/conditional_image_generation) |
| Text-Guided Image-to-Image Translation | 在文本提示的指导下调整图像 | [img2img](./using-diffusers/img2img) |
| Text-Guided Image-Inpainting | 给出图像、遮罩和文本提示,填充图像的遮罩部分 | [inpaint](./using-diffusers/inpaint) |
| Text-Guided Depth-to-Image Translation | 在文本提示的指导下调整图像的部分内容,同时通过深度估计保留其结构 | [depth2img](./using-diffusers/depth2img) |
首先创建一个[`DiffusionPipeline`]的实例并指定要下载的pipeline检查点。
你可以使用存储在Hugging Face Hub上的任何[`DiffusionPipeline`][检查点](https://huggingface.co/models?library=diffusers&sort=downloads)。
在教程中,你将加载[`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)检查点,用于文本到图像的生成。
首先创建一个[DiffusionPipeline]实例,并指定要下载的管道检查点。
您可以在Hugging Face Hub上使用[DiffusionPipeline]的任何检查点。
在本快速入门中您将加载stable-diffusion-v1-5检查点用于文本到图像生成。
<Tip warning={true}>。
对于[Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion)模型,在运行该模型之前,请先仔细阅读[许可证](https://huggingface.co/spaces/CompVis/stable-diffusion-license)。🧨 Diffusers实现了一个[`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py)以防止有攻击性的或有害的内容但Stable Diffusion模型改进图像的生成能力仍有可能产生潜在的有害内容。
</Tip>
用[`~DiffusionPipeline.from_pretrained`]方法加载模型。
```python
>>> from diffusers import DiffusionPipeline
>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
```
[`DiffusionPipeline`]会下载并缓存所有的建模、标记化和调度组件。你可以看到Stable Diffusion的pipeline是由[`UNet2DConditionModel`]和[`PNDMScheduler`]等组件组成的:
```py
>>> pipeline
StableDiffusionPipeline {
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.13.1",
...,
"scheduler": [
"diffusers",
"PNDMScheduler"
],
...,
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
```
我们强烈建议你在GPU上运行这个pipeline因为该模型由大约14亿个参数组成。
你可以像在Pytorch里那样把生成器对象移到GPU上
```python
>>> pipeline.to("cuda")
```
现在你可以向`pipeline`传递一个文本提示来生成图像,然后获得去噪的图像。默认情况下,图像输出被放在一个[`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class)对象中。
```python
>>> image = pipeline("An image of a squirrel in Picasso style").images[0]
>>> image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png"/>
</div>
调用`save`保存图像:
```python
>>> image.save("image_of_squirrel_painting.png")
```
### 本地管道
你也可以在本地使用管道。唯一的区别是你需提前下载权重:
```
git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
```
将下载好的权重加载到管道中:
```python
>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
```
现在你可以像上一节中那样运行管道了。
### 更换调度器
不同的调度器对去噪速度和质量的权衡是不同的。要想知道哪种调度器最适合你,最好的办法就是试用一下。🧨 Diffusers的主要特点之一是允许你轻松切换不同的调度器。例如要用[`EulerDiscreteScheduler`]替换默认的[`PNDMScheduler`],用[`~diffusers.ConfigMixin.from_config`]方法加载即可:
```py
>>> from diffusers import EulerDiscreteScheduler
>>> pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
```
试着用新的调度器生成一个图像,看看你能否发现不同之处。
在下一节中,你将仔细观察组成[`DiffusionPipeline`]的组件——模型和调度器,并学习如何使用这些组件来生成猫咪的图像。
## 模型
大多数模型取一个噪声样本,在每个时间点预测*噪声残差*(其他模型则直接学习预测前一个样本或速度或[`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)),即噪声较小的图像与输入图像的差异。你可以混搭模型创建其他扩散系统。
模型是用[`~ModelMixin.from_pretrained`]方法启动的,该方法还在本地缓存了模型权重,所以下次加载模型时更快。对于快速入门,你默认加载的是[`UNet2DModel`],这是一个基础的无条件图像生成模型,该模型有一个在猫咪图像上训练的检查点:
```py
>>> from diffusers import UNet2DModel
>>> repo_id = "google/ddpm-cat-256"
>>> model = UNet2DModel.from_pretrained(repo_id)
```
想知道模型的参数,调用 `model.config`:
```py
>>> model.config
```
模型配置是一个🧊冻结的🧊字典,意思是这些参数在模型创建后就不变了。这是特意设置的,确保在开始时用于定义模型架构的参数保持不变,其他参数仍然可以在推理过程中进行调整。
一些最重要的参数:
* `sample_size`:输入样本的高度和宽度尺寸。
* `in_channels`:输入样本的输入通道数。
* `down_block_types`和`up_block_types`用于创建U-Net架构的下采样和上采样块的类型。
* `block_out_channels`:下采样块的输出通道数;也以相反的顺序用于上采样块的输入通道数。
* `layers_per_block`每个U-Net块中存在的ResNet块的数量。
为了使用该模型进行推理,用随机高斯噪声生成图像形状。它应该有一个`batch`轴,因为模型可以接收多个随机噪声,一个`channel`轴,对应于输入通道的数量,以及一个`sample_size`轴,对应图像的高度和宽度。
```py
>>> import torch
>>> torch.manual_seed(0)
>>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
>>> noisy_sample.shape
torch.Size([1, 3, 256, 256])
```
对于推理,将噪声图像和一个`timestep`传递给模型。`timestep` 表示输入图像的噪声程度,开始时噪声更多,结束时噪声更少。这有助于模型确定其在扩散过程中的位置,是更接近开始还是结束。使用 `sample` 获得模型输出:
```py
>>> with torch.no_grad():
... noisy_residual = model(sample=noisy_sample, timestep=2).sample
```
想生成实际的样本,你需要一个调度器指导去噪过程。在下一节中,你将学习如何把模型与调度器结合起来。
## 调度器
调度器管理一个噪声样本到一个噪声较小的样本的处理过程,给出模型输出 —— 在这种情况下,它是`noisy_residual`。
<Tip>
🧨 Diffusers是一个用于构建扩散系统的工具箱。预定义好的扩散系统[`DiffusionPipeline`]能方便你快速试用,你也可以单独选择自己的模型和调度器组件来建立一个自定义的扩散系统。
</Tip>
在快速入门教程中,你将用它的[`~diffusers.ConfigMixin.from_config`]方法实例化[`DDPMScheduler`]
```py
>>> from diffusers import DDPMScheduler
>>> scheduler = DDPMScheduler.from_config(repo_id)
>>> scheduler
DDPMScheduler {
"_class_name": "DDPMScheduler",
"_diffusers_version": "0.13.1",
"beta_end": 0.02,
"beta_schedule": "linear",
"beta_start": 0.0001,
"clip_sample": true,
"clip_sample_range": 1.0,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"trained_betas": null,
"variance_type": "fixed_small"
}
```
<Tip>
💡 注意调度器是如何从配置中实例化的。与模型不同,调度器没有可训练的权重,而且是无参数的。
</Tip>
* `num_train_timesteps`:去噪过程的长度,或者换句话说,将随机高斯噪声处理成数据样本所需的时间步数。
* `beta_schedule`:用于推理和训练的噪声表。
* `beta_start`和`beta_end`:噪声表的开始和结束噪声值。
要预测一个噪音稍小的图像,请将 模型输出、`timestep`和当前`sample` 传递给调度器的[`~diffusers.DDPMScheduler.step`]方法:
```py
>>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
>>> less_noisy_sample.shape
```
这个 `less_noisy_sample` 去噪样本 可以被传递到下一个`timestep` ,处理后会将变得噪声更小。现在让我们把所有步骤合起来,可视化整个去噪过程。
首先,创建一个函数,对去噪后的图像进行后处理并显示为`PIL.Image`
```py
>>> import PIL.Image
>>> import numpy as np
>>> def display_sample(sample, i):
... image_processed = sample.cpu().permute(0, 2, 3, 1)
... image_processed = (image_processed + 1.0) * 127.5
... image_processed = image_processed.numpy().astype(np.uint8)
... image_pil = PIL.Image.fromarray(image_processed[0])
... display(f"Image at step {i}")
... display(image_pil)
```
将输入和模型移到GPU上加速去噪过程
```py
>>> model.to("cuda")
>>> noisy_sample = noisy_sample.to("cuda")
```
现在创建一个去噪循环,该循环预测噪声较少样本的残差,并使用调度程序计算噪声较少的样本:
```py
>>> import tqdm
>>> sample = noisy_sample
>>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
... # 1. predict noise residual
... with torch.no_grad():
... residual = model(sample, t).sample
... # 2. compute less noisy image and set x_t -> x_t-1
... sample = scheduler.step(residual, t, sample).prev_sample
... # 3. optionally look at image
... if (i + 1) % 50 == 0:
... display_sample(sample, i + 1)
```
看!这样就从噪声中生成出一只猫了!😻
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/diffusion-quicktour.png"/>
</div>
## 下一步
希望你在这次快速入门教程中用🧨Diffuser 生成了一些很酷的图像! 下一步你可以:
* 在[训练](./tutorials/basic_training)教程中训练或微调一个模型来生成你自己的图像。
* 查看官方和社区的[训练或微调脚本](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples)的例子,了解更多使用情况。
* 在[使用不同的调度器](./using-diffusers/schedulers)指南中了解更多关于加载、访问、更改和比较调度器的信息。
* 在[Stable Diffusion](./stable_diffusion)教程中探索提示工程、速度和内存优化,以及生成更高质量图像的技巧。
* 通过[在GPU上优化PyTorch](./optimization/fp16)指南,以及运行[Apple (M1/M2)上的Stable Diffusion](./optimization/mps)和[ONNX Runtime](./optimization/onnx)的教程更深入地了解如何加速🧨Diffuser。