From 4aa68291a9671491521733da647cb7dd2cabb236 Mon Sep 17 00:00:00 2001 From: Sian Date: Fri, 10 Mar 2023 20:55:12 +0800 Subject: [PATCH] add translated docs (#2587) * add translated docs * improve translated content * improve translated content * Modify the translation content --- docs/source/zh/_toctree.yml | 238 +++++++++++++++++++++++ docs/source/zh/index.mdx | 78 ++++++++ docs/source/zh/installation.mdx | 147 ++++++++++++++ docs/source/zh/quicktour.mdx | 331 ++++++++++++++++++++++++++++++++ 4 files changed, 794 insertions(+) create mode 100644 docs/source/zh/_toctree.yml create mode 100644 docs/source/zh/index.mdx create mode 100644 docs/source/zh/installation.mdx create mode 100644 docs/source/zh/quicktour.mdx diff --git a/docs/source/zh/_toctree.yml b/docs/source/zh/_toctree.yml new file mode 100644 index 0000000000..2d67d9c4a0 --- /dev/null +++ b/docs/source/zh/_toctree.yml @@ -0,0 +1,238 @@ +- sections: + - local: index + title: 🧨 Diffusers + - local: quicktour + title: 快速入门 + - local: stable_diffusion + title: Stable Diffusion + - local: installation + title: 安装 + title: 开始 +- sections: + - local: tutorials/basic_training + title: Train a diffusion model + title: Tutorials +- sections: + - sections: + - local: using-diffusers/loading + title: Loading Pipelines, Models, and Schedulers + - local: using-diffusers/schedulers + title: Using different Schedulers + - local: using-diffusers/configuration + title: Configuring Pipelines, Models, and Schedulers + - local: using-diffusers/custom_pipeline_overview + title: Loading and Adding Custom Pipelines + - local: using-diffusers/kerascv + title: Using KerasCV Stable Diffusion Checkpoints in Diffusers + title: Loading & Hub + - sections: + - local: using-diffusers/unconditional_image_generation + title: Unconditional Image Generation + - local: using-diffusers/conditional_image_generation + title: Text-to-Image Generation + - local: using-diffusers/img2img + title: Text-Guided Image-to-Image + - local: using-diffusers/inpaint + title: Text-Guided Image-Inpainting + - local: using-diffusers/depth2img + title: Text-Guided Depth-to-Image + - local: using-diffusers/controlling_generation + title: Controlling generation + - local: using-diffusers/reusing_seeds + title: Reusing seeds for deterministic generation + - local: using-diffusers/reproducibility + title: Reproducibility + - local: using-diffusers/custom_pipeline_examples + title: Community Pipelines + - local: using-diffusers/contribute_pipeline + title: How to contribute a Pipeline + - local: using-diffusers/using_safetensors + title: Using safetensors + title: Pipelines for Inference + - sections: + - local: using-diffusers/rl + title: Reinforcement Learning + - local: using-diffusers/audio + title: Audio + - local: using-diffusers/other-modalities + title: Other Modalities + title: Taking Diffusers Beyond Images + title: Using Diffusers +- sections: + - local: optimization/fp16 + title: Memory and Speed + - local: optimization/torch2.0 + title: Torch2.0 support + - local: optimization/xformers + title: xFormers + - local: optimization/onnx + title: ONNX + - local: optimization/open_vino + title: OpenVINO + - local: optimization/mps + title: MPS + - local: optimization/habana + title: Habana Gaudi + title: Optimization/Special Hardware +- sections: + - local: training/overview + title: Overview + - local: training/unconditional_training + title: Unconditional Image Generation + - local: training/text_inversion + title: Textual Inversion + - local: training/dreambooth + title: DreamBooth + - local: training/text2image + title: Text-to-image + - local: training/lora + title: Low-Rank Adaptation of Large Language Models (LoRA) + title: Training +- sections: + - local: conceptual/philosophy + title: Philosophy + - local: conceptual/contribution + title: How to contribute? + - local: conceptual/ethical_guidelines + title: Diffusers' Ethical Guidelines + title: Conceptual Guides +- sections: + - sections: + - local: api/models + title: Models + - local: api/diffusion_pipeline + title: Diffusion Pipeline + - local: api/logging + title: Logging + - local: api/configuration + title: Configuration + - local: api/outputs + title: Outputs + - local: api/loaders + title: Loaders + title: Main Classes + - sections: + - local: api/pipelines/overview + title: Overview + - local: api/pipelines/alt_diffusion + title: AltDiffusion + - local: api/pipelines/audio_diffusion + title: Audio Diffusion + - local: api/pipelines/cycle_diffusion + title: Cycle Diffusion + - local: api/pipelines/dance_diffusion + title: Dance Diffusion + - local: api/pipelines/ddim + title: DDIM + - local: api/pipelines/ddpm + title: DDPM + - local: api/pipelines/dit + title: DiT + - local: api/pipelines/latent_diffusion + title: Latent Diffusion + - local: api/pipelines/paint_by_example + title: PaintByExample + - local: api/pipelines/pndm + title: PNDM + - local: api/pipelines/repaint + title: RePaint + - local: api/pipelines/stable_diffusion_safe + title: Safe Stable Diffusion + - local: api/pipelines/score_sde_ve + title: Score SDE VE + - local: api/pipelines/semantic_stable_diffusion + title: Semantic Guidance + - sections: + - local: api/pipelines/stable_diffusion/overview + title: Overview + - local: api/pipelines/stable_diffusion/text2img + title: Text-to-Image + - local: api/pipelines/stable_diffusion/img2img + title: Image-to-Image + - local: api/pipelines/stable_diffusion/inpaint + title: Inpaint + - local: api/pipelines/stable_diffusion/depth2img + title: Depth-to-Image + - local: api/pipelines/stable_diffusion/image_variation + title: Image-Variation + - local: api/pipelines/stable_diffusion/upscale + title: Super-Resolution + - local: api/pipelines/stable_diffusion/latent_upscale + title: Stable-Diffusion-Latent-Upscaler + - local: api/pipelines/stable_diffusion/pix2pix + title: InstructPix2Pix + - local: api/pipelines/stable_diffusion/attend_and_excite + title: Attend and Excite + - local: api/pipelines/stable_diffusion/pix2pix_zero + title: Pix2Pix Zero + - local: api/pipelines/stable_diffusion/self_attention_guidance + title: Self-Attention Guidance + - local: api/pipelines/stable_diffusion/panorama + title: MultiDiffusion Panorama + - local: api/pipelines/stable_diffusion/controlnet + title: Text-to-Image Generation with ControlNet Conditioning + title: Stable Diffusion + - local: api/pipelines/stable_diffusion_2 + title: Stable Diffusion 2 + - local: api/pipelines/stable_unclip + title: Stable unCLIP + - local: api/pipelines/stochastic_karras_ve + title: Stochastic Karras VE + - local: api/pipelines/unclip + title: UnCLIP + - local: api/pipelines/latent_diffusion_uncond + title: Unconditional Latent Diffusion + - local: api/pipelines/versatile_diffusion + title: Versatile Diffusion + - local: api/pipelines/vq_diffusion + title: VQ Diffusion + title: Pipelines + - sections: + - local: api/schedulers/overview + title: Overview + - local: api/schedulers/ddim + title: DDIM + - local: api/schedulers/ddim_inverse + title: DDIMInverse + - local: api/schedulers/ddpm + title: DDPM + - local: api/schedulers/deis + title: DEIS + - local: api/schedulers/dpm_discrete + title: DPM Discrete Scheduler + - local: api/schedulers/dpm_discrete_ancestral + title: DPM Discrete Scheduler with ancestral sampling + - local: api/schedulers/euler_ancestral + title: Euler Ancestral Scheduler + - local: api/schedulers/euler + title: Euler scheduler + - local: api/schedulers/heun + title: Heun Scheduler + - local: api/schedulers/ipndm + title: IPNDM + - local: api/schedulers/lms_discrete + title: Linear Multistep + - local: api/schedulers/multistep_dpm_solver + title: Multistep DPM-Solver + - local: api/schedulers/pndm + title: PNDM + - local: api/schedulers/repaint + title: RePaint Scheduler + - local: api/schedulers/singlestep_dpm_solver + title: Singlestep DPM-Solver + - local: api/schedulers/stochastic_karras_ve + title: Stochastic Kerras VE + - local: api/schedulers/unipc + title: UniPCMultistepScheduler + - local: api/schedulers/score_sde_ve + title: VE-SDE + - local: api/schedulers/score_sde_vp + title: VP-SDE + - local: api/schedulers/vq_diffusion + title: VQDiffusionScheduler + title: Schedulers + - sections: + - local: api/experimental/rl + title: RL Planning + title: Experimental Features + title: API diff --git a/docs/source/zh/index.mdx b/docs/source/zh/index.mdx new file mode 100644 index 0000000000..4f952c5db7 --- /dev/null +++ b/docs/source/zh/index.mdx @@ -0,0 +1,78 @@ + + +

+
+ +
+

+ +# 🧨 Diffusers + +🤗Diffusers提供了预训练好的视觉和音频扩散模型,并可以作为推理和训练的模块化工具箱。 + +更准确地说,🤗Diffusers提供了: + +- 最先进的扩散管道,可以在推理中仅用几行代码运行(详情看[**Using Diffusers**](./using-diffusers/conditional_image_generation))或看[**管道**](#pipelines) 以获取所有支持的管道及其对应的论文的概述。 +- 可以在推理中交替使用的各种噪声调度程序,以便在推理过程中权衡如何选择速度和质量。有关更多信息,可以看[**Schedulers**](./api/schedulers/overview)。 +- 多种类型的模型,如U-Net,可用作端到端扩散系统中的构建模块。有关更多详细信息,可以看 [**Models**](./api/models) 。 +- 训练示例,展示如何训练最流行的扩散模型任务。更多相关信息,可以看[**Training**](./training/overview)。 + + +## 🧨 Diffusers pipelines + +下表总结了所有官方支持的pipelines及其对应的论文,部分提供了colab,可以直接尝试一下。 + + +| 管道 | 论文 | 任务 | Colab +|---|---|:---:|:---:| +| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | +| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) +| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [**ControlNet with Stable Diffusion**](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb) +| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation | +| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation | +| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation | +| [ddim](./api/pipelines/ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation | +| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation | +| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image | +| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation | +| [paint_by_example](./api/pipelines/paint_by_example) | [**Paint by Example: Exemplar-based Image Editing with Diffusion Models**](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting | +| [pndm](./api/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation | +| [score_sde_ve](./api/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | +| [score_sde_vp](./api/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | +| [semantic_stable_diffusion](./api/pipelines/semantic_stable_diffusion) | [**Semantic Guidance**](https://arxiv.org/abs/2301.12247) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb) +| [stable_diffusion_text2img](./api/pipelines/stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| [stable_diffusion_img2img](./api/pipelines/stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) +| [stable_diffusion_inpaint](./api/pipelines/stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb) +| [stable_diffusion_panorama](./api/pipelines/stable_diffusion/panorama) | [**MultiDiffusion**](https://multidiffusion.github.io/) | Text-to-Panorama Generation | +| [stable_diffusion_pix2pix](./api/pipelines/stable_diffusion/pix2pix) | [**InstructPix2Pix**](https://github.com/timothybrooks/instruct-pix2pix) | Text-Guided Image Editing| +| [stable_diffusion_pix2pix_zero](./api/pipelines/stable_diffusion/pix2pix_zero) | [**Zero-shot Image-to-Image Translation**](https://pix2pixzero.github.io/) | Text-Guided Image Editing | +| [stable_diffusion_attend_and_excite](./api/pipelines/stable_diffusion/attend_and_excite) | [**Attend and Excite for Stable Diffusion**](https://attendandexcite.github.io/Attend-and-Excite/) | Text-to-Image Generation | +| [stable_diffusion_self_attention_guidance](./api/pipelines/stable_diffusion/self_attention_guidance) | [**Self-Attention Guidance**](https://ku-cvlab.github.io/Self-Attention-Guidance) | Text-to-Image Generation | +| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [**Stable Diffusion Image Variations**](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation | +| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [**Stable Diffusion Latent Upscaler**](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image | +| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation | +| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting | +| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Depth-Conditional Stable Diffusion**](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) | Depth-to-Image Generation | +| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image | +| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb) +| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Text-to-Image Generation | +| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Image-to-Image Text-Guided Generation | +| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation | +| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation | +| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation | +| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation | +| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation | +| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation | + + +**注意**: 管道是如何使用相应论文中提出的扩散模型的简单示例。 \ No newline at end of file diff --git a/docs/source/zh/installation.mdx b/docs/source/zh/installation.mdx new file mode 100644 index 0000000000..cda91df8a6 --- /dev/null +++ b/docs/source/zh/installation.mdx @@ -0,0 +1,147 @@ + + +# 安装 + +安装🤗 Diffusers 到你正在使用的任何深度学习框架中。 + +🤗 Diffusers已在Python 3.7+、PyTorch 1.7.0+和Flax上进行了测试。按照下面的安装说明,针对你正在使用的深度学习框架进行安装: + +- [PyTorch](https://pytorch.org/get-started/locally/) installation instructions. +- [Flax](https://flax.readthedocs.io/en/latest/) installation instructions. + +## 使用pip安装 + +你需要在[虚拟环境](https://docs.python.org/3/library/venv.html)中安装🤗 Diffusers 。 + +如果你对 Python 虚拟环境不熟悉,可以看看这个[教程](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). + +使用虚拟环境你可以轻松管理不同的项目,避免了依赖项之间的兼容性问题。 + +首先,在你的项目目录下创建一个虚拟环境: + +```bash +python -m venv .env +``` + +激活虚拟环境: + +```bash +source .env/bin/activate +``` + +现在你就可以安装 🤗 Diffusers了!使用下边这个命令: + +**PyTorch** + +```bash +pip install diffusers["torch"] +``` + +**Flax** + +```bash +pip install diffusers["flax"] +``` + +## 从源代码安装 + +在从源代码安装 `diffusers` 之前,你先确定你已经安装了 `torch` 和 `accelerate`。 + +`torch`的安装教程可以看 `torch` [文档](https://pytorch.org/get-started/locally/#start-locally). + +安装 `accelerate` + +```bash +pip install accelerate +``` + +从源码安装 🤗 Diffusers 使用以下命令: + +```bash +pip install git+https://github.com/huggingface/diffusers +``` + +这个命令安装的是最新的 `main`版本,而不是最近的`stable`版。 +`main`是一直和最新进展保持一致的。比如,上次正式版发布了,有bug,新的正式版还没推出,但是`main`中可以看到这个bug被修复了。 +但是这也意味着 `main`版本并不总是稳定的。 + +我们努力保持`main`版本正常运行,大多数问题都能在几个小时或一天之内解决 + +如果你遇到了问题,可以提 [Issue](https://github.com/huggingface/transformers/issues),这样我们就能更快修复问题了。 + +## 可修改安装 + +如果你想做以下两件事,那你可能需要一个可修改代码的安装方式: + +* 使用 `main`版本的源代码。 +* 为 🤗 Diffusers 贡献,需要测试代码中的变化。 + +使用以下命令克隆并安装 🤗 Diffusers: + +```bash +git clone https://github.com/huggingface/diffusers.git +cd diffusers +``` + +**PyTorch** + +``` +pip install -e ".[torch]" +``` + +**Flax** + +``` +pip install -e ".[flax]" +``` + +这些命令将连接你克隆的版本库和你的 Python 库路径。 +现在,除了正常的库路径外,Python 还会在你克隆的文件夹内寻找。 +例如,如果你的 Python 包通常安装在 `~/anaconda3/envs/main/lib/python3.7/Site-packages/`,Python 也会搜索你克隆到的文件夹。`~/diffusers/`。 + + + +如果你想继续使用这个库,你必须保留 `diffusers` 文件夹。 + + + + +现在你可以用下面的命令轻松地将你克隆的🤗Diffusers仓库更新到最新版本。 + +```bash +cd ~/diffusers/ +git pull +``` + +你的Python环境将在下次运行时找到`main`版本的🤗 Diffusers。 + +## 注意遥测日志 + +我们的库会在使用`from_pretrained()`请求期间收集信息。这些数据包括Diffusers和PyTorch/Flax的版本,请求的模型或管道,以及预训练检查点的路径(如果它被托管在Hub上)。 + +这些使用数据有助于我们调试问题并优先考虑新功能。 +当从HuggingFace Hub加载模型和管道时才会发送遥测数据,并且在本地使用时不会收集数据。 + +我们知道并不是每个人都想分享这些的信息,我们尊重您的隐私, +因此您可以通过在终端中设置“DISABLE_TELEMETRY”环境变量来禁用遥测数据的收集: + + +在Linux/MacOS中: +```bash +export DISABLE_TELEMETRY=YES +``` + +在Windows中: +```bash +set DISABLE_TELEMETRY=YES +``` \ No newline at end of file diff --git a/docs/source/zh/quicktour.mdx b/docs/source/zh/quicktour.mdx new file mode 100644 index 0000000000..68ab56c55a --- /dev/null +++ b/docs/source/zh/quicktour.mdx @@ -0,0 +1,331 @@ + + +[[open-in-colab]] + +# 快速上手 + +训练扩散模型,是为了对随机高斯噪声进行逐步去噪,以生成令人感兴趣的样本,比如图像或者语音。 + +扩散模型的发展引起了人们对生成式人工智能的极大兴趣,你可能已经在网上见过扩散生成的图像了。🧨 Diffusers库的目的是让大家更易上手扩散模型。 + +无论你是开发人员还是普通用户,本文将向你介绍🧨 Diffusers 并帮助你快速开始生成内容! + +🧨 Diffusers 库的三个主要组件: + + +无论你是开发者还是普通用户,这个快速指南将向你介绍🧨 Diffusers,并帮助你快速使用和生成!该库三个主要部分如下: + +* [`DiffusionPipeline`]是一个高级的端到端类,旨在通过预训练的扩散模型快速生成样本进行推理。 +* 作为创建扩散系统做组件的流行的预训练[模型](./api/models)框架和模块。 +* 许多不同的[调度器](./api/schedulers/overview):控制如何在训练过程中添加噪声的算法,以及如何在推理过程中生成去噪图像的算法。 + +快速入门将告诉你如何使用[`DiffusionPipeline`]进行推理,然后指导你如何结合模型和调度器以复现[`DiffusionPipeline`]内部发生的事情。 + + + +快速入门是🧨[Diffusers入门](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb)的简化版,可以帮助你快速上手。如果你想了解更多关于🧨 Diffusers的目标、设计理念以及关于它的核心API的更多细节,可以点击🧨[Diffusers入门](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb)查看。 + + + +在开始之前,确认一下你已经安装好了所需要的库: + +```bash +pip install --upgrade diffusers accelerate transformers +``` + +- [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) 在推理和训练过程中加速模型加载。 +- [🤗 Transformers](https://huggingface.co/docs/transformers/index) 是运行最流行的扩散模型所必须的库,比如[Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview). + +## 扩散模型管道 + +[`DiffusionPipeline`]是用预训练的扩散系统进行推理的最简单方法。它是一个包含模型和调度器的端到端系统。你可以直接使用[`DiffusionPipeline`]完成许多任务。请查看下面的表格以了解一些支持的任务,要获取完整的支持任务列表,请查看[🧨 Diffusers 总结](./api/pipelines/overview#diffusers-summary) 。 + +| **任务** | **描述** | **管道** +|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------| +| Unconditional Image Generation | 从高斯噪声中生成图片 | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) | +| Text-Guided Image Generation | 给定文本提示生成图像 | [conditional_image_generation](./using-diffusers/conditional_image_generation) | +| Text-Guided Image-to-Image Translation | 在文本提示的指导下调整图像 | [img2img](./using-diffusers/img2img) | +| Text-Guided Image-Inpainting | 给出图像、遮罩和文本提示,填充图像的遮罩部分 | [inpaint](./using-diffusers/inpaint) | +| Text-Guided Depth-to-Image Translation | 在文本提示的指导下调整图像的部分内容,同时通过深度估计保留其结构 | [depth2img](./using-diffusers/depth2img) | + +首先创建一个[`DiffusionPipeline`]的实例,并指定要下载的pipeline检查点。 +你可以使用存储在Hugging Face Hub上的任何[`DiffusionPipeline`][检查点](https://huggingface.co/models?library=diffusers&sort=downloads)。 +在教程中,你将加载[`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)检查点,用于文本到图像的生成。 + +首先创建一个[DiffusionPipeline]实例,并指定要下载的管道检查点。 +您可以在Hugging Face Hub上使用[DiffusionPipeline]的任何检查点。 +在本快速入门中,您将加载stable-diffusion-v1-5检查点,用于文本到图像生成。 + +。 + +对于[Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion)模型,在运行该模型之前,请先仔细阅读[许可证](https://huggingface.co/spaces/CompVis/stable-diffusion-license)。🧨 Diffusers实现了一个[`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py),以防止有攻击性的或有害的内容,但Stable Diffusion模型改进图像的生成能力仍有可能产生潜在的有害内容。 + + + +用[`~DiffusionPipeline.from_pretrained`]方法加载模型。 + +```python +>>> from diffusers import DiffusionPipeline + +>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") +``` +[`DiffusionPipeline`]会下载并缓存所有的建模、标记化和调度组件。你可以看到Stable Diffusion的pipeline是由[`UNet2DConditionModel`]和[`PNDMScheduler`]等组件组成的: + +```py +>>> pipeline +StableDiffusionPipeline { + "_class_name": "StableDiffusionPipeline", + "_diffusers_version": "0.13.1", + ..., + "scheduler": [ + "diffusers", + "PNDMScheduler" + ], + ..., + "unet": [ + "diffusers", + "UNet2DConditionModel" + ], + "vae": [ + "diffusers", + "AutoencoderKL" + ] +} +``` + +我们强烈建议你在GPU上运行这个pipeline,因为该模型由大约14亿个参数组成。 + +你可以像在Pytorch里那样把生成器对象移到GPU上: + +```python +>>> pipeline.to("cuda") +``` + +现在你可以向`pipeline`传递一个文本提示来生成图像,然后获得去噪的图像。默认情况下,图像输出被放在一个[`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class)对象中。 + +```python +>>> image = pipeline("An image of a squirrel in Picasso style").images[0] +>>> image +``` + +
+ +
+ + +调用`save`保存图像: + +```python +>>> image.save("image_of_squirrel_painting.png") +``` + +### 本地管道 + +你也可以在本地使用管道。唯一的区别是你需提前下载权重: + +``` +git lfs install +git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 +``` + +将下载好的权重加载到管道中: + +```python +>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5") +``` + +现在你可以像上一节中那样运行管道了。 + +### 更换调度器 + +不同的调度器对去噪速度和质量的权衡是不同的。要想知道哪种调度器最适合你,最好的办法就是试用一下。🧨 Diffusers的主要特点之一是允许你轻松切换不同的调度器。例如,要用[`EulerDiscreteScheduler`]替换默认的[`PNDMScheduler`],用[`~diffusers.ConfigMixin.from_config`]方法加载即可: + +```py +>>> from diffusers import EulerDiscreteScheduler + +>>> pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") +>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) +``` + + +试着用新的调度器生成一个图像,看看你能否发现不同之处。 + +在下一节中,你将仔细观察组成[`DiffusionPipeline`]的组件——模型和调度器,并学习如何使用这些组件来生成猫咪的图像。 + +## 模型 + +大多数模型取一个噪声样本,在每个时间点预测*噪声残差*(其他模型则直接学习预测前一个样本或速度或[`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)),即噪声较小的图像与输入图像的差异。你可以混搭模型创建其他扩散系统。 + +模型是用[`~ModelMixin.from_pretrained`]方法启动的,该方法还在本地缓存了模型权重,所以下次加载模型时更快。对于快速入门,你默认加载的是[`UNet2DModel`],这是一个基础的无条件图像生成模型,该模型有一个在猫咪图像上训练的检查点: + + +```py +>>> from diffusers import UNet2DModel + +>>> repo_id = "google/ddpm-cat-256" +>>> model = UNet2DModel.from_pretrained(repo_id) +``` + +想知道模型的参数,调用 `model.config`: + +```py +>>> model.config +``` + +模型配置是一个🧊冻结的🧊字典,意思是这些参数在模型创建后就不变了。这是特意设置的,确保在开始时用于定义模型架构的参数保持不变,其他参数仍然可以在推理过程中进行调整。 + +一些最重要的参数: + +* `sample_size`:输入样本的高度和宽度尺寸。 +* `in_channels`:输入样本的输入通道数。 +* `down_block_types`和`up_block_types`:用于创建U-Net架构的下采样和上采样块的类型。 +* `block_out_channels`:下采样块的输出通道数;也以相反的顺序用于上采样块的输入通道数。 +* `layers_per_block`:每个U-Net块中存在的ResNet块的数量。 + +为了使用该模型进行推理,用随机高斯噪声生成图像形状。它应该有一个`batch`轴,因为模型可以接收多个随机噪声,一个`channel`轴,对应于输入通道的数量,以及一个`sample_size`轴,对应图像的高度和宽度。 + + +```py +>>> import torch + +>>> torch.manual_seed(0) + +>>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size) +>>> noisy_sample.shape +torch.Size([1, 3, 256, 256]) +``` + +对于推理,将噪声图像和一个`timestep`传递给模型。`timestep` 表示输入图像的噪声程度,开始时噪声更多,结束时噪声更少。这有助于模型确定其在扩散过程中的位置,是更接近开始还是结束。使用 `sample` 获得模型输出: + + +```py +>>> with torch.no_grad(): +... noisy_residual = model(sample=noisy_sample, timestep=2).sample +``` + +想生成实际的样本,你需要一个调度器指导去噪过程。在下一节中,你将学习如何把模型与调度器结合起来。 + +## 调度器 + +调度器管理一个噪声样本到一个噪声较小的样本的处理过程,给出模型输出 —— 在这种情况下,它是`noisy_residual`。 + + + + + +🧨 Diffusers是一个用于构建扩散系统的工具箱。预定义好的扩散系统[`DiffusionPipeline`]能方便你快速试用,你也可以单独选择自己的模型和调度器组件来建立一个自定义的扩散系统。 + + + +在快速入门教程中,你将用它的[`~diffusers.ConfigMixin.from_config`]方法实例化[`DDPMScheduler`]: + +```py +>>> from diffusers import DDPMScheduler + +>>> scheduler = DDPMScheduler.from_config(repo_id) +>>> scheduler +DDPMScheduler { + "_class_name": "DDPMScheduler", + "_diffusers_version": "0.13.1", + "beta_end": 0.02, + "beta_schedule": "linear", + "beta_start": 0.0001, + "clip_sample": true, + "clip_sample_range": 1.0, + "num_train_timesteps": 1000, + "prediction_type": "epsilon", + "trained_betas": null, + "variance_type": "fixed_small" +} +``` + + + + +💡 注意调度器是如何从配置中实例化的。与模型不同,调度器没有可训练的权重,而且是无参数的。 + + + +* `num_train_timesteps`:去噪过程的长度,或者换句话说,将随机高斯噪声处理成数据样本所需的时间步数。 +* `beta_schedule`:用于推理和训练的噪声表。 +* `beta_start`和`beta_end`:噪声表的开始和结束噪声值。 + +要预测一个噪音稍小的图像,请将 模型输出、`timestep`和当前`sample` 传递给调度器的[`~diffusers.DDPMScheduler.step`]方法: + + +```py +>>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample +>>> less_noisy_sample.shape +``` + +这个 `less_noisy_sample` 去噪样本 可以被传递到下一个`timestep` ,处理后会将变得噪声更小。现在让我们把所有步骤合起来,可视化整个去噪过程。 + +首先,创建一个函数,对去噪后的图像进行后处理并显示为`PIL.Image`: + +```py +>>> import PIL.Image +>>> import numpy as np + + +>>> def display_sample(sample, i): +... image_processed = sample.cpu().permute(0, 2, 3, 1) +... image_processed = (image_processed + 1.0) * 127.5 +... image_processed = image_processed.numpy().astype(np.uint8) + +... image_pil = PIL.Image.fromarray(image_processed[0]) +... display(f"Image at step {i}") +... display(image_pil) +``` + +将输入和模型移到GPU上加速去噪过程: + +```py +>>> model.to("cuda") +>>> noisy_sample = noisy_sample.to("cuda") +``` + +现在创建一个去噪循环,该循环预测噪声较少样本的残差,并使用调度程序计算噪声较少的样本: + +```py +>>> import tqdm + +>>> sample = noisy_sample + +>>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)): +... # 1. predict noise residual +... with torch.no_grad(): +... residual = model(sample, t).sample + +... # 2. compute less noisy image and set x_t -> x_t-1 +... sample = scheduler.step(residual, t, sample).prev_sample + +... # 3. optionally look at image +... if (i + 1) % 50 == 0: +... display_sample(sample, i + 1) +``` + +看!这样就从噪声中生成出一只猫了!😻 + +
+ +
+ +## 下一步 + +希望你在这次快速入门教程中用🧨Diffuser 生成了一些很酷的图像! 下一步你可以: + +* 在[训练](./tutorials/basic_training)教程中训练或微调一个模型来生成你自己的图像。 +* 查看官方和社区的[训练或微调脚本](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples)的例子,了解更多使用情况。 +* 在[使用不同的调度器](./using-diffusers/schedulers)指南中了解更多关于加载、访问、更改和比较调度器的信息。 +* 在[Stable Diffusion](./stable_diffusion)教程中探索提示工程、速度和内存优化,以及生成更高质量图像的技巧。 +* 通过[在GPU上优化PyTorch](./optimization/fp16)指南,以及运行[Apple (M1/M2)上的Stable Diffusion](./optimization/mps)和[ONNX Runtime](./optimization/onnx)的教程,更深入地了解如何加速🧨Diffuser。 \ No newline at end of file