mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
[Pipiline] Wuerstchen v3 aka Stable Cascasde pipeline (#6487)
* initial diffNext v3 * move to v3 folder * imports * dry up the unets * no switch_level * fix init * add switch_level tp config * Fixed some things * Added pooled text embeddings * Initial work on adding image encoder * changes from @dome272 * Stuff for the image encoder processing and variable naming in decoder * fix arg name * inference fixes * inference fixes * default TimestepBlock without conds * c_skip=0 by default * fix bfloat16 to cpu * use config * undo temp change * fix gen_c_embeddings args * change text encoding * text encoding * undo print * undo .gitignore change * Allow WuerstchenV3PriorPipeline to use the base DDPM & DDIM schedulers * use WuerstchenV3Unet in both pipelines * fix imports * initial failing tests * cleanup * use scheduler.timesterps * some fixes to the tests, still not fully working * fix tests * fix prior tests * add dropout to the model_kwargs * more tests passing * update expected_slice * initial rename * rename tests * rename class names * make fix-copies * initial docs * autodocs * typos * fix arg docs * add text_encoder info * combined pipeline has optional image arg * fix documentation * Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * use self.config * Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * c_in -> in_channels * removed kwargs from unet's forward * Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * remove older callback api * removed kwargs and fixed decoder guidance > 1 * decoder takes emeds * check and use image_embeds * fixed all but one decoder test * fix decoder tests * update callback api * fix some more combined tests * push combined pipeline * initial docs * fix doc_string * update combined api * no test_callback_inputs test for combined pipeline * add optional components * fix ordering of components * fix combined tests * update convert script * Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_prior.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_prior.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_prior.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * fix imports * move effnet out of deniosing loop * prompt_embeds_pooled only when doing guidance * Fix repeat shape * move StableCascadeUnet to models/unets/ * more descriptive names * converted when numpy() * StableCascadePriorPipelineOutput docs * rename StableCascadeUNet * add slow tests * fix slow tests * update * update * updated model_path * add args for weights * set push_to_hub to false * update * update * update * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: Dominic Rampas <d6582533@gmail.com> Co-authored-by: Pablo Pernias <pablo@pernias.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: 99991 <99991@users.noreply.github.com> Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
This commit is contained in:
@@ -318,6 +318,8 @@
|
||||
title: Semantic Guidance
|
||||
- local: api/pipelines/shap_e
|
||||
title: Shap-E
|
||||
- local: api/pipelines/stable_cascade
|
||||
title: Stable Cascade
|
||||
- sections:
|
||||
- local: api/pipelines/stable_diffusion/overview
|
||||
title: Overview
|
||||
|
||||
88
docs/source/en/api/pipelines/stable_cascade.md
Normal file
88
docs/source/en/api/pipelines/stable_cascade.md
Normal file
@@ -0,0 +1,88 @@
|
||||
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Stable Cascade
|
||||
|
||||
This model is built upon the [Würstchen](https://openreview.net/forum?id=gU58d5QeGv) architecture and its main
|
||||
difference to other models like Stable Diffusion is that it is working at a much smaller latent space. Why is this
|
||||
important? The smaller the latent space, the **faster** you can run inference and the **cheaper** the training becomes.
|
||||
How small is the latent space? Stable Diffusion uses a compression factor of 8, resulting in a 1024x1024 image being
|
||||
encoded to 128x128. Stable Cascade achieves a compression factor of 42, meaning that it is possible to encode a
|
||||
1024x1024 image to 24x24, while maintaining crisp reconstructions. The text-conditional model is then trained in the
|
||||
highly compressed latent space. Previous versions of this architecture, achieved a 16x cost reduction over Stable
|
||||
Diffusion 1.5.
|
||||
|
||||
Therefore, this kind of model is well suited for usages where efficiency is important. Furthermore, all known extensions
|
||||
like finetuning, LoRA, ControlNet, IP-Adapter, LCM etc. are possible with this method as well.
|
||||
|
||||
The original codebase can be found at [Stability-AI/StableCascade](https://github.com/Stability-AI/StableCascade).
|
||||
|
||||
## Model Overview
|
||||
Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade to generate images,
|
||||
hence the name "Stable Cascade".
|
||||
|
||||
Stage A & B are used to compress images, similar to what the job of the VAE is in Stable Diffusion.
|
||||
However, with this setup, a much higher compression of images can be achieved. While the Stable Diffusion models use a
|
||||
spatial compression factor of 8, encoding an image with resolution of 1024 x 1024 to 128 x 128, Stable Cascade achieves
|
||||
a compression factor of 42. This encodes a 1024 x 1024 image to 24 x 24, while being able to accurately decode the
|
||||
image. This comes with the great benefit of cheaper training and inference. Furthermore, Stage C is responsible
|
||||
for generating the small 24 x 24 latents given a text prompt.
|
||||
|
||||
## Uses
|
||||
|
||||
### Direct Use
|
||||
|
||||
The model is intended for research purposes for now. Possible research areas and tasks include
|
||||
|
||||
- Research on generative models.
|
||||
- Safe deployment of models which have the potential to generate harmful content.
|
||||
- Probing and understanding the limitations and biases of generative models.
|
||||
- Generation of artworks and use in design and other artistic processes.
|
||||
- Applications in educational or creative tools.
|
||||
|
||||
Excluded uses are described below.
|
||||
|
||||
### Out-of-Scope Use
|
||||
|
||||
The model was not trained to be factual or true representations of people or events,
|
||||
and therefore using the model to generate such content is out-of-scope for the abilities of this model.
|
||||
The model should not be used in any way that violates Stability AI's [Acceptable Use Policy](https://stability.ai/use-policy).
|
||||
|
||||
## Limitations and Bias
|
||||
|
||||
### Limitations
|
||||
- Faces and people in general may not be generated properly.
|
||||
- The autoencoding part of the model is lossy.
|
||||
|
||||
|
||||
## StableCascadeCombinedPipeline
|
||||
|
||||
[[autodoc]] StableCascadeCombinedPipeline
|
||||
- all
|
||||
- __call__
|
||||
|
||||
## StableCascadePriorPipeline
|
||||
|
||||
[[autodoc]] StableCascadePriorPipeline
|
||||
- all
|
||||
- __call__
|
||||
|
||||
## StableCascadePriorPipelineOutput
|
||||
|
||||
[[autodoc]] pipelines.stable_cascade.pipeline_stable_cascade_prior.StableCascadePriorPipelineOutput
|
||||
|
||||
## StableCascadeDecoderPipeline
|
||||
|
||||
[[autodoc]] StableCascadeDecoderPipeline
|
||||
- all
|
||||
- __call__
|
||||
|
||||
Reference in New Issue
Block a user