1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-29 07:22:12 +03:00

docs: promote the usage of automodel.

This commit is contained in:
sayakpaul
2025-04-12 12:39:34 +05:30
parent 0ef29355c9
commit 61287def48
7 changed files with 35 additions and 22 deletions

View File

@@ -26,13 +26,13 @@ The example below only quantizes the weights to int8.
```python
import torch
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
from diffusers import FluxPipeline, AutoModel, TorchAoConfig
model_id = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
quantization_config = TorchAoConfig("int8wo")
transformer = FluxTransformer2DModel.from_pretrained(
transformer = AutoModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
@@ -99,10 +99,11 @@ To serialize a quantized model in a given dtype, first load the model with the d
```python
import torch
from diffusers import FluxTransformer2DModel, TorchAoConfig
from diffusers import AutoModel, TorchAoConfig
quantization_config = TorchAoConfig("int8wo")
transformer = FluxTransformer2DModel.from_pretrained(
transformer = from diffusers import AutoModel, TorchAoConfig
.from_pretrained(
"black-forest-labs/Flux.1-Dev",
subfolder="transformer",
quantization_config=quantization_config,
@@ -115,9 +116,9 @@ To load a serialized quantized model, use the [`~ModelMixin.from_pretrained`] me
```python
import torch
from diffusers import FluxPipeline, FluxTransformer2DModel
from diffusers import FluxPipeline, AutoModel
transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_int8wo", torch_dtype=torch.bfloat16, use_safetensors=False)
transformer = AutoModel.from_pretrained("/path/to/flux_int8wo", torch_dtype=torch.bfloat16, use_safetensors=False)
pipe = FluxPipeline.from_pretrained("black-forest-labs/Flux.1-Dev", transformer=transformer, torch_dtype=torch.bfloat16)
pipe.to("cuda")
@@ -131,10 +132,10 @@ If you are using `torch<=2.6.0`, some quantization methods, such as `uint4wo`, c
```python
import torch
from accelerate import init_empty_weights
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
from diffusers import FluxPipeline, AutoModel, TorchAoConfig
# Serialize the model
transformer = FluxTransformer2DModel.from_pretrained(
transformer = AutoModel.from_pretrained(
"black-forest-labs/Flux.1-Dev",
subfolder="transformer",
quantization_config=TorchAoConfig("uint4wo"),
@@ -146,10 +147,16 @@ transformer.save_pretrained("/path/to/flux_uint4wo", safe_serialization=False, m
# Load the model
state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")
with init_empty_weights():
transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")
transformer = AutoModel.from_config("/path/to/flux_uint4wo/config.json")
transformer.load_state_dict(state_dict, strict=True, assign=True)
```
<Tip>
With Torch 2.6 or higher, you can directly do: `transformer = AutoModel.from_pretrained("/path/to/flux_uint4wo/")`.
</Tip>
## Resources
- [TorchAO Quantization API](https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md)

View File

@@ -163,6 +163,12 @@ Models are initiated with the [`~ModelMixin.from_pretrained`] method which also
>>> model = UNet2DModel.from_pretrained(repo_id, use_safetensors=True)
```
<Tip>
Use `AutoModel` from `from diffusers import AutoModel` in case you are unsure which model class to use.
</Tip>
To access the model parameters, call `model.config`:
```py

View File

@@ -31,10 +31,10 @@ To adapt your text-to-image model for inpainting, you'll need to change the numb
Initialize a [`UNet2DConditionModel`] with the pretrained text-to-image model weights, and change `in_channels` to 9. Changing the number of `in_channels` means you need to set `ignore_mismatched_sizes=True` and `low_cpu_mem_usage=False` to avoid a size mismatch error because the shape is different now.
```py
from diffusers import UNet2DConditionModel
from diffusers import AutoModel
model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
unet = UNet2DConditionModel.from_pretrained(
unet = AutoModel.from_pretrained(
model_id,
subfolder="unet",
in_channels=9,

View File

@@ -165,10 +165,10 @@ flush()
Load the diffusion transformer next which has 12.5B parameters. This time, set `device_map="auto"` to automatically distribute the model across two 16GB GPUs. The `auto` strategy is backed by [Accelerate](https://hf.co/docs/accelerate/index) and available as a part of the [Big Model Inference](https://hf.co/docs/accelerate/concept_guides/big_model_inference) feature. It starts by distributing a model across the fastest device first (GPU) before moving to slower devices like the CPU and hard drive if needed. The trade-off of storing model parameters on slower devices is slower inference latency.
```py
from diffusers import FluxTransformer2DModel
from diffusers import AutoModel
import torch
transformer = FluxTransformer2DModel.from_pretrained(
transformer = AutoModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
device_map="auto",

View File

@@ -32,9 +32,9 @@ The denoiser checkpoint can also have multiple shards and supports inference tha
For example, let's save a sharded checkpoint for the [SDXL UNet](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main/unet):
```python
from diffusers import UNet2DConditionModel
from diffusers import AutoModel
unet = UNet2DConditionModel.from_pretrained(
unet = AutoModel.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet"
)
unet.save_pretrained("sdxl-unet-sharded", max_shard_size="5GB")
@@ -43,10 +43,10 @@ unet.save_pretrained("sdxl-unet-sharded", max_shard_size="5GB")
The size of the fp32 variant of the SDXL UNet checkpoint is ~10.4GB. Set the `max_shard_size` parameter to 5GB to create 3 shards. After saving, you can load them in [`StableDiffusionXLPipeline`]:
```python
from diffusers import UNet2DConditionModel, StableDiffusionXLPipeline
from diffusers import AutoModel, StableDiffusionXLPipeline
import torch
unet = UNet2DConditionModel.from_pretrained(
unet = AutoModel.from_pretrained(
"sayakpaul/sdxl-unet-sharded", torch_dtype=torch.float16
)
pipeline = StableDiffusionXLPipeline.from_pretrained(

View File

@@ -134,7 +134,7 @@ The [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] method loads L
- the LoRA weights don't have separate identifiers for the UNet and text encoder
- the LoRA weights have separate identifiers for the UNet and text encoder
To directly load (and save) a LoRA adapter at the *model-level*, use [`~PeftAdapterMixin.load_lora_adapter`], which builds and prepares the necessary model configuration for the adapter. Like [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`], [`PeftAdapterMixin.load_lora_adapter`] can load LoRAs for both the UNet and text encoder. For example, if you're loading a LoRA for the UNet, [`PeftAdapterMixin.load_lora_adapter`] ignores the keys for the text encoder.
To directly load (and save) a LoRA adapter at the *model-level*, use [`~loaders.PeftAdapterMixin.load_lora_adapter`], which builds and prepares the necessary model configuration for the adapter. Like [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`], [`~loaders.PeftAdapterMixin.load_lora_adapter`] can load LoRAs for both the UNet and text encoder. For example, if you're loading a LoRA for the UNet, [`loaders.PeftAdapterMixin.load_lora_adapter`] ignores the keys for the text encoder.
Use the `weight_name` parameter to specify the specific weight file and the `prefix` parameter to filter for the appropriate state dicts (`"unet"` in this case) to load.
@@ -155,7 +155,7 @@ image
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_attn_proc.png" />
</div>
Save an adapter with [`~PeftAdapterMixin.save_lora_adapter`].
Save an adapter with [`~loaders.PeftAdapterMixin.save_lora_adapter`].
To unload the LoRA weights, use the [`~loaders.StableDiffusionLoraLoaderMixin.unload_lora_weights`] method to discard the LoRA weights and restore the model to its original weights:

View File

@@ -66,10 +66,10 @@ Let's dive deeper into what these steps entail.
1. Load a UNet that corresponds to the UNet in the LoRA checkpoint. In this case, both LoRAs use the SDXL UNet as their base model.
```python
from diffusers import UNet2DConditionModel
from diffusers import AutoModel
import torch
unet = UNet2DConditionModel.from_pretrained(
unet = AutoModel.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
@@ -136,7 +136,7 @@ feng_peft_model.load_state_dict(original_state_dict, strict=True)
```python
from peft import PeftModel
base_unet = UNet2DConditionModel.from_pretrained(
base_unet = AutoModel.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,