mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
[docs] minor updates to bitsandbytes docs. (#11509)
* minor updates to bitsandbytes docs. * Apply suggestions from code review
This commit is contained in:
@@ -48,7 +48,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
|
||||
```py
|
||||
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
|
||||
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
|
||||
|
||||
import torch
|
||||
from diffusers import AutoModel
|
||||
from transformers import T5EncoderModel
|
||||
|
||||
@@ -88,6 +88,8 @@ Setting `device_map="auto"` automatically fills all available space on the GPU(s
|
||||
CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory.
|
||||
|
||||
```py
|
||||
from diffusers import FluxPipeline
|
||||
|
||||
pipe = FluxPipeline.from_pretrained(
|
||||
"black-forest-labs/FLUX.1-dev",
|
||||
transformer=transformer_8bit,
|
||||
@@ -132,7 +134,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
|
||||
```py
|
||||
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
|
||||
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
|
||||
|
||||
import torch
|
||||
from diffusers import AutoModel
|
||||
from transformers import T5EncoderModel
|
||||
|
||||
@@ -171,6 +173,8 @@ Let's generate an image using our quantized models.
|
||||
Setting `device_map="auto"` automatically fills all available space on the GPU(s) first, then the CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory.
|
||||
|
||||
```py
|
||||
from diffusers import FluxPipeline
|
||||
|
||||
pipe = FluxPipeline.from_pretrained(
|
||||
"black-forest-labs/FLUX.1-dev",
|
||||
transformer=transformer_4bit,
|
||||
@@ -214,6 +218,8 @@ Check your memory footprint with the `get_memory_footprint` method:
|
||||
print(model.get_memory_footprint())
|
||||
```
|
||||
|
||||
Note that this only tells you the memory footprint of the model params and does _not_ estimate the inference memory requirements.
|
||||
|
||||
Quantized models can be loaded from the [`~ModelMixin.from_pretrained`] method without needing to specify the `quantization_config` parameters:
|
||||
|
||||
```py
|
||||
@@ -413,4 +419,4 @@ transformer_4bit.dequantize()
|
||||
## Resources
|
||||
|
||||
* [End-to-end notebook showing Flux.1 Dev inference in a free-tier Colab](https://gist.github.com/sayakpaul/c76bd845b48759e11687ac550b99d8b4)
|
||||
* [Training](https://gist.github.com/sayakpaul/05afd428bc089b47af7c016e42004527)
|
||||
* [Training](https://github.com/huggingface/diffusers/blob/8c661ea586bf11cb2440da740dd3c4cf84679b85/examples/dreambooth/README_hidream.md#using-quantization)
|
||||
Reference in New Issue
Block a user