mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Files

Steven Liu 7d4db57037 [docs] Fix quantization links (#10323 )

Update overview.md

2024-12-20 08:30:21 -08:00

2.1 KiB

Raw Blame History

Quantization

Quantization techniques focus on representing data with less information while also trying to not lose too much accuracy. This often means converting a data type to represent the same information with fewer bits. For example, if your model weights are stored as 32-bit floating points and they're quantized to 16-bit floating points, this halves the model size which makes it easier to store and reduces memory-usage. Lower precision can also speedup inference because it takes less time to perform calculations with fewer bits.

Interested in adding a new quantization method to Diffusers? Refer to the Contribute new quantization method guide to learn more about adding a new quantization method.

If you are new to the quantization field, we recommend you to check out these beginner-friendly courses about quantization in collaboration with DeepLearning.AI:

When to use what?

Diffusers currently supports the following quantization methods.

This resource provides a good overview of the pros and cons of different quantization techniques.

2.1 KiB Raw Blame History

Quantization

When to use what?

2.1 KiB

Raw Blame History