1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00
Files
diffusers/docs/source/en/api/quantization.md
2025-06-19 07:46:01 +05:30

1.3 KiB

Quantization

Quantization techniques reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). This enables loading larger models you normally wouldn't be able to fit into memory, and speeding up inference.

Learn how to quantize models in the Quantization guide.

PipelineQuantizationConfig

autodoc quantizers.PipelineQuantizationConfig

BitsAndBytesConfig

autodoc BitsAndBytesConfig

GGUFQuantizationConfig

autodoc GGUFQuantizationConfig

QuantoConfig

autodoc QuantoConfig

TorchAoConfig

autodoc TorchAoConfig

DiffusersQuantizer

autodoc quantizers.base.DiffusersQuantizer