From cb004ad5e6955f0622422b7ce1c13bc20086f201 Mon Sep 17 00:00:00 2001 From: Dhruv Nair Date: Thu, 24 Jul 2025 11:03:39 +0200 Subject: [PATCH] update --- docs/source/en/quantization/gguf.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/source/en/quantization/gguf.md b/docs/source/en/quantization/gguf.md index aec0875c65..cb4be67122 100644 --- a/docs/source/en/quantization/gguf.md +++ b/docs/source/en/quantization/gguf.md @@ -53,6 +53,16 @@ image = pipe(prompt, generator=torch.manual_seed(0)).images[0] image.save("flux-gguf.png") ``` +## Using Optimized CUDA Kernels with GGUF + +Optimized CUDA kernels can accelerate GGUF quantized model inference by approximately 10%. This functionality requires a compatible GPU with `torch.cuda.get_device_capability` greater than 7 and the kernels library: + +```shell +pip install -U kernels +``` + +Once installed, GGUF inference automatically uses optimized kernels when available. Note that CUDA kernels may introduce minor numerical differences compared to the original GGUF implementation, potentially causing subtle visual variations in generated images. To disable CUDA kernel usage, set the environment variable `DIFFUSERS_GGUF_CUDA_KERNELS=false`. + ## Supported Quantization Types - BF16