From b389f339ec016cb83f0975c1c9cc0d7965e411f8 Mon Sep 17 00:00:00 2001 From: Dhruv Nair Date: Wed, 18 Dec 2024 18:32:36 +0530 Subject: [PATCH] Fix Doc links in GGUF and Quantization overview docs (#10279) * update * Update docs/source/en/quantization/gguf.md Co-authored-by: Aryan --------- Co-authored-by: Aryan --- docs/source/en/quantization/gguf.md | 4 ++-- docs/source/en/quantization/overview.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/en/quantization/gguf.md b/docs/source/en/quantization/gguf.md index dbcd1b1486..2ff2a92931 100644 --- a/docs/source/en/quantization/gguf.md +++ b/docs/source/en/quantization/gguf.md @@ -25,9 +25,9 @@ pip install -U gguf Since GGUF is a single file format, use [`~FromSingleFileMixin.from_single_file`] to load the model and pass in the [`GGUFQuantizationConfig`]. -When using GGUF checkpoints, the quantized weights remain in a low memory `dtype`(typically `torch.unint8`) and are dynamically dequantized and cast to the configured `compute_dtype` during each module's forward pass through the model. The `GGUFQuantizationConfig` allows you to set the `compute_dtype`. +When using GGUF checkpoints, the quantized weights remain in a low memory `dtype`(typically `torch.uint8`) and are dynamically dequantized and cast to the configured `compute_dtype` during each module's forward pass through the model. The `GGUFQuantizationConfig` allows you to set the `compute_dtype`. -The functions used for dynamic dequantizatation are based on the great work done by [city96](https://github.com/city96/ComfyUI-GGUF), who created the Pytorch ports of the original (`numpy`)[https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/quants.py] implementation by [compilade](https://github.com/compilade). +The functions used for dynamic dequantizatation are based on the great work done by [city96](https://github.com/city96/ComfyUI-GGUF), who created the Pytorch ports of the original [`numpy`](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/quants.py) implementation by [compilade](https://github.com/compilade). ```python import torch diff --git a/docs/source/en/quantization/overview.md b/docs/source/en/quantization/overview.md index 6c2df7514d..3eef5238f1 100644 --- a/docs/source/en/quantization/overview.md +++ b/docs/source/en/quantization/overview.md @@ -33,8 +33,8 @@ If you are new to the quantization field, we recommend you to check out these be ## When to use what? Diffusers currently supports the following quantization methods. -- [BitsandBytes]() -- [TorchAO]() -- [GGUF]() +- [BitsandBytes](./bitsandbytes.md) +- [TorchAO](./torchao.md) +- [GGUF](./gguf.md) [This resource](https://huggingface.co/docs/transformers/main/en/quantization/overview#when-to-use-what) provides a good overview of the pros and cons of different quantization techniques.