* quantization config. * fix-copies * fix * modules_to_not_convert * add bitsandbytes utilities. * make progress. * fixes * quality * up * up rotary embedding refactor 2: update comments, fix dtype for use_real=False (#9312) fix notes and dtype up up * minor * up * up * fix * provide credits where due. * make configurations work. * fixes * fix * update_missing_keys * fix * fix * make it work. * fix * provide credits to transformers. * empty commit * handle to() better. * tests * change to bnb from bitsandbytes * fix tests fix slow quality tests SD3 remark fix complete int4 tests add a readme to the test files. add model cpu offload tests warning test * better safeguard. * change merging status * courtesy to transformers. * move upper. * better * make the unused kwargs warning friendlier. * harmonize changes with https://github.com/huggingface/transformers/pull/33122 * style * trainin tests * feedback part i. * Add Flux inpainting and Flux Img2Img (#9135) --------- Co-authored-by: yiyixuxu <yixu310@gmail.com> Update `UNet2DConditionModel`'s error messages (#9230) * refactor [CI] Update Single file Nightly Tests (#9357) * update * update feedback. improve README for flux dreambooth lora (#9290) * improve readme * improve readme * improve readme * improve readme fix one uncaught deprecation warning for accessing vae_latent_channels in VaeImagePreprocessor (#9372) deprecation warning vae_latent_channels add mixed int8 tests and more tests to nf4. [core] Freenoise memory improvements (#9262) * update * implement prompt interpolation * make style * resnet memory optimizations * more memory optimizations; todo: refactor * update * update animatediff controlnet with latest changes * refactor chunked inference changes * remove print statements * update * chunk -> split * remove changes from incorrect conflict resolution * remove changes from incorrect conflict resolution * add explanation of SplitInferenceModule * update docs * Revert "update docs" This reverts commitc55a50a271. * update docstring for freenoise split inference * apply suggestions from review * add tests * apply suggestions from review quantization docs. docs. * Revert "Add Flux inpainting and Flux Img2Img (#9135)" This reverts commit5799954dd4. * tests * don * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * contribution guide. * changes * empty * fix tests * harmonize with https://github.com/huggingface/transformers/pull/33546. * numpy_cosine_distance * config_dict modification. * remove if config comment. * note for load_state_dict changes. * float8 check. * quantizer. * raise an error for non-True low_cpu_mem_usage values when using quant. * low_cpu_mem_usage shenanigans when using fp32 modules. * don't re-assign _pre_quantization_type. * make comments clear. * remove comments. * handle mixed types better when moving to cpu. * add tests to check if we're throwing warning rightly. * better check. * fix 8bit test_quality. * handle dtype more robustly. * better message when keep_in_fp32_modules. * handle dtype casting. * fix dtype checks in pipeline. * fix warning message. * Update src/diffusers/models/modeling_utils.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * mitigate the confusing cpu warning --------- Co-authored-by: Vishnu V Jaddipal <95531133+Gothos@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>
2.1 KiB
Quantization
Quantization techniques focus on representing data with less information while also trying to not lose too much accuracy. This often means converting a data type to represent the same information with fewer bits. For example, if your model weights are stored as 32-bit floating points and they're quantized to 16-bit floating points, this halves the model size which makes it easier to store and reduces memory-usage. Lower precision can also speedup inference because it takes less time to perform calculations with fewer bits.
Interested in adding a new quantization method to Transformers? Refer to the Contribute new quantization method guide to learn more about adding a new quantization method.
If you are new to the quantization field, we recommend you to check out these beginner-friendly courses about quantization in collaboration with DeepLearning.AI:
When to use what?
This section will be expanded once Diffusers has multiple quantization backends. Currently, we only support bitsandbytes. This resource provides a good overview of the pros and cons of different quantization techniques.