From 00b179fb1afc147f87bd311f03b1ef7d747e1792 Mon Sep 17 00:00:00 2001 From: Sayak Paul Date: Thu, 12 Jun 2025 08:49:24 +0530 Subject: [PATCH] [docs] add compilation bits to the bitsandbytes docs. (#11693) * add compilation bits to the bitsandbytes docs. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * finish --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/quantization/bitsandbytes.md | 39 +++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/docs/source/en/quantization/bitsandbytes.md b/docs/source/en/quantization/bitsandbytes.md index b1c130b792..dc095054e1 100644 --- a/docs/source/en/quantization/bitsandbytes.md +++ b/docs/source/en/quantization/bitsandbytes.md @@ -416,6 +416,45 @@ text_encoder_2_4bit.dequantize() transformer_4bit.dequantize() ``` +## torch.compile + +Speed up inference with `torch.compile`. Make sure you have the latest `bitsandbytes` installed and we also recommend installing [PyTorch nightly](https://pytorch.org/get-started/locally/). + + + +```py +torch._dynamo.config.capture_dynamic_output_shape_ops = True + +quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True) +transformer_4bit = AutoModel.from_pretrained( + "black-forest-labs/FLUX.1-dev", + subfolder="transformer", + quantization_config=quant_config, + torch_dtype=torch.float16, +) +transformer_4bit.compile(fullgraph=True) +``` + + + + +```py +quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True) +transformer_4bit = AutoModel.from_pretrained( + "black-forest-labs/FLUX.1-dev", + subfolder="transformer", + quantization_config=quant_config, + torch_dtype=torch.float16, +) +transformer_4bit.compile(fullgraph=True) +``` + + + +On an RTX 4090 with compilation, 4-bit Flux generation completed in 25.809 seconds versus 32.570 seconds without. + +Check out the [benchmarking script](https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d) for more details. + ## Resources * [End-to-end notebook showing Flux.1 Dev inference in a free-tier Colab](https://gist.github.com/sayakpaul/c76bd845b48759e11687ac550b99d8b4)