mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-29 07:22:12 +03:00
add a tip for compile + offload
This commit is contained in:
@@ -302,6 +302,13 @@ compute-bound, [group-offloading](#group-offloading) tends to be better. Group o
|
||||
|
||||
</Tip>
|
||||
|
||||
<Tip>
|
||||
|
||||
When using offloading, users can additionally compile the diffusion transformer/unet to get a
|
||||
good speed-memory trade-off. First set `torch._dynamo.config.cache_size_limit=1000`, and then before calling the pipeline, add `pipeline.transformer.compile()`.
|
||||
|
||||
</Tip>
|
||||
|
||||
## Layerwise casting
|
||||
|
||||
Layerwise casting stores weights in a smaller data format (for example, `torch.float8_e4m3fn` and `torch.float8_e5m2`) to use less memory and upcasts those weights to a higher precision like `torch.float16` or `torch.bfloat16` for computation. Certain layers (normalization and modulation related weights) are skipped because storing them in fp8 can degrade generation quality.
|
||||
|
||||
Reference in New Issue
Block a user