mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
[docs] add notes for stateful model changes (#3252)
* [docs] add notes for stateful model changes * Update docs/source/en/optimization/fp16.mdx Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * link to accelerate docs for discarding hooks --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
This commit is contained in:
@@ -202,6 +202,8 @@ image = pipe(prompt).images[0]
|
||||
|
||||
**Note**: When using `enable_sequential_cpu_offload()`, it is important to **not** move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal. See [this issue](https://github.com/huggingface/diffusers/issues/1934) for more information.
|
||||
|
||||
**Note**: `enable_sequential_cpu_offload()` is a stateful operation that installs hooks on the models.
|
||||
|
||||
|
||||
<a name="model_offloading"></a>
|
||||
## Model offloading for fast inference and memory savings
|
||||
@@ -251,6 +253,11 @@ image = pipe(prompt).images[0]
|
||||
This feature requires `accelerate` version 0.17.0 or larger.
|
||||
</Tip>
|
||||
|
||||
**Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly offload
|
||||
models after they are called, it is required that the entire pipeline is run and models are called in the order the pipeline expects them to be. Exercise caution
|
||||
if models are re-used outside the context of the pipeline after hooks have been installed. See [accelerate](https://huggingface.co/docs/accelerate/v0.18.0/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module)
|
||||
for further docs on removing hooks.
|
||||
|
||||
## Using Channels Last memory format
|
||||
|
||||
Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension (aka storing images pixel-per-pixel). Since not all operators currently support channels last format it may result in a worst performance, so it's better to try it and see if it works for your model.
|
||||
|
||||
Reference in New Issue
Block a user