[docs] add notes for stateful model changes (#3252)

* [docs] add notes for stateful model changes * Update docs/source/en/optimization/fp16.mdx Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * link to accelerate docs for discarding hooks --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2026-01-27 17:22:53 +03:00 · 2023-04-27 11:05:08 -07:00
parent 329d1df8f2
commit 256e6960cb
1 changed files with 7 additions and 0 deletions
--- a/docs/source/en/optimization/fp16.mdx
+++ b/docs/source/en/optimization/fp16.mdx
@@ -202,6 +202,8 @@ image = pipe(prompt).images[0]

 **Note**: When using `enable_sequential_cpu_offload()`, it is important to **not** move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal. See [this issue](https://github.com/huggingface/diffusers/issues/1934) for more information.

+**Note**: `enable_sequential_cpu_offload()` is a stateful operation that installs hooks on the models.
+

 <a name="model_offloading"></a>
 ## Model offloading for fast inference and memory savings
@@ -251,6 +253,11 @@ image = pipe(prompt).images[0]
 This feature requires `accelerate` version 0.17.0 or larger.
 </Tip>

+**Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly offload
+models after they are called, it is required that the entire pipeline is run and models are called in the order the pipeline expects them to be. Exercise caution
+if models are re-used outside the context of the pipeline after hooks have been installed. See [accelerate](https://huggingface.co/docs/accelerate/v0.18.0/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module)
+for further docs on removing hooks.
+
 ## Using Channels Last memory format

 Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension (aka storing images pixel-per-pixel). Since not all operators currently support channels last format it may result in a worst performance, so it's better to try it and see if it works for your model.