mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
[docs] slight edits to the attention backends docs. (#12394)
* slight edits to the attention backends docs. * Update docs/source/en/optimization/attention_backends.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License. -->
|
||||
|
||||
# Attention backends
|
||||
|
||||
> [!TIP]
|
||||
> [!NOTE]
|
||||
> The attention dispatcher is an experimental feature. Please open an issue if you have any feedback or encounter any problems.
|
||||
|
||||
Diffusers provides several optimized attention algorithms that are more memory and computationally efficient through it's *attention dispatcher*. The dispatcher acts as a router for managing and switching between different attention implementations and provides a unified interface for interacting with them.
|
||||
@@ -33,7 +33,7 @@ The [`~ModelMixin.set_attention_backend`] method iterates through all the module
|
||||
|
||||
The example below demonstrates how to enable the `_flash_3_hub` implementation for FlashAttention-3 from the [kernel](https://github.com/huggingface/kernels) library, which allows you to instantly use optimized compute kernels from the Hub without requiring any setup.
|
||||
|
||||
> [!TIP]
|
||||
> [!NOTE]
|
||||
> FlashAttention-3 is not supported for non-Hopper architectures, in which case, use FlashAttention with `set_attention_backend("flash")`.
|
||||
|
||||
```py
|
||||
@@ -78,10 +78,16 @@ with attention_backend("_flash_3_hub"):
|
||||
image = pipeline(prompt).images[0]
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> Most attention backends support `torch.compile` without graph breaks and can be used to further speed up inference.
|
||||
|
||||
## Available backends
|
||||
|
||||
Refer to the table below for a complete list of available attention backends and their variants.
|
||||
|
||||
<details>
|
||||
<summary>Expand</summary>
|
||||
|
||||
| Backend Name | Family | Description |
|
||||
|--------------|--------|-------------|
|
||||
| `native` | [PyTorch native](https://docs.pytorch.org/docs/stable/generated/torch.nn.attention.SDPBackend.html#torch.nn.attention.SDPBackend) | Default backend using PyTorch's scaled_dot_product_attention |
|
||||
@@ -104,3 +110,5 @@ Refer to the table below for a complete list of available attention backends and
|
||||
| `_sage_qk_int8_pv_fp16_cuda` | [SageAttention](https://github.com/thu-ml/SageAttention) | INT8 QK + FP16 PV (CUDA) |
|
||||
| `_sage_qk_int8_pv_fp16_triton` | [SageAttention](https://github.com/thu-ml/SageAttention) | INT8 QK + FP16 PV (Triton) |
|
||||
| `xformers` | [xFormers](https://github.com/facebookresearch/xformers) | Memory-efficient attention |
|
||||
|
||||
</details>
|
||||
Reference in New Issue
Block a user