mirror of
https://github.com/huggingface/diffusers.git
synced 2026-01-27 17:22:53 +03:00
* Update EasyAnimate V5.1 * Add docs && add tests && Fix comments problems in transformer3d and vae * delete comments and remove useless import * delete process * Update EXAMPLE_DOC_STRING * rename transformer file * make fix-copies * make style * refactor pt. 1 * update toctree.yml * add model tests * Update layer_norm for norm_added_q and norm_added_k in Attention * Fix processor problem * refactor vae * Fix problem in comments * refactor tiling; remove einops dependency * fix docs path * make fix-copies * Update src/diffusers/pipelines/easyanimate/pipeline_easyanimate_control.py * update _toctree.yml * fix test * update * update * update * make fix-copies * fix tests --------- Co-authored-by: Aryan <aryan@huggingface.co> Co-authored-by: Aryan <contact.aryanvs@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
89 lines
4.4 KiB
Markdown
89 lines
4.4 KiB
Markdown
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
-->
|
|
|
|
# EasyAnimate
|
|
[EasyAnimate](https://github.com/aigc-apps/EasyAnimate) by Alibaba PAI.
|
|
|
|
The description from it's GitHub page:
|
|
*EasyAnimate is a pipeline based on the transformer architecture, designed for generating AI images and videos, and for training baseline models and Lora models for Diffusion Transformer. We support direct prediction from pre-trained EasyAnimate models, allowing for the generation of videos with various resolutions, approximately 6 seconds in length, at 8fps (EasyAnimateV5.1, 1 to 49 frames). Additionally, users can train their own baseline and Lora models for specific style transformations.*
|
|
|
|
This pipeline was contributed by [bubbliiiing](https://github.com/bubbliiiing). The original codebase can be found [here](https://huggingface.co/alibaba-pai). The original weights can be found under [hf.co/alibaba-pai](https://huggingface.co/alibaba-pai).
|
|
|
|
There are two official EasyAnimate checkpoints for text-to-video and video-to-video.
|
|
|
|
| checkpoints | recommended inference dtype |
|
|
|:---:|:---:|
|
|
| [`alibaba-pai/EasyAnimateV5.1-12b-zh`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh) | torch.float16 |
|
|
| [`alibaba-pai/EasyAnimateV5.1-12b-zh-InP`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-InP) | torch.float16 |
|
|
|
|
There is one official EasyAnimate checkpoints available for image-to-video and video-to-video.
|
|
|
|
| checkpoints | recommended inference dtype |
|
|
|:---:|:---:|
|
|
| [`alibaba-pai/EasyAnimateV5.1-12b-zh-InP`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-InP) | torch.float16 |
|
|
|
|
There are two official EasyAnimate checkpoints available for control-to-video.
|
|
|
|
| checkpoints | recommended inference dtype |
|
|
|:---:|:---:|
|
|
| [`alibaba-pai/EasyAnimateV5.1-12b-zh-Control`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-Control) | torch.float16 |
|
|
| [`alibaba-pai/EasyAnimateV5.1-12b-zh-Control-Camera`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-Control-Camera) | torch.float16 |
|
|
|
|
For the EasyAnimateV5.1 series:
|
|
- Text-to-video (T2V) and Image-to-video (I2V) works for multiple resolutions. The width and height can vary from 256 to 1024.
|
|
- Both T2V and I2V models support generation with 1~49 frames and work best at this value. Exporting videos at 8 FPS is recommended.
|
|
|
|
## Quantization
|
|
|
|
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
|
|
|
|
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`EasyAnimatePipeline`] for inference with bitsandbytes.
|
|
|
|
```py
|
|
import torch
|
|
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, EasyAnimateTransformer3DModel, EasyAnimatePipeline
|
|
from diffusers.utils import export_to_video
|
|
|
|
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
|
|
transformer_8bit = EasyAnimateTransformer3DModel.from_pretrained(
|
|
"alibaba-pai/EasyAnimateV5.1-12b-zh",
|
|
subfolder="transformer",
|
|
quantization_config=quant_config,
|
|
torch_dtype=torch.float16,
|
|
)
|
|
|
|
pipeline = EasyAnimatePipeline.from_pretrained(
|
|
"alibaba-pai/EasyAnimateV5.1-12b-zh",
|
|
transformer=transformer_8bit,
|
|
torch_dtype=torch.float16,
|
|
device_map="balanced",
|
|
)
|
|
|
|
prompt = "A cat walks on the grass, realistic style."
|
|
negative_prompt = "bad detailed"
|
|
video = pipeline(prompt=prompt, negative_prompt=negative_prompt, num_frames=49, num_inference_steps=30).frames[0]
|
|
export_to_video(video, "cat.mp4", fps=8)
|
|
```
|
|
|
|
## EasyAnimatePipeline
|
|
|
|
[[autodoc]] EasyAnimatePipeline
|
|
- all
|
|
- __call__
|
|
|
|
## EasyAnimatePipelineOutput
|
|
|
|
[[autodoc]] pipelines.easyanimate.pipeline_output.EasyAnimatePipelineOutput
|