mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Files

Bubbliiiing 5e3b7d2d8a Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626 )

* Update EasyAnimate V5.1

* Add docs && add tests && Fix comments problems in transformer3d and vae

* delete comments and remove useless import

* delete process

* Update EXAMPLE_DOC_STRING

* rename transformer file

* make fix-copies

* make style

* refactor pt. 1

* update toctree.yml

* add model tests

* Update layer_norm for norm_added_q and norm_added_k in Attention

* Fix processor problem

* refactor vae

* Fix problem in comments

* refactor tiling; remove einops dependency

* fix docs path

* make fix-copies

* Update src/diffusers/pipelines/easyanimate/pipeline_easyanimate_control.py

* update _toctree.yml

* fix test

* update

* update

* update

* make fix-copies

* fix tests

---------

Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

2025-03-03 18:37:19 +05:30

4.4 KiB

Raw Permalink Blame History

EasyAnimate

EasyAnimate by Alibaba PAI.

The description from it's GitHub page: EasyAnimate is a pipeline based on the transformer architecture, designed for generating AI images and videos, and for training baseline models and Lora models for Diffusion Transformer. We support direct prediction from pre-trained EasyAnimate models, allowing for the generation of videos with various resolutions, approximately 6 seconds in length, at 8fps (EasyAnimateV5.1, 1 to 49 frames). Additionally, users can train their own baseline and Lora models for specific style transformations.

This pipeline was contributed by bubbliiiing. The original codebase can be found here. The original weights can be found under hf.co/alibaba-pai.

There are two official EasyAnimate checkpoints for text-to-video and video-to-video.

checkpoints	recommended inference dtype
`alibaba-pai/EasyAnimateV5.1-12b-zh`	torch.float16
`alibaba-pai/EasyAnimateV5.1-12b-zh-InP`	torch.float16

There is one official EasyAnimate checkpoints available for image-to-video and video-to-video.

checkpoints	recommended inference dtype
`alibaba-pai/EasyAnimateV5.1-12b-zh-InP`	torch.float16

There are two official EasyAnimate checkpoints available for control-to-video.

checkpoints	recommended inference dtype
`alibaba-pai/EasyAnimateV5.1-12b-zh-Control`	torch.float16
`alibaba-pai/EasyAnimateV5.1-12b-zh-Control-Camera`	torch.float16

For the EasyAnimateV5.1 series:

Text-to-video (T2V) and Image-to-video (I2V) works for multiple resolutions. The width and height can vary from 256 to 1024.
Both T2V and I2V models support generation with 1~49 frames and work best at this value. Exporting videos at 8 FPS is recommended.

Quantization

Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.

Refer to the Quantization overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [EasyAnimatePipeline] for inference with bitsandbytes.

import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, EasyAnimateTransformer3DModel, EasyAnimatePipeline
from diffusers.utils import export_to_video

quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
transformer_8bit = EasyAnimateTransformer3DModel.from_pretrained(
    "alibaba-pai/EasyAnimateV5.1-12b-zh",
    subfolder="transformer",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
)

pipeline = EasyAnimatePipeline.from_pretrained(
    "alibaba-pai/EasyAnimateV5.1-12b-zh",
    transformer=transformer_8bit,
    torch_dtype=torch.float16,
    device_map="balanced",
)

prompt = "A cat walks on the grass, realistic style."
negative_prompt = "bad detailed"
video = pipeline(prompt=prompt, negative_prompt=negative_prompt, num_frames=49, num_inference_steps=30).frames[0]
export_to_video(video, "cat.mp4", fps=8)

EasyAnimatePipeline

autodoc EasyAnimatePipeline

all
call

EasyAnimatePipelineOutput

autodoc pipelines.easyanimate.pipeline_output.EasyAnimatePipelineOutput

4.4 KiB Raw Permalink Blame History

EasyAnimate

Quantization

EasyAnimatePipeline

EasyAnimatePipelineOutput

4.4 KiB

Raw Permalink Blame History