mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Files

Tolga Cangöz b88fef4785 [Research Project] Add AnyText: Multilingual Visual Text Generation And Editing (#8998 )

* Add initial template

* Second template

* feat: Add TextEmbeddingModule to AnyTextPipeline

* feat: Add AuxiliaryLatentModule template to AnyTextPipeline

* Add bert tokenizer from the anytext repo for now

* feat: Update AnyTextPipeline's modify_prompt method

This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe.

* Fill in the `forward` pass of `AuxiliaryLatentModule`

* `make style && make quality`

* `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library`

* Update error handling to raise and logging

* Add `create_glyph_lines` function into `TextEmbeddingModule`

* make style

* Up

* Up

* Up

* Up

* Remove several comments

* refactor: Remove ControlNetConditioningEmbedding and update code accordingly

* Up

* Up

* up

* refactor: Update AnyTextPipeline to include new optional parameters

* up

* feat: Add OCR model and its components

* chore: Update `TextEmbeddingModule` to include OCR model components and dependencies

* chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task

* `make style`

* refactor: Update `AnyTextPipeline`'s docstring

* Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once

* simplify

* `make style`

* Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function

* Simplify for now

* `make style`

* Up

* feat: Add scripts to convert AnyText controlnet to diffusers

* `make style`

* Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule`

* make style

* Up

* Simplify

* Up

* feat: Add safetensors module for loading model file

* Fix device issues

* Up

* Up

* refactor: Simplify

* refactor: Simplify code for loading models and handling data types

* `make style`

* refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule

* refactor: Update dtype in embedding_manager.py to match proj.weight

* Up

* Add attribution and adaptation information to pipeline_anytext.py

* Update usage example

* Will refactor `controlnet_cond_embedding` initialization

* Add `AnyTextControlNetConditioningEmbedding` template

* Refactor organization

* style

* style

* Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding`

* Follow one-file policy

* style

* [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel

* [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py

* [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py

* Refactor AnyTextControlNet to use configurable conditioning embedding channels

* Complete control net conditioning embedding in AnyTextControlNetModel

* up

* [FIX] Ensure embeddings use correct device in AnyTextControlNetModel

* up

* up

* style

* [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline

* [UPDATE] Update example code in anytext.py to use correct font file and improve clarity

* down

* [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing

* update pillow

* [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity

* [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file

* [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency

* 🆙

* style

* [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py

* style

* Update examples/research_projects/anytext/README.md

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Remove commented-out image preparation code in AnyTextPipeline

* Remove unnecessary blank line in README.md

2025-03-11 01:49:37 +05:30

anytext

[Research Project] Add AnyText: Multilingual Visual Text Generation And Editing (#8998 )

2025-03-11 01:49:37 +05:30

autoencoderkl

create a script to train autoencoderkl (#10605 )

2025-01-27 16:41:34 +05:30

colossalai

fix: Fixed few docstrings according to the Google Style Guide (#7717 )

2024-05-20 10:26:05 -07:00

consistency_training

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

controlnet

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

diffusion_dpo

[docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8 (#9428 )

2024-09-16 10:18:45 -07:00

diffusion_orpo

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

dreambooth_inpaint

Errata - Trim trailing white space in the whole repo (#8575 )

2024-06-24 18:39:15 +05:30

flux_lora_quantization

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

geodiff

#7535 Update FloatTensor type hints to Tensor (#7883 )

2024-05-10 09:53:31 -10:00

gligen

Update CLIPFeatureExtractor to CLIPImageProcessor and DPTFeatureExtractor to DPTImageProcessor (#9002 )

2024-08-05 09:20:29 -10:00

instructpix2pix_lora

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

intel_opts

Errata - Fix typos and improve style (#8571 )

2024-06-24 10:07:22 -07:00

ip_adapter

Move IP Adapter Scripts to research project (#9960 )

2024-11-19 10:37:22 -08:00

lora

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

multi_subject_dreambooth

Errata - Fix typos and improve style (#8571 )

2024-06-24 10:07:22 -07:00

multi_subject_dreambooth_inpainting

changed w&b report link (#6387 )

2023-12-29 19:49:11 +05:30

multi_token_textual_inversion

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

onnxruntime

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

pixart

Refactor gradient checkpointing (#10611 )

2025-01-28 06:51:46 +05:30

promptdiffusion

bugfix for npu not support float64 (#10123 )

2025-01-20 09:35:24 -10:00

pytorch_xla

implementing flux on TPUs with ptxla (#10515 )

2025-01-16 08:46:02 -10:00

rdm

Use pipelines without vae (#10441 )

2025-01-07 13:26:51 -10:00

realfill

Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill (#10984 )

2025-03-06 11:59:51 +00:00

scheduled_huber_loss_training

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

sd3_lora_colab

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

sdxl_flax

Fix typos (#9077 )

2024-08-05 09:00:08 -10:00

vae

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

wuerstchen/text_to_image

[chore] change licensing to 2025 from 2024. (#10615 )

2025-01-20 16:57:27 -10:00

README.md

Errata (#8322 )

2024-06-05 13:59:09 -07:00

README.md

Research projects

This folder contains various research projects using 🧨 Diffusers. They are not really maintained by the core maintainers of this library and often require a specific version of Diffusers that is indicated in the requirements file of each folder. Updating them to the most recent version of the library will require some work.

To use any of them, just run the command

pip install -r requirements.txt

inside the folder of your choice.

If you need help with any of those, please open an issue where you directly ping the author(s), as indicated at the top of the README of each folder.