* DataLoader will now bake in any transforms or image manipulations contained in the EXIF
Images may have rotations stored in EXIF. Training using such images will cause those transforms to be ignored while training and thus produce unexpected results
* Fixed the Dataloading EXIF issue in main DreamBooth training as well
* Run make style (black & isort)
* update IF stage I pipelines
add fixed variance schedulers and lora loading
* added kv lora attn processor
* allow loading into alternative lora attn processor
* make vae optional
* throw away predicted variance
* allow loading into added kv lora layer
* allow load T5
* allow pre compute text embeddings
* set new variance type in schedulers
* fix copies
* refactor all prompt embedding code
class prompts are now included in pre-encoding code
max tokenizer length is now configurable
embedding attention mask is now configurable
* fix for when variance type is not defined on scheduler
* do not pre compute validation prompt if not present
* add example test for if lora dreambooth
* add check for train text encoder and pre compute text embeddings
* Set --only_save_embeds to False by default
Due to how the option is named, it makes more sense to behave like this.
* Refactor only_save_embeds to save_as_full_pipeline
* EDICT pipeline initial commit
- Starting point taking from https://github.com/Joqsan/edict-diffusion
* refactor __init__() method
* minor refactoring
* refactor scheduler code
- remove scheduler and move its methods to the EDICTPipeline class
* make CFG optional
- refactor encode_prompt().
- include optional generator for sampling with vae.
- minor variable renaming
* add EDICT pipeline description to README.md
* replace preprocess() with VaeImageProcessor
* run make style and make quality commands
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* 👽 qol improvements for LoRA.
* better function name?
* fix: LoRA weight loading with the new format.
* address Patrick's comments.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* change wording around encouraging the use of load_lora_weights().
* fix: function name.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add: LoRA text encoder support for DreamBooth example.
* fix initialization.
* fix: modification call.
* add: entry in the readme.
* use dog dataset from hub.
* fix: params to clip.
* add entry to the LoRA doc.
* add: tests for lora.
* remove unnecessary list comprehension./
* Added distillation for quantization example on textual inversion.
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* refined readme and code style.
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* Update text2images.py
* refined code of model load and added compatibility check.
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* fixed code style.
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* fix C403 [*] Unnecessary `list` comprehension (rewrite as a `set` comprehension)
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
---------
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
controlnet training center crop input images to multiple of 8
The pipeline code resizes inputs to multiples of 8.
Not doing this resizing in the training script is causing
the encoded image to have different height/width dimensions
than the encoded conditioning image (which uses a separate
encoder that's part of the controlnet model).
We resize and center crop the inputs to make sure they're the
same size (as well as all other images in the batch). We also
check that the initial resolution is a multiple of 8.
* Add SD/txt2img Community Pipeline to diffusers along with TensorRT utils
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
* update installation command
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
* update tensorrt installation
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
* changes
1. Update setting of cache directory
2. Address comments: merge utils and pipeline code.
3. Address comments: Add section in README
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
* apply make style
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
---------
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* WIP controlnet training
- bugfix --streaming
- bugfix running report_to!='wandb'
- adds memory profile before validation
* Adds final logging statement.
* Sets train epochs to 11.
Looking at a longer ~16ep run, we see only good validation images
after ~11ep:
https://wandb.ai/andsteing/controlnet_fill50k/runs/3j2hx6n8
* Removes --logging_dir (it's not used).
* Adds --profile flags.
* Updates --output_dir=runs/fill-circle-{timestamp}.
* Compute mean of `train_metrics`.
Previously `train_metrics[-1]` was logged, resulting in very bumpy train
metrics.
* Improves logging a bit.
- adds l2_grads gradient norm logging
- adds steps_per_sec
- sets walltime as x coordinate of train/step
- logs controlnet_params config
* Adds --ccache (doesn't really help though).
* minor fix in controlnet flax example (#2986)
* fix the error when push_to_hub but not log validation
* contronet_from_pt & controlnet_revision
* add intermediate checkpointing to the guide
* Bugfix --profile_steps
* Sets `RACKER_PROJECT_NAME='controlnet_fill50k'`.
* Logs fractional epoch.
* Adds relative `walltime` metric.
* Adds `StepTraceAnnotation` and uses `global_step` insetad of `step`.
* Applied `black`.
* Streamlines commands in README a bit.
* Removes `--ccache`.
This makes only a very small difference (~1 min) with this model size, so removing
the option introduced in cdb3cc.
* Re-ran `black`.
* Update examples/controlnet/README.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Converts spaces to tab.
* Removes repeated args.
* Skips first step (compilation) in profiling
* Updates README with profiling instructions.
* Unifies tabs/spaces in README.
* Re-ran style & quality.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>