* WIP controlnet training
- bugfix --streaming
- bugfix running report_to!='wandb'
- adds memory profile before validation
* Adds final logging statement.
* Sets train epochs to 11.
Looking at a longer ~16ep run, we see only good validation images
after ~11ep:
https://wandb.ai/andsteing/controlnet_fill50k/runs/3j2hx6n8
* Removes --logging_dir (it's not used).
* Adds --profile flags.
* Updates --output_dir=runs/fill-circle-{timestamp}.
* Compute mean of `train_metrics`.
Previously `train_metrics[-1]` was logged, resulting in very bumpy train
metrics.
* Improves logging a bit.
- adds l2_grads gradient norm logging
- adds steps_per_sec
- sets walltime as x coordinate of train/step
- logs controlnet_params config
* Adds --ccache (doesn't really help though).
* minor fix in controlnet flax example (#2986)
* fix the error when push_to_hub but not log validation
* contronet_from_pt & controlnet_revision
* add intermediate checkpointing to the guide
* Bugfix --profile_steps
* Sets `RACKER_PROJECT_NAME='controlnet_fill50k'`.
* Logs fractional epoch.
* Adds relative `walltime` metric.
* Adds `StepTraceAnnotation` and uses `global_step` insetad of `step`.
* Applied `black`.
* Streamlines commands in README a bit.
* Removes `--ccache`.
This makes only a very small difference (~1 min) with this model size, so removing
the option introduced in cdb3cc.
* Re-ran `black`.
* Update examples/controlnet/README.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Converts spaces to tab.
* Removes repeated args.
* Skips first step (compilation) in profiling
* Updates README with profiling instructions.
* Unifies tabs/spaces in README.
* Re-ran style & quality.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>