From ed2a3584ab2ee59676eb95c884ba2d7bb831f41d Mon Sep 17 00:00:00 2001 From: Zhao Shenyang Date: Wed, 19 Jul 2023 02:56:13 +0800 Subject: [PATCH] Docs/bentoml integration (#4090) * docs: first draft of BentoML integration * Update the diffusers doc * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add BentoML integration guide under Optimization section * restyle codes --------- Co-authored-by: Sherlock113 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/_toctree.yml | 2 + docs/source/en/optimization/bentoml.mdx | 200 ++++++++++++++++++++++++ 2 files changed, 202 insertions(+) create mode 100644 docs/source/en/optimization/bentoml.mdx diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index b60933c052..4cf4cf36e4 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -117,6 +117,8 @@ title: Habana Gaudi - local: optimization/tome title: Token Merging + - local: optimization/bentoml + title: BentoML Integration title: Optimization/Special Hardware - sections: - local: conceptual/philosophy diff --git a/docs/source/en/optimization/bentoml.mdx b/docs/source/en/optimization/bentoml.mdx new file mode 100644 index 0000000000..b75ff84311 --- /dev/null +++ b/docs/source/en/optimization/bentoml.mdx @@ -0,0 +1,200 @@ + + +# BentoML Integration Guide + +[[open-in-colab]] + +[BentoML](https://github.com/bentoml/BentoML/) is an open-source framework designed for building, +shipping, and scaling AI applications. It allows users to easily package and serve diffusion models +for production, ensuring reliable and efficient deployments. It features out-of-the-box operational +management tools like monitoring and tracing, and facilitates the deployment to various cloud platforms +with ease. BentoML's distributed architecture and the separation of API server logic from +model inference logic enable efficient scaling of deployments, even with budget constraints. +As a result, integrating it with Diffusers provides a valuable tool for real-world deployments. + +This tutorial demonstrates how to integrate BentoML with Diffusers. + +## Prerequisites + +- Install [Diffusers](https://huggingface.co/docs/diffusers/installation). +- Install BentoML by running `pip install bentoml`. For more information, see the [BentoML documentation](https://docs.bentoml.com). + +## Import a diffusion model + +First, you need to prepare the model. BentoML has its own [Model Store](https://docs.bentoml.com/en/latest/concepts/model.html) +for model management. Create a `download_model.py` file as below to import a diffusion model into BentoML's Model +Store: + +```py +import bentoml + +bentoml.diffusers.import_model( + "sd2.1", # Model tag in the BentoML Model Store + "stabilityai/stable-diffusion-2-1", # Hugging Face model identifier +) +``` + +This code snippet downloads the Stable Diffusion 2.1 model (using it's repo id +`stabilityai/stable-diffusion-2-1`) from the Hugging Face Hub (or use the cached download +files if the model is already downloaded) and imports it into the BentoML Model +Store with the name `sd2.1`. + +For models already fine-tuned and stored on disk, you can provide the path instead of +the repo id. + +```py +import bentoml + +bentoml.diffusers.import_model( + "sd2.1-local", + "./local_stable_diffusion_2.1/", +) +``` + +You can view the model in the Model Store: + +``` +bentoml models list + +Tag Module Size Creation Time +sd2.1:ysrlmubascajwnry bentoml.diffusers 33.85 GiB 2023-07-12 16:47:44 +``` + +## Turn a diffusion model into a RESTful service with BentoML + +Once the diffusion model is in BentoML's Model Store, you can implement a text-to-image +service with it. The Stable Diffusion model accepts various arguments +in addition to the required prompt to guide the image generation process. +To validate these input arguments, use BentoML's [pydantic](https://github.com/pydantic/pydantic) integration. +Create a `sdargs.py` file with an example pydantic model: + +```py +import typing as t + +from pydantic import BaseModel + + +class SDArgs(BaseModel): + prompt: str + negative_prompt: t.Optional[str] = None + height: t.Optional[int] = 512 + width: t.Optional[int] = 512 + + class Config: + extra = "allow" +``` + +This pydantic model requires a string field `prompt` and three optional fields: `height`, `width`, and `negative_prompt`, +each with corresponding types. The `extra = "allow"` line supports adding additional fields not defined in the `SDArgs` class. +In a real-world scenario, you may define all the desired fields and not allow extra ones. + +Next, create a BentoML Service file that defines a Stable Diffusion service: + +```py +import bentoml +from bentoml.io import Image, JSON + +from sdargs import SDArgs + +bento_model = bentoml.diffusers.get("sd2.1:latest") +sd21_runner = bento_model.to_runner(name="sd21-runner") + +svc = bentoml.Service("stable-diffusion-21", runners=[sd21_runner]) + + +@svc.api(input=JSON(pydantic_model=SDArgs), output=Image()) +async def txt2img(input_data): + kwargs = input_data.dict() + res = await sd21_runner.async_run(**kwargs) + images = res[0] + return images[0] +``` + +Save the file as `service.py`, and spin up a BentoML Service endpoint using: + +``` +bentoml serve service:svc +``` + +An HTTP server with `/txt2img` endpoint that accepts a JSON dictionary should be up at +port 3000. Go to in your web browser to access the Swagger UI. + +You can also test the text-to-image generation using `curl` and write the returned image to +`output.jpg`. + +``` +curl -X POST http://127.0.0.1:3000/txt2img \ + -H 'Content-Type: application/json' \ + -d "{\"prompt\":\"a black cat\", \"height\":768, \"width\":768}" \ + --output output.jpg +``` + +## Package a BentoML Service for cloud deployment + +To deploy a BentoML Service, you need to pack it into a BentoML +[Bento](https://docs.bentoml.com/en/latest/concepts/bento.html), a file archive with all the source code, +models, data files, and dependencies. This can be done by providing a `bentofile.yaml` file as follows: + +```yaml +service: "service.py:svc" +include: + - "service.py" +python: + packages: + - torch + - transformers + - accelerate + - diffusers + - triton + - xformers + - pydantic +docker: + distro: debian + cuda_version: "11.6" +``` + +The `bentofile.yaml` file contains [Bento build +options](https://docs.bentoml.com/en/latest/concepts/bento.html#bento-build-options), +such as package dependencies and Docker options. + +Then you build a Bento using: + +``` +bentoml build +``` + +The output looks like: + +``` +Successfully built Bento(tag="stable-diffusion-21:crkuh7a7rw5bcasc"). + +Possible next steps: + + * Containerize your Bento with `bentoml containerize`: + $ bentoml containerize stable-diffusion-21:crkuh7a7rw5bcasc + + * Push to BentoCloud with `bentoml push`: + $ bentoml push stable-diffusion-21:crkuh7a7rw5bcasc +``` + +You can create a Docker image based on the Bento by running the following command and deploy it to a cloud provider. + +``` +bentoml containerize stable-diffusion-21:crkuh7a7rw5bcasc +``` + +If you want an end-to-end solution for deploying and managing models, you can push the Bento to [Yatai](https://github.com/bentoml/Yatai) or +[BentoCloud](https://bentoml.com/cloud) for a distributed deployment. + +For more information about BentoML's integration with Diffusers, see the [BentoML Diffusers +Guide](https://docs.bentoml.com/en/latest/frameworks/diffusers.html).