From ed2a3584ab2ee59676eb95c884ba2d7bb831f41d Mon Sep 17 00:00:00 2001
From: Zhao Shenyang <dev@zsy.im>
Date: Wed, 19 Jul 2023 02:56:13 +0800
Subject: [PATCH] Docs/bentoml integration (#4090)

* docs: first draft of BentoML integration

* Update the diffusers doc

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add BentoML integration guide under Optimization section

* restyle codes

---------

Co-authored-by: Sherlock113 <sherlockxu07@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/_toctree.yml             |   2 +
 docs/source/en/optimization/bentoml.mdx | 200 ++++++++++++++++++++++++
 2 files changed, 202 insertions(+)
 create mode 100644 docs/source/en/optimization/bentoml.mdx

diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
index b60933c052..4cf4cf36e4 100644
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -117,6 +117,8 @@
     title: Habana Gaudi
   - local: optimization/tome
     title: Token Merging
+  - local: optimization/bentoml
+    title: BentoML Integration
   title: Optimization/Special Hardware
 - sections:
   - local: conceptual/philosophy
diff --git a/docs/source/en/optimization/bentoml.mdx b/docs/source/en/optimization/bentoml.mdx
new file mode 100644
index 0000000000..b75ff84311
--- /dev/null
+++ b/docs/source/en/optimization/bentoml.mdx
@@ -0,0 +1,200 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# BentoML Integration Guide
+
+[[open-in-colab]]
+
+[BentoML](https://github.com/bentoml/BentoML/) is an open-source framework designed for building,
+shipping, and scaling AI applications. It allows users to easily package and serve diffusion models
+for production, ensuring reliable and efficient deployments. It features out-of-the-box operational
+management tools like monitoring and tracing, and facilitates the deployment to various cloud platforms
+with ease. BentoML's distributed architecture and the separation of API server logic from
+model inference logic enable efficient scaling of deployments, even with budget constraints.
+As a result, integrating it with Diffusers provides a valuable tool for real-world deployments.
+
+This tutorial demonstrates how to integrate BentoML with Diffusers.
+
+## Prerequisites
+
+- Install [Diffusers](https://huggingface.co/docs/diffusers/installation).
+- Install BentoML by running `pip install bentoml`. For more information, see the [BentoML documentation](https://docs.bentoml.com).
+
+## Import a diffusion model
+
+First, you need to prepare the model. BentoML has its own [Model Store](https://docs.bentoml.com/en/latest/concepts/model.html)
+for model management. Create a `download_model.py` file as below to import a diffusion model into BentoML's Model
+Store:
+
+```py
+import bentoml
+
+bentoml.diffusers.import_model(
+    "sd2.1",  # Model tag in the BentoML Model Store
+    "stabilityai/stable-diffusion-2-1",  # Hugging Face model identifier
+)
+```
+
+This code snippet downloads the Stable Diffusion 2.1 model (using it's repo id
+`stabilityai/stable-diffusion-2-1`) from the Hugging Face Hub (or use the cached download
+files if the model is already downloaded) and imports it into the BentoML Model
+Store with the name `sd2.1`.
+
+For models already fine-tuned and stored on disk, you can provide the path instead of
+the repo id.
+
+```py
+import bentoml
+
+bentoml.diffusers.import_model(
+    "sd2.1-local",
+    "./local_stable_diffusion_2.1/",
+)
+```
+
+You can view the model in the Model Store:
+
+```
+bentoml models list
+
+Tag                                                                 Module                              Size       Creation Time       
+sd2.1:ysrlmubascajwnry                                              bentoml.diffusers                   33.85 GiB  2023-07-12 16:47:44 
+```
+
+## Turn a diffusion model into a RESTful service with BentoML
+
+Once the diffusion model is in BentoML's Model Store, you can implement a text-to-image
+service with it. The Stable Diffusion model accepts various arguments
+in addition to the required prompt to guide the image generation process.
+To validate these input arguments, use BentoML's [pydantic](https://github.com/pydantic/pydantic) integration.
+Create a `sdargs.py` file with an example pydantic model:
+
+```py
+import typing as t
+
+from pydantic import BaseModel
+
+
+class SDArgs(BaseModel):
+    prompt: str
+    negative_prompt: t.Optional[str] = None
+    height: t.Optional[int] = 512
+    width: t.Optional[int] = 512
+
+    class Config:
+        extra = "allow"
+```
+
+This pydantic model requires a string field `prompt` and three optional fields: `height`, `width`, and `negative_prompt`,
+each with corresponding types. The `extra = "allow"` line supports adding additional fields not defined in the `SDArgs` class.
+In a real-world scenario, you may define all the desired fields and not allow extra ones.
+
+Next, create a BentoML Service file that defines a Stable Diffusion service:
+
+```py
+import bentoml
+from bentoml.io import Image, JSON
+
+from sdargs import SDArgs
+
+bento_model = bentoml.diffusers.get("sd2.1:latest")
+sd21_runner = bento_model.to_runner(name="sd21-runner")
+
+svc = bentoml.Service("stable-diffusion-21", runners=[sd21_runner])
+
+
+@svc.api(input=JSON(pydantic_model=SDArgs), output=Image())
+async def txt2img(input_data):
+    kwargs = input_data.dict()
+    res = await sd21_runner.async_run(**kwargs)
+    images = res[0]
+    return images[0]
+```
+
+Save the file as `service.py`, and spin up a BentoML Service endpoint using:
+
+```
+bentoml serve service:svc
+```
+
+An HTTP server with `/txt2img` endpoint that accepts a JSON dictionary should be up at
+port 3000. Go to <http://127.0.0.1:3000> in your web browser to access the Swagger UI.
+
+You can also test the text-to-image generation using `curl` and write the returned image to
+`output.jpg`.
+
+```
+curl -X POST http://127.0.0.1:3000/txt2img \
+     -H 'Content-Type: application/json' \
+     -d "{\"prompt\":\"a black cat\", \"height\":768, \"width\":768}" \
+     --output output.jpg
+```
+
+## Package a BentoML Service for cloud deployment
+
+To deploy a BentoML Service, you need to pack it into a BentoML
+[Bento](https://docs.bentoml.com/en/latest/concepts/bento.html), a file archive with all the source code,
+models, data files, and dependencies. This can be done by providing a `bentofile.yaml` file as follows:
+
+```yaml
+service: "service.py:svc"
+include:
+  - "service.py"
+python:
+  packages:
+    - torch
+    - transformers
+    - accelerate
+    - diffusers
+    - triton
+    - xformers
+    - pydantic
+docker:
+    distro: debian
+    cuda_version: "11.6"
+```
+
+The `bentofile.yaml` file contains [Bento build
+options](https://docs.bentoml.com/en/latest/concepts/bento.html#bento-build-options),
+such as package dependencies and Docker options.
+
+Then you build a Bento using:
+
+```
+bentoml build
+```
+
+The output looks like:
+
+```
+Successfully built Bento(tag="stable-diffusion-21:crkuh7a7rw5bcasc").
+
+Possible next steps:
+
+ * Containerize your Bento with `bentoml containerize`:
+    $ bentoml containerize stable-diffusion-21:crkuh7a7rw5bcasc
+
+ * Push to BentoCloud with `bentoml push`:
+    $ bentoml push stable-diffusion-21:crkuh7a7rw5bcasc
+```
+
+You can create a Docker image based on the Bento by running the following command and deploy it to a cloud provider.
+
+```
+bentoml containerize stable-diffusion-21:crkuh7a7rw5bcasc
+```
+
+If you want an end-to-end solution for deploying and managing models, you can push the Bento to [Yatai](https://github.com/bentoml/Yatai) or
+[BentoCloud](https://bentoml.com/cloud) for a distributed deployment.
+
+For more information about BentoML's integration with Diffusers, see the [BentoML Diffusers
+Guide](https://docs.bentoml.com/en/latest/frameworks/diffusers.html).