1
0
mirror of https://github.com/huggingface/diffusers.git synced 2026-01-27 17:22:53 +03:00

Docs/bentoml integration (#4090)

* docs: first draft of BentoML integration

* Update the diffusers doc

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add BentoML integration guide under Optimization section

* restyle codes

---------

Co-authored-by: Sherlock113 <sherlockxu07@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
Zhao Shenyang
2023-07-19 02:56:13 +08:00
committed by GitHub
parent 3eb498e7b4
commit ed2a3584ab
2 changed files with 202 additions and 0 deletions

View File

@@ -117,6 +117,8 @@
title: Habana Gaudi
- local: optimization/tome
title: Token Merging
- local: optimization/bentoml
title: BentoML Integration
title: Optimization/Special Hardware
- sections:
- local: conceptual/philosophy

View File

@@ -0,0 +1,200 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# BentoML Integration Guide
[[open-in-colab]]
[BentoML](https://github.com/bentoml/BentoML/) is an open-source framework designed for building,
shipping, and scaling AI applications. It allows users to easily package and serve diffusion models
for production, ensuring reliable and efficient deployments. It features out-of-the-box operational
management tools like monitoring and tracing, and facilitates the deployment to various cloud platforms
with ease. BentoML's distributed architecture and the separation of API server logic from
model inference logic enable efficient scaling of deployments, even with budget constraints.
As a result, integrating it with Diffusers provides a valuable tool for real-world deployments.
This tutorial demonstrates how to integrate BentoML with Diffusers.
## Prerequisites
- Install [Diffusers](https://huggingface.co/docs/diffusers/installation).
- Install BentoML by running `pip install bentoml`. For more information, see the [BentoML documentation](https://docs.bentoml.com).
## Import a diffusion model
First, you need to prepare the model. BentoML has its own [Model Store](https://docs.bentoml.com/en/latest/concepts/model.html)
for model management. Create a `download_model.py` file as below to import a diffusion model into BentoML's Model
Store:
```py
import bentoml
bentoml.diffusers.import_model(
"sd2.1", # Model tag in the BentoML Model Store
"stabilityai/stable-diffusion-2-1", # Hugging Face model identifier
)
```
This code snippet downloads the Stable Diffusion 2.1 model (using it's repo id
`stabilityai/stable-diffusion-2-1`) from the Hugging Face Hub (or use the cached download
files if the model is already downloaded) and imports it into the BentoML Model
Store with the name `sd2.1`.
For models already fine-tuned and stored on disk, you can provide the path instead of
the repo id.
```py
import bentoml
bentoml.diffusers.import_model(
"sd2.1-local",
"./local_stable_diffusion_2.1/",
)
```
You can view the model in the Model Store:
```
bentoml models list
Tag Module Size Creation Time
sd2.1:ysrlmubascajwnry bentoml.diffusers 33.85 GiB 2023-07-12 16:47:44
```
## Turn a diffusion model into a RESTful service with BentoML
Once the diffusion model is in BentoML's Model Store, you can implement a text-to-image
service with it. The Stable Diffusion model accepts various arguments
in addition to the required prompt to guide the image generation process.
To validate these input arguments, use BentoML's [pydantic](https://github.com/pydantic/pydantic) integration.
Create a `sdargs.py` file with an example pydantic model:
```py
import typing as t
from pydantic import BaseModel
class SDArgs(BaseModel):
prompt: str
negative_prompt: t.Optional[str] = None
height: t.Optional[int] = 512
width: t.Optional[int] = 512
class Config:
extra = "allow"
```
This pydantic model requires a string field `prompt` and three optional fields: `height`, `width`, and `negative_prompt`,
each with corresponding types. The `extra = "allow"` line supports adding additional fields not defined in the `SDArgs` class.
In a real-world scenario, you may define all the desired fields and not allow extra ones.
Next, create a BentoML Service file that defines a Stable Diffusion service:
```py
import bentoml
from bentoml.io import Image, JSON
from sdargs import SDArgs
bento_model = bentoml.diffusers.get("sd2.1:latest")
sd21_runner = bento_model.to_runner(name="sd21-runner")
svc = bentoml.Service("stable-diffusion-21", runners=[sd21_runner])
@svc.api(input=JSON(pydantic_model=SDArgs), output=Image())
async def txt2img(input_data):
kwargs = input_data.dict()
res = await sd21_runner.async_run(**kwargs)
images = res[0]
return images[0]
```
Save the file as `service.py`, and spin up a BentoML Service endpoint using:
```
bentoml serve service:svc
```
An HTTP server with `/txt2img` endpoint that accepts a JSON dictionary should be up at
port 3000. Go to <http://127.0.0.1:3000> in your web browser to access the Swagger UI.
You can also test the text-to-image generation using `curl` and write the returned image to
`output.jpg`.
```
curl -X POST http://127.0.0.1:3000/txt2img \
-H 'Content-Type: application/json' \
-d "{\"prompt\":\"a black cat\", \"height\":768, \"width\":768}" \
--output output.jpg
```
## Package a BentoML Service for cloud deployment
To deploy a BentoML Service, you need to pack it into a BentoML
[Bento](https://docs.bentoml.com/en/latest/concepts/bento.html), a file archive with all the source code,
models, data files, and dependencies. This can be done by providing a `bentofile.yaml` file as follows:
```yaml
service: "service.py:svc"
include:
- "service.py"
python:
packages:
- torch
- transformers
- accelerate
- diffusers
- triton
- xformers
- pydantic
docker:
distro: debian
cuda_version: "11.6"
```
The `bentofile.yaml` file contains [Bento build
options](https://docs.bentoml.com/en/latest/concepts/bento.html#bento-build-options),
such as package dependencies and Docker options.
Then you build a Bento using:
```
bentoml build
```
The output looks like:
```
Successfully built Bento(tag="stable-diffusion-21:crkuh7a7rw5bcasc").
Possible next steps:
* Containerize your Bento with `bentoml containerize`:
$ bentoml containerize stable-diffusion-21:crkuh7a7rw5bcasc
* Push to BentoCloud with `bentoml push`:
$ bentoml push stable-diffusion-21:crkuh7a7rw5bcasc
```
You can create a Docker image based on the Bento by running the following command and deploy it to a cloud provider.
```
bentoml containerize stable-diffusion-21:crkuh7a7rw5bcasc
```
If you want an end-to-end solution for deploying and managing models, you can push the Bento to [Yatai](https://github.com/bentoml/Yatai) or
[BentoCloud](https://bentoml.com/cloud) for a distributed deployment.
For more information about BentoML's integration with Diffusers, see the [BentoML Diffusers
Guide](https://docs.bentoml.com/en/latest/frameworks/diffusers.html).