.. _deploying_with_bentoml: Deploying with BentoML ====================== `BentoML `_ allows you to deploy a large language model (LLM) server with vLLM as the backend, which exposes OpenAI-compatible endpoints. You can serve the model locally or containerize it as an OCI-complicant image and deploy it on Kubernetes. For details, see the tutorial `vLLM inference in the BentoML documentation `_.