Skip to main content

🐳 Docker, Deploying LiteLLM Proxy

You can find the Dockerfile to build litellm proxy here

Quick Start​

To start using Litellm, run the following commands in a shell:

# Get the code
git clone https://github.com/BerriAI/litellm

# Go to folder
cd litellm

# Add the master key - you can change this after setup
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env

# Add the litellm salt key - you cannot change this after adding a model
# It is used to encrypt / decrypt your LLM API Key credentials
# We recommned - https://1password.com/password-generator/
# password generator to get a random hash for litellm salt key
echo 'LITELLM_SALT_KEY="sk-1234"' > .env

source .env

# Start
docker-compose up

Step 1. CREATE config.yaml​

Example litellm_config.yaml

model_list:
- model_name: azure-gpt-3.5
litellm_params:
model: azure/<your-azure-model-deployment>
api_base: os.environ/AZURE_API_BASE # runs os.getenv("AZURE_API_BASE")
api_key: os.environ/AZURE_API_KEY # runs os.getenv("AZURE_API_KEY")
api_version: "2023-07-01-preview"

Step 2. RUN Docker Image​

docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-e AZURE_API_KEY=d6*********** \
-e AZURE_API_BASE=https://openai-***********/ \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml --detailed_debug

Get Latest Image πŸ‘‰ here

Step 3. TEST Request​

Pass model=azure-gpt-3.5 this was set on step 1

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "azure-gpt-3.5",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

That's it ! That's the quick start to deploy litellm

Use with Langchain, OpenAI SDK, LlamaIndex, Instructor, Curl​

info

πŸ’‘ Go here πŸ‘‰ to make your first LLM API Request

LiteLLM is compatible with several SDKs - including OpenAI SDK, Anthropic SDK, Mistral SDK, LLamaIndex, Langchain (Js, Python)

Options to deploy LiteLLM​

DocsWhen to Use
Quick Startcall 100+ LLMs + Load Balancing
Deploy with Database+ use Virtual Keys + Track Spend (Note: When deploying with a database providing a DATABASE_URL and LITELLM_MASTER_KEY are required in your env )
LiteLLM container + Redis+ load balance across multiple litellm containers
LiteLLM Database container + PostgresDB + Redis+ use Virtual Keys + Track Spend + load balance across multiple litellm containers

Deploy with Database​

Docker, Kubernetes, Helm Chart​

Requirements:

  • Need a postgres database (e.g. Supabase, Neon, etc) Set DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> in your env
  • Set a LITELLM_MASTER_KEY, this is your Proxy Admin key - you can use this to create other keys (🚨 must start with sk-)

We maintain a separate Dockerfile for reducing build time when running LiteLLM proxy with a connected Postgres Database

docker pull ghcr.io/berriai/litellm-database:main-latest
docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-e LITELLM_MASTER_KEY=sk-1234 \
-e DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> \
-e AZURE_API_KEY=d6*********** \
-e AZURE_API_BASE=https://openai-***********/ \
-p 4000:4000 \
ghcr.io/berriai/litellm-database:main-latest \
--config /app/config.yaml --detailed_debug

Your LiteLLM Proxy Server is now running on http://0.0.0.0:4000.

LiteLLM container + Redis​

Use Redis when you need litellm to load balance across multiple litellm containers

The only change required is setting Redis on your config.yaml LiteLLM Proxy supports sharing rpm/tpm shared across multiple litellm instances, pass redis_host, redis_password and redis_port to enable this. (LiteLLM will use Redis to track rpm/tpm usage )

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-endpoint>
api_key: <your-azure-api-key>
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>
rpm: 6
router_settings:
redis_host: <your redis host>
redis_password: <your redis password>
redis_port: 1992

Start docker container with config

docker run ghcr.io/berriai/litellm:main-latest --config your_config.yaml

LiteLLM Database container + PostgresDB + Redis​

The only change required is setting Redis on your config.yaml LiteLLM Proxy supports sharing rpm/tpm shared across multiple litellm instances, pass redis_host, redis_password and redis_port to enable this. (LiteLLM will use Redis to track rpm/tpm usage )

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-endpoint>
api_key: <your-azure-api-key>
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>
rpm: 6
router_settings:
redis_host: <your redis host>
redis_password: <your redis password>
redis_port: 1992

Start litellm-databasedocker container with config

docker run --name litellm-proxy \
-e DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> \
-p 4000:4000 \
ghcr.io/berriai/litellm-database:main-latest --config your_config.yaml

LiteLLM without Internet Connection​

By default prisma generate downloads prisma's engine binaries. This might cause errors when running without internet connection.

Use this dockerfile to build an image which pre-generates the prisma binaries.

# Use the provided base image
FROM ghcr.io/berriai/litellm:main-latest

# Set the working directory to /app
WORKDIR /app

### [πŸ‘‡ KEY STEP] ###
# Install Prisma CLI and generate Prisma client
RUN pip install prisma
RUN prisma generate
### FIN ####


# Expose the necessary port
EXPOSE 4000

# Override the CMD instruction with your desired command and arguments
# WARNING: FOR PROD DO NOT USE `--detailed_debug` it slows down response times, instead use the following CMD
# CMD ["--port", "4000", "--config", "config.yaml"]

# Define the command to run your app
ENTRYPOINT ["litellm"]

CMD ["--port", "4000"]

Advanced Deployment Settings​

1. Customization of the server root path (custom Proxy base url)​

πŸ’₯ Use this when you want to serve LiteLLM on a custom base url path like https://localhost:4000/api/v1

info

In a Kubernetes deployment, it's possible to utilize a shared DNS to host multiple applications by modifying the virtual service

Customize the root path to eliminate the need for employing multiple DNS configurations during deployment.

Step 1. πŸ‘‰ Set SERVER_ROOT_PATH in your .env and this will be set as your server root path

export SERVER_ROOT_PATH="/api/v1"

Step 2 (If you want the Proxy Admin UI to work with your root path you need to use this dockerfile)

  • Use the dockerfile below (it uses litellm as a base image)
  • πŸ‘‰ Set UI_BASE_PATH=$SERVER_ROOT_PATH/ui in the Dockerfile, example UI_BASE_PATH=/api/v1/ui

Dockerfile

# Use the provided base image
FROM ghcr.io/berriai/litellm:main-latest

# Set the working directory to /app
WORKDIR /app

# Install Node.js and npm (adjust version as needed)
RUN apt-get update && apt-get install -y nodejs npm

# Copy the UI source into the container
COPY ./ui/litellm-dashboard /app/ui/litellm-dashboard

# Set an environment variable for UI_BASE_PATH
# This can be overridden at build time
# set UI_BASE_PATH to "<your server root path>/ui"
# πŸ‘‡πŸ‘‡ Enter your UI_BASE_PATH here
ENV UI_BASE_PATH="/api/v1/ui"

# Build the UI with the specified UI_BASE_PATH
WORKDIR /app/ui/litellm-dashboard
RUN npm install
RUN UI_BASE_PATH=$UI_BASE_PATH npm run build

# Create the destination directory
RUN mkdir -p /app/litellm/proxy/_experimental/out

# Move the built files to the appropriate location
# Assuming the build output is in ./out directory
RUN rm -rf /app/litellm/proxy/_experimental/out/* && \
mv ./out/* /app/litellm/proxy/_experimental/out/

# Switch back to the main app directory
WORKDIR /app

# Make sure your entrypoint.sh is executable
RUN chmod +x entrypoint.sh

# Expose the necessary port
EXPOSE 4000/tcp

# Override the CMD instruction with your desired command and arguments
# only use --detailed_debug for debugging
CMD ["--port", "4000", "--config", "config.yaml"]

Step 3 build this Dockerfile

docker build -f Dockerfile -t litellm-prod-build . --progress=plain

Step 4. Run Proxy with SERVER_ROOT_PATH set in your env

docker run \
-v $(pwd)/proxy_config.yaml:/app/config.yaml \
-p 4000:4000 \
-e LITELLM_LOG="DEBUG"\
-e SERVER_ROOT_PATH="/api/v1"\
-e DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> \
-e LITELLM_MASTER_KEY="sk-1234"\
litellm-prod-build \
--config /app/config.yaml

After running the proxy you can access it on http://0.0.0.0:4000/api/v1/ (since we set SERVER_ROOT_PATH="/api/v1")

Step 5. Verify Running on correct path

That's it, that's all you need to run the proxy on a custom root path

2. Setting SSL Certification​

Use this, If you need to set ssl certificates for your on prem litellm proxy

Pass ssl_keyfile_path (Path to the SSL keyfile) and ssl_certfile_path (Path to the SSL certfile) when starting litellm proxy

docker run ghcr.io/berriai/litellm:main-latest \
--ssl_keyfile_path ssl_test/keyfile.key \
--ssl_certfile_path ssl_test/certfile.crt

Provide an ssl certificate when starting litellm proxy server

3. Providing LiteLLM config.yaml file as a s3, GCS Bucket Object/url​

Use this if you cannot mount a config file on your deployment service (example - AWS Fargate, Railway etc)

LiteLLM Proxy will read your config.yaml from an s3 Bucket or GCS Bucket

Set the following .env vars

LITELLM_CONFIG_BUCKET_TYPE = "gcs"                              # set this to "gcs"         
LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy" # your bucket name on GCS
LITELLM_CONFIG_BUCKET_OBJECT_KEY = "proxy_config.yaml" # object key on GCS

Start litellm proxy with these env vars - litellm will read your config from GCS

docker run --name litellm-proxy \
-e DATABASE_URL=<database_url> \
-e LITELLM_CONFIG_BUCKET_NAME=<bucket_name> \
-e LITELLM_CONFIG_BUCKET_OBJECT_KEY="<object_key>> \
-e LITELLM_CONFIG_BUCKET_TYPE="gcs" \
-p 4000:4000 \
ghcr.io/berriai/litellm-database:main-latest --detailed_debug

Platform-specific Guide​

Kubernetes - Deploy on EKS​

Step1. Create an EKS Cluster with the following spec

eksctl create cluster --name=litellm-cluster --region=us-west-2 --node-type=t2.small

Step 2. Mount litellm proxy config on kub cluster

This will mount your local file called proxy_config.yaml on kubernetes cluster

kubectl create configmap litellm-config --from-file=proxy_config.yaml

Step 3. Apply kub.yaml and service.yaml Clone the following kub.yaml and service.yaml files and apply locally

Apply kub.yaml

kubectl apply -f kub.yaml

Apply service.yaml - creates an AWS load balancer to expose the proxy

kubectl apply -f service.yaml

# service/litellm-service created

Step 4. Get Proxy Base URL

kubectl get services

# litellm-service LoadBalancer 10.100.6.31 a472dc7c273fd47fd******.us-west-2.elb.amazonaws.com 4000:30374/TCP 63m

Proxy Base URL = a472dc7c273fd47fd******.us-west-2.elb.amazonaws.com:4000

That's it, now you can start using LiteLLM Proxy

Extras​

Run with docker compose​

Step 1

Here's an example docker-compose.yml file

version: "3.9"
services:
litellm:
build:
context: .
args:
target: runtime
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000" # Map the container port to the host, change the host port if necessary
volumes:
- ./litellm-config.yaml:/app/config.yaml # Mount the local configuration file
# You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
command: [ "--config", "/app/config.yaml", "--port", "4000", "--num_workers", "8" ]

# ...rest of your docker-compose config if any

Step 2

Create a litellm-config.yaml file with your LiteLLM config relative to your docker-compose.yml file.

Check the config doc here

Step 3

Run the command docker-compose up or docker compose up as per your docker installation.

Use -d flag to run the container in detached mode (background) e.g. docker compose up -d

Your LiteLLM container should be running now on the defined port e.g. 4000.

IAM-based Auth for RDS DB​

  1. Set AWS env var
export AWS_WEB_IDENTITY_TOKEN='/path/to/token'
export AWS_ROLE_NAME='arn:aws:iam::123456789012:role/MyRole'
export AWS_SESSION_NAME='MySession'

See all Auth options

  1. Add RDS credentials to env
export DATABASE_USER="db-user"
export DATABASE_PORT="5432"
export DATABASE_HOST="database-1-instance-1.cs1ksmwz2xt3.us-west-2.rds.amazonaws.com"
export DATABASE_NAME="database-1-instance-1"
  1. Run proxy with iam+rds
litellm --config /path/to/config.yaml --iam_token_db_auth