Getting Started with SGLang
This guide shows how to set up a minimal deployment to use the TensorZero Gateway with self-hosted LLMs using SGLang.
We’re using Llama-3.1-8B-Instruct in this example, but you can use virtually any model supported by SGLang.
Setup
This guide assumes that you are running SGLang locally with this command (from https://docs.sglang.ai/start/install.html)
docker run --gpus all \ # Set shared memory size - needed for loading large models and processing requests --shm-size 32g \ -p 30000:30000 \ # Mount the host's ~/.cache/huggingface directory to the container's /root/.cache/huggingface directory -v ~/.cache/huggingface:/root/.cache/huggingface \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000
Make sure to update the api_base
in the configuration below to match your SGLang server.
For this minimal setup, you’ll need just two files in your project directory:
Directoryconfig/
- tensorzero.toml
- docker-compose.yml
For production deployments, see our Deployment Guide.
Configuration
Create a minimal configuration file that defines a model and a simple chat function:
[models.llama]routing = ["sglang"]
[models.llama.providers.sglang]type = "sglang"api_base = "http://host.docker.internal:8080/v1/" # for SGLang running locally on the hostapi_key_location = "none" # by default, SGLang requires no API keymodel_name = "my-sglang-model"
[functions.my_function_name]type = "chat"
[functions.my_function_name.variants.my_variant_name]type = "chat_completion"model = "llama"
Credentials
The api_key_location
field in your model provider configuration specifies how to handle API key authentication:
-
If your endpoint does not require an API key (e.g. SGLang by default):
api_key_location = "none" -
If your endpoint requires an API key, you have two options:
-
Configure it in advance through an environment variable:
api_key_location = "env::ENVIRONMENT_VARIABLE_NAME"You’ll need to set the environment variable before starting the gateway.
-
Provide it at inference time:
api_key_location = "dynamic::ARGUMENT_NAME"The API key can then be passed in the inference request.
-
See the Credential Management guide, the Configuration Reference, and the API reference for more details.
In this example, SGLang is running locally without authentication, so we use api_key_location = "none"
.
Deployment (Docker Compose)
Create a minimal Docker Compose configuration:
# This is a simplified example for learning purposes. Do not use this in production.# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services: gateway: image: tensorzero/gateway volumes: - ./config:/app/config:ro # environment: # - SGLANG_API_KEY=${SGLANG_API_KEY:?Environment variable SGLANG_API_KEY must be set.} ports: - "3000:3000" extra_hosts: - "host.docker.internal:host-gateway"
You can start the gateway with docker compose up
.
Inference
Make an inference request to the gateway:
curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "my_function_name", "input": { "messages": [ { "role": "user", "content": "What is the capital of Japan?" } ] } }'