Getting Started with SGLang
This guide shows how to set up a minimal deployment to use the TensorZero Gateway with self-hosted LLMs using SGLang.
We’re using Llama-3.1-8B-Instruct in this example, but you can use virtually any model supported by SGLang.
Setup
This guide assumes that you are running SGLang locally with this command (from https://docs.sglang.ai/start/install.html)
Make sure to update the api_base
in the configuration below to match your SGLang server.
For this minimal setup, you’ll need just two files in your project directory:
Directoryconfig/
- tensorzero.toml
- docker-compose.yml
For production deployments, see our Deployment Guide.
Configuration
Create a minimal configuration file that defines a model and a simple chat function:
Credentials
The api_key_location
field in your model provider configuration specifies how to handle API key authentication:
-
If your endpoint does not require an API key (e.g. SGLang by default):
-
If your endpoint requires an API key, you have two options:
-
Configure it in advance through an environment variable:
You’ll need to set the environment variable before starting the gateway.
-
Provide it at inference time:
The API key can then be passed in the inference request.
-
See the Configuration Reference and the API reference for more details.
In this example, SGLang is running locally without authentication, so we use api_key_location = "none"
.
Deployment (Docker Compose)
Create a minimal Docker Compose configuration:
You can start the gateway with docker compose up
.
Inference
Make an inference request to the gateway: