Skip to content

Deployment

It’s easy to get started with the TensorZero Gateway.

To deploy the TensorZero Gateway, you need to:

ClickHouse

The TensorZero Gateway stores inference and feedback data in a ClickHouse database. This data is later used for model observability, experimentation, and optimization.

For production deployments, the easiest setup is to use a managed service like ClickHouse Cloud. ClickHouse Cloud is also available through the AWS Marketplace, GCP Marketplace, and Azure Marketplace. You can alternatively run your own self-managed ClickHouse instance or cluster.

For development purposes, you can run a single-node ClickHouse instance locally (e.g. using Homebrew or Docker) or a cheap Development-tier cluster on ClickHouse Cloud.

See the ClickHouse documentation for more details on configuring your ClickHouse deployment.

After setting up your database, you need to configure the CLICKHOUSE_URL environment variable with the connection details. The variable takes a standard format.

.env
CLICKHOUSE_URL="http[s]://[username]:[password]@[hostname]:[port]/[database]"
# Example: ClickHouse running locally
CLICKHOUSE_URL="http://localhost:8123/tensorzero"
# Example: ClickHouse Cloud
CLICKHOUSE_URL="https://USERNAME:[email protected]:8443/tensorzero"
# Example: TensorZero Gateway running in a container, ClickHouse running on host machine
CLICKHOUSE_URL="http://host.docker.internal:8123/tensorzero"

The TensorZero Gateway automatically applies database migrations on startup.

Disabling Observability (Not Recommended)

You can disable observability features if you’re not interested in storing any data for experimentation and optimization. In this case, you won’t need to set up ClickHouse, and the TensorZero Gateway will act as a simple model gateway.

To disable observability, set the following configuration in the tensorzero.toml file:

tensorzero.toml
[gateway]
disable_observability = true

If you only need to disable observability temporarily, you can pass a dryrun: true parameter to the inference and feedback API endpoints.

TensorZero Gateway

Configuration

To run the TensorZero Gateway, first you need to create a tensorzero.toml configuration file. Read more about the configuration file here.

Once you have a configuration file, you can run the TensorZero Gateway using one of the following methods.

Model Provider Credentials

In addition to the CLICKHOUSE_URL environment variable discussed above, the TensorZero Gateway accepts the following environment variables for provider credentials. These environment variables are required for the providers that are used in a variant with positive weight. If required credentials are missing, the gateway will fail on startup.

ProviderEnvironment Variable(s)
AnthropicANTHROPIC_API_KEY
AWS BedrockAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION (optional)
Azure OpenAIAZURE_OPENAI_API_KEY
FireworksFIREWORKS_API_KEY
GCP VertexGCP_VERTEX_CREDENTIALS_PATH (see below for details)
MistralMISTRAL_API_KEY
OpenAIOPENAI_API_KEY
TogetherTOGETHER_API_KEY
vLLMVLLM_API_KEY

Notes:

  • AWS Bedrock supports many authentication methods, including environment variables, IAM roles, and more. See the AWS documentation for more details.
  • If you’re using the GCP Vertex provider, you also need to mount the credentials for a service account in JWT form (described here) to /app/gcp-credentials.json using an additional -v flag.

Local Development

Running with Docker (Recommended)

You can easily run the TensorZero Gateway locally using Docker.

You need to provide it with a path to a folder containing your tensorzero.toml file as well as its dependencies (e.g. schemas and templates), as well as the environment variables discussed above.

Running with Docker
docker run \
--name tensorzero-gateway \
-v "./config:/app/config" \
--env-file .env \
-p 3000:3000 \
-d \
tensorzero/gateway
Building from source

Alternatively, you can build the TensorZero Gateway from source and run it directly on your host machine using Cargo:

Building from source
cargo run --release --bin gateway -- path/to/your/tensorzero.toml

Production Deployment

You can deploy the TensorZero Gateway alongside your application (e.g. as a sidecar container) or as a standalone service.

A single gateway instance can handle over 1k QPS/core with sub-millisecond latency (see Benchmarks), so a simple deployment should suffice for the vast majority of applications. If you deploy it as an independent service, we recommend deploying at least two instances behind a load balancer for high availability. The gateway is stateless, so you can easily scale horizontally and don’t need to worry about persistence.

Running with Docker (Recommended)

The recommended way to run the TensorZero Gateway in production is to use Docker.

There are many ways to run Docker containers in production. A simple solution is to use Docker Compose. We provide an example docker-compose.yml for reference.

Building from source

Alternatively, you can build the TensorZero Gateway from source and run it directly on your host machine using Cargo. For production deployments, we recommend enabling performance optimizations:

Building from source
cargo run --profile performance --bin gateway -- path/to/your/tensorzero.toml

Python Client

We provide a Python Asyncio client for interacting with the TensorZero Gateway.

The library is available on PyPI under the name tensorzero. You can install it using your favorite Python package manager (e.g. pip or uv).