Skip to content

Getting Started with TGI

This guide shows how to set up a minimal deployment to use the TensorZero Gateway with self-hosted LLMs using TGI.

We’re using Phi-4 in this example, but you can use virtually any model supported by TGI.

Setup

This guide assumes that you are running TGI locally with

Run TGI locally
docker run \
--gpus all \
# Set shared memory size - needed for loading large models and processing requests
--shm-size 64g \
# Map the host's port 8080 to the container's port 80
-p 8080:80 \
# Mount the host's './data' directory to the container's '/data' directory
-v $PWD/data:/data \
ghcr.io/huggingface/text-generation-inference:3.0.1 \
--model-id microsoft/phi-4

Make sure to update the api_base in the configuration below to match your TGI server.

For this minimal setup, you’ll need just two files in your project directory:

  • Directoryconfig/
    • tensorzero.toml
  • docker-compose.yml

For production deployments, see our Deployment Guide.

Configuration

Create a minimal configuration file that defines a model and a simple chat function:

config/tensorzero.toml
[models.phi_4]
routing = ["tgi"]
[models.phi_4.providers.tgi]
type = "tgi"
api_base = "http://host.docker.internal:8080/v1/" # for TGI running locally on the host
api_key_location = "none" # by default, TGI requires no API key
[functions.my_function_name]
type = "chat"
[functions.my_function_name.variants.my_variant_name]
type = "chat_completion"
model = "phi_4"
# Disable observability to keep this example minimal (not recommended in production)
[gateway]
disable_observability = true

Credentials

The api_key_location field in your model provider configuration specifies how to handle API key authentication:

  • If your endpoint does not require an API key (e.g. TGI by default):

    api_key_location = "none"
  • If your endpoint requires an API key, you have two options:

    1. Configure it in advance through an environment variable:

      api_key_location = "env::ENVIRONMENT_VARIABLE_NAME"

      You’ll need to set the environment variable before starting the gateway.

    2. Provide it at inference time:

      api_key_location = "dynamic::ARGUMENT_NAME"

      The API key can then be passed in the inference request.

See the Configuration Reference and the API reference for more details.

In this example, TGI is running locally without authentication, so we use api_key_location = "none".

Deployment (Docker Compose)

Create a minimal Docker Compose configuration:

docker-compose.yml
# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services:
gateway:
image: tensorzero/gateway
volumes:
- ./config:/app/config:ro
# environment:
# - TGI_API_KEY=${TGI_API_KEY:?Environment variable TGI_API_KEY must be set.}
ports:
- "3000:3000"
# The following entry is needed if TGI runs on the host machine. If it runs on a separate server, you can remove it.
extra_hosts:
- "host.docker.internal:host-gateway"

You can start the gateway with docker compose up.

Inference

Make an inference request to the gateway:

Terminal window
curl -X POST http://localhost:3000/inference \
-H "Content-Type: application/json" \
-d '{
"function_name": "my_function_name",
"input": {
"messages": [
{
"role": "user",
"content": "What is the capital of Japan?"
}
]
}
}'