Skip to content

Quick Start — From 0 to Observability

This Quick Start guide shows how we’d upgrade an OpenAI wrapper to a minimal TensorZero deployment with built-in observability — in just 5 minutes.

From there, you can take advantage of dozens of features to build best-in-class LLM applications. Some of our favorites include built-in inference-time optimizations and experimentation (A/B testing).

Status Quo: OpenAI Wrapper

Imagine we’re building an LLM application that writes haikus.

Today, our integration with OpenAI might look like this:

run_with_openai.py
from openai import OpenAI
response = OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
],
)
print(response)
Sample Output
ChatCompletion(
id='chatcmpl-A5wr5WennQNF6nzF8gDo3SPIVABse',
choices=[
Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content='Silent minds awaken, \nPatterns dance in code and wire, \nDreams of thought unfold.',
role='assistant',
function_call=None,
tool_calls=None,
refusal=None
)
)
],
created=1725981243,
model='gpt-4o-mini',
object='chat.completion',
system_fingerprint='fp_483d39d857',
usage=CompletionUsage(
completion_tokens=19,
prompt_tokens=22,
total_tokens=41
)
)

Migrating to TensorZero

TensorZero offers dozens of features covering inference, observability, optimization, and experimentation.

But the absolutely minimal setup requires just a simple configuration file: tensorzero.toml.

tensorzero.toml
# A model is a specific LLM (e.g. GPT-4o Mini)...
[models.gpt_4o_mini]
routing = ["openai"]
# ... and a provider is an API endpoint that serves it (e.g. OpenAI, Azure).
# (You can define multiple providers per model to enable fallbacks for high availability.)
[models.gpt_4o_mini.providers.openai]
type = "openai"
model_name = "gpt-4o-mini"
# A function is the interface for the task we're tackling (e.g. generating a haiku)...
[functions.generate_haiku]
type = "chat"
# ... and a variant is one of many implementations to achieve it (a choice of model, prompt templates, parameters, etc.).
# Since we only have one variant for this function, the gateway will always select it.
[functions.generate_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "gpt_4o_mini"

This minimal configuration file tells the TensorZero Gateway everything it needs to replicate our original OpenAI call with added observability. For now it’s not doing much else, but we could enable additional features with just a few extra lines of configuration. We’ll cover that later.

Deploying TensorZero

We’re almost ready to start making API calls. Let’s launch the TensorZero Gateway.

You need to:

  • Set the environment variable OPENAI_API_KEY
  • Download the following docker-compose.yml file.
  • Place your tensorzero.toml configuration file in ./config/tensorzero.toml.

This Docker Compose configuration sets up a development ClickHouse database and the TensorZero Gateway. The gateway will store inference data in the database.

docker-compose.yml
# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services:
clickhouse:
image: clickhouse/clickhouse-server
ports:
- "8123:8123"
healthcheck:
test: wget --spider --tries 1 http://localhost:8123/ping
start_period: 30s
start_interval: 1s
timeout: 1s
gateway:
image: tensorzero/gateway
volumes:
# Mount our tensorzero.toml file into the container
- ./config:/app/config:ro
environment:
- CLICKHOUSE_URL=http://clickhouse:8123/tensorzero
- OPENAI_API_KEY=${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.}
ports:
- "3000:3000"
depends_on:
clickhouse:
condition: service_healthy

Your setup should look like:

  • Directoryconfig/
    • tensorzero.toml
  • docker-compose.yml
  • run_with_openai.py
  • run_with_tensorzero.py see below

Let’s launch everything!

Terminal window
docker compose up

Our First TensorZero API Call

The gateway will replicate our original OpenAI call and store the data in our database — with less than 1ms latency overhead thanks to Rust 🦀.

run_with_tensorzero.py
from tensorzero import TensorZeroGateway
response = TensorZeroGateway("http://localhost:3000").inference(
function_name="generate_haiku",
input={
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
]
},
)
print(response)
Sample Output
ChatInferenceResponse(
inference_id=UUID('0191ddb2-2c02-7641-8525-494f01bcc468'),
episode_id=UUID('0191ddb2-28f3-7cc2-b0cc-07f504d37e59'),
variant_name='gpt_4o_mini',
content=[
Text(
type='text',
text='Wires hum with intent, \nThoughts born from code and structure, \nGhost in silicon.'
)
],
usage=Usage(
input_tokens=15,
output_tokens=20
)
)

Querying Observability Data

The gateway stored our inference data in ClickHouse. Let’s query it.

Terminal window
curl "http://localhost:8123/" \
-d "SELECT * FROM tensorzero.ChatInference
WHERE function_name = 'generate_haiku'
ORDER BY timestamp DESC
LIMIT 1
FORMAT Vertical"
Sample Output
Row 1:
──────
id: 0191ddb2-2c02-7641-8525-494f01bcc468
function_name: generate_haiku
variant_name: gpt_4o_mini
episode_id: 0191ddb2-28f3-7cc2-b0cc-07f504d37e59
input: {"messages":[{"role":"user","content":[{"type":"text","value":"Write a haiku about artificial intelligence."}]}]}
output: [{"type":"text","text":"Wires hum with intent, \nThoughts born from code and structure, \nGhost in silicon."}]
tool_params:
inference_params: {"chat_completion":{}}
processing_time_ms: 782

Conclusion & Next Steps

The Quick Start guide gives a tiny taste of what TensorZero is capable of.

We strongly encourage you to check out the section on prompt templates & schemas. Though optional, they unlock many of the downstream features TensorZero offers in experimentation and optimization.

From here, you can explore features like built-in support for experimentation (A/B testing) with prompts and models, inference-time optimizations, multi-step LLM workflows (episodes), routing & fallbacks, JSON generation, tool use, and a lot more.

As we collect data with the gateway, we’ll start building a dataset we can use for optimization, especially if we incoporate metrics & feedback. For example, we could use the haikus that received positive feedback to fine-tune a custom model with TensorZero Recipes.

What should we try next? We can dive deeper into the TensorZero Gateway, or skip to optimizing our haiku generator?