Skip to content

Quick Start (5min) — From 0 to Observability

The Quick Start guide shows how to get up and running with TensorZero in 5 minutes.

We’ll migrate from an example OpenAI integration to a TensorZero deployment with built-in observability. Over time, this system will collect data that can be used to optimize our prompts and models using more advanced TensorZero functionality.

Status Quo

Imagine we’re building an LLM application that writes haikus.

Today, our integration with OpenAI looks like this:

run_with_openai.py
from openai import OpenAI
def run_with_openai(topic):
client = OpenAI()
result = client.chat.completions.create(
model="gpt-4o-mini-2024-07-18",
messages=[
{
"role": "user",
"content": f"Write a haiku about '{topic}'. Don't write anything else.",
}
],
)
print(result)
if __name__ == "__main__":
run_with_openai("artificial intelligence")
Sample Output
ChatCompletion(
id='chatcmpl-A5wr5WennQNF6nzF8gDo3SPIVABse',
choices=[
Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content='Silent minds awaken, \nPatterns dance in code and wire, \nDreams of thought unfold.',
role='assistant',
function_call=None,
tool_calls=None,
refusal=None
)
)
],
created=1725981243,
model='gpt-4o-mini-2024-07-18',
object='chat.completion',
system_fingerprint='fp_483d39d857',
usage=CompletionUsage(
completion_tokens=19,
prompt_tokens=22,
total_tokens=41
)
)

Migrating to TensorZero

To migrate to TensorZero, we just need to set up a few configuration files that specify the behavior of our LLM application.

First, we’ll create a file for each of our prompt templates. In this case, we only have the user prompt, so let’s put it in a file called user_template.minijinja.

user_template.minijinja
Write a haiku about "{{ topic }}". Don't write anything else.

Second, we’ll define a schema for the template in user_schema.json.

user_schema.json
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"topic": {
"type": "string"
}
},
"required": ["topic"],
"additionalProperties": false
}

Finally, let’s configure our application in tensorzero.toml.

tensorzero.toml
# A model is a specific LLM (e.g. GPT-4o Mini)...
[models.gpt_4o_mini]
routing = ["openai"]
# ... and a provider is an API endpoint for it (e.g. OpenAI, Azure)
[models.gpt_4o_mini.providers.openai]
type = "openai"
model_name = "gpt-4o-mini-2024-07-18"
# A function is a task we're tackling (e.g. generating a haiku)...
[functions.generate_haiku]
type = "chat"
user_schema = "user_schema.json"
# ... and a variant is a specific way of doing it (e.g. our prompt + GPT-4o Mini)
[functions.generate_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "gpt_4o_mini"
weight = 1
user_template = "user_template.minijinja"
# A metric is a way to evaluate the performance of our system
[metrics.thumbs_up]
type = "boolean"
optimize = "max"
level = "inference"

3… 2… 1… Launch!

We’re almost ready to start making API calls. Let’s launch the TensorZero Gateway.

Set the environment variable OPENAI_API_KEY and download the following docker-compose.yml file. This Docker Compose configuration sets up a development ClickHouse database and the TensorZero Gateway. The gateway will store structured inference and feedback data in the database.

docker-compose.yml
# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services:
clickhouse:
image: clickhouse/clickhouse-server
ports:
- "8123:8123"
healthcheck:
test: wget --spider --tries 1 http://localhost:8123/ping
start_period: 30s
start_interval: 1s
timeout: 1s
gateway:
image: tensorzero/gateway
volumes:
- .:/app/config:ro
environment:
- CLICKHOUSE_URL=http://clickhouse:8123/tensorzero
- OPENAI_API_KEY=${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.}
ports:
- "3000:3000"
depends_on:
clickhouse:
condition: service_healthy

Let’s launch them!

Terminal window
docker compose up

Our First TensorZero API Call

It’s time for our first (and second) TensorZero API call.

First, we’ll make an API call to generate a haiku. The behavior will be identical to our original OpenAI call, but the gateway will store structured inference data in our database.

Second, we’ll associate feedback with the inference, which we could use later on to optimize our prompt or model. We previously defined a boolean metric called thumbs_up, and we’ll assume here that the haiku was good enough to warrant a 👍.

run_with_tensorzero.py
import asyncio
from tensorzero import AsyncTensorZeroGateway
async def run_with_tensorzero(topic):
async with AsyncTensorZeroGateway("http://localhost:3000") as client:
# Run the inference API call...
inference_result = await client.inference(
function_name="generate_haiku",
input={
"messages": [
{"role": "user", "content": {"topic": topic}},
],
},
)
print(inference_result)
# ... and associate feedback to that inference using its ID
feedback_result = await client.feedback(
metric_name="thumbs_up",
inference_id=inference_result.inference_id,
value=True, # 👍
)
print(feedback_result)
if __name__ == "__main__":
asyncio.run(run_with_tensorzero("artificial intelligence"))
Sample Output
ChatInferenceResponse(
inference_id=UUID('0191ddb2-2c02-7641-8525-494f01bcc468'),
episode_id=UUID('0191ddb2-28f3-7cc2-b0cc-07f504d37e59'),
variant_name='gpt_4o_mini',
content=[
Text(
type='text',
text='Wires hum with intent, \nThoughts born from code and structure, \nGhost in silicon.'
)
],
usage=Usage(
input_tokens=22,
output_tokens=20
)
)
FeedbackResponse(
feedback_id='0191dc7d-f5e5-7370-b956-1608f5f9937e'
)

The TensorZero Gateway stored structured information about our inference in ClickHouse. Let’s query it to make sure everything worked properly.

Terminal window
curl "http://localhost:8123/" \
-d "SELECT * FROM tensorzero.ChatInference
WHERE function_name = 'generate_haiku'
ORDER BY timestamp DESC
LIMIT 1
FORMAT Vertical"
Sample Output
Row 1:
──────
id: 0191ddb2-2c02-7641-8525-494f01bcc468
function_name: generate_haiku
variant_name: gpt_4o_mini
episode_id: 0191ddb2-28f3-7cc2-b0cc-07f504d37e59
input: {"messages":[{"role":"user","content":[{"type":"text","value":{"topic":"artificial intelligence"}}]}]}
output: [{"type":"text","text":"Wires hum with intent, \nThoughts born from code and structure, \nGhost in silicon."}]
tool_params:
inference_params: {"chat_completion":{}}
processing_time_ms: 782

The same goes for the feedback we just added. Later, we could join these tables to curate datasets for fine-tuning, prompt optimization, and other workflows.

Terminal window
curl "http://localhost:8123/" \
-d "SELECT *
FROM tensorzero.BooleanMetricFeedback
WHERE metric_name = 'thumbs_up'
ORDER BY timestamp DESC
LIMIT 1
FORMAT Vertical"
Sample Output
Row 1:
──────
id: 0191ddb2-2c0e-7fd0-9745-2abfc12a1e76
target_id: 0191ddb2-2c02-7641-8525-494f01bcc468
metric_name: thumbs_up
value: true

We have blazing-fast structured observability!

Conclusion & Next Steps

This Quick Start guide gives a tiny taste of what TensorZero is capable of. The TensorZero Gateway also has built-in support for experimentation (A/B testing), multi-step LLM workflows, provider routing, fallbacks, JSON generation, tool use, and a lot more. But even with this tiny example, we have a fully-fledged observability platform for our LLM application that is blazing fast and extremely scalable (see Benchmarks).

Why should we care about structured inference data?

A schema-based interface simplifies engineering iteration, experimentation, and optimization, especially as application complexity and team size grow. For example, the prompt template becomes an optimization variable that is easy to experiment with, and later counterfactual values can be used for evaluation and fine-tuning. This choice also neatly fits into longstanding traditions in the sequential decision making literature, which we’ll discuss in detail in an upcoming blog post.

As we collect inference and feedback data with the gateway, we’ll start building the perfect dataset for optimizing our application. For example, we could use the haikus that received positive feedback to fine-tune a custom model. TensorZero Recipes streamline many common LLM workflows like this, or we could create a custom recipe with complete flexibility (e.g. a Jupyter notebook that reads from the database).

Now, what should we try next? Let’s dive deeper into the TensorZero Gateway, or skip to optimizing our haiku generator with TensorZero Recipes?