Quick Start — From 0 to Observability & Fine-Tuning
This Quick Start guide shows how we’d upgrade an OpenAI wrapper to a minimal TensorZero deployment with built-in observability and fine-tuning capabilities — in just 5 minutes. From there, you can take advantage of dozens of features to build best-in-class LLM applications.
Status Quo: OpenAI Wrapper
Imagine we’re building an LLM application that writes haikus.
Today, our integration with OpenAI might look like this:
from openai import OpenAI
with OpenAI() as client: response = client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "user", "content": "Write a haiku about artificial intelligence.", } ], )
print(response)
Sample Output
ChatCompletion( id='chatcmpl-A5wr5WennQNF6nzF8gDo3SPIVABse', choices=[ Choice( finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage( content='Silent minds awaken, \nPatterns dance in code and wire, \nDreams of thought unfold.', role='assistant', function_call=None, tool_calls=None, refusal=None ) ) ], created=1725981243, model='gpt-4o-mini', object='chat.completion', system_fingerprint='fp_483d39d857', usage=CompletionUsage( completion_tokens=19, prompt_tokens=22, total_tokens=41 ))
Migrating to TensorZero
TensorZero offers dozens of features covering inference, observability, optimization, and experimentation.
But the absolutely minimal setup requires just a simple configuration file: tensorzero.toml
.
# A function defines the task we're tackling (e.g. generating a haiku)...[functions.generate_haiku]type = "chat"
# ... and a variant is one of many implementations we can use to tackle it (a choice of prompt, model, etc.).# Since we only have one variant for this function, the gateway will always use it.[functions.generate_haiku.variants.gpt_4o_mini]type = "chat_completion"model = "openai::gpt-4o-mini"
This minimal configuration file tells the TensorZero Gateway everything it needs to replicate our original OpenAI call.
Deploying TensorZero
We’re almost ready to start making API calls. Let’s launch TensorZero.
- Set the environment variable
OPENAI_API_KEY
. - Place our
tensorzero.toml
in the./config
directory. - Download the following sample
docker-compose.yml
file. This Docker Compose configuration sets up a development ClickHouse database (where TensorZero stores data), the TensorZero Gateway, and the TensorZero UI.
curl -LO "https://raw.githubusercontent.com/tensorzero/tensorzero/refs/heads/main/examples/quickstart/docker-compose.yml"
docker-compose.yml
# This is a simplified example for learning purposes. Do not use this in production.# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services: clickhouse: image: clickhouse/clickhouse-server:24.12-alpine environment: - CLICKHOUSE_USER=chuser - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 - CLICKHOUSE_PASSWORD=chpassword ports: - "8123:8123" healthcheck: test: wget --spider --tries 1 http://chuser:chpassword@clickhouse:8123/ping start_period: 30s start_interval: 1s timeout: 1s
gateway: image: tensorzero/gateway volumes: # Mount our tensorzero.toml file into the container - ./config:/app/config:ro environment: - TENSORZERO_CLICKHOUSE_URL=http://chuser:chpassword@clickhouse:8123/tensorzero - OPENAI_API_KEY=${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.} ports: - "3000:3000" depends_on: clickhouse: condition: service_healthy
ui: image: tensorzero/ui volumes: # Mount our tensorzero.toml file into the container - ./config:/app/config:ro environment: - OPENAI_API_KEY=${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.} - TENSORZERO_CLICKHOUSE_URL=http://chuser:chpassword@clickhouse:8123/tensorzero - TENSORZERO_GATEWAY_URL=http://gateway:3000 ports: - "4000:4000" depends_on: clickhouse: condition: service_healthy
Our setup should look like:
Directoryconfig/
- tensorzero.toml
- after.py see below
- before.py
- docker-compose.yml
Let’s launch everything!
docker compose up
Our First TensorZero API Call
The gateway will replicate our original OpenAI call and store the data in our database — with less than 1ms latency overhead thanks to Rust 🦀.
TensorZero can be used with its native Python client, with OpenAI’s client, or via its HTTP API in any programming language.
from tensorzero import TensorZeroGateway
with TensorZeroGateway("http://localhost:3000") as client: response = client.inference( function_name="generate_haiku", input={ "messages": [ { "role": "user", "content": "Write a haiku about artificial intelligence.", } ] }, )
print(response)
Sample Output
ChatInferenceResponse( inference_id=UUID('0191ddb2-2c02-7641-8525-494f01bcc468'), episode_id=UUID('0191ddb2-28f3-7cc2-b0cc-07f504d37e59'), variant_name='gpt_4o_mini', content=[ Text( type='text', text='Wires hum with intent, \nThoughts born from code and structure, \nGhost in silicon.' ) ], usage=Usage( input_tokens=15, output_tokens=20 ))
import asyncio
from tensorzero import AsyncTensorZeroGateway
async def main(): async with AsyncTensorZeroGateway("http://localhost:3000") as gateway: response = await gateway.inference( function_name="generate_haiku", input={ "messages": [ { "role": "user", "content": "Write a haiku about artificial intelligence.", } ] }, )
print(response)
asyncio.run(main())
Sample Output
ChatInferenceResponse( inference_id=UUID('01940622-d215-7111-9ca7-4995ef2c43f8'), episode_id=UUID('01940622-cba0-7db3-832b-273aff72f95f'), variant_name='gpt_4o_mini', content=[ Text( type='text', text='Wires whisper secrets, \nLogic dances with the light— \nDreams of thoughts unfurl.' ) ], usage=Usage( input_tokens=15, output_tokens=21 ))
from openai import OpenAI
with OpenAI(base_url="http://localhost:3000/openai/v1") as client: response = client.chat.completions.create( model="tensorzero::generate_haiku", messages=[ { "role": "user", "content": "Write a haiku about artificial intelligence.", } ], )
print(response)
Sample Output
ChatCompletion( id='0194061e-2211-7a90-9087-1c255d060b59', choices=[ Choice( finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage( content='Circuit dreams awake, \nSilent minds in metal form— \nWisdom coded deep.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[] ) ) ], created=1735269425, model='gpt_4o_mini', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage( completion_tokens=18, prompt_tokens=15, total_tokens=33, completion_tokens_details=None, prompt_tokens_details=None ), episode_id='0194061e-1fab-7411-9931-576b067cf0c5')
curl -X POST "http://localhost:3000/inference" \ -H "Content-Type: application/json" \ -d '{ "function_name": "generate_haiku", "input": { "messages": [ { "role": "user", "content": "Write a haiku about artificial intelligence." } ] } }'
Sample Output
{ "inference_id": "01940627-935f-7fa1-a398-e1f57f18064a", "episode_id": "01940627-8fe2-75d3-9b65-91be2c7ba622", "variant_name": "gpt_4o_mini", "content": [ { "type": "text", "text": "Wires hum with pure thought, \nDreams of codes in twilight's glow, \nBeyond human touch." } ], "usage": { "input_tokens": 15, "output_tokens": 23 }}
TensorZero UI
The TensorZero UI streamlines LLM engineering workflows like observability and optimization (e.g. fine-tuning).
The Docker Compose file we used above also launched the TensorZero UI.
You can visit the UI at http://localhost:4000
.
Observability
The TensorZero UI provides a dashboard for observability data. We can inspect data about individual inferences, entire functions, and more.
![TensorZero UI Observability - Function Detail Page - Screenshot](/_astro/quickstart-observability-function.CAucMXW9.png)
![TensorZero UI Observability - Inference Detail Page - Screenshot](/_astro/quickstart-observability-inference.B-Qx3jGB.png)
Fine-Tuning
The TensorZero UI also provides a workflow for fine-tuning models like GPT-4o and Llama 3.
With a few clicks, you can launch a fine-tuning job.
Once the job is complete, the TensorZero UI will provide a configuration snippet you can add to your tensorzero.toml
.
![TensorZero UI Fine-Tuning Screenshot](/_astro/quickstart-sft.CZlPToKA.png)
Conclusion & Next Steps
The Quick Start guide gives a tiny taste of what TensorZero is capable of.
We strongly encourage you to check out the guides on metrics & feedback and prompt templates & schemas. Though optional, they unlock many of the downstream features TensorZero offers in experimentation and optimization.
From here, you can explore features like built-in support for inference-time optimizations, retries & fallbacks, experimentation (A/B testing) with prompts and models, and a lot more.
What should we try next? We can dive deeper into the TensorZero Gateway, or skip to optimizing our haiku generator?
Learn how to build better LLM applications with TensorZero. We’ll build complete examples involving copilots, RAG, and data extraction. Along the way, we’ll cover features like experimentation, routing & fallbacks, and multi-step LLM workflows.
This complete runnable example fine-tunes GPT-4o Mini to generate haikus tailored to a judge with hidden preferences. Continuous improvement over successive fine-tuning runs demonstrates TensorZero’s data & learning flywheel.