Quick Start (5min) — From 0 to Observability
The Quick Start guide shows how to get up and running with TensorZero in 5 minutes.
We’ll migrate from an example OpenAI integration to a TensorZero deployment with built-in observability. Over time, this system will collect data that can be used to optimize our prompts and models using more advanced TensorZero functionality.
Status Quo
Imagine we’re building an LLM application that writes haikus.
Today, our integration with OpenAI looks like this:
Sample Output
Migrating to TensorZero
To migrate to TensorZero, we just need to set up a few configuration files that specify the behavior of our LLM application.
First, we’ll create a file for each of our prompt templates.
In this case, we only have the user prompt, so let’s put it in a file called user_template.minijinja
.
Second, we’ll define a schema for the template in user_schema.json
.
Finally, let’s configure our application in tensorzero.toml
.
3… 2… 1… Launch!
We’re almost ready to start making API calls. Let’s launch the TensorZero Gateway.
Set the environment variable OPENAI_API_KEY
and download the following docker-compose.yml
file.
This Docker Compose configuration sets up a development ClickHouse database and the TensorZero Gateway.
The gateway will store structured inference and feedback data in the database.
Let’s launch them!
Our First TensorZero API Call
It’s time for our first (and second) TensorZero API call.
First, we’ll make an API call to generate a haiku. The behavior will be identical to our original OpenAI call, but the gateway will store structured inference data in our database.
Second, we’ll associate feedback with the inference, which we could use later on to optimize our prompt or model.
We previously defined a boolean metric called thumbs_up
, and we’ll assume here that the haiku was good enough to warrant a 👍.
Sample Output
The TensorZero Gateway stored structured information about our inference in ClickHouse. Let’s query it to make sure everything worked properly.
Sample Output
The same goes for the feedback we just added. Later, we could join these tables to curate datasets for fine-tuning, prompt optimization, and other workflows.
Sample Output
We have blazing-fast structured observability!
Conclusion & Next Steps
This Quick Start guide gives a tiny taste of what TensorZero is capable of. The TensorZero Gateway also has built-in support for experimentation (A/B testing), multi-step LLM workflows, provider routing, fallbacks, JSON generation, tool use, and a lot more. But even with this tiny example, we have a fully-fledged observability platform for our LLM application that is blazing fast and extremely scalable (see Benchmarks).
Why should we care about structured inference data?
A schema-based interface simplifies engineering iteration, experimentation, and optimization, especially as application complexity and team size grow. For example, the prompt template becomes an optimization variable that is easy to experiment with, and later counterfactual values can be used for evaluation and fine-tuning. This choice also neatly fits into longstanding traditions in the sequential decision making literature, which we’ll discuss in detail in an upcoming blog post.
As we collect inference and feedback data with the gateway, we’ll start building the perfect dataset for optimizing our application. For example, we could use the haikus that received positive feedback to fine-tune a custom model. TensorZero Recipes streamline many common LLM workflows like this, or we could create a custom recipe with complete flexibility (e.g. a Jupyter notebook that reads from the database).
Now, what should we try next? Let’s dive deeper into the TensorZero Gateway, or skip to optimizing our haiku generator with TensorZero Recipes?
The tutorial dives deeper into the TensorZero Gateway through four complete examples: a simple chatbot, an email copilot, a RAG system, and a data extraction pipeline. We’ll cover major features of the gateway and explain the technical concepts powering them.
This complete runnable example fine-tunes GPT-4o Mini to generate haikus tailored to a judge with hidden preferences. Continuous improvement over successive fine-tuning runs demonstrates TensorZero’s data & learning flywheel.