Skip to content

Metrics & Feedback

The TensorZero Gateway allows you to assign feedback to inferences or sequences of inferences (episodes).

Feedback captures the downstream outcomes of your LLM application, and drive the experimentation and optimization workflows in TensorZero. For example, you can fine-tune models using data from inferences that led to positive downstream behavior.

Feedback

TensorZero currently supports the following types of feedback:

Feedback TypeExamples
Boolean MetricThumbs up, task success
Float MetricStar rating, clicks, number of mistakes made
CommentNatural-language feedback from users or developers
DemonstrationEdited drafts, labels, human-generated content

You can send feedback data to the gateway by using the /feedback endpoint.

Metrics

You can define metrics in your tensorzero.toml configuration file.

The skeleton of a metric looks like the following configuration entry.

tensorzero.toml
[metrics.my_metric_name]
level = "..." # "inference" or "episode"
optimize = "..." # "min" or "max"
type = "..." # "boolean" or "float"

Rating Haikus

In the Quick Start, we built a simple LLM application that writes haikus about artificial intelligence.

Imagine we wanted to assign 👍 or 👎 to these haikus. Later, we can use this data to fine-tune a model using only haikus that match our tastes.

We should use a metric of type boolean to capture this behavior since we’re optimizing for a binary outcome: whether we liked the haikus or not. The metric applies to individual inference requests, so we’ll set level = "inference". And finally, we’ll set optimize = "max" because we want to maximize this metric.

Our metric configuration should look like this:

tensorzero.toml
[metrics.haiku_rating]
type = "boolean"
optimize = "max"
level = "inference"
Full Configuration
tensorzero.toml
[models.gpt_4o_mini]
routing = ["openai"]
[models.gpt_4o_mini.providers.openai]
type = "openai"
model_name = "gpt-4o-mini"
[functions.generate_haiku]
type = "chat"
[functions.generate_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "gpt_4o_mini"
[metrics.haiku_rating]
type = "boolean"
optimize = "max"
level = "inference"

Let’s make an inference call like we did in the Quick Start, and then assign some (positive) feedback to it. We’ll use the inference response’s inference_id we receive from the first API call to link the two.

run.py
from tensorzero import TensorZeroGateway
with TensorZeroGateway("http://localhost:3000") as client:
inference_response = client.inference(
function_name="generate_haiku",
input={
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
]
},
)
print(inference_response)
feedback_response = client.feedback(
metric_name="haiku_rating",
inference_id=inference_response.inference_id,
value=True, # let's assume it deserves a 👍
)
print(feedback_response)
Sample Output
ChatInferenceResponse(
inference_id=UUID('01920c75-d114-7aa1-aadb-26a31bb3c7a0'),
episode_id=UUID('01920c75-cdcb-7fa3-bd69-fd28cf615f91'),
variant_name='gpt_4o_mini', content=[
Text(type='text', text='Silent circuits hum, \nWisdom spun from lines of code, \nDreams in data bloom.')
],
usage=Usage(
input_tokens=15,
output_tokens=20,
),
)
FeedbackResponse(feedback_id='01920c75-d11a-7150-81d8-15d497ce7eb8')

Querying Feedback Data

The TensorZero Gateway stores feedback data in the database, just like with inferences. Let’s query it!

Terminal window
curl "http://localhost:8123/" \
-d "SELECT * FROM tensorzero.BooleanMetricFeedback
WHERE metric_name = 'haiku_rating'
ORDER BY timestamp DESC
LIMIT 1
FORMAT Vertical"
Sample Output
Row 1:
──────
id: 01920c75-d11a-7150-81d8-15d497ce7eb8
target_id: 01920c75-d114-7aa1-aadb-26a31bb3c7a0
metric_name: haiku_rating
value: true

You can easily join feedback data with inference data (using the inference ID or episode ID) in ClickHouse. That’s how TensorZero Recipes collect the data for optimization.

Terminal window
curl "http://localhost:8123/" \
-d "SELECT *
FROM tensorzero.ChatInference i
LEFT JOIN tensorzero.BooleanMetricFeedback f ON i.id = f.target_id
WHERE f.metric_name = 'haiku_rating'
ORDER BY i.timestamp DESC
LIMIT 1
FORMAT Vertical"

Conclusion & Next Steps

Feedback unlocks powerful workflows in observability, optimization, and experimentation. For example, you might want to fine-tune a model with inference data from haikus that receive positive ratings.

This is exactly what we demonstrate in Writing Haikus to Satisfy a Judge with Hidden Preferences! This complete runnable example fine-tunes GPT-4o Mini to generate haikus tailored to an AI judge with hidden preferences. Continuous improvement over successive fine-tuning runs demonstrates TensorZero’s data and learning flywheel.