Overview

TensorZero Recipes are a set of pre-built workflows for optimizing your LLM applications. You can also create your own recipes to customize the workflow to your needs.

The TensorZero Gateway collects structured inference data and the downstream feedback associated with it. This dataset sets the perfect foundation for building and optimizing LLM applications. As this dataset builds up, you can use these recipes to generate powerful variants for your functions. For example, you can use this dataset to curate data to fine-tune a custom LLM, or run an automated prompt engineering workflow.

In other words, TensorZero Recipes optimize TensorZero functions by generating new variants from historical inference and feedback data.

Model Optimizations

Supervised Fine-tuning

A fine-tuning recipe curates a dataset from your historical inferences and fine-tunes an LLM on it. You can use the feedback associated with those inferences to select the right subset of data. A simple example is to use only inferences that led to good outcomes according to a metric you defined.

We present sample fine-tuning recipes:

See complete examples using the recipes below.

RLHF

DPO (Preference Fine-tuning)

A direct preference optimization (DPO) — also known as preference fine-tuning — recipe fine-tunes an LLM on a dataset of preference pairs. You can use demonstration feedback collected with TensorZero to curate a dataset of preference pairs and fine-tune an LLM on it.

We present a sample DPO recipe for OpenAI:

DPO (Preference Fine-tuning) with OpenAI

Dynamic In-Context Learning

Dynamic In-Context Learning (DICL) is a technique that leverages historical examples to enhance LLM performance at inference time. It involves selecting relevant examples from a database of past interactions and including them in the prompt, allowing the model to learn from similar contexts on-the-fly. This approach can significantly improve the model’s ability to handle specific tasks or domains without the need for fine-tuning.

We provide a sample recipe for DICL with OpenAI. The recipe supports selecting examples based on boolean metrics, float metrics, and demonstrations.

Dynamic In-Context Learning with OpenAI

Prompt Optimization

TensorZero offers a prompt optimization recipe, MIPRO, which jointly optimizes instructions and few-shot examples. More recipes for prompt optimization are planned.

MIPRO

MIPRO (Multi-prompt Instruction PRoposal Optimizer) is a method for automatically improving system instructions and few-shot demonstrations in LLM applications — including ones with multiple LLM functions or calls.

MIPRO Diagram

MIPRO can optimize prompts across an entire LLM pipeline without needing fine-grained labels or gradients. Instead, it uses a Bayesian optimizer to figure out which instructions and demonstrations actually improve end-to-end performance. By combining application-aware prompt proposals and stochastic mini-batch evaluations, MIPRO can improve downstream task performance compared to traditional prompt engineering approaches.

See Automated Prompt Engineering with MIPRO on GitHub for more details.

Inference-Time Optimization

The TensorZero Gateway offers built-in inference-time optimizations like dynamic in-context learning and best/mixture-of-N sampling.

See Inference-Time Optimizations for more information.

Custom Recipes

You can also create your own recipes.

Put simply, a recipe takes inference and feedback data stored that the TensorZero Gateway stored in your ClickHouse database, and generates a new set of variants for your functions. You should should be able to use virtually any LLM engineering workflow with TensorZero, ranging from automated prompt engineering to advanced RLHF workflows. See an example of a custom recipe using DSPy below.

Examples

We are working on a series of complete runnable examples illustrating TensorZero’s data & learning flywheel.

Optimizing Data Extraction (NER) with TensorZero — This example shows how to use TensorZero to optimize a data extraction pipeline. We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL). In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.
Agentic RAG — Multi-Hop Question Answering with LLMs — This example shows how to build a multi-hop retrieval agent using TensorZero. The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.
Writing Haikus to Satisfy a Judge with Hidden Preferences — This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You’ll see TensorZero’s “data flywheel in a box” in action: better variants leads to better data, and better data leads to better variants. You’ll see progress by fine-tuning the LLM multiple times.
Improving LLM Chess Ability with Best/Mixture-of-N Sampling — This example showcases how best-of-N sampling and mixture-of-N sampling can significantly enhance an LLM’s chess-playing abilities by selecting the most promising moves from multiple generated options.
Improving Math Reasoning with a Custom Recipe for Automated Prompt Engineering (DSPy) — TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows. But you can also easily create your own recipes and workflows! This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy.