Overview

The TensorZero Gateway is a high-performance model gateway that provides a unified interface for all your LLM applications.

One API for All LLMs. The gateway provides a unified interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks. TensorZero natively supports Anthropic, AWS Bedrock, AWS SageMaker, Azure OpenAI Service, Fireworks, GCP Vertex AI Anthropic, GCP Vertex AI Gemini, Google AI Studio (Gemini API), Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, Together, vLLM, and xAI. Need something else? Your provider is most likely supported because TensorZero integrates with any OpenAI-compatible API (e.g. Ollama). Still not supported? Open an issue on GitHub and we’ll integrate it!
Learn more in our How to call any LLM guide.
Blazing Fast. The gateway (written in Rust 🦀) achieves <1ms P99 latency overhead under extreme load. In benchmarks, LiteLLM @ 100 QPS adds 25-100x+ more latency than our gateway @ 10,000 QPS.
Structured Inferences. The gateway enforces schemas for inputs and outputs, ensuring robustness for your application. Structured inference data is later used for powerful optimization recipes (e.g. swapping historical prompts before fine-tuning). Learn more about creating prompt templates.
Multi-Step LLM Workflows. The gateway provides first-class support for complex multi-step LLM workflows by associating multiple inferences with an episode. Feedback can be assigned at the inference or episode level, allowing for end-to-end optimization of compound LLM systems. Learn more about episodes.
Built-in Observability. The gateway collects structured inference traces along with associated downstream metrics and natural-language feedback. Everything is stored in a ClickHouse database for real-time, scalable, and developer-friendly analytics. TensorZero Recipes leverage this dataset to optimize your LLMs.
Built-in Experimentation. The gateway automatically routes traffic between variants to enable A/B tests. It ensures consistent variants within an episode in multi-step workflows. More advanced experimentation techniques (e.g. asynchronous multi-armed bandits) are coming soon.
Built-in Fallbacks. The gateway automatically fallbacks failed inferences to different inference providers, or even completely different variants. Ensure misconfiguration, provider downtime, and other edge cases don’t affect your availability.
GitOps Orchestration. Orchestrate prompts, models, parameters, tools, experiments, and more with GitOps-friendly configuration. Manage a few LLMs manually with human-friendly readable configuration files, or thousands of prompts and LLMs entirely programmatically.

Next Steps

Quickstart

Make your first TensorZero API call with built-in observability and fine-tuning in under 5 minutes.

Tutorial

Build a simple chatbot, an email copilot, a RAG system, and a data extraction pipeline using TensorZero.

Deployment

Quickly deploy locally, or set up high-availability services for production environments.

Integrations

The TensorZero Gateway integrates with the major LLM providers.

Benchmarks

The TensorZero Gateway achieves sub-millisecond latency overhead under extreme load.

API Reference

The TensorZero Gateway provides an unified interface for making inference and feedback API calls.

Configuration Reference

Easily manage your LLM applications with GitOps orchestration — even complex multi-step systems.

Introduction

Gateway

Optimization

Evaluations

Experimentation

Deployment

Operations

Next Steps

Quickstart

Tutorial

Deployment

Integrations

Benchmarks

API Reference

Configuration Reference

Introduction

Gateway

Optimization

Evaluations

Experimentation

Deployment

Operations

​Next Steps

Quickstart

Tutorial

Deployment

Integrations

Benchmarks

API Reference

Configuration Reference

Next Steps