Overview
The TensorZero Gateway is a high-performance model gateway that provides a unified interface for all your LLM applications.
-
One API for All LLMs. The gateway provides a unified interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks. TensorZero natively supports Anthropic, AWS Bedrock, Azure OpenAI Service, Fireworks, GCP Vertex AI Anthropic, GCP Vertex AI Gemini, Google AI Studio (Gemini API), Hyperbolic, Mistral, OpenAI, Together, vLLM, and xAI. Need something else? Your provider is most likely supported because TensorZero integrates with any OpenAI-compatible API (e.g. Ollama). Still not supported? Open an issue on GitHub and we’ll integrate it!
-
Blazing Fast. The gateway (written in Rust 🦀) achieves <1ms P99 latency overhead under extreme load. In benchmarks, LiteLLM @ 100 QPS adds 25-100x+ more latency than our gateway @ 10,000 QPS.
-
Structured Inferences. The gateway enforces schemas for inputs and outputs, ensuring robustness for your application. Structured inference data is later used for powerful optimization recipes (e.g. swapping historical prompts before fine-tuning). Learn more about prompt templates & schemas.
-
Multi-Step LLM Workflows. The gateway provides first-class support for complex multi-step LLM workflows by associating multiple inferences with an episode. Feedback can be assigned at the inference or episode level, allowing for end-to-end optimization of compound LLM systems. Learn more about episodes.
-
Built-in Observability. The gateway collects structured inference traces along with associated downstream metrics and natural-language feedback. Everything is stored in a ClickHouse database for real-time, scalable, and developer-friendly analytics. TensorZero Recipes leverage this dataset to optimize your LLMs.
-
Built-in Experimentation. The gateway automatically routes traffic between variants to enable A/B tests. It ensures consistent variants within an episode in multi-step workflows. More advanced experimentation techniques (e.g. asynchronous multi-armed bandits) are coming soon.
-
Built-in Fallbacks. The gateway automatically fallbacks failed inferences to different inference providers, or even completely different variants. Ensure misconfiguration, provider downtime, and other edge cases don’t affect your availability.
-
GitOps Orchestration. Orchestrate prompts, models, parameters, tools, experiments, and more with GitOps-friendly configuration. Manage a few LLMs manually with human-friendly readable configuration files, or thousands of prompts and LLMs entirely programmatically.