Skip to content

Overview

The TensorZero Gateway is a high-performance model gateway that provides a unified interface for all your LLM applications.

  • Blazing Fast. The gateway (written in Rust 🦀) achieves <1ms P99 latency overhead under extreme load. In benchmarks, LiteLLM @ 100 QPS adds 25-100x+ more latency than our gateway @ 10,000 QPS.

  • Structured Inferences. The gateway enforces schemas for inputs and outputs, ensuring robustness for your application. Structured inference data is later used for powerful optimization recipes (e.g. swapping historical prompts before fine-tuning).

  • Multi-Step LLM Workflows. The gateway provides first-class support for complex multi-step LLM workflows by associating multiple inferences with an episode. Feedback can be assigned at the inference or episode level, allowing for end-to-end optimization of compound LLM systems.

  • Built-in Observability. The gateway collects structured inference traces along with associated downstream metrics and natural-language feedback. Everything is stored in a ClickHouse database for real-time, scalable, and developer-friendly analytics. TensorZero Recipes leverage this dataset to optimize your LLMs.

  • Built-in Experimentation. The gateway automatically routes traffic between variants to enable A/B tests. It ensures consistent variants within an episode in multi-step workflows. More advanced experimentation techniques (e.g. asynchronous multi-armed bandits) are coming soon.

  • Built-in Fallbacks. The gateway automatically fallbacks failed inferences to different inference providers, or even completely different variants. Ensure misconfiguration, provider downtime, and other edge cases don’t affect your availability.

  • GitOps Orchestration. Orchestrate prompts, models, parameters, tools, experiments, and more with GitOps-friendly configuration. Manage a few LLMs manually with human-friendly readable configuration files, or thousands of prompts and LLMs entirely programmatically.

Next Steps