Overview
TensorZero is an open-source platform that helps LLM applications graduate from API wrappers into defensible AI products.
- Integrate our model gateway
- Send metrics or feedback
- Unlock compounding improvements in quality, cost, and latency
It enables a data & learning flywheel for LLMs by unifying:
- Inference: one API for all LLMs, with <1ms P99 overhead
- Observability: inference & feedback → your database
- Optimization: better prompts, models, inference strategies
- Experimentation: built-in A/B testing, routing, fallbacks
How It Works
- The TensorZero Gateway is a high-performance model gateway written in Rust 🦀 that provides a unified API interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
- It handles structured schema-based inference with <1ms P99 latency overhead (see Benchmarks) and built-in observability and experimentation (and soon, inference-time optimizations).
- It also collects downstream metrics and feedback associated with these inferences, with first-class support for multi-step LLM systems.
- Everything is stored in a ClickHouse data warehouse that you control for real-time, scalable, and developer-friendly analytics.
- Over time, TensorZero Recipes leverage this structured dataset to optimize your prompts and models: run pre-built recipes for common workflows like fine-tuning, or create your own with complete flexibility using any language and platform.
- Finally, the gateway’s experimentation features and GitOps orchestration enable you to iterate and deploy with confidence, be it a single LLM or thousands of LLMs.
Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: systems that learn from real-world experience. Read more about our Vision & Roadmap.