Frequently Asked Questions
Technical
Why is the TensorZero Gateway a proxy instead of a library?
TensorZero’s proxy pattern makes it agnostic to the application’s tech stack, isolated from the business logic, more composable with other tools, and easy to deploy and manage.
Many engineers are (correctly) wary of marginal latency from such a proxy, so we built the gateway from the ground up with performance in mind. In Benchmarks, it achieves sub-millisecond P99 latency overhead under extreme load. This makes the gateway fast and lightweight enough to be unnoticeable even in the most demanding LLM applications, especially if deployed as a sidecar container.
How is the TensorZero Gateway so fast?
The TensorZero Gateway was built from from the ground up with performance in mind. It was written in Rust 🦀 and optimizes many common bottlenecks by efficiently managing connections to model providers, pre-compiling schemas and templates, logging data asynchronously, and more.
It achieves <1ms P99 latency overhead under extreme load. In Benchmarks, LiteLLM @ 100 QPS adds 25-100x+ more latency than the TensorZero Gateway @ 10,000 QPS.
Why did you choose ClickHouse as TensorZero's analytics database?
ClickHouse is open source, extremely fast, and versatile. It supports diverse storage backends, query patterns, and data types, including vector search (which will be important for upcoming TensorZero features). From the start, we designed TensorZero to be easy to deploy but able to grow to massive scale. ClickHouse is the best tool for the job.
Why doesn't the TensorZero Gateway have an OpenAI-compatible API?
When we set out to build TensorZero, we took a step back to think from first principles about what the interface should look like between applications and generative models. Our conclusion is that it should be structured (defined by schemas) rather than unstructured (free-form text for prompts and generations). A schema-based interface simplifies engineering iteration, experimentation, and optimization, especially as application complexity and team size grow. For example, the prompt template becomes an optimization variable that is easy to experiment with, and later counterfactual values can be used for evaluation and fine-tuning. This choice also neatly fits into longstanding traditions in the sequential decision making literature, which we’ll discuss in detail in an upcoming blog post.
With all this in mind, we decided against following the trend of OpenAI-compatible interfaces and instead built an interface that is best-suited to the next stage of LLM-based application development.
Project
Who is behind TensorZero?
We’re a small technical team based in NYC.
Viraj Mehta (CTO) recently completed his PhD from CMU, with an emphasis on reinforcement learning for LLMs and nuclear fusion, and previously worked in machine learning at KKR and a fintech startup; he holds a BS in math and an MS in computer science from Stanford.
Gabriel Bianconi (CEO) was the chief product officer at Ondo Finance ($14B+ valuation in 2024) and previously spent years consulting on machine learning for companies ranging from early-stage tech startups to some of the largest financial firms; he holds BS and MS degrees in computer science from Stanford.
How is TensorZero licensed?
TensorZero is open source under the permissive Apache 2.0 License.
How does TensorZero make money?
We’re lucky to have investors who are aligned with our long-term vision, so we’re able to focus on building and snooze this question for a while.
We’re inspired by companies like Databricks and ClickHouse. One day, we’ll launch a managed service that further streamlines LLM engineering, especially in enterprise settings, but open source will always be at the core of our business.