Optimizations
If you care about extreme concurrency and low latency, we recommend the following settings and workflows.TensorZero Gateway
- Enable
gateway.observability.async_writes
to offload the responsibility of writing inference responses to ClickHouse to a background task, instead of waiting for ClickHouse to return the inference response. Learn more → - Ensure your application, the TensorZero Gateway, and ClickHouse are deployed in the same region to minimize network latency.
Python Client
- Initialize the client once and reuse it as much as possible, to avoid initialization overhead and to keep the connection alive.