/metrics endpoint.
tensorzero_inference_latency_overhead_seconds
This metric tracks the latency overhead introduced by TensorZero on inference requests.
It measures the total request duration minus the time spent waiting for external model provider HTTP requests.
This is useful for understanding how much latency TensorZero adds to your inference requests, independently of model provider latency.
This metric is reported as a summary with quantiles (e.g. p50, p90, p99).
GET /metrics
tensorzero_inference_latency_overhead_seconds_histogram
This metric is an optional histogram variant of tensorzero_inference_latency_overhead_seconds (see above).
It provides traditional histogram buckets instead of pre-computed quantiles, which is useful if you want to compute custom quantiles or aggregate across multiple instances.
To enable it, configure the histogram buckets in your configuration file:
tensorzero.toml
GET /metrics
tensorzero_inferences_total
This metric counts the total number of inferences performed by TensorZero.
GET /metrics
tensorzero_requests_total
This metric counts the total number of requests handled by TensorZero.
GET /metrics