Skip to content

Configuration Reference

The configuration file is the backbone of TensorZero. It defines the behavior of the gateway, including the models and their providers, functions and their variants, tools, metrics, and more. Developers express the behavior of LLM calls by defining the relevant prompt templates, schemas, and other parameters in this configuration file.

You can see an example configuration file here.

The configuration file is a TOML file with a few major sections (TOML tables): gateway, clickhouse, models, model_providers, functions, variants, tools, and metrics.

[gateway]

The [gateway] section defines the behavior of the TensorZero Gateway.

bind_address

  • Type: string
  • Required: no (default: 0.0.0.0:3000)

Defines the socket address to bind the TensorZero Gateway to.

tensorzero.toml
[gateway]
# ...
bind_address = "0.0.0.0:3000"
# ...

disable_observability

  • Type: boolean
  • Required: no (default: false)

Disable the observability features of the TensorZero Gateway (not recommended).

tensorzero.toml
[gateway]
# ...
disable_observability = true # not recommended
# ...

[models.model_name]

The [models.model_name] section defines the behavior of a model. You can define multiple models by including multiple [models.model_name] sections.

A model is provider agnostic, and the relevant providers are defined in the providers sub-section (see below).

If your model_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct as [models."llama-3.1-8b-instruct"].

tensorzero.toml
[models.claude-3-haiku-20240307]
# fieldA = ...
# fieldB = ...
# ...
[models."llama-3.1-8b-instruct"]
# fieldA = ...
# fieldB = ...
# ...

routing

  • Type: array of strings
  • Required: yes

A list of provider names to route requests to. The providers must be defined in the providers sub-section (see below). The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.

tensorzero.toml
[models.gpt-4o]
# ...
routing = ["openai", "azure"]
# ...
[models.gpt-4o.providers.openai]
# ...
[models.gpt-4o.providers.azure]
# ...

[models.model_name.providers.provider_name]

The providers sub-section defines the behavior of a specific provider for a model. You can define multiple providers by including multiple [models.model_name.providers.provider_name] sections.

If your provider_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define vllm.internal as [models.model_name.providers."vllm.internal"].

tensorzero.toml
[models.gpt-4o]
# ...
routing = ["openai", "azure"]
# ...
[models.gpt-4o.providers.openai]
# ...
[models.gpt-4o.providers.azure]
# ...

type

  • Type: string
  • Required: yes

Defines the types of the provider. See Integrations » Model Providers for details.

The supported provider types are anthropic, aws_bedrock, azure, fireworks, gcp_vertex_anthropic, gcp_vertex_gemini, google_ai_studio_gemini, hyperbolic, mistral, openai, together, vllm, and xai.

The other fields in the provider sub-section depend on the provider type.

tensorzero.toml
[models.gpt-4o.providers.azure]
# ...
type = "azure"
# ...
type: "anthropic"
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the Anthropic API. See Anthropic’s documentation for the list of available model names.

tensorzero.toml
[models.claude-3-haiku.providers.anthropic]
# ...
type = "anthropic"
model_name = "claude-3-haiku-20240307"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::ANTHROPIC_API_KEY)

Defines the location of the API key for the Anthropic provider. The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference for more details).

tensorzero.toml
[models.claude-3-haiku.providers.anthropic]
# ...
type = "anthropic"
api_key_location = "dynamic::anthropic_api_key"
# api_key_location = "env::ALTERNATE_ANTHROPIC_API_KEY"
# ...
type: "aws_bedrock"
model_id
  • Type: string
  • Required: yes

Defines the model ID to use with the AWS Bedrock API. See AWS Bedrock’s documentation for the list of available model IDs.

tensorzero.toml
[models.claude-3-haiku.providers.aws_bedrock]
# ...
type = "aws_bedrock"
model_id = "anthropic.claude-3-haiku-20240307-v1:0"
# ...
region
  • Type: string
  • Required: no (default: based on credentials if set, otherwise us-east-1)

Defines the AWS region to use with the AWS Bedrock API.

tensorzero.toml
[models.claude-3-haiku.providers.aws_bedrock]
# ...
type = "aws_bedrock"
region = "us-east-2"
# ...
type: "azure"

The TensorZero Gateway handles the API version under the hood (currently 2024-06-01). You only need to set the deployment_id and endpoint fields.

deployment_id
  • Type: string
  • Required: yes

Defines the deployment ID of the Azure OpenAI deployment.

See Azure OpenAI’s documentation for the list of available models.

tensorzero.toml
[models.gpt-4o-mini.providers.azure]
# ...
type = "azure"
deployment_id = "gpt4o-mini-20240718"
# ...
endpoint
  • Type: string
  • Required: yes

Defines the endpoint of the Azure OpenAI deployment (protocol and hostname).

tensorzero.toml
[models.gpt-4o-mini.providers.azure]
# ...
type = "azure"
endpoint = "https://<your-endpoint>.openai.azure.com"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::AZURE_OPENAI_API_KEY)

Defines the location of the API key for the Azure OpenAI provider. The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference for more details).

tensorzero.toml
[models.gpt-4o-mini.providers.azure]
# ...
type = "azure"
api_key_location = "dynamic::azure_openai_api_key"
# api_key_location = "env::ALTERNATE_AZURE_OPENAI_API_KEY"
# ...
type: "fireworks"
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the Fireworks API.

See Fireworks’ documentation for the list of available model names. You can also deploy your own models on Fireworks AI.

tensorzero.toml
[models."llama-3.1-8b-instruct".providers.fireworks]
# ...
type = "fireworks"
model_name = "accounts/fireworks/models/llama-v3p1-8b-instruct"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::FIREWORKS_API_KEY)

Defines the location of the API key for the Fireworks provider. The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference for more details).

tensorzero.toml
[models."llama-3.1-8b-instruct".providers.fireworks]
# ...
type = "fireworks"
api_key_location = "dynamic::fireworks_api_key"
# api_key_location = "env::ALTERNATE_FIREWORKS_API_KEY"
# ...
type: "gcp_vertex_anthropic"
location
  • Type: string
  • Required: yes

Defines the location (region) of the GCP Vertex AI Anthropic model.

tensorzero.toml
[models.claude-3-haiku.providers.gcp_vertex]
# ...
type = "gcp_vertex_anthropic"
location = "us-central1"
# ...
model_id
  • Type: string
  • Required: yes

Defines the model ID of the GCP Vertex AI model.

See Anthropic’s GCP documentation for the list of available model IDs.

tensorzero.toml
[models.claude-3-haiku.providers.gcp_vertex]
# ...
type = "gcp_vertex_anthropic"
model_id = "claude-3-haiku@20240307"
# ...
project_id
  • Type: string
  • Required: yes

Defines the project ID of the GCP Vertex AI model.

tensorzero.toml
[models.claude-3-haiku-2024030.providers.gcp_vertex]
# ...
type = "gcp_vertex"
project_id = "your-project-id"
# ...
credential_location
  • Type: string
  • Required: no (default: env::GCP_CREDENTIALS_PATH)

Defines the location of the credentials for the GCP Vertex Anthropic provider. The supported locations are env::PATH_TO_CREDENTIALS_FILE, dynamic::CREDENTIALS_ARGUMENT_NAME (see the API reference for more details), and file::PATH_TO_CREDENTIALS_FILE.

tensorzero.toml
[models.claude-3-haiku.providers.gcp_vertex]
# ...
type = "gcp_vertex_anthropic"
credential_location = "dynamic::gcp_credentials_path"
# credential_location = "env::ALTERNATE_GCP_CREDENTIALS_PATH"
# credential_location = "file::PATH_TO_CREDENTIALS_FILE"
# ...
type: "gcp_vertex_gemini"
location
  • Type: string
  • Required: yes

Defines the location (region) of the GCP Vertex Gemini model.

tensorzero.toml
[models."gemini-1.5-flash".providers.gcp_vertex]
# ...
type = "gcp_vertex_gemini"
location = "us-central1"
# ...
model_id
  • Type: string
  • Required: yes

Defines the model ID of the GCP Vertex AI model.

See GCP Vertex AI’s documentation for the list of available model IDs.

tensorzero.toml
[models."gemini-1.5-flash".providers.gcp_vertex]
# ...
type = "gcp_vertex_gemini"
model_id = "gemini-1.5-flash-001"
# ...
project_id
  • Type: string
  • Required: yes

Defines the project ID of the GCP Vertex AI model.

tensorzero.toml
[models."gemini-1.5-flash".providers.gcp_vertex]
# ...
type = "gcp_vertex_gemini"
project_id = "your-project-id"
# ...
credential_location
  • Type: string
  • Required: no (default: env::GCP_CREDENTIALS_PATH)

Defines the location of the credentials for the GCP Vertex Gemini provider. The supported locations are env::PATH_TO_CREDENTIALS_FILE, dynamic::CREDENTIALS_ARGUMENT_NAME (see the API reference for more details), and file::PATH_TO_CREDENTIALS_FILE.

tensorzero.toml
[models."gemini-1.5-flash".providers.gcp_vertex]
# ...
type = "gcp_vertex_gemini"
credential_location = "dynamic::gcp_credentials_path"
# credential_location = "env::ALTERNATE_GCP_CREDENTIALS_PATH"
# credential_location = "file::PATH_TO_CREDENTIALS_FILE"
# ...
type: "google_ai_studio_gemini"
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the Google AI Studio Gemini API. See Google AI Studio’s documentation for the list of available model names.

tensorzero.toml
[models."gemini-1.5-flash".providers.google_ai_studio_gemini]
# ...
type = "google_ai_studio_gemini"
model_name = "gemini-1.5-flash-001"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::GOOGLE_AI_STUDIO_API_KEY)

Defines the location of the API key for the Google AI Studio Gemini provider. The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference for more details).

tensorzero.toml
[models."gemini-1.5-flash".providers.google_ai_studio_gemini]
# ...
type = "google_ai_studio_gemini"
api_key_location = "dynamic::google_ai_studio_api_key"
# api_key_location = "env::ALTERNATE_GOOGLE_AI_STUDIO_API_KEY"
# ...
type: "hyperbolic"
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the Hyperbolic API.

See Hyperbolic’s documentation for the list of available model names.

tensorzero.toml
[models."meta-llama/Meta-Llama-3-70B-Instruct".providers.hyperbolic]
# ...
type = "hyperbolic"
model_name = "meta-llama/Meta-Llama-3-70B-Instruct"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::HYPERBOLIC_API_KEY)

Defines the location of the API key for the Hyperbolic provider. The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference for more details).

tensorzero.toml
[models."meta-llama/Meta-Llama-3-70B-Instruct".providers.hyperbolic]
# ...
type = "hyperbolic"
api_key_location = "dynamic::hyperbolic_api_key"
# api_key_location = "env::ALTERNATE_HYPERBOLIC_API_KEY"
# ...
type: "mistral"
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the Mistral API.

See Mistral’s documentation for the list of available model names.

tensorzero.toml
[models."open-mistral-nemo".providers.mistral]
# ...
type = "mistral"
model_name = "open-mistral-nemo-2407"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::MISTRAL_API_KEY)

Defines the location of the API key for the Mistral provider. The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference for more details).

tensorzero.toml
[models."open-mistral-nemo".providers.mistral]
# ...
type = "mistral"
api_key_location = "dynamic::mistral_api_key"
# api_key_location = "env::ALTERNATE_MISTRAL_API_KEY"
# ...
type: "openai"
api_base
  • Type: string
  • Required: no (default: https://api.openai.com/v1/)

Defines the base URL of the OpenAI API.

You can use the api_base field to use an API provider that is compatible with the OpenAI API. However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.

tensorzero.toml
[models."gpt-4o".providers.openai]
# ...
type = "openai"
api_base = "https://api.openai.com/v1/"
# ...
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the OpenAI API.

See OpenAI’s documentation for the list of available model names.

tensorzero.toml
[models.gpt-4o-mini.providers.openai]
# ...
type = "openai"
model_name = "gpt-4o-mini-2024-07-18"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::OPENAI_API_KEY)

Defines the location of the API key for the OpenAI provider. The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference for more details).

tensorzero.toml
[models.gpt-4o-mini.providers.openai]
# ...
type = "openai"
api_key_location = "dynamic::openai_api_key"
# api_key_location = "env::ALTERNATE_OPENAI_API_KEY"
# api_key_location = "none"
# ...
type: "together"
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the Together API.

See Together’s documentation for the list of available model names. You can also deploy your own models on Together AI.

tensorzero.toml
[models.llama3_1_8b_instruct_turbo.providers.together]
# ...
type = "together"
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::TOGETHER_API_KEY)

Defines the location of the API key for the Together AI provider. The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference for more details).

tensorzero.toml
[models.llama3_1_8b_instruct_turbo.providers.together]
# ...
type = "together"
api_key_location = "dynamic::together_api_key"
# api_key_location = "env::ALTERNATE_TOGETHER_API_KEY"
# ...
type: "vllm"
api_base
  • Type: string
  • Required: yes (default: http://localhost:8000/v1/)

Defines the base URL of the VLLM API.

tensorzero.toml
[models."phi-3.5-mini-instruct".providers.vllm]
# ...
type = "vllm"
api_base = "http://localhost:8000/v1/"
# ...
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the vLLM API.

tensorzero.toml
[models."phi-3.5-mini-instruct".providers.vllm]
# ...
type = "vllm"
model_name = "microsoft/Phi-3.5-mini-instruct"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::VLLM_API_KEY)

Defines the location of the API key for the vLLM provider. The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference for more details).

tensorzero.toml
[models."phi-3.5-mini-instruct".providers.vllm]
# ...
type = "vllm"
api_key_location = "dynamic::vllm_api_key"
# api_key_location = "env::ALTERNATE_VLLM_API_KEY"
# api_key_location = "none"
# ...
type: "xai"
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the xAI API.

See xAI’s documentation for the list of available model names.

tensorzero.toml
[models.grok_2_1212.providers.xai]
# ...
type = "xai"
model_name = "grok-2-1212"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::XAI_API_KEY)

Defines the location of the API key for the xAI provider. The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference for more details).

tensorzero.toml
[models.grok_2_1212.providers.xai]
# ...
type = "xai"
api_key_location = "dynamic::xai_api_key"
# api_key_location = "env::ALTERNATE_XAI_API_KEY"
# ...

[embedding_models.model_name]

The [embedding_models.model_name] section defines the behavior of an embedding model. You can define multiple models by including multiple [embedding_models.model_name] sections.

A model is provider agnostic, and the relevant providers are defined in the providers sub-section (see below).

If your model_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define embedding-0.1 as [embedding_models."embedding-0.1"].

tensorzero.toml
[embedding_models.openai-text-embedding-3-small]
# fieldA = ...
# fieldB = ...
# ...
[embedding_models."t0-text-embedding-3.5-massive"]
# fieldA = ...
# fieldB = ...
# ...

routing

  • Type: array of strings
  • Required: yes

A list of provider names to route requests to. The providers must be defined in the providers sub-section (see below). The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.

tensorzero.toml
[embedding_models.model-name]
# ...
routing = ["openai", "alternative-provider"]
# ...
[embedding_models.model-name.providers.openai]
# ...
[embedding_models.model-name.providers.alternative-provider]
# ...

[embedding_models.model_name.providers.provider_name]

The providers sub-section defines the behavior of a specific provider for a model. You can define multiple providers by including multiple [embedding_models.model_name.providers.provider_name] sections.

If your provider_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define vllm.internal as [embedding_models.model_name.providers."vllm.internal"].

tensorzero.toml
[embedding_models.model-name]
# ...
routing = ["openai", "alternative-provider"]
# ...
[embedding_models.model-name.providers.openai]
# ...
[embedding_models.model-name.providers.alternative-provider]
# ...

type

  • Type: string
  • Required: yes

Defines the types of the provider. See Integrations » Model Providers for details.

TensorZero currently only supports openai as a provider for embedding models. More integrations are on the way.

The other fields in the provider sub-section depend on the provider type.

tensorzero.toml
[embedding_models.model-name.providers.openai]
# ...
type = "openai"
# ...
type: "openai"
api_base
  • Type: string
  • Required: no (default: https://api.openai.com/v1/)

Defines the base URL of the OpenAI API.

You can use the api_base field to use an API provider that is compatible with the OpenAI API. However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.

tensorzero.toml
[embedding_models.openai-text-embedding-3-small.providers.openai]
# ...
type = "openai"
api_base = "https://api.openai.com/v1/"
# ...
model_name
  • Type: string
  • Required: yes

Defines the model name to use with the OpenAI API.

See OpenAI’s documentation for the list of available model names.

tensorzero.toml
[embedding_models.openai-text-embedding-3-small.providers.openai]
# ...
type = "openai"
model_name = "text-embedding-3-small"
# ...
api_key_location
  • Type: string
  • Required: no (default: env::OPENAI_API_KEY)

Defines the location of the API key for the OpenAI provider. The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference for more details).

tensorzero.toml
[embedding_models.openai-text-embedding-3-small.providers.openai]
# ...
type = "openai"
api_key_location = "dynamic::openai_api_key"
# api_key_location = "env::ALTERNATE_OPENAI_API_KEY"
# api_key_location = "none"
# ...

[functions.function_name]

The [functions.function_name] section defines the behavior of a function. You can define multiple functions by including multiple [functions.function_name] sections.

A function can have multiple variants, and each variant is defined in the variants sub-section (see below). A function expresses the abstract behavior of an LLM call (e.g. the schemas for the messages), and its variants express concrete instantiations of that LLM call (e.g. specific templates and models).

If your function_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define summarize-2.0 as [functions."summarize-2.0"].

tensorzero.toml
[functions.draft-email]
# fieldA = ...
# fieldB = ...
# ...
[functions.summarize-email]
# fieldA = ...
# fieldB = ...
# ...

assistant_schema

  • Type: string (path)
  • Required: no

Defines the path to the assistant schema file. The path is relative to the configuration file.

If provided, the assistant schema file should contain a JSON Schema for the assistant messages. The variables in the schema are used for templating the assistant messages. If a schema is provided, all function variants must also provide an assistant template (see below).

tensorzero.toml
[functions.draft-email]
# ...
assistant_schema = "./functions/draft-email/assistant_schema.json"
# ...
[functions.draft-email.variants.prompt-v1]
# ...
assistant_template = "./functions/draft-email/prompt-v1/assistant_template.minijinja"
# ...

system_schema

  • Type: string (path)
  • Required: no

Defines the path to the system schema file. The path is relative to the configuration file.

If provided, the system schema file should contain a JSON Schema for the system message. The variables in the schema are used for templating the system message. If a schema is provided, all function variants must also provide a system template (see below).

tensorzero.toml
[functions.draft-email]
# ...
system_schema = "./functions/draft-email/system_schema.json"
# ...
[functions.draft-email.variants.prompt-v1]
# ...
system_template = "./functions/draft-email/prompt-v1/system_template.minijinja"
# ...

type

  • Type: string
  • Required: yes

Defines the type of the function.

The supported function types are chat and json.

Most other fields in the function section depend on the function type.

tensorzero.toml
[functions.draft-email]
# ...
type = "chat"
# ...
type: "chat"
parallel_tool_calls
  • Type: boolean
  • Required: no (default: false)

Determines whether the function should be allowed to call multiple tools in a single conversation turn.

Most model providers do not support this feature. In those cases, this field will be ignored.

tensorzero.toml
[functions.draft-email]
# ...
type = "chat"
parallel_tool_calls = true
# ...
tool_choice
  • Type: string
  • Required: no (default: auto)

Determines the tool choice strategy for the function.

The supported tool choice strategies are:

  • none: The function should not use any tools.
  • auto: The model decides whether or not to use a tool. If it decides to use a tool, it also decides which tools to use.
  • required: The model should use a tool. If multiple tools are available, the model decides which tool to use.
  • { specific = "tool_name" }: The model should use a specific tool. The tool must be defined in the tools field (see below).
tensorzero.toml
[functions.solve-math-problem]
# ...
type = "chat"
tool_choice = "auto"
tools = [
# ...
"run-python"
# ...
]
# ...
[tools.run-python]
# ...
tensorzero.toml
[functions.generate-query]
# ...
type = "chat"
tool_choice = { specific = "query-database" }
tools = [
# ...
"query-database"
# ...
]
# ...
[tools.query-database]
# ...
tools
  • Type: array of strings
  • Required: no (default: [])

Determines the tools that the function can use.

The supported tools are defined in [tools.tool_name] sections (see below).

tensorzero.toml
[functions.draft-email]
# ...
type = "chat"
tools = [
# ...
"query-database"
# ...
]
# ...
[tools.query-database]
# ...
type: "json"
output_schema
  • Type: string (path)
  • Required: no (default: {}, the empty JSON schema that accepts any valid JSON output)

Defines the path to the output schema file, which should contain a JSON Schema for the output of the function. The path is relative to the configuration file.

This schema is used for validating the output of the function.

tensorzero.toml
[functions.extract-customer-info]
# ...
type = "json"
output_schema = "./functions/extract-customer-info/output_schema.json"
# ...

user_schema

  • Type: string (path)
  • Required: no

Defines the path to the user schema file. The path is relative to the configuration file.

If provided, the user schema file should contain a JSON Schema for the user messages. The variables in the schema are used for templating the user messages. If a schema is provided, all function variants must also provide a user template (see below).

tensorzero.toml
[functions.draft-email]
# ...
user_schema = "./functions/draft-email/user_schema.json"
# ...
[functions.draft-email.variants.prompt-v1]
# ...
user_template = "./functions/draft-email/prompt-v1/user_template.minijinja"
# ...

[functions.function_name.variants.variant_name]

The variants sub-section defines the behavior of a specific variant of a function. You can define multiple variants by including multiple [functions.function_name.variants.variant_name] sections.

If your variant_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct as [functions.function_name.variants."llama-3.1-8b-instruct"].

tensorzero.toml
[functions.draft-email]
# ...
[functions.draft-email.variants."llama-3.1-8b-instruct"]
# ...
[functions.draft-email.variants.claude-3-haiku]
# ...

type

  • Type: string
  • Required: yes

Defines the type of the variant.

TensorZero currently supports the following variant types:

TypeDescription
chat_completionUses a chat completion model to generate responses by processing a series of messages in a conversational format. This is typically what you use out of the box with most LLMs.
experimental_best_of_nGenerates multiple response candidates with other variants, and selects the best one using an evaluator model.
experimental_dynamic_in_context_learningSelects similar high-quality examples using an embedding of the input, and incorporates them into the prompt to enhance context and improve response quality.
experimental_mixture_of_nGenerates multiple response candidates with other variants, and combines the responses using a fuser model.
tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
type = "chat_completion"
# ...
type: "chat_completion"
assistant_template
  • Type: string (path)
  • Required: no

Defines the path to the assistant template file. The path is relative to the configuration file.

This file should contain a MiniJinja template for the assistant messages. If the template uses any variables, the variables should be defined in the function’s assistant_schema field.

tensorzero.toml
[functions.draft-email]
# ...
assistant_schema = "./functions/draft-email/assistant_schema.json"
# ...
[functions.draft-email.variants.prompt-v1]
# ...
assistant_template = "./functions/draft-email/prompt-v1/assistant_template.minijinja"
# ...
frequency_penalty
  • Type: float
  • Required: no (default: null)

Penalizes new tokens based on their frequency in the text so far if positive, encourages them if negative.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
frequency_penalty = 0.2
# ...
json_mode
  • Type: string
  • Required: no (default: on)

Defines the strategy for generating JSON outputs.

This parameter is only supported for variants of functions with type = "json".

The supported modes are:

  • off: Make a chat completion request without any special JSON handling (not recommended).
  • on: Make a chat completion request with JSON mode (if supported by the provider).
  • strict: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.
  • implicit_tool: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
json_mode = "strict"
# ...
max_tokens
  • Type: integer
  • Required: no (default: null)

Defines the maximum number of tokens to generate.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
max_tokens = 100
# ...
model
  • Type: string
  • Required: yes

Defines the model to use for the variant. The model must be defined in the [models.model_name] section (see above).

tensorzero.toml
[models.gpt-4o-mini]
# ...
[functions.draft-email.variants.prompt-v1]
# ...
model = "gpt-4o-mini"
# ...
presence_penalty
  • Type: float
  • Required: no (default: null)

Penalizes new tokens based on that have already appeared in the text so far if positive, encourages them if negative.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
presence_penalty = 0.5
# ...
retries
  • Type: object with optional keys num_retries and max_delay_s
  • Required: no (defaults to num_retries = 0 and a max_delay_s = 10)

TensorZero’s retry strategy is truncated exponential backoff with jitter. The num_retries parameter defines the number of retries (not including the initial request). The max_delay_s parameter defines the maximum delay between retries.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
retries = { num_retries = 3, max_delay_s = 10 }
# ...
seed
  • Type: integer
  • Required: no (default: null)

Defines the seed to use for the variant.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
seed = 42
system_template
  • Type: string (path)
  • Required: no

Defines the path to the system template file. The path is relative to the configuration file.

This file should contain a MiniJinja template for the system messages. If the template uses any variables, the variables should be defined in the function’s system_schema field.

tensorzero.toml
[functions.draft-email]
# ...
system_schema = "./functions/draft-email/system_schema.json"
# ...
[functions.draft-email.variants.prompt-v1]
# ...
system_template = "./functions/draft-email/prompt-v1/system_template.minijinja"
# ...
temperature
  • Type: float
  • Required: no (default: null)

Defines the temperature to use for the variant.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
temperature = 0.5
# ...
top_p
  • Type: float, between 0 and 1
  • Required: no (default: null)

Defines the top_p to use for the variant during nucleus sampling. Typically at most one of top_p and temperature is set.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
top_p = 0.3
# ...
user_template
  • Type: string (path)
  • Required: no

Defines the path to the user template file. The path is relative to the configuration file.

This file should contain a MiniJinja template for the user messages. If the template uses any variables, the variables should be defined in the function’s user_schema field.

tensorzero.toml
[functions.draft-email]
# ...
user_schema = "./functions/draft-email/user_schema.json"
# ...
[functions.draft-email.variants.prompt-v1]
# ...
user_template = "./functions/draft-email/prompt-v1/user_template.minijinja"
# ...
weight
  • Type: float
  • Required: no (default: 0)

Defines the weight of the variant. When you call a function, the weight determines the relative importance of the variant when sampling.

Variants will be sampled with a probability proportional to their weight. For example, if variant A has a weight of 1.0 and variant B has a weight of 3.0, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25% and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%.

You can disable a variant by setting its weight to 0. The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name. This is useful for defining fallback variants, which won’t be used unless no other variants are available.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
weight = 1.0
# ...
type: "experimental_best_of_n"
candidates
  • Type: list of strings
  • Required: yes

This inference strategy generates N candidate responses, and an evaluator model selects the best one. This approach allows you to leverage multiple prompts or variants to increase the likelihood of getting a high-quality response.

The candidates parameter specifies a list of variant names used to generate candidate responses. For example, if you have two variants defined (promptA and promptB), you could set up the candidates list to generate two responses using promptA and one using promptB using the snippet below. The evaluator would then choose the best response from these three candidates.

tensorzero.toml
[functions.draft-email.variants.promptA]
type = "chat_completion"
# ...
[functions.draft-email.variants.promptB]
type = "chat_completion"
# ...
[functions.draft-email.variants.best-of-n]
type = "experimental_best_of_n"
candidates = ["promptA", "promptA", "promptB"] # 3 candidate generations
# ...
evaluator
  • Type: object
  • Required: yes

The evaluator parameter specifies the configuration for the model that will evaluate and select the best response from the generated candidates.

The evaluator is configured similarly to a chat_completion variant, but without the type field. The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to an evaluator.

[functions.draft-email.variants.best-of-n]
type = "experimental_best_of_n"
# ...
[functions.draft-email.variants.best-of-n.evaluator]
# Same fields as a `chat_completion` variant (excl.`type`), e.g.:
# user_template = "functions/draft-email/best-of-n/user.minijinja"
# ...
timeout_s
  • Type: float
  • Required: no (default: 300s)

The timeout_s parameter specifies the maximum time in seconds allowed for generating candidate responses. Any candidate that takes longer than this duration to generate a response will be dropped from consideration.

[functions.draft-email.variants.best-of-n]
type = "experimental_best_of_n"
timeout_s = 60
# ...
weight
  • Type: float
  • Required: no (default: 0)

Defines the weight of the variant. When you call a function, the weight determines the relative importance of the variant when sampling.

Variants will be sampled with a probability proportional to their weight. For example, if variant A has a weight of 1.0 and variant B has a weight of 3.0, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25% and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%.

You can disable a variant by setting its weight to 0. The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name. This is useful for defining fallback variants, which won’t be used unless no other variants are available.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
weight = 1.0
# ...
type: "experimental_mixture_of_n"
candidates
  • Type: list of strings
  • Required: yes

This inference strategy generates N candidate responses, and a fuser model combines them to produce a final answer. This approach allows you to leverage multiple prompts or variants to increase the likelihood of getting a high-quality response.

The candidates parameter specifies a list of variant names used to generate candidate responses. For example, if you have two variants defined (promptA and promptB), you could set up the candidates list to generate two responses using promptA and one using promptB using the snippet below. The fuser would then combine the three responses.

tensorzero.toml
[functions.draft-email.variants.promptA]
type = "chat_completion"
# ...
[functions.draft-email.variants.promptB]
type = "chat_completion"
# ...
[functions.draft-email.variants.mixture-of-n]
type = "experimental_mixture_of_n"
candidates = ["promptA", "promptA", "promptB"] # 3 candidate generations
# ...
fuser
  • Type: object
  • Required: yes

The fuser parameter specifies the configuration for the model that will evaluate and combine the elements.

The evaluator is configured similarly to a chat_completion variant, but without the type field. The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to a fuser.

[functions.draft-email.variants.mixture-of-n]
type = "experimental_mixture_of_n"
# ...
[functions.draft-email.variants.mixture-of-n.fuser]
# Same fields as a `chat_completion` variant (excl.`type`), e.g.:
# user_template = "functions/draft-email/mixture-of-n/user.minijinja"
# ...
timeout_s
  • Type: float
  • Required: no (default: 300s)

The timeout_s parameter specifies the maximum time in seconds allowed for generating candidate responses. Any candidate that takes longer than this duration to generate a response will be dropped from consideration.

[functions.draft-email.variants.mixture-of-n]
type = "experimental_mixture_of_n"
timeout_s = 60
# ...
weight
  • Type: float
  • Required: no (default: 0)

Defines the weight of the variant. When you call a function, the weight determines the relative importance of the variant when sampling.

Variants will be sampled with a probability proportional to their weight. For example, if variant A has a weight of 1.0 and variant B has a weight of 3.0, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25% and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%.

You can disable a variant by setting its weight to 0. The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name. This is useful for defining fallback variants, which won’t be used unless no other variants are available.

[functions.draft-email.variants.mixture-of-n]
# ...
weight = 1.0
# ...
type: "experimental_dynamic_in_context_learning"
embedding_model
  • Type: string
  • Required: yes

Defines the model to use for retrieving the similar examples. The model must be defined in the [embedding_models.model_name] section (see above).

The embedding model used for inference should be the same model previously used to generate the embeddings stored in ClickHouse.

tensorzero.toml
[embedding_models.openai-text-embedding-3-small]
# ...
[functions.draft-email.variants.dicl]
# ...
embedding_model = "openai-text-embedding-3-small"
# ...
json_mode
  • Type: string
  • Required: no (default: on)

Defines the strategy for generating JSON outputs.

This parameter is only supported for variants of functions with type = "json".

The supported modes are:

  • off: Make a chat completion request without any special JSON handling (not recommended).
  • on: Make a chat completion request with JSON mode (if supported by the provider).
  • strict: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.
  • implicit_tool: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
json_mode = "strict"
# ...
k
  • Type: non-negative integer
  • Required: yes

Defines the number of examples to retrieve for the inference.

tensorzero.toml
[functions.draft-email.variants.dicl]
# ...
k = 10
# ...
max_tokens
  • Type: integer
  • Required: no (default: null)

Defines the maximum number of tokens to generate.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
max_tokens = 100
# ...
model
  • Type: string
  • Required: yes

Defines the model to use for the variant. The model must be defined in the [models.model_name] section (see above).

tensorzero.toml
[models.gpt-4o-mini]
# ...
[functions.draft-email.variants.prompt-v1]
# ...
model = "gpt-4o-mini"
# ...
retries
  • Type: object with optional keys num_retries and max_delay_s
  • Required: no (defaults to num_retries = 0 and a max_delay_s = 10)

TensorZero’s retry strategy is truncated exponential backoff with jitter. The num_retries parameter defines the number of retries (not including the initial request). The max_delay_s parameter defines the maximum delay between retries.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
retries = { num_retries = 3, max_delay_s = 10 }
# ...
seed
  • Type: integer
  • Required: no (default: null)

Defines the seed to use for the variant.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
seed = 42
system_instructions
  • Type: string (path)
  • Required: no

Defines the path to the system instructions file. The path is relative to the configuration file.

The system instruction is a text file that will be added to the evaluator’s system prompt. Unlike system_template, it doesn’t support variables. This file contains static instructions that define the behavior and role of the AI assistant for the specific function variant.

tensorzero.toml
[functions.draft-email.variants.dicl]
# ...
system_instructions = "./functions/draft-email/prompt-v1/system_template.txt"
# ...
temperature
  • Type: float
  • Required: no (default: null)

Defines the temperature to use for the variant.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
temperature = 0.5
# ...
weight
  • Type: float
  • Required: no (default: 0)

Defines the weight of the variant. When you call a function, the weight determines the relative importance of the variant when sampling.

Variants will be sampled with a probability proportional to their weight. For example, if variant A has a weight of 1.0 and variant B has a weight of 3.0, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25% and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%.

You can disable a variant by setting its weight to 0. The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name. This is useful for defining fallback variants, which won’t be used unless no other variants are available.

tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
weight = 1.0
# ...

[metrics]

The [metrics] section defines the behavior of a metric. You can define multiple metrics by including multiple [metrics.metric_name] sections.

The metric name can’t be comment or demonstration, as those names are reserved for internal use.

If your metric_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define beats-gpt-3.5 as [metrics."beats-gpt-3.5"].

tensorzero.toml
[metrics.task-completed]
# fieldA = ...
# fieldB = ...
# ...
[metrics.user-rating]
# fieldA = ...
# fieldB = ...
# ...

level

  • Type: string
  • Required: yes

Defines whether the metric applies to individual inference or across entire episodes.

The supported levels are inference and episode.

tensorzero.toml
[metrics.valid-output]
# ...
level = "inference"
# ...
[metrics.task-completed]
# ...
level = "episode"
# ...

optimize

  • Type: string
  • Required: yes

Defines whether the metric should be maximized or minimized.

The supported values are max and min.

tensorzero.toml
[metrics.mistakes-made]
# ...
optimize = "min"
# ...
[metrics.user-rating]
# ...
optimize = "max"
# ...

type

  • Type: string
  • Required: yes

Defines the type of the metric.

The supported metric types are boolean and float.

tensorzero.toml
[metrics.user-rating]
# ...
type = "float"
# ...
[metrics.task-completed]
# ...
type = "boolean"
# ...

[tools.tool_name]

The [tools.tool_name] section defines the behavior of a tool. You can define multiple tools by including multiple [tools.tool_name] sections.

If your tool_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define run-python-3.10 as [tools."run-python-3.10"].

You can enable a tool for a function by adding it to the function’s tools field.

tensorzero.toml
[functions.weather-chatbot]
# ...
type = "chat"
tools = [
# ...
"get-temperature"
# ...
]
# ...
[tools.get-temperature]
# ...

description

  • Type: string
  • Required: yes

Defines the description of the tool provided to the model.

You can typically materially improve the quality of responses by providing a detailed description of the tool.

tensorzero.toml
[tools.get-temperature]
# ...
description = "Get the current temperature in a given location (e.g. \"Tokyo\") using the specified unit (must be \"celsius\" or \"fahrenheit\")."
# ...

parameters

  • Type: string (path)
  • Required: yes

Defines the path to the parameters file. The path is relative to the configuration file.

This file should contain a JSON Schema for the parameters of the tool.

tensorzero.toml
[tools.get-temperature]
# ...
parameters = "./tools/get-temperature.json"
# ...

strict

  • Type: boolean
  • Required: no (default: false)

If set to true, the TensorZero Gateway attempts to use strict JSON generation for the tool parameters. This typically improves the quality of responses.

Only a few providers support strict JSON generation. For example, the TensorZero Gateway uses Structured Outputs for OpenAI. If the provider does not support strict mode, the TensorZero Gateway ignores this field.

tensorzero.toml
[tools.get-temperature]
# ...
strict = true
# ...