Skip to content

Getting Started with AWS SageMaker

This guide shows how to set up a minimal deployment to use the TensorZero Gateway with the AWS SageMaker API.

The AWS SageMaker model provider is a wrapper around other TensorZero model providers that handles AWS SageMaker-specific logic (e.g. auth). For example, you can use it to infer self-hosted model providers like Ollama deployed on AWS SageMaker.

Setup

For this minimal setup, you’ll need just two files in your project directory:

  • Directoryconfig/
    • tensorzero.toml
  • docker-compose.yml

For production deployments, see our Deployment Guide.

You’ll also need to deploy a SageMaker endpoint for your LLM model. For this example, we’re using a container running Ollama.

Configuration

Create a minimal configuration file that defines a model and a simple chat function:

config/tensorzero.toml
[models.gemma_3]
routing = ["aws_sagemaker"]
[models.gemma_3.providers.aws_sagemaker]
type = "aws_sagemaker"
model_name = "gemma3:1b"
endpoint_name = "my-sagemaker-endpoint"
region = "us-east-1"
# ... or use `allow_auto_detect_region = true` to infer region with the AWS SDK
hosted_provider = "openai" # Ollama is OpenAI-compatible
[functions.my_function_name]
type = "chat"
[functions.my_function_name.variants.my_variant_name]
type = "chat_completion"
model = "gemma_3"

The hosted_provider field specifies the model provider that you deployed on AWS SageMaker. For example, Ollama is OpenAI-compatible, so we use openai as the hosted provider. Alternatively, you can use hosted_provider = "tgi" if you had deployed TGI instead.

You can specify the endpoint’s region explicitly, or use allow_auto_detect_region = true to infer region with the AWS SDK.

See the Configuration Reference for optional fields. The relevant fields will depend on the hosted_provider.

Credentials

You must make sure that the gateway has the necessary permissions to access AWS SageMaker. The TensorZero Gateway will use the AWS SDK to retrieve the relevant credentials.

The simplest way is to set the following environment variables before running the gateway:

Terminal window
AWS_ACCESS_KEY_ID=...
AWS_REGION=us-east-1
AWS_SECRET_ACCESS_KEY=...

Alternatively, you can use other authentication methods supported by the AWS SDK.

Deployment (Docker Compose)

Create a minimal Docker Compose configuration:

docker-compose.yml
# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services:
gateway:
image: tensorzero/gateway
volumes:
- ./config:/app/config:ro
command: --config-file /app/config/tensorzero.toml
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:?Environment variable AWS_ACCESS_KEY_ID must be set.}
- AWS_REGION=${AWS_REGION:?Environment variable AWS_REGION must be set.}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY:?Environment variable AWS_SECRET_ACCESS_KEY must be set.}
ports:
- "3000:3000"
extra_hosts:
- "host.docker.internal:host-gateway"

You can start the gateway with docker compose up.

Inference

Make an inference request to the gateway:

Terminal window
curl -X POST http://localhost:3000/inference \
-H "Content-Type: application/json" \
-d '{
"function_name": "my_function_name",
"input": {
"messages": [
{
"role": "user",
"content": "What is the capital of Japan?"
}
]
}
}'