Getting Started with AWS SageMaker
This guide shows how to set up a minimal deployment to use the TensorZero Gateway with the AWS SageMaker API.
The AWS SageMaker model provider is a wrapper around other TensorZero model providers that handles AWS SageMaker-specific logic (e.g. auth). For example, you can use it to infer self-hosted model providers like Ollama deployed on AWS SageMaker.
Setup
For this minimal setup, you’ll need just two files in your project directory:
Directoryconfig/
- tensorzero.toml
- docker-compose.yml
For production deployments, see our Deployment Guide.
You’ll also need to deploy a SageMaker endpoint for your LLM model. For this example, we’re using a container running Ollama.
Configuration
Create a minimal configuration file that defines a model and a simple chat function:
[models.gemma_3]routing = ["aws_sagemaker"]
[models.gemma_3.providers.aws_sagemaker]type = "aws_sagemaker"model_name = "gemma3:1b"endpoint_name = "my-sagemaker-endpoint"region = "us-east-1"# ... or use `allow_auto_detect_region = true` to infer region with the AWS SDKhosted_provider = "openai" # Ollama is OpenAI-compatible
[functions.my_function_name]type = "chat"
[functions.my_function_name.variants.my_variant_name]type = "chat_completion"model = "gemma_3"
The hosted_provider
field specifies the model provider that you deployed on AWS SageMaker.
For example, Ollama is OpenAI-compatible, so we use openai
as the hosted provider.
Alternatively, you can use hosted_provider = "tgi"
if you had deployed TGI instead.
You can specify the endpoint’s region
explicitly, or use allow_auto_detect_region = true
to infer region with the AWS SDK.
See the Configuration Reference for optional fields.
The relevant fields will depend on the hosted_provider
.
Credentials
You must make sure that the gateway has the necessary permissions to access AWS SageMaker. The TensorZero Gateway will use the AWS SDK to retrieve the relevant credentials.
The simplest way is to set the following environment variables before running the gateway:
AWS_ACCESS_KEY_ID=...AWS_REGION=us-east-1AWS_SECRET_ACCESS_KEY=...
Alternatively, you can use other authentication methods supported by the AWS SDK.
Deployment (Docker Compose)
Create a minimal Docker Compose configuration:
# This is a simplified example for learning purposes. Do not use this in production.# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services: gateway: image: tensorzero/gateway volumes: - ./config:/app/config:ro command: --config-file /app/config/tensorzero.toml environment: - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:?Environment variable AWS_ACCESS_KEY_ID must be set.} - AWS_REGION=${AWS_REGION:?Environment variable AWS_REGION must be set.} - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY:?Environment variable AWS_SECRET_ACCESS_KEY must be set.} ports: - "3000:3000" extra_hosts: - "host.docker.internal:host-gateway"
You can start the gateway with docker compose up
.
Inference
Make an inference request to the gateway:
curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "my_function_name", "input": { "messages": [ { "role": "user", "content": "What is the capital of Japan?" } ] } }'