Multimodal Inference (VLMs)
TensorZero Gateway supports multimodal inference (e.g. image inputs) with vision-language models (VLMs).
See Integrations for a list of supported models.
Setup
Object Storage
TensorZero uses object storage to store images used during multimodal inference.
It supports any S3-compatible object storage service, including AWS S3, GCP Cloud Storage, Cloudflare R2, and many more.
You can configure the object storage service in the object_storage
section of the configuration file.
In this example, we’ll use a local deployment of MinIO, an open-source S3-compatible object storage service.
[object_storage]type = "s3_compatible"endpoint = "http://minio:9000" # optional: defaults to AWS S3# region = "us-east-1" # optional: depends on your S3-compatible storage providerbucket_name = "tensorzero" # optional: depends on your S3-compatible storage provider# IMPORTANT: for production environments, remove the following setting and use a secure method of authentication in# combination with a production-grade object storage service.allow_http = true
You can also store images in a local directory (type = "filesystem"
) or disable image storage (type = "disabled"
).
See Configuration Reference for more details.
The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:
S3_ACCESS_KEY_ID
andS3_SECRET_ACCESS_KEY
environment variablesAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables- Credentials from the AWS SDK (default profile)
Docker Compose
We’ll use Docker Compose to deploy the TensorZero Gateway, ClickHouse, and MinIO.
docker-compose.yml
# This is a simplified example for learning purposes. Do not use this in production.# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services: clickhouse: image: clickhouse/clickhouse-server:24.12-alpine environment: - CLICKHOUSE_USER=chuser - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 - CLICKHOUSE_PASSWORD=chpassword ports: - "8123:8123" healthcheck: test: wget --spider --tries 1 http://chuser:chpassword@clickhouse:8123/ping start_period: 30s start_interval: 1s timeout: 1s
gateway: image: tensorzero/gateway volumes: # Mount our tensorzero.toml file into the container - ./config:/app/config:ro environment: - OPENAI_API_KEY=${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.} - S3_ACCESS_KEY_ID=miniouser - S3_SECRET_ACCESS_KEY=miniopassword - TENSORZERO_CLICKHOUSE_URL=http://chuser:chpassword@clickhouse:8123/tensorzero ports: - "3000:3000" extra_hosts: - "host.docker.internal:host-gateway" depends_on: clickhouse: condition: service_healthy minio: condition: service_healthy
# For a production deployment, you can use AWS S3, GCP Cloud Storage, Cloudflare R2, etc. minio: image: bitnami/minio ports: - "9000:9000" # API port - "9001:9001" # Console port environment: - MINIO_ROOT_USER=miniouser - MINIO_ROOT_PASSWORD=miniopassword - MINIO_DEFAULT_BUCKETS=tensorzero healthcheck: test: "mc ls local/tensorzero || exit 1" start_period: 30s start_interval: 1s timeout: 1s
Inference
With the setup out of the way, you can now use the TensorZero Gateway to perform multimodal inference.
The TensorZero Gateway accepts both embedded images (encoded as base64 strings) and remote images (specified by a URL).
from tensorzero import TensorZeroGateway
with TensorZeroGateway.build_http( gateway_url="http://localhost:3000",) as client: response = client.inference( model_name="openai::gpt-4o-mini", input={ "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Do the images share any common features?", }, # Remote image of Ferris the crab { "type": "image", "url": "https://raw.githubusercontent.com/tensorzero/tensorzero/ff3e17bbd3e32f483b027cf81b54404788c90dc1/tensorzero-internal/tests/e2e/providers/ferris.png", }, # One-pixel orange image encoded as a base64 string { "type": "image", "mime_type": "image/png", "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII=", }, ], } ], }, )
print(response)
from openai import OpenAI
with OpenAI(base_url="http://localhost:3000/openai/v1") as client: response = client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "user", "content": [ { "type": "text", "text": "Do the images share any common features?", }, # Remote image of Ferris the crab { "type": "image_url", "image_url": { "url": "https://raw.githubusercontent.com/tensorzero/tensorzero/ff3e17bbd3e32f483b027cf81b54404788c90dc1/tensorzero-internal/tests/e2e/providers/ferris.png", }, }, # One-pixel orange image encoded as a base64 string { "type": "image_url", "image_url": { "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII=", }, }, ], } ], )
print(response)
curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "model_name": "openai::gpt-4o-mini", "input": { "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Do the images share any common features?" }, { "type": "image", "url": "https://raw.githubusercontent.com/tensorzero/tensorzero/ff3e17bbd3e32f483b027cf81b54404788c90dc1/tensorzero-internal/tests/e2e/providers/ferris.png" }, { "type": "image", "mime_type": "image/png", "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII=" } ] } ] } }'