Voice Cloud API¶

The Voice Cloud API is the curated, external-facing subset of the Telbox API. It exposes Telbox's voice + AI primitives — capture audio, transcribe, understand, ask, translate, and synthesise voice — to third-party applications, without the internal app, admin, and operator surface.

Curated subset

This is a deliberate slice of the full API, not a separate service. Every Voice Cloud endpoint is a Telbox /v1 endpoint. The Voice Cloud reference renders only this subset; the full reference shows everything.

Early access

The Voice Cloud product surface is still firming up. The endpoint set below is the proposed initial surface drawn from shipping endpoints. Dedicated server-to-server API keys are on the roadmap — today, authenticate with a JWT from the phone-OTP flow. To get involved, email developers@telbox.ai.

What's included¶

Capture & transcribe

Upload audio, attach it to a message, and get a transcript from the AI pipeline.
Understand & ask

Extract structure and answer questions grounded in conversation history (RAG), including by voice.
Translate

Stream a translation of any transcript token-by-token.
Synthesise

Create and preview voice clones for voice output.

Endpoint map¶

Capability	Endpoints
Auth & identity	`POST /v1/auth/phone/start`, `POST /v1/auth/phone/verify`, `POST /v1/auth/refresh`, `GET /v1/me`
Media	`POST /v1/media/upload`, `GET /v1/media/{object_key}`
Conversations	`GET /v1/threads`, `POST /v1/threads`, `GET /v1/threads/{thread_id}`
Messaging	`POST /v1/messages/send`, `GET /v1/messages`, `GET /v1/messages/{message_id}`, `GET /v1/messages/{message_id}/info`, `PATCH /v1/messages/{message_id}/transcription`
AI	`POST /v1/messages/{message_id}/insights`, `POST /v1/messages/{message_id}/translate-stream`, `POST /v1/ask`, `POST /v1/ask/stream`, `POST /v1/ai/ask-by-voice`
Voice synthesis	`GET /v1/voice-clones`, `POST /v1/voice-clones`, `GET /v1/voice-clones/{clone_id}`, `GET /v1/voice-clones/{clone_id}/preview`
System	`GET /v1/healthz`

→ Browse the interactive Voice Cloud API reference.

A typical voice-to-insight flow¶

sequenceDiagram
    participant C as Your app
    participant A as Telbox API
    C->>A: POST /v1/media/upload (audio)
    A-->>C: { object_key }
    C->>A: POST /v1/messages/send (voice, object_key)
    A-->>C: { message_id }
    C->>A: POST /v1/messages/{id}/insights
    A-->>C: 202 (pipeline started)
    Note over C,A: watch ai.processed on the WebSocket,<br/>or poll GET /v1/messages/{id}
    C->>A: GET /v1/messages/{id}
    A-->>C: { transcript, insights }

Before you build¶

Auth: Authentication — JWT bearer; refresh before expiry.
Errors: Errors — branch on the detail code.
Limits: Rate Limits & Quotas — AI is metered on the free tier.
AI semantics: AI & Privacy — consent gate, no-train, confirm cards for writes.
Streaming: Realtime — SSE for ask/translate, WebSocket for push.