Voice Cloud API¶
The Voice Cloud API is the curated, external-facing subset of the Telbox API. It exposes Telbox's voice + AI primitives — capture audio, transcribe, understand, ask, translate, and synthesise voice — to third-party applications, without the internal app, admin, and operator surface.
Curated subset
This is a deliberate slice of the full API, not a separate service. Every
Voice Cloud endpoint is a Telbox /v1 endpoint. The
Voice Cloud reference renders only this subset; the
full reference shows everything.
Early access
The Voice Cloud product surface is still firming up. The endpoint set below
is the proposed initial surface drawn from shipping endpoints. Dedicated
server-to-server API keys are on the roadmap — today, authenticate with a
JWT from the phone-OTP flow. To get involved,
email developers@telbox.ai.
What's included¶
-
Capture & transcribe
Upload audio, attach it to a message, and get a transcript from the AI pipeline.
-
Understand & ask
Extract structure and answer questions grounded in conversation history (RAG), including by voice.
-
Translate
Stream a translation of any transcript token-by-token.
-
Synthesise
Create and preview voice clones for voice output.
Endpoint map¶
| Capability | Endpoints |
|---|---|
| Auth & identity | POST /v1/auth/phone/start, POST /v1/auth/phone/verify, POST /v1/auth/refresh, GET /v1/me |
| Media | POST /v1/media/upload, GET /v1/media/{object_key} |
| Conversations | GET /v1/threads, POST /v1/threads, GET /v1/threads/{thread_id} |
| Messaging | POST /v1/messages/send, GET /v1/messages, GET /v1/messages/{message_id}, GET /v1/messages/{message_id}/info, PATCH /v1/messages/{message_id}/transcription |
| AI | POST /v1/messages/{message_id}/insights, POST /v1/messages/{message_id}/translate-stream, POST /v1/ask, POST /v1/ask/stream, POST /v1/ai/ask-by-voice |
| Voice synthesis | GET /v1/voice-clones, POST /v1/voice-clones, GET /v1/voice-clones/{clone_id}, GET /v1/voice-clones/{clone_id}/preview |
| System | GET /v1/healthz |
→ Browse the interactive Voice Cloud API reference.
A typical voice-to-insight flow¶
sequenceDiagram
participant C as Your app
participant A as Telbox API
C->>A: POST /v1/media/upload (audio)
A-->>C: { object_key }
C->>A: POST /v1/messages/send (voice, object_key)
A-->>C: { message_id }
C->>A: POST /v1/messages/{id}/insights
A-->>C: 202 (pipeline started)
Note over C,A: watch ai.processed on the WebSocket,<br/>or poll GET /v1/messages/{id}
C->>A: GET /v1/messages/{id}
A-->>C: { transcript, insights }
Before you build¶
- Auth: Authentication — JWT bearer; refresh before expiry.
- Errors: Errors — branch on the
detailcode. - Limits: Rate Limits & Quotas — AI is metered on the free tier.
- AI semantics: AI & Privacy — consent gate, no-train, confirm cards for writes.
- Streaming: Realtime — SSE for
ask/translate, WebSocket for push.