Skip to content

Voice Cloud API

The Voice Cloud API is the curated, external-facing subset of the Telbox API. It exposes Telbox's voice + AI primitives — capture audio, transcribe, understand, ask, translate, and synthesise voice — to third-party applications, without the internal app, admin, and operator surface.

Curated subset

This is a deliberate slice of the full API, not a separate service. Every Voice Cloud endpoint is a Telbox /v1 endpoint. The Voice Cloud reference renders only this subset; the full reference shows everything.

Early access

The Voice Cloud product surface is still firming up. The endpoint set below is the proposed initial surface drawn from shipping endpoints. Dedicated server-to-server API keys are on the roadmap — today, authenticate with a JWT from the phone-OTP flow. To get involved, email developers@telbox.ai.

What's included

  • Capture & transcribe

    Upload audio, attach it to a message, and get a transcript from the AI pipeline.

  • Understand & ask

    Extract structure and answer questions grounded in conversation history (RAG), including by voice.

  • Translate

    Stream a translation of any transcript token-by-token.

  • Synthesise

    Create and preview voice clones for voice output.

Endpoint map

Capability Endpoints
Auth & identity POST /v1/auth/phone/start, POST /v1/auth/phone/verify, POST /v1/auth/refresh, GET /v1/me
Media POST /v1/media/upload, GET /v1/media/{object_key}
Conversations GET /v1/threads, POST /v1/threads, GET /v1/threads/{thread_id}
Messaging POST /v1/messages/send, GET /v1/messages, GET /v1/messages/{message_id}, GET /v1/messages/{message_id}/info, PATCH /v1/messages/{message_id}/transcription
AI POST /v1/messages/{message_id}/insights, POST /v1/messages/{message_id}/translate-stream, POST /v1/ask, POST /v1/ask/stream, POST /v1/ai/ask-by-voice
Voice synthesis GET /v1/voice-clones, POST /v1/voice-clones, GET /v1/voice-clones/{clone_id}, GET /v1/voice-clones/{clone_id}/preview
System GET /v1/healthz

→ Browse the interactive Voice Cloud API reference.

A typical voice-to-insight flow

sequenceDiagram
    participant C as Your app
    participant A as Telbox API
    C->>A: POST /v1/media/upload (audio)
    A-->>C: { object_key }
    C->>A: POST /v1/messages/send (voice, object_key)
    A-->>C: { message_id }
    C->>A: POST /v1/messages/{id}/insights
    A-->>C: 202 (pipeline started)
    Note over C,A: watch ai.processed on the WebSocket,<br/>or poll GET /v1/messages/{id}
    C->>A: GET /v1/messages/{id}
    A-->>C: { transcript, insights }

Before you build

  • Auth: Authentication — JWT bearer; refresh before expiry.
  • Errors: Errors — branch on the detail code.
  • Limits: Rate Limits & Quotas — AI is metered on the free tier.
  • AI semantics: AI & Privacy — consent gate, no-train, confirm cards for writes.
  • Streaming: Realtime — SSE for ask/translate, WebSocket for push.