Skip to content

Realtime (WebSocket & SSE)

Telbox pushes live updates over a WebSocket and streams token-by-token AI output over Server-Sent Events (SSE).

Two channels, two jobs

  • WebSocket (/v1/ws) — persistent, bidirectional, one per device. Carries every server→client event (new messages, read receipts, presence, call signaling, AI progress).
  • SSE — short-lived, unidirectional, one per request. Streams the output of a single AI call (translation, ask, assistant).

WebSocket

Connect

wss://api.telbox.ai/v1/ws?token=<access_token>

The server accepts the socket first, then authenticates the token. On failure it closes with code 4401 and the error code as the close reason. Pass the same access token you use for REST; refresh it before it expires and reconnect.

Heartbeat

Send the text frame ping; the server replies pong. Use this to keep intermediaries from idling the connection and to detect a half-open socket.

Event envelope

Every server→client message is JSON with this outer shape:

{
  "type": "message.new",
  "thread_id": "uuid",
  "message_id": "uuid (optional)",
  "workspace_id": "uuid (optional)",
  "payload": { }
}

Switch on type; route on thread_id. UUID fields are bare strings.

Event catalog

type Direction Meaning
message.new S→C A new message arrived in a thread (E2E-encrypted payload).
message.read S→sender Recipient marked your message(s) read.
message_reactions S→C Reaction set on a message changed.
message_deleted S→C A message was deleted.
message.transcript_edited S→C A voice-note transcript was edited.
presence S→C Typing / recording indicator for a thread.
recording_progress S→C Live progress while a voice note records/uploads.
ai.stage_done S→C One AI pipeline stage finished (triage/extract/summarize/replies).
ai.processed S→C The AI pipeline for a message completed.
call.invited S→C Incoming call invitation.
call.state_changed S→C Call transitioned (ringing→active→ended, etc.).
call.signal S→C WaveLink SDP/ICE handshake signal for a call.
call.relay S→C Relay coordination for a call.
handoff.event S→sender Live view / screenshot receipt on a hand-off message.

Additional state pushes you may receive: subscription.lapsed, voice_clone.status_changed, task_due, invite_redeemed. Treat any unknown type as a no-op (forward-compatible) — the apps do.

Payloads are encrypted

message.new carries the E2E-encrypted message (encapsulated_key_b64, iv, ciphertext, aad, sender_eph_public). Decrypt client-side with your device keys. The server never sees plaintext on the transport path.

Reconnect & replay

The WebSocket is best-effort while connected; durable delivery is guaranteed by an outbox with at-least-once semantics and idempotency keys. After any disconnect:

  1. Reconnect the socket.
  2. Catch up on anything missed via REST: GET /v1/messages?after=<last_seen_ts>.

Because events are idempotent by design (e.g. message.new is keyed by message_id), replaying a window you partially saw is safe — dedupe on the id.

Server-Sent Events (SSE)

SSE responses set Content-Type: text/event-stream, Cache-Control: no-cache, and X-Accel-Buffering: no. Each event is a data: line; the stream terminates with a literal data: [DONE].

Streaming translation — POST /v1/messages/{message_id}/translate-stream

Request:

{ "target_lang": "en" }

target_lang is an optional BCP-47 tag; it defaults to your preferred_language. You must be a member of the message's thread.

Stream:

data: {"text": "Translated fragment"}

data: {"text": " continues…"}

data: [DONE]
  • {"text": "…"} — one translated fragment per token.
  • {"error": "upstream_failed"} — terminal upstream failure (still 200, followed by [DONE]).
  • Client errors (403, 404, 429) arrive as a normal HTTP error with no SSE body.

Streaming ask — POST /v1/ask/stream

Streams the answer to an Ask query token-by-token. It may also emit a pending_action event carrying an HMAC-signed confirm card; redeem it at POST /v1/agent/confirm-action. Terminates with [DONE].

Streaming assistant — POST /v1/threads/{thread_id}/assistant-invocations/stream

Drives the personal assistant with live tool-call chrome:

data: {"step": "tool_call_start", "name": "search_messages", "call_id": "c1"}

data: {"step": "tool_call_done", "name": "search_messages", "ok": true, "summary": "2 hit(s)"}

data: {"event": "result", "visibility": "private", "action": { }, "run": { }}

data: [DONE]
  • {"step": …}tool_call_start, tool_call_done, done.
  • {"action": {…}} — a write-tool confirm card (same token path as Ask).
  • {"event": "result", …} — terminal result with the finalized action/run.
  • {"event": "error", "error": "<code>"} — mid-stream failure (terminal).

transcribe-stream is removed

There is no streaming-transcription endpoint. Voice-note transcripts are produced one-shot by the AI pipeline and delivered when ready — watch for the ai.stage_done / ai.processed WebSocket events, or poll GET /v1/messages/{id}.