Authority & autonomy (the leash)¶

An agent is defined less by what it can do than by when it may do it alone. Telbox encodes that as a per-capability authority level plus per-capability limits. At runtime the resolver turns (capability, level, limits, action) into one of four outcomes — refuse, draft, ask, or auto-act — so every write a granted agent attempts is checked against the leash the owner set. This page documents that gradient honestly, including what auto-execution does not protect against.

The autonomy gradient¶

Each capability an agent touches (calendar, email, thread_replies, reminders, tasks, mute, purchases, …) carries one level. The level is the coarse leash; limits tighten it.

Level	Decision	What the agent does
`disabled`	REFUSE	The capability is off-limits — the agent can't use it at all.
`draft_only`	DRAFT	Prepares the action; the owner finishes / sends it.
`ask_before_action`	ASK	Proposes the action; one confirm executes it (human-in-the-loop).
`auto_act_limited`	AUTO (within limits)	Does it now, within the stated limits, then logs it and offers undo.

Default is draft_only

An agent autos nothing until the owner explicitly grants auto_act_limited for a specific capability. A capability with no grant resolves to ASK, never auto. The visual builder and SDK both default unset capabilities to the safe end of the gradient.

An AUTO decision carries an undo window (undo_window_s, 45s by default, env-tunable via TELBOX_AGENT_UNDO_WINDOW_S) — the frictionless pattern is act + undo, not confirm first. Every other decision has a zero undo window because nothing irreversible happened yet.

Per-capability limits¶

auto_act_limited only auto-acts when the concrete action falls inside the capability's limits. These are the limit keys the resolver actually honors:

Capability	Limit key	Checked against	Auto-acts when
`calendar`	`max_duration_min`	the event's `duration_min`	duration ≤ max
`calendar`	`known_contacts_only`	the event's `invitees_known`	all invitees are known contacts
`thread_replies`	`max_chars`	the reply's `char_count`	reply length ≤ max
`email`	`approved_domains`	the message's `recipient_domains`	every recipient domain is in the list
`purchases`	`max_amount_cents`	the order's `amount_cents`	amount ≤ max

Over-limit never silently becomes AUTO — it falls back to ASK

Bounds are hard. A calendar event longer than max_duration_min, a reply over max_chars, an email to an unapproved domain, or a purchase over max_amount_cents does not quietly proceed — it escalates to ASK so a human confirms. The fallback reason is surfaced (e.g. calendar_over_limit:duration_exceeds_max).

High-risk capabilities require an explicit, satisfied limit

purchases and email are treated as high-risk (money / outbound mail are external and hard to reverse). They never auto-act on an empty limit set — email with no approved_domains and purchases with no max_amount_cents both fall back to ASK even at auto_act_limited. You must set a real bound for them to ever auto.

External & irreversible tools never auto-act¶

Above the authority resolver sits a harder gate. A tool whose side-effect class is external is removed from auto-execution before the resolver is even consulted — so it can never auto-act, regardless of grant or limits.

The clearest example is request_uber_ride (side_effects: "external"): even an auto_act_limited grant on its capability cannot make it fire on its own. It always produces a confirm step. Email-send-equivalents are likewise hard-gated. The rule: if the result reaches outside Telbox or can't be cleanly undone, a human confirms.

Two independent backstops

Auto-execution survives only when both are true: (1) the tool is not an external/irreversible side effect, and (2) the owner granted auto_act_limited and the action is within limits. Either one failing → no auto.

Injection posture (honest)¶

This is the part most platforms gloss. We don't.

A message_arrival trigger fires on content the agent did not author and cannot fully trust. With the gradient wired into the triggered runtime, a tool the owner granted auto_act_limited will auto-execute when its trigger fires — which means a reversible granted tool can act on instructions embedded in a message it reads. "A VIP-watcher agent that turns an incoming message into a task" is exactly this behavior, by design.

There is no taint/provenance gate that blocks auto-acting on freshly-read untrusted content for reversible tools — a blanket one would neuter every message-triggered auto-agent. The agent's system prompt does carry a standing instruction to treat message text, transcripts, and tool results as untrusted data, never instructions, but that is prompt-level steering, not a hard gate. The real, hard backstops are the two below — and the blast radius is deliberately bounded:

External / irreversible tools never auto-execute — injected content or not. request_uber_ride, email sends, anything external is hard-gated out of auto.
Default is draft_only — nothing autos until an explicit auto_act_limited grant for a specific capability.
Reversible + own-workspace only — the tools that can auto-fire (create_task, create_reminder, create_calendar_event, mute_thread) are undoable and scoped to the grantor's own data. To inject at all, an attacker must already be a member of a thread the agent watches.
Bounded volume — one trigger ⇒ one deduped run ⇒ a per-run tool cap; the reply path additionally carries a per-agent send budget.

Adding a new irreversible auto-capability MUST come with a real taint gate

The current "reversible content can auto-act" contract is only safe because the actions are undoable and own-workspace. Do not widen auto-execution to a new irreversible capability without adding a real provenance/taint gate first.

Setting the leash in the IR¶

Authority lives in the agent's IR as a per-capability guard. Both SDKs expose ir.guard(level, ...limits); the visual builder (Studio) renders the same guards on each write node. The example below grants reminders full auto (reversible, low-risk), bounds thread_replies to short auto-replies, requires confirmation for calendar, and leaves email at draft-only — then previews exactly what would auto-act with dry_run (no LLM, no side effects) before publishing.

curlPythonTypeScript

# Preview what this IR would do — no LLM, no side effects.
curl -X POST https://api.telbox.ai/v1/agents/dry-run \
  -H "Authorization: Bearer tb_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Reply Nudge",
    "persona": "Watch the thread; turn asks into reminders and short replies.",
    "triggers": [{ "kind": "message_arrival" }],
    "steps": [
      { "id": "s1", "type": "tool", "tool": "create_reminder",
        "args": { "title": { "prompt": "follow up" } } },
      { "id": "s2", "type": "tool", "tool": "compose_email_draft",
        "args": { "to": { "prompt": "the sender" } } }
    ],
    "guards": { "capabilities": {
      "reminders":      { "level": "auto_act_limited" },
      "thread_replies": { "level": "auto_act_limited", "limits": { "max_chars": 280 } },
      "calendar":       { "level": "ask_before_action" },
      "email":          { "level": "draft_only" }
    } }
  }'

# Then create it (same IR body):
# curl -X POST https://api.telbox.ai/v1/agents  -H "Authorization: Bearer tb_live_…" -d '{ … }'

from telbox import TelboxClient, ir

tb = TelboxClient(api_key="tb_live_…")

agent = ir.agent(
    "Reply Nudge",
    persona="Watch the thread; turn asks into reminders and short replies.",
    triggers=[ir.on_message()],
    steps=[
        ir.tool("s1", "create_reminder", title=ir.prompt("follow up")),
        ir.tool("s2", "compose_email_draft", to=ir.prompt("the sender")),
    ],
    guards={
        # reversible + low-risk → auto-acts on a watched message
        "reminders": ir.guard("auto_act_limited"),
        # auto, but only short replies; longer ones fall back to ASK
        "thread_replies": ir.guard("auto_act_limited", max_chars=280),
        # one confirm before any calendar write
        "calendar": ir.guard("ask_before_action"),
        # high-risk: stays draft-only (and would need approved_domains to ever auto)
        "email": ir.guard("draft_only"),
    },
)

# Preview the leash before publishing — no LLM, no side effects.
preview = tb.dry_run(agent)
print(preview.effects)

created = tb.create_agent(agent)

import { TelboxClient, ir } from "@telbox/sdk";

const tb = new TelboxClient({ apiKey: "tb_live_…" });

const agent = ir.agent("Reply Nudge", {
  persona: "Watch the thread; turn asks into reminders and short replies.",
  triggers: [ir.onMessage()],
  steps: [
    ir.tool("s1", "create_reminder", { title: ir.prompt("follow up") }),
    ir.tool("s2", "compose_email_draft", { to: ir.prompt("the sender") }),
  ],
  guards: {
    // reversible + low-risk → auto-acts on a watched message
    reminders: ir.guard("auto_act_limited"),
    // auto, but only short replies; longer ones fall back to ASK
    thread_replies: ir.guard("auto_act_limited", { max_chars: 280 }),
    // one confirm before any calendar write
    calendar: ir.guard("ask_before_action"),
    // high-risk: stays draft-only (and would need approved_domains to ever auto)
    email: ir.guard("draft_only"),
  },
});

// Preview the leash before publishing — no LLM, no side effects.
const preview = await tb.dryRun(agent);
console.log(preview.effects);

const created = await tb.createAgent(agent);

Same resolver, three surfaces

The dry-run preview, the interactive test-run, and the triggered worker all consult the same authority resolver — so what dry_run shows is what the live agent actually does. A preview that diverged from the runtime would be worse than no preview.

Where this fits¶

The four levels are validated against the closed authority_levels vocabulary returned by GET /v1/agent-tools (alongside each tool's side_effects and the capability its guard is keyed on).
Authority gates writes; it does not gate reads. A read-only tool has no governing capability.
See AI & Privacy for the consent + quota gates that sit in front of any model call, and the confirm-card flow (POST /v1/agent/confirm-action) that an ASK decision produces.
See Errors for the machine-readable codes a blocked or escalated action returns.