Build by talking (`@build`) + the compiler eval gate¶

@build turns a plain-English intent — "every Monday at 9, nudge whoever still owes me a reply" — into a draft agent definition you review before anything is created. One endpoint, POST /v1/agents/compile, runs the natural-language → IR compiler and hands back a draft AgentDefinitionIR plus a plain-language rendering of its steps. It is the front door to building an agent without writing the IR by hand. This page explains exactly what it does today, the seam that lets it get smarter, and the eval gate that decides when a smarter compiler is allowed to act on its own.

What `compile` does¶

POST /v1/agents/compile takes one field, text (3–2000 chars), and returns a draft — it creates nothing, persists nothing, and never auto-runs. The caller is expected to inspect the draft and then explicitly create the agent (POST /v1/agents with the IR, or POST /v1/agents/from-template).

The response is a CompiledAgentView:

Field	Type	Meaning
`ir`	object	The drafted `AgentDefinitionIR` — the same shape you'd post to `POST /v1/agents`.
`source`	string	Provenance of the draft. `"template-match"` today (see below).
`matched_template`	string \| null	The curated template id the intent matched, when `source` is `"template-match"`.
`rendered_steps`	string[]	Plain-language, one-line-per-step rendering of the drafted plan — the surface you show a user to confirm.

Compile drafts; it never creates

compile is read-only. Nothing exists until you take the returned ir and call createAgent / create_agent (or install a template). Treat the draft as a proposal to confirm, not a fait accompli.

Compile via the API¶

curlPythonTypeScript

curl -X POST https://api.telbox.ai/v1/agents/compile \
  -H "Authorization: Bearer tb_live_…" \
  -H "Content-Type: application/json" \
  -d '{"text": "every Monday at 9, nudge whoever still owes me a reply"}'

from telbox import TelboxClient

tb = TelboxClient(api_key="tb_live_…")

draft = tb.compile("every Monday at 9, nudge whoever still owes me a reply")

print(draft.source)            # "template-match"
print(draft.matched_template)  # "nudge-non-repliers"
for line in draft.rendered_steps:
    print(line)                # one plain-language line per step

# Confirm, then create — compile created nothing on its own.
agent = tb.create_agent(draft.ir)

import { TelboxClient } from "@telbox/sdk";

const tb = new TelboxClient({ apiKey: "tb_live_…" });

const draft = await tb.compile(
  "every Monday at 9, nudge whoever still owes me a reply",
);

console.log(draft.source);           // "template-match"
console.log(draft.matched_template); // "nudge-non-repliers"
draft.rendered_steps.forEach((l) => console.log(l));

// Confirm, then create — compile created nothing on its own.
const agent = await tb.createAgent(draft.ir);

Preview before you commit

Once you have a draft ir, you don't have to create blind. POST /v1/agents/dry-run (tb.dry_run(ir) / tb.dryRun(ir)) returns a deterministic, side-effect-free preview of every step, each write's authority decision, and whether the agent would auto-reply — so the confirm step can show what it would actually do.

Deterministic today — honestly¶

It would be easy to claim @build is "AI that understands what you want." It isn't, and we'd rather be straight about it. Today the compiler is deterministic keyword → curated-template matching, personalized with your own words as the agent's persona. There is no LLM in the compile path: it's instant, it never hallucinates a tool, and it's reproducible.

Concretely, compile_intent lowercases the text, scores it against a fixed keyword map (e.g. remind / nudge / follow up / chase → the nudge-non-repliers template; summary / recap / digest → daily-thread-summary), picks the template with the most hits (defaulting to a thread summary on no signal), builds that template's IR, and overwrites its persona with your verbatim text. That's why the draft's source is "template-match" and matched_template names the chosen template.

The honest limitation: keyword matching can't reach paraphrase or non-English intent. "Make sure I don't leave people hanging" won't match the nudge keywords. That ceiling is documented, measured, and exactly what a smarter compiler is meant to clear.

The swappable seam¶

compile_intent (in telbox.modules.agent.nl_compiler) is deliberately a swappable seam. A free-form LLM compiler — a model emitting a validated AgentDefinitionIR — can replace it without touching the API endpoint or the SDKs: same POST /v1/agents/compile, same CompiledAgentView (the source field flips to indicate LLM provenance). More expressive, reaches paraphrase and other languages — but a model can also pick the wrong tool or bolt on tools the intent never asked for.

A smarter compiler doesn't get autonomy for free

Expressiveness cuts both ways: a free-form compiler can over-fire. That's precisely the risk the compiler eval gate guards. A new compiler must earn the right to act on its own — it doesn't inherit it by being newer.

The compiler eval gate¶

The gate (telbox.modules.agent.eval.compiler_eval) grades any compiler against a small, versioned golden corpus of labeled intent → {expected_tools, forbidden_tools, expected_trigger} cases (~20). It is compiler-agnostic: it scores a compile_fn, so the production deterministic compiler and a candidate LLM compiler are graded identically. It measures per-class precision/recall (not overall agreement), with a hard zero on forbidden-tool over-firing, and it's deterministic so it runs in seconds.

The bar (the tunable constants in compiler_eval):

Metric	Bar	Why
Recall (expected tools present)	≥ 0.90	The compiler must include the tools the agent actually needs.
Precision (no spurious tools)	≥ 0.85	It must not bolt on tools the intent didn't ask for.
Forbidden-tool violations	0	The over-fire guard — never wire, say, web/email into a plain summary.
Trigger-kind failures	0	The right trigger (schedule vs. message vs. manual).

Corpus cases flagged "gate": false are aspirational — the paraphrase / non-English intents the keyword matcher can't yet reach. They're evaluated and reported, but excluded from the pass/fail bar, so they document the ceiling a free-form compiler should beat without failing today's deterministic one.

The gate's two jobs¶

Regression shield (CI). The production compiler is graded on every run; the test suite fails the build if it drops below the bar — for example, if an edit to the keyword map quietly breaks matching.
Graduation gate. compiler_gate_passes(compile_fn) is the predicate a future free-form compiler must clear. Pass it the candidate compiler's function; it's graded by the same corpus and the same bar.

How a compiler graduates to auto-act

Free-form @build stays draft-only — it can propose an IR, but a human confirms — until compiler_gate_passes(...) returns True for it. Only then is a compiler allowed to graduate from draft-only to can-auto-act. The compiler earns autonomy the same way an agent does: by clearing a measurable bar, not by assertion.

Where this fits¶

The drafted ir is an AgentDefinitionIR — the same object you create agents from. See the Voice Cloud / Agents API reference for the full IR shape and the create/dry-run/run endpoints.
New to the platform? Start with Getting Started to provision a key, then come back and compile your first intent.
Both SDKs (telbox for Python, @telbox/sdk for TypeScript) expose compile directly, as shown above.

The bar is only as good as the corpus

When a real @build miss surfaces, it's added to the golden corpus — gated if the production compiler should handle it, aspirational if it's a target for the free-form compiler. The corpus grows as the surface grows; the gated set stays small enough that the suite remains sub-second.

Build by talking (@build) + the compiler eval gate¶

What compile does¶