Build a Long-Horizon Agent
Build a Long-Horizon Agent
Run multi-hour agent loops with cost ceilings, resumable state, and voice input
This cookbook assumes you have an OpenRouter API key and are using the Agent
SDK (@openrouter/agent). If you are starting from scratch, read the
Agent SDK overview and the
callModel reference first.
Goal: Run an agent that can keep working for hours, not seconds — research
projects, multi-stage migrations, voice-driven assistants, or background jobs
that span days. The same callModel loop works for all of them once you wire
up four primitives.
Outcome: A long-horizon agent that:
- Caps total cost and step count so it always terminates.
- Persists conversation state so it can be resumed after a crash, deploy, or human approval.
- Streams progress events so dashboards and UIs stay live during the run.
- Runs a self-ask loop — research, adversarial review, repeat — until the
agent emits a
[DONE]sentinel. - Optionally accepts voice input via OpenRouter’s Speech-to-Text endpoint and replies with Text-to-Speech.
You can hand this page to your coding agent as the implementation brief. Adapt the storage, ceilings, and surface (CLI, API, queue worker) to your app rather than scaffold a separate project.
Prerequisites
- Node.js 20+ or Bun
- An OpenRouter API key in
OPENROUTER_API_KEY - A project with
@openrouter/agentinstalled - A place to persist state — a database, Redis, S3, or the local filesystem
- Optional: a microphone or audio file for the voice section
1. Set hard ceilings on every run
Long-horizon agents must terminate. Combine multiple stop conditions so the
loop ends as soon as the first one fires. The most useful for long runs are
maxCost, stepCountIs, and maxTokensUsed.
See the Stop Conditions reference
for the full list (stepCountIs, hasToolCall, maxTokensUsed, maxCost,
finishReasonIs) and how to compose custom predicates.
Long-horizon runs spend real credits. Always set both a step ceiling and a cost ceiling before you start a multi-hour run, and start small while you are iterating.
2. Persist state for resumability
A multi-hour run must survive restarts, deploys, and human approvals.
callModel accepts a StateAccessor that loads and saves
ConversationState between steps. Back it with whatever storage your app
already uses.
To resume after a crash, deploy, or human review, call callModel again with
the same StateAccessor. Pass input: [] to signal “no new user turn —
continue from saved state”; the SDK loads the checkpoint and keeps going.
For production, swap the file accessor for one backed by Postgres, Redis, or an object store. See Tool Approval & State for the full StateAccessor and resumption contract.
3. Stream progress instead of waiting
A run that lasts an hour should not block your UI for an hour. callModel
returns a result object with several streams you can consume independently:
result.getTextStream()— token deltas for the user-facing response.result.getToolCallsStream()— tool calls as they complete.result.getFullResponsesStream()— the full event stream, including tool preliminary results.result.getResponse()— the final, fully-resolved response with usage data.
See the callModel API reference for every stream method and event type.
Wire publishToDashboard to whatever transport you already use — Server-Sent
Events, WebSockets, a database table, or a pubsub channel.
4. Loop with adversarial self-review
A single pass through callModel often leaves gaps — unverified citations,
missing edge cases, or stale data. Wrap the run in an outer self-ask loop:
research, adversarial review, repeat until the agent emits a [DONE]
sentinel. Each iteration appends a new user turn to the persisted
StateAccessor, so the agent builds on its prior work instead of starting
over.
The [DONE] sentinel is intentionally cheap: any model can produce it, and a
plain String.includes check keeps the control flow obvious. Swap the review
prompt or the reviewer model (for example a faster
~anthropic/claude-sonnet-latest critiquing an Opus researcher) without
changing the loop. Three layers of ceilings keep cost bounded:
SELF_ASK_MAX_ITERATIONS caps the number of review rounds, and each round
inherits its own stepCountIs + maxCost budget.
Pair this with the state accessor from step 2 so the loop survives crashes
mid-review. On resume, re-enter the loop from the saved state and continue
reviewing.
5. Add voice input
Drive the same agent loop from a voice memo, phone call, or push-to-talk app.
OpenRouter exposes a dedicated
/api/v1/audio/transcriptions
endpoint with a single STT model parameter. Hand the transcript to
callModel exactly like a text prompt.
For a streaming microphone, capture audio chunks on the client, send them to
your server, and call createTranscription once silence is detected. Use the
STT cookbook for the full request and
response shape.
6. Speak the response back (optional)
For voice-out, pipe the agent’s reply through
/api/v1/audio/speech and write the
resulting bytes to a file or stream them to the caller.
7. Notify on completion
Long-horizon jobs usually run somewhere the user is not watching. Notify them
when the run terminates — by webhook, email, Slack message, or whatever your
stack uses. Trigger the notification once getResponse() resolves so the
agent has fully completed and ceilings have been honored.
For agents that pause mid-run (for example, human-in-the-loop approvals), see Add Human-in-the-Loop Controls.
Check your work
A correct long-horizon implementation should pass all of the following:
- A run with a low
maxCost(for example,maxCost(0.10)) returns fromcallModelonce the ceiling is hit, even if the agent has more work queued. - Killing the process mid-run and starting a new
callModelinvocation with the sameStateAccessorresumes from the savedConversationState. The message history grows rather than starting over. getToolCallsStream()andgetTextStream()yield events while the agent is still running, not only at the end.- Sending a voice file through
sdk.stt.createTranscriptionreturns the expected text, and feeding that text intocallModelproduces a response that references the spoken request. - A webhook (or other notification) fires after
getResponse()resolves.