Build a Token-Efficient Review Agent

Use a cheap executor model for routine work and call Advisor only for compact uncertainty checks

Goal: Build a review agent that drafts routine answers with a cheap model and asks a stronger Advisor model only after the executor compresses uncertainty into a short packet.

Outcome: Your app sends normal review tasks to a low-cost executor, adds one compact plan-reviewer Advisor entry when a second model is worth the spend, and logs whether Advisor was offered without storing prompts or advice.

Want your coding agent to add this workflow to your app? Copy this prompt.

Advisor is a beta server tool. It runs an inner model call, so it can add cost and latency. This recipe keeps that call behind a budget gate and sends a compact review prompt instead of forwarding the full transcript. Use returned usage.cost when present, or estimate spend from the selected advisor model’s current pricing before widening the gate.

Before you start

You need:

  • Node.js 20 or newer
  • An OpenRouter API key in OPENROUTER_API_KEY
  • A review, planning, or agent workflow that already calls OpenRouter
  • A cheap executor model for routine work
  • A stronger advisor model for compact second opinions

If you are starting a new TypeScript agent, use the Agent SDK callModel API for the executor loop. The sample below uses Chat Completions so the server-tool request shape is visible, but the budget-gate pattern is the same inside an Agent SDK workflow.

Use these references for exact schemas:

What you’re building

This recipe builds a tiny budget-aware implementation-plan reviewer.

The executor model handles the normal response and writes most plans by itself. The app only offers the plan-reviewer Advisor when the task has uncertainty signals like a large diff, missing tests, a schema change, or unfamiliar ownership. When Advisor is available, the executor can ask it for one focused second opinion before writing the final answer.

Review task
→ app checks whether a second model earns its cost
→ cheap executor drafts the answer
→ uncertain task: executor can call plan-reviewer with a compact prompt
→ final answer includes the plan, open questions, and next action

For this workflow, give the Advisor the decision being reviewed, the changed files, the uncertainty signals, and a short diff summary.

1. Define the compact review packet

Start with the smallest object the agent needs. This is the data your app already knows before it calls OpenRouter.

1const reviewTask = {
2 title: "Move usage-event writes to a monthly partitioned table",
3 userQuestion:
4 "Should we ship this migration plan, or ask for another design pass?",
5 changedFiles: [
6 "packages/db/migrations/2026-06-10-usage-partitions.sql",
7 "services/cfw-api/src/usage/write-usage-event.ts",
8 ],
9 diffSummary:
10 "Adds monthly partitions for usage_events and routes new writes by workspace_id and created_at.",
11 uncertaintySignals: ["schema-change", "missing-rollback", "billing-path"],
12};

Keep this packet small. Full diffs, raw conversation history, logs, and customer data belong in your own review UI or trace system, not in the Advisor prompt by default.

2. Add the cheap executor and budget-gated Advisor

The sample below shows the routing pattern and Chat Completions request shape. Adapt the same budget gate and tools shape inside your existing OpenRouter call, including Agent SDK callModel if that is your agent loop.

1const ADVISOR_WORTHY_SIGNALS = new Set([
2 "billing-path",
3 "large-diff",
4 "missing-rollback",
5 "missing-tests",
6 "schema-change",
7 "unknown-owner",
8]);
9
10const reviewTask = {
11 title: "Move usage-event writes to a monthly partitioned table",
12 userQuestion:
13 "Should we ship this migration plan, or ask for another design pass?",
14 changedFiles: [
15 "packages/db/migrations/2026-06-10-usage-partitions.sql",
16 "services/cfw-api/src/usage/write-usage-event.ts",
17 ],
18 diffSummary:
19 "Adds monthly partitions for usage_events and routes new writes by workspace_id and created_at.",
20 uncertaintySignals: ["schema-change", "missing-rollback", "billing-path"],
21};
22
23const shouldEnableAdvisor = (task) =>
24 task.uncertaintySignals.some((signal) => ADVISOR_WORTHY_SIGNALS.has(signal));
25
26const createAdvisorTool = (advisorModel) => ({
27 type: "openrouter:advisor",
28 parameters: {
29 name: "plan-reviewer",
30 model: advisorModel,
31 instructions:
32 "You are a senior engineering reviewer. Review only the compact task packet. Identify hidden assumptions, missing rollback steps, missing tests, and cheaper alternatives. Be concise.",
33 forward_transcript: false,
34 max_completion_tokens: 220,
35 temperature: 0,
36 },
37});
38
39const formatTaskPacket = (task) =>
40 [
41 `Title: ${task.title}`,
42 `Question: ${task.userQuestion}`,
43 `Changed files: ${task.changedFiles.join(", ")}`,
44 `Uncertainty signals: ${task.uncertaintySignals.join(", ")}`,
45 `Diff summary: ${task.diffSummary}`,
46 ].join("\n");
47
48const buildReviewRequest = ({ task, executorModel, advisorModel }) => {
49 const isAdvisorEnabled = shouldEnableAdvisor(task);
50
51 return {
52 requestBody: {
53 model: executorModel,
54 messages: [
55 {
56 role: "system",
57 content:
58 "You are a token-efficient implementation-plan reviewer. Use the cheap executor model for routine reasoning. If the plan-reviewer tool is available, call it at most once when a compact second-model check can change the answer. Send the advisor a compact prompt only. Do not paste full diffs, logs, secrets, or chat transcripts into the advisor prompt.",
59 },
60 {
61 role: "user",
62 content: formatTaskPacket(task),
63 },
64 ],
65 ...(isAdvisorEnabled
66 ? {
67 tools: [createAdvisorTool(advisorModel)],
68 tool_choice: "auto",
69 }
70 : {}),
71 max_tokens: 500,
72 temperature: 0.2,
73 },
74 telemetryContext: {
75 executor_model: executorModel,
76 advisor_model: isAdvisorEnabled ? advisorModel : null,
77 did_enable_advisor: isAdvisorEnabled,
78 },
79 };
80};
81
82const { requestBody, telemetryContext } = buildReviewRequest({
83 task: reviewTask,
84 executorModel: "openai/gpt-4o-mini",
85 advisorModel: "~anthropic/claude-opus-latest",
86});

Send requestBody through the request path your app already uses. For routine tasks, tools is omitted and the request stays on the cheap executor model. For uncertain tasks, the request offers one named Advisor tool:

1{
2 "type": "openrouter:advisor",
3 "parameters": {
4 "name": "plan-reviewer",
5 "model": "~anthropic/claude-opus-latest",
6 "instructions": "You are a senior engineering reviewer. Review only the compact task packet. Identify hidden assumptions, missing rollback steps, missing tests, and cheaper alternatives. Be concise.",
7 "forward_transcript": false,
8 "max_completion_tokens": 220,
9 "temperature": 0
10 }
11}

The executor chooses whether to call plan-reviewer. It passes only prompt in the tool-call arguments because the advisor model is pinned in parameters.model.

3. Make the Advisor prompt earn its tokens

The cost control comes from 3 choices:

  • The app decides whether a second model is worth the spend.
  • The executor stays cheap.
  • The Advisor sees a compact prompt, not the whole transcript.

In this workflow, forward_transcript: false is deliberate. The Advisor receives the executor’s compact prompt argument, plus its own instructions. If you set forward_transcript: true, the Advisor can see the full parent conversation, which is useful for some agents but often defeats the token-saving goal.

Use the system prompt to tell the executor what belongs in the Advisor call:

When calling plan-reviewer, include:
- the decision you want reviewed
- the changed files or affected modules
- the uncertainty signals
- the shortest useful plan summary
Do not include:
- full diffs
- secrets
- logs
- the entire conversation
- unrelated implementation details

That keeps the expensive model focused on the part where it changes the outcome.

4. Add specialist reviewers only when uncertainty splits

If the executor can identify different kinds of uncertainty, give it separate Advisor entries. Each entry is its own tool. Do not use a nested parameters.advisors roster.

1const tools = [
2 {
3 type: "openrouter:advisor",
4 parameters: {
5 name: "schema-reviewer",
6 model: process.env.SCHEMA_ADVISOR_MODEL ?? "~anthropic/claude-opus-latest",
7 instructions:
8 "Review the compact task packet for data-model assumptions, migration order, rollback gaps, and tests. Return the most useful correction.",
9 forward_transcript: false,
10 max_completion_tokens: 180,
11 },
12 },
13 {
14 type: "openrouter:advisor",
15 parameters: {
16 name: "cost-reviewer",
17 model: process.env.COST_ADVISOR_MODEL ?? "openai/gpt-4o-mini",
18 instructions:
19 "Review the compact task packet for token, latency, and infrastructure cost. Suggest a cheaper path if it preserves the requested behavior.",
20 forward_transcript: false,
21 max_completion_tokens: 180,
22 },
23 },
24];

The executor chooses the matching tool without passing a name argument. At most one Advisor entry can omit name; that unnamed entry becomes the default Advisor.

If you replay the conversation across requests, keep Advisor entries in a stable order. Advisor identity comes from each entry’s index in the tools array, so reordering or inserting entries can make schema-reviewer and cost-reviewer reconstruct each other’s memory.

5. Log cost and routing, not private content

Add telemetry where your app already records model calls. This recipe’s sample logs only model names, whether Advisor was enabled, finish reason, and usage keys.

Log:

  • executor_model
  • advisor_model or Advisor entry name
  • did_enable_advisor
  • finish_reason
  • usage.prompt_tokens, usage.completion_tokens, usage.total_tokens, and usage.cost when returned
  • route or feature name, such as budgeted_plan_review

Do not log:

  • API keys
  • cookies
  • full prompts
  • full advisor advice
  • raw diffs
  • user content unless your product already has an explicit retention policy

If your app uses the Responses API and your UI benefits from showing the second opinion as it arrives, set stream: true on that Advisor entry. It streams advice deltas for Responses clients, then still returns the completed advice item. Chat Completions ignores stream, and Messages streaming is planned.

Check your work

Validate the workflow through the path you’re adding.

Confirm:

  • Routine tasks send no Advisor tool, or leave Advisor disabled by your budget gate.
  • Advisor-worthy tasks include one openrouter:advisor entry named plan-reviewer.
  • The Advisor entry uses flat parameters, not parameters.advisors.
  • The executor model is the cheap model from your config.
  • The Advisor model is separately configurable.
  • The Advisor prompt contains a compact uncertainty packet, not full diffs or the full conversation.
  • Logs include model names, Advisor enablement, finish reason, and usage fields when present.
  • Logs do not include OPENROUTER_API_KEY, raw prompts, cookies, full diffs, or full generated content.

After you wire the pattern into your app, log routing telemetry like this. Treat usage_keys as provider-dependent. Assert the routing fields, then check that usage includes the billing fields your app depends on. finish_reason is typically stop after the server-side tool call resolves, but don’t treat the literal value as a fixed contract.

1{
2 "telemetry": {
3 "executor_model": "openai/gpt-4o-mini",
4 "advisor_model": "~anthropic/claude-opus-latest",
5 "did_enable_advisor": true,
6 "finish_reason": "stop",
7 "usage_keys": [
8 "prompt_tokens",
9 "completion_tokens",
10 "total_tokens",
11 "cost"
12 ]
13 }
14}

Token counts, cost, answer text, and provider-specific usage detail keys vary by model and prompt. Treat the routing fields and redaction boundary as the contract.

Next steps

  • Read the Advisor reference for exact parameters, multiple-advisor rules, memory, streaming, and API-surface details.
  • Add Web Search as an Advisor sub-tool when the reviewer needs current sources.
  • Use Response Caching for repeated stable prefixes in the executor prompt.
  • Add Human-in-the-Loop controls when a second opinion should pause for a person instead of another model.