EU AI Act Compliance: Human Oversight for AI Agents
Kenny Rogers ·

Three regulations converge on the same obligation: a human must be able to oversee, intervene in, and override AI-driven decisions that affect people. The Agent SDK(opens in new tab) has the primitives to implement this today.
| Regulation | Effective | Core requirement |
|---|---|---|
| EU AI Act, Article 14(opens in new tab) | Aug 2026 (high-risk obligations) | Human oversight with ability to intervene and override. Audit trail of oversight actions. |
| Colorado AI Act (SB 205)(opens in new tab) | Feb 2026 | Human review before consequential decisions. Disclosure of AI involvement. |
| NIST AI RMF (GOVERN 1)(opens in new tab) | Voluntary, referenced by US regulators | Human oversight proportional to risk. Documentation of oversight controls. |
The common thread: if your agent makes or influences decisions that materially affect people (credit, employment, healthcare, safety), you need a reviewable gate between the model's recommendation and the action's execution.
Below are 5 patterns that satisfy those requirements using @openrouter/agent, building on the HITL tools cookbook(opens in new tab) (which covers the SDK mechanics). Here we cover the compliance patterns you bolt on top.
Note: This post provides engineering patterns, not legal advice. Consult legal counsel to determine which regulations apply to your specific use case and jurisdiction.
Give this to your agent
Want your coding agent to implement this? Copy the prompt below:
1. Classify your tools by risk tier
Regulations require human review on actions that are consequential. Start by splitting your tools into tiers:
| Tier | Example actions | Control |
|---|---|---|
| High-risk | Financial transactions, PII processing, access decisions, medical recommendations | HITL tool with mandatory pause (return null) |
| Medium-risk | Bulk emails, content moderation, data exports | requireApproval with conditional predicate |
| Low-risk | Search, read-only queries, formatting | No gate needed |
For medium-risk tools, use a conditional predicate that gates on context:
2. Add audit logging to every oversight event
Regulations require you to prove that human oversight happened. That means logging who reviewed what, when, and what they decided. Wire this into onResponseReceived:
The writeAuditLog function should write to append-only storage. A minimal interface:
EU AI Act Article 12(opens in new tab) (Record-Keeping) requires that high-risk systems maintain logs for their operational lifetime. Store audit logs in durable, append-only storage with retention policies that match your regulatory requirements.
3. Implement timeout-based escalation
A human review gate that nobody responds to is worse than no gate at all. Regulations expect the system to handle unresponsive reviewers. Implement a timeout that either escalates to a supervisor or rejects the action by default.
This pattern runs outside the callModel loop, in whatever service polls for stale pending reviews:
Which option to pick depends on your risk appetite. For EU AI Act compliance with high-risk systems, default-deny (Option B) is safer: the action never executes without explicit human approval. For lower-risk systems where delays have operational cost, escalation to a supervisor (Option A) keeps things moving while preserving the oversight chain.
4. Back your StateAccessor with durable storage
In-memory state disappears on process restart. For compliance, your StateAccessor must use durable storage so that pending reviews, conversation history, and audit context survive crashes, deploys, and horizontal scaling.
Every time state transitions to 'awaiting_hitl' or 'awaiting_approval', the pending review is persisted. Your escalation service (step 3) queries this table to find stale reviews.
5. Wire it all together
Here's the complete flow: classify, gate, log, timeout, resume. This assumes processCreditDecision and sendBulkEmail from steps 1-2, writeAuditLog from step 2, and createDurableStateAccessor from step 4.
When the reviewer responds (through your admin UI, Slack action, queue consumer, etc.):
The onResponseReceived hook fires, stamps the audit record, and the model receives the validated decision.
Start building today
Colorado SB 205 took effect February 2026 and is already enforceable. EU AI Act high-risk obligations land August 2026. One implementation (risk classification, audit logging, timeout escalation, durable state) satisfies all three frameworks.
The Agent SDK handles pausing execution, persisting state across restarts, validating human responses against schemas, and resuming cleanly. Your job is to wire it into your review workflows and audit storage.
For related governance controls (budget caps, data retention policies, model restrictions), see Guardrails(opens in new tab).
Full SDK reference and working examples: HITL tools documentation(opens in new tab).
FAQ
What does EU AI Act Article 14 require?
Article 14 mandates that high-risk AI systems include human oversight measures. Humans must be able to understand the system's capabilities, monitor its operation, interpret outputs, and intervene or override decisions. Audit log retention requirements fall under Article 12 (Record-Keeping) and Article 9 (Risk Management).
When does the EU AI Act take effect?
The AI Act entered into force August 2024, but the high-risk obligations (including Article 14 human oversight) apply starting August 2026. That's the deadline for systems classified as high-risk to demonstrate compliant oversight controls.
Is the Colorado AI Act enforceable yet?
Yes. Colorado SB 205 took effect February 2026. Deployers of high-risk AI systems must provide human review before consequential decisions in employment, finance, housing, insurance, and education.
What is human-in-the-loop (HITL) for AI agents?
HITL means a human reviews and approves (or rejects) an AI agent's proposed action before it executes. In the Agent SDK, this is implemented through onToolCalled (which pauses execution and waits for human input) and requireApproval (which conditionally gates tool execution based on parameters).