Sensitive Info Guardrail

The Sensitive Info Guardrail lets you automatically detect and handle sensitive information — such as email addresses, phone numbers, credit card numbers, and names — before requests reach the model provider. You can choose to redact (replace with a placeholder) or block (reject the request entirely) when sensitive data is detected. This feature is part of Guardrails and can be configured alongside budget limits, model restrictions, and other guardrail settings.

How It Works

When a Sensitive Info Guardrail is active, every API request is scanned before it is forwarded to the model provider:

Detection — The request content is checked against your configured patterns and presets.
Action — If a match is found, the configured action is applied:
- Redact: The matched text is replaced with a labeled placeholder (e.g., [EMAIL], [PHONE], [REDACTED]) and the modified request is forwarded to the provider.
- Block: The entire request is rejected with an HTTP 403 Forbidden error.
Forwarding — If no sensitive info is detected (or all matches were redacted), the request proceeds to the model provider as normal.

Sensitive info detection runs on the input (prompt) side of requests. It scans message content, tool call arguments, and prompt strings. It does not scan model responses.

Detection Methods

OpenRouter uses two complementary detection methods:

Regex-Based Detection

Most built-in presets and all custom patterns use regular expression matching. This is fast, deterministic, and adds negligible latency to requests. Regex-based presets include:

Email addresses
Phone numbers
Social Security numbers (SSNs)
Credit card numbers
IP addresses

NLP-Based Detection

Some types of sensitive information — like person names and physical addresses — cannot be reliably detected with simple patterns. For these, OpenRouter uses NLP-powered entity recognition (via Presidio), which analyzes text contextually. NLP-based presets include:

Person names (beta)
Physical addresses / locations (beta)

The “Person Name” and “Address” presets are currently in beta. Detection accuracy may vary — especially for uncommon name formats and partial or non-standard addresses. If the check times out, the request proceeds (not blocked). We’re actively improving these models.NLP-based detection also adds latency proportional to the size of the input text. These presets are marked with an Adds latency label in the dashboard.

Built-In Presets

The following presets are available out of the box. Each can be individually enabled and configured with either the Redact or Block action.

Preset	Detection Method	Redaction Label	Example Matches
Email address	Regex	`[EMAIL]`	`user@example.com`, `name+tag@domain.co`
Phone number	Regex	`[PHONE]`	`914-309-4996`, `914.309.4996`, `9143094996`
Social Security number	Regex	`[SSN]`	`123-45-6789`
Credit card number	Regex	`[CREDIT_CARD]`	`4265 5256 0839 8752`, `4265-5256-0839-8752`
IP address	Regex	`[IP_ADDRESS]`	`192.168.0.1`, `10.0.0.1`
Person name (beta)	NLP	`[PERSON_NAME]`	`John Smith`, `Dr. Sarah Johnson`, `Maria Garcia-Lopez`
Address (beta)	NLP	`[ADDRESS]`	`123 Main Street, Springfield`, `London, United Kingdom`

NLP Preset Limitations

NLP-based detection is contextual and probabilistic. Keep the following in mind: Person Name:

May not catch names without surrounding context
Uncommon or non-Western names may be missed
Single-word names (e.g., “Cher”) are harder to detect

Address:

Partial addresses without city/state may be missed
Ambiguous location names (e.g., “Paris” as a name vs. a city) depend on context
Non-standard or abbreviated formats may not be detected

Custom Patterns

In addition to built-in presets, you can define your own custom regex patterns to detect domain-specific sensitive information. Each custom pattern requires:

Pattern — A valid regular expression
Action — Either redact or block

When a custom pattern matches with the Redact action, the matched text is replaced with [REDACTED]. When set to Block, the entire request is rejected.

Example Custom Patterns

Use Case	Pattern	Action
Internal project codes	`PROJ-\d{4,6}`	Redact
AWS access keys	`AKIA[0-9A-Z]{16}`	Block
Internal URLs	`https?://internal\.company\.com\S*`	Redact

Pattern Safety

Patterns are validated for:

Syntax — Must be a valid JavaScript regular expression.
Safety — Must not be vulnerable to catastrophic backtracking (ReDoS). Patterns with nested quantifiers like (a+)+ or (a|a)* are rejected.

Invalid or unsafe patterns are rejected at creation time with a descriptive error message.

Configuring Sensitive Info Guardrails

Via the Dashboard

Navigate to your workspace’s Privacy & Guardrails page, or go to Settings > Privacy.
Create a new guardrail or edit an existing one.
Expand the Sensitive Info section.
Enable the desired built-in presets and/or add custom patterns.
For each preset or pattern, choose the action: Redact or Block.
Save the guardrail.

You can use the Enable all / Disable all buttons to quickly toggle all built-in presets.

Via the API

Sensitive info filters are configured as part of the guardrail object using the content_filter_builtins and content_filters fields. Built-in presets use the content_filter_builtins field:

{
  "name": "PII Protection",
  "content_filter_builtins": [
    { "slug": "email", "action": "redact" },
    { "slug": "phone", "action": "redact" },
    { "slug": "ssn", "action": "block" },
    { "slug": "credit-card", "action": "block" },
    { "slug": "ip-address", "action": "redact" },
    { "slug": "person-name", "action": "redact" },
    { "slug": "address", "action": "redact" }
  ]
}

Available slugs: email, phone, ssn, credit-card, ip-address, person-name, address. Custom patterns use the content_filters field:

{
  "name": "Custom Filters",
  "content_filters": [
    { "pattern": "AKIA[0-9A-Z]{16}", "action": "block", "label": "AWS Key" },
    { "pattern": "PROJ-\\d{4,6}", "action": "redact" }
  ]
}

Each custom filter supports an optional label field for descriptive error messages when blocking. See the Guardrails API reference for full endpoint documentation.

How Sensitive Info Interacts with Other Guardrails

Sensitive info filters follow the same guardrail hierarchy as other guardrail settings. When multiple guardrails apply to a request:

Content filters are unioned — If a member guardrail has an email filter and an API key guardrail has a phone filter, both filters apply.
Block wins over redact — If the same entity type appears in multiple guardrails with different actions, the stricter action (block) takes precedence.
Custom and built-in filters combine — Filters from all applicable guardrails (default, member, and API key level) are merged together.

Error Responses

When a request is blocked by a content filter, the API returns:

{
  "error": {
    "code": 403,
    "message": "Request blocked by content filter: [LABEL]"
  }
}

The [LABEL] in the error message depends on what triggered the block:

For built-in presets: the preset label (e.g., Email address, Social Security number)
For custom patterns with a label field: the custom label
For custom patterns without a label: [BLOCKED]
For NLP-detected entities: the entity type (e.g., Blocked PII detected: PERSON)

Reporting False Positives

If a detection incorrectly flags legitimate content, you can mark it as a false positive from the Logs page. Generations with a guardrail event show a shield icon on the row; hover it to open the guardrail popover. When a single entity type was detected, click Mark as false positive directly in the popover:

Guardrail popover with a Mark as false positive button for a single detected sensitive info entity type

When multiple entity types were detected, the popover instead links to the generation detail view, where you can select the specific entity types to report:

Guardrail popover linking to Review entity types in detail for a multi-entity sensitive info detection

In the detail view, check the entity types that were flagged incorrectly under Mark as false positive, then click Submit:

Generation detail view with per-entity Mark as false positive checkboxes for a sensitive info detection

The event is visually marked and your feedback is recorded for future detection improvements.

Marking a detection as false positive does not retroactively unblock the request. If the action was block, the original request was already rejected.

Best Practices

Start with Redact — Use Redact as the default action when getting started. This lets requests proceed while protecting sensitive data, giving you time to evaluate detection accuracy before switching to Block.
Use built-in presets for common PII — The built-in presets are tuned for common formats and are the easiest way to get started. Add custom patterns for domain-specific data.
Be aware of NLP latency — The Person Name and Address presets use NLP-based detection, which adds latency proportional to input size. If latency is critical, consider using only regex-based presets.
Test before deploying — Use the Test Preview in the guardrail editor to verify your filters work as expected before saving and assigning the guardrail. If a detection misfires, you can report false positives from the Logs page.
Combine with other guardrail settings — Sensitive info filters work alongside budget limits, model allowlists, provider restrictions, and ZDR enforcement. Use them together for comprehensive governance.
Use labels on custom block patterns — Adding a label to custom patterns that use the Block action provides clearer error messages to API consumers, making it easier to understand why a request was rejected.

Overview

Models & Routing

Features

Privacy

Best Practices

Community

How It Works

Detection Methods

Regex-Based Detection

NLP-Based Detection

Built-In Presets

NLP Preset Limitations

Custom Patterns

Example Custom Patterns

Pattern Safety

Configuring Sensitive Info Guardrails

Via the Dashboard

Via the API

How Sensitive Info Interacts with Other Guardrails

Error Responses

Reporting False Positives

Best Practices

​How It Works

​Detection Methods

​Regex-Based Detection

​NLP-Based Detection

​Built-In Presets

​NLP Preset Limitations

​Custom Patterns

​Example Custom Patterns

​Pattern Safety

​Configuring Sensitive Info Guardrails

​Via the Dashboard

​Via the API

​How Sensitive Info Interacts with Other Guardrails

​Error Responses

​Reporting False Positives

​Best Practices

How It Works

Detection Methods

Regex-Based Detection

NLP-Based Detection

Built-In Presets

NLP Preset Limitations

Custom Patterns

Example Custom Patterns

Pattern Safety

Configuring Sensitive Info Guardrails

Via the Dashboard

Via the API

How Sensitive Info Interacts with Other Guardrails

Error Responses

Reporting False Positives

Best Practices