Sensitive Info Guardrail
The Sensitive Info Guardrail lets you automatically detect and handle sensitive information — such as email addresses, phone numbers, credit card numbers, and names — before requests reach the model provider. You can choose to redact (replace with a placeholder) or block (reject the request entirely) when sensitive data is detected.
This feature is part of Guardrails and can be configured alongside budget limits, model restrictions, and other guardrail settings.
How It Works
When a Sensitive Info Guardrail is active, every API request is scanned before it is forwarded to the model provider:
- Detection — The request content is checked against your configured patterns and presets.
- Action — If a match is found, the configured action is applied:
- Redact: The matched text is replaced with a labeled placeholder (e.g.,
[EMAIL],[PHONE],[REDACTED]) and the modified request is forwarded to the provider. - Block: The entire request is rejected with an HTTP
403 Forbiddenerror.
- Redact: The matched text is replaced with a labeled placeholder (e.g.,
- Forwarding — If no sensitive info is detected (or all matches were redacted), the request proceeds to the model provider as normal.
Sensitive info detection runs on the input (prompt) side of requests. It scans message content, tool call arguments, and prompt strings. It does not scan model responses.
Detection Methods
OpenRouter uses two complementary detection methods:
Regex-Based Detection
Most built-in presets and all custom patterns use regular expression matching. This is fast, deterministic, and adds negligible latency to requests.
Regex-based presets include:
- Email addresses
- Phone numbers
- Social Security numbers (SSNs)
- Credit card numbers
- IP addresses
NLP-Based Detection
Some types of sensitive information — like person names and physical addresses — cannot be reliably detected with simple patterns. For these, OpenRouter uses NLP-powered entity recognition (via Presidio), which analyzes text contextually.
NLP-based presets include:
- Person names
- Physical addresses / locations
NLP-based detection adds latency to requests proportional to the size of the input text. The “Person Name” and “Address” presets are marked with an Adds latency label in the dashboard to indicate this.
Built-In Presets
The following presets are available out of the box. Each can be individually enabled and configured with either the Redact or Block action.
NLP Preset Limitations
NLP-based detection is contextual and probabilistic. Keep the following in mind:
Person Name:
- May not catch names without surrounding context
- Uncommon or non-Western names may be missed
- Single-word names (e.g., “Cher”) are harder to detect
Address:
- Partial addresses without city/state may be missed
- Ambiguous location names (e.g., “Paris” as a name vs. a city) depend on context
- Non-standard or abbreviated formats may not be detected
Custom Patterns
In addition to built-in presets, you can define your own custom regex patterns to detect domain-specific sensitive information. Each custom pattern requires:
- Pattern — A valid regular expression
- Action — Either
redactorblock
When a custom pattern matches with the Redact action, the matched text is replaced with [REDACTED]. When set to Block, the entire request is rejected.
Example Custom Patterns
Pattern Safety
Patterns are validated for:
- Syntax — Must be a valid JavaScript regular expression.
- Safety — Must not be vulnerable to catastrophic backtracking (ReDoS). Patterns with nested quantifiers like
(a+)+or(a|a)*are rejected.
Invalid or unsafe patterns are rejected at creation time with a descriptive error message.
Configuring Sensitive Info Guardrails
Via the Dashboard
- Navigate to your workspace’s Privacy & Guardrails page, or go to Settings > Privacy.
- Create a new guardrail or edit an existing one.
- Expand the Sensitive Info section.
- Enable the desired built-in presets and/or add custom patterns.
- For each preset or pattern, choose the action: Redact or Block.
- Save the guardrail.
You can use the Enable all / Disable all buttons to quickly toggle all built-in presets.
Via the API
Sensitive info filters are configured as part of the guardrail object using the content_filter_builtins and content_filters fields.
Built-in presets use the content_filter_builtins field:
Available slugs: email, phone, ssn, credit-card, ip-address, person-name, address.
Custom patterns use the content_filters field:
Each custom filter supports an optional label field for descriptive error messages when blocking.
See the Guardrails API reference for full endpoint documentation.
How Sensitive Info Interacts with Other Guardrails
Sensitive info filters follow the same guardrail hierarchy as other guardrail settings. When multiple guardrails apply to a request:
- Content filters are unioned — If a member guardrail has an email filter and an API key guardrail has a phone filter, both filters apply.
- Block wins over redact — If the same entity type appears in multiple guardrails with different actions, the stricter action (block) takes precedence.
- Custom and built-in filters combine — Filters from all applicable guardrails (default, member, and API key level) are merged together.
Error Responses
When a request is blocked by a content filter, the API returns:
The [LABEL] in the error message depends on what triggered the block:
- For built-in presets: the preset label (e.g.,
Email address,Social Security number) - For custom patterns with a
labelfield: the custom label - For custom patterns without a label:
[BLOCKED] - For NLP-detected entities: the entity type (e.g.,
Blocked PII detected: PERSON)
Best Practices
-
Start with Redact — Use Redact as the default action when getting started. This lets requests proceed while protecting sensitive data, giving you time to evaluate detection accuracy before switching to Block.
-
Use built-in presets for common PII — The built-in presets are tuned for common formats and are the easiest way to get started. Add custom patterns for domain-specific data.
-
Be aware of NLP latency — The Person Name and Address presets use NLP-based detection, which adds latency proportional to input size. If latency is critical, consider using only regex-based presets.
-
Test before deploying — Use the Test Preview in the guardrail editor to verify your filters work as expected before saving and assigning the guardrail.
-
Combine with other guardrail settings — Sensitive info filters work alongside budget limits, model allowlists, provider restrictions, and ZDR enforcement. Use them together for comprehensive governance.
-
Use labels on custom block patterns — Adding a
labelto custom patterns that use the Block action provides clearer error messages to API consumers, making it easier to understand why a request was rejected.