Skip to content
  • Status
  • Announcements
  • Docs
  • Support
  • About
  • Partners
  • Enterprise
  • Careers
  • Pricing
  • Privacy
  • Terms
  •  
  • © 2025 OpenRouter, Inc
    Posts12/18/2025 by Alex Atallah

    Response Healing: Reduce JSON Defects by 80%+

    Response Healing: Reduce JSON Defects by 80%+

    We expect our APIs to have 99.999% uptime. We'd never tolerate a payment processor that failed 2% of the time. So why do we accept LLMs that routinely break JSON syntax in structured output requests?

    Today we're launching Response Healing: a new feature on OpenRouter that automatically fixes malformed JSON responses from LLMs before they reach your application.

    Two standout improvements from a week of data:

    • Gemini 2.0 Flash, our most popular model for structured output with over 1.6 million requests in the past week, saw its defect rate decline by 80%.
    • Qwen3 235B, one of the most capable open-weight models available, saw its defect rate decline by 99.8%.

    The Math That Should Keep You Up at Night

    Here's something most developers overlook: if an LLM has a 2% JSON defect rate, and Response Healing drops that to 1%, you haven't just made a 1% improvement. You've cut your defects, bugs, and support tickets in half.

    At OpenRouter's scale, we see this compounding effect across billions of tokens daily. A "small" improvement in structured output reliability translates to dramatically fewer 3am pages, fewer angry users, and fewer hours debugging why your agent suddenly stopped working.

    This is why we obsess over this problem more than any other gateway. Reliability at the margins is where real production systems succeed or fail.

    What We're Fixing

    LLMs make surprisingly creative mistakes when generating JSON. Common issues include trailing commas after the last element, unescaped control characters in strings, missing closing brackets, and various syntax errors that break parsers.

    Here’s the data you requested: {…}

    That’s not something that should ever take you down.

    For a detailed breakdown of the failure modes we handle, check out our Response Healing documentation:

    example healings

    The Benchmarks

    We analyzed millions of structured output generations across our platform. We did this on the fly, at inference time, without logging any completions or storing results.

    Here are the results for the highest-volume models:

    ModelRequestsSuccess BeforeSuccess AfterDefects Resolved
    Gemini 2.0 Flash1.62M99.61%99.92%80.0%
    Gemini 2.5 Flash772k98.97%99.65%66.3%
    Gemini 2.5 Flash Lite703k99.64%99.89%68.7%
    GPT-4o Mini494k99.98%100.00%80.7%
    Grok 4 Fast488k92.89%94.87%27.8%
    Grok 4.1 Fast284k98.70%99.17%36.4%
    Gemini 2.0 Flash Lite282k99.94%100.00%98.9%
    Deepseek Chat v3.1196k82.54%97.39%85.0%
    GPT-4.1155k98.22%98.40%10.4%
    Qwen3 235B113k88.02%99.98%99.8%
    GPT-oss-120b112k99.53%99.82%62.2%
    Devstral 2512104k96.59%99.99%99.6%
    Gemini 2.5 Flash Lite Preview93k99.14%99.86%83.7%
    Llama 3.1 8B Instruct79k99.68%99.91%72.4%
    GPT-oss-20b58k99.01%99.36%34.8%
    Mistral Small 3.2 24B57k98.82%99.99%99.3%
    GPT-5 Nano52k99.96%99.96%8.7%
    Ministral 3B52k99.99%100.00%100.0%

    Some highlights worth noting since we soft-rolled this out a week ago:

    • mistralai/devstral-2512: customers who turned on the plugin had valid json rate taken from 97% to 99.99%, which is a 99.7% defect reduction
    • google/gemini-2.5-flash: success rate increased from 97.5% to 99.88%. 95.2% reduction
    • meta-llama/llama-3.1-8b-instruct: success rate increased from 99.9% to 100%. 100% reduction
    • Qwen3-235B: 87.97% valid to 99.98% valid, a 99.85% reduction
    • Deepseek Chat V3.1: 83.16% valid to 97.46% valid, a 84.89% reduction
    • Several models—Ministral 3B, Devstral 2512, Mistral Small 3.2—achieve near-perfect healing rates above 99%
    • Even models that already perform well see meaningful gains: Gemini 2.0 Flash Lite went from 99.94% to 100% validity

    How to Enable It

    Response Healing is opt-in. You can configure it through the new Plugins section in your settings:

    openrouter.ai/settings/plugins

    plugins screenshot

    Toggle it on, and every structured output request will automatically pass through our healing layer before returning to your application.

    Cost

    The plugin is free to use. In terms of latency, we ran an analysis of added CPU time across all production data:

    CategoryMean TimeOps/Second
    Schema-less Repair0.018ms54,700
    Unified API0.019ms51,500
    Type Coercion0.041ms32,600
    Basic Parsing0.133ms16,900
    Large Payloads (10KB)2.3ms437

    In reality, factors outside the plugin are going to dominate any real world latency. So we can say that, for typical responses, healing adds less than 1ms of latency, negligible compared to LLM inference time.

    What This Doesn't Fix

    To be clear about scope: Response Healing fixes JSON syntax errors, not schema adherence. If a model returns valid JSON that doesn't match your expected schema (wrong field names, missing required properties, wrong types), healing won't catch that.

    It also only works for non-streaming requests today. Contact us with your use case if you have a need for fixing streaming requests as well.

    That said, you should still see a meaningful drop in your overall error rate. Syntax errors are one of the most common failure modes, and eliminating them lets you focus your error handling on the semantic issues that actually require application logic to resolve.

    What about tool calling and schema adherence? Tool calling has very few structural JSON issues, but schema adherence has many defects across most models. We’ll evaluate schema adherence soon.

    What about XML? The plugin can heal XML output as well - contact us if you’d like access.

    Ship with Confidence

    We built OpenRouter to be the infrastructure layer you don't have to think about. Response Healing is another step toward that goal: structured outputs that just work, every time.

    Enable it today at openrouter.ai/settings/plugins, and let us know what you're building.