Integrate your inference API with the OpenRouter network.
OpenRouter routes requests across 70+ providers to serve 10M+ developers. We review every application to maintain API reliability and performance standards across the network.
We currently have a large backlog of provider applications and are prioritizing providers with proprietary models.
Your endpoints are accessible to 10M+ developers through a single OpenAI-compatible API — no additional integration work on their side.
Requests are routed based on latency, throughput, uptime, and price. Providers that perform well receive proportionally more traffic.
TTFT, throughput, and uptime are tracked publicly on every model page. These metrics are transparent to developers choosing providers.
Usage-based billing handled via monthly invoicing. Token counts are reconciled automatically.
Endpoint reliability is continuously monitored. Providers maintaining 95%+ uptime retain standard routing priority.
Declare your datacenter locations in the /models endpoint. Routing respects geographic preferences and data residency requirements.
All providers must meet these requirements before being considered for the network. Applications that don't meet these criteria will not be reviewed.
View full technical requirements →Your /chat/completions endpoint must be OpenAI-compatible, return usage tokens for both stream and non-stream requests, and support streaming.
Publish a /models endpoint returning your available models with pricing, context length, max output tokens, supported features, and datacenter locations.
Support monthly invoicing so OpenRouter can pay for inference without manual intervention.
Have a published privacy policy and clear data retention terms. Providers must disclose whether prompts are logged and if data is used for training.
Provide details about your infrastructure, API endpoints, supported models, and data policies.
Our team evaluates your API compatibility, endpoint reliability, pricing, and performance against network standards.
Accepted providers are onboarded with test traffic to validate latency, throughput, and error handling.
Your models become available on the network and begin receiving production requests routed by performance and price.
Can't find what you need?
Reach out to [email protected]