Skip to content
No models found
OpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Apps
  • Models
  • Providers
  • Pricing
  • Enterprise
  • Labs

Company

  • About
  • Announcements
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR
  • Data

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube
Collections/Image Models

Image Generation Models

Model rankings updated June 2026 based on real usage data.

OpenRouter provides access to leading image generation models through a single, unified API gateway. Compare pricing, capabilities, and performance across multiple image generation APIs to find the best fit for models that transform text descriptions into high-quality images.

Image Generation Models on OpenRouter

Favicon for google

Google: Nano Banana (Gemini 2.5 Flash Image)

8.55B tokens

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the image_config API Parameter

by google33K context$0.30/M input tokens$2.50/M output tokens$30/M tokens$1/M audio tokens
Favicon for google

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

5.65B tokens

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the image_config API Parameter

by google131K context$0.50/M input tokens$3/M output tokens$60/M tokens
Favicon for openai

OpenAI: GPT-5.4 Image 2

4.22B tokens

GPT-5.4 Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2. It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and visual generation within the same interaction.

by openai272K context$8/M input tokens$15/M output tokens$30/M tokens
Favicon for bytedance-seed

ByteDance Seed: Seedream 4.5

2.71B tokens

Seedream 4.5 is the latest in-house image generation model developed by ByteDance. Compared with Seedream 4.0, it delivers comprehensive improvements, especially in editing consistency, including better preservation of subject details, lighting, and color tone. It also enhances portrait refinement and small-text rendering. The model’s multi-image composition capabilities have been significantly strengthened, and both reasoning performance and visual aesthetics continue to advance, enabling more accurate and artistically expressive image generation.

Pricing is $0.04 per output image, regardless of size.

by bytedance-seed4K context$0/M input tokens$0/M output tokens$9.581/M tokens
Favicon for google

Google: Nano Banana Pro (Gemini 3 Pro Image Preview)

2.59B tokens

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model generates context-rich graphics, from infographics and diagrams to cinematic composites, and can incorporate real-time information via Search grounding.

It offers industry-leading text rendering in images (including long passages and multilingual layouts), consistent multi-image blending, and accurate identity preservation across up to five subjects. Nano Banana Pro adds fine-grained creative controls such as localized edits, lighting and focus adjustments, camera transformations, and support for 2K/4K outputs and flexible aspect ratios. It is designed for professional-grade design, product visualization, storyboarding, and complex multi-element compositions while remaining efficient for general image creation workflows.

by google66K context$2/M input tokens$12/M output tokens$120/M tokens$2/M audio tokens
Favicon for x-ai

xAI: Grok Imagine Image Quality

859M tokens

Grok Imagine Image Quality is xAI's fast, high-fidelity image generation and editing model. It accepts text prompts and optional reference images, producing photorealistic outputs at 1K or 2K across a range of aspect ratios, including flexible adjustment of reference images.

The model emphasizes realistic detail — natural lighting and physics, accurate textures, and consistent rendering of named entities such as brands, public figures, and specific locations. It supports clean multilingual text rendering inside images, making it the top choice for posters, packaging, ads, menus, and social graphics. When given reference images, it preserves identity and structure for product placement, brand-aligned variations, and character continuity across scenes.

by x-ai66K context$0/M input tokens$0/M output tokens$11.98/M tokens
Favicon for black-forest-labs

Black Forest Labs: FLUX.2 Pro

740M tokens

A high-end image generation and editing model focused on frontier-level visual quality and reliability. It delivers strong prompt adherence, stable lighting, sharp textures, and consistent character/style reproduction across multi-reference inputs. Designed for production workloads, it balances speed and quality while supporting text-to-image and image editing up to 4 MP resolution.

Pricing is as follows, per the docs: Input: We charge $0.015 for each megapixel on the input (i.e. reference images for editing) Output: The first megapixel is charged $0.03 and then each subsequent MP will be charged $0.015.

by black-forest-labs47K context$0/M input tokens$0/M output tokens$7.324/M tokens
Favicon for black-forest-labs

Black Forest Labs: FLUX.2 Klein 4B

398M tokens

FLUX.2 [klein] 4B is the fastest and most cost-effective model in the FLUX.2 family, optimized for high-throughput use cases while maintaining excellent image quality.

Pricing is based on the output image. The first generated megapixel is charged $0.014. Each subsequent megapixel is charged $0.001.

by black-forest-labs41K context$0/M input tokens$0/M output tokens$3.418/M tokens
Favicon for openai

OpenAI: GPT-5 Image Mini

261M tokens

GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by GPT-5 Mini, with GPT Image 1 Mini for efficient image generation. This natively multimodal model features superior instruction following, text rendering, and detailed image editing with reduced latency and cost. It excels at high-quality visual creation while maintaining strong text understanding, making it ideal for applications that require both efficient image generation and text processing at scale.

by openai400K context$2.50/M input tokens$2/M output tokens$8/M tokens
Favicon for black-forest-labs

Black Forest Labs: FLUX.2 Max

201M tokens

FLUX.2 [max] is the new top-tier image model from Black Forest Labs, pushing image quality, prompt understanding, and editing consistency to the highest level yet.

Pricing is as follows, per the docs: Input: We charge $0.03 for each megapixel on the input (i.e. reference images for editing) Output: The first generated megapixel is charged $0.07. Each subsequent megapixel is charged $0.03.

by black-forest-labs47K context$0/M input tokens$0/M output tokens$17.09/M tokens
Favicon for openai

OpenAI: GPT-5 Image

105M tokens

GPT-5 Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improvements in reasoning, code quality, and user experience while incorporating GPT Image 1's superior instruction following, text rendering, and detailed image editing.

by openai400K context$10/M input tokens$10/M output tokens$40/M tokens
Favicon for black-forest-labs

Black Forest Labs: FLUX.2 Flex

97.4M tokens

FLUX.2 [flex] excels at rendering complex text, typography, and fine details, and supports multi-reference editing in the same unified architecture.

Pricing is as follows, per the docs: We charge $0.06 for each megapixel on both input and output side.

by black-forest-labs67K context$0/M input tokens$0/M output tokens$14.65/M tokens