Best Rerank Models for Search and RAG

Model rankings updated July 2026 based on real usage data.

Rerank models improve retrieval systems by reordering candidate documents, passages, or search results according to relevance. They are commonly used in semantic search, retrieval-augmented generation (RAG), recommendations, and knowledge-base applications where the first retrieval step returns too many possible matches. Compare top reranking models on OpenRouter to find the best fit for your search or RAG pipeline.

Top Rerank Models on OpenRouter

NVIDIA: Llama Nemotron Rerank VL 1B V2 (free)

21B tokens

Llama Nemotron Rerank VL 1B V2 is a 1.7B multimodal reranking model from NVIDIA. It evaluates the relevance of document images and text against user queries, designed for vision RAG pipelines handling charts, tables, infographics, and mixed-media documents. Functions as a cross-encoder that accepts text queries paired with image, text, or combined document inputs, delivering approximately 6-7% recall improvements over embedding-only baselines on visual document retrieval benchmarks.

by nvidia10K context$0/M input tokens$0/M output tokens

Cohere: Rerank 4 Pro

Cohere's AI search foundation model for enhancing the relevance of information surfaced within search and RAG systems. Features a 32K context window, multilingual support across 100+ languages, no data pre-processing required, and state of the art performance with low latency.

by cohere33K context$0/M input tokens$0/M output tokens

Cohere: Rerank 4 Fast

by cohere33K context$0/M input tokens$0/M output tokens

Cohere: Rerank v3.5

Rerank v3.5 is designed to reorder search results for improved relevance. It supports multi-aspect and semi-structured data reranking over 100+ languages. Ideal for refining results from semantic or keyword search pipelines.

by cohere4K context$0/M input tokens$0/M output tokens