LLM Rankings

Compare models by tokens processed

The NeverSleep team is back, with a Llama 3 70B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary. To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength. Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). • 8192 context

12.7M tokens

new

Perplexity: Llama3 Sonar 8B

Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/models/perplexity/llama-3-sonar-small-32k-online) of this model has Internet access. • 32768 context

18.3M tokens

11488%

Google: Gemini Flash 1.5 (preview)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter. #multimodal • 2800000 context

282M tokens

4343%

Perplexity: Llama3 Sonar 70B Online

Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat model](/models/perplexity/llama-3-sonar-large-32k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online • 28000 context

15.8M tokens

3132%

Perplexity: Llama3 Sonar 8B Online

Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat model](/models/perplexity/llama-3-sonar-small-32k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online • 28000 context

3.14M tokens

2819%

Anthropic: Claude Instant (older v1)

Anthropic's model for low-latency, high throughput text generation. Supports hundreds of pages of text. • 100000 context

2.99M tokens

1973%

OpenAI: GPT-4o (2024-05-13)

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal • 128000 context

182M tokens

1463%

Qwen 1.5 14B Chat

Qwen1.5 14B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). • 32768 context

79M tokens

1051%

Meta: Llama 3 8B

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 8B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). • 8192 context

3.97M tokens

791%

10.

Perplexity: Llama3 Sonar 70B

Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/models/perplexity/llama-3-sonar-large-32k-online) of this model has Internet access. • 32768 context

20M tokens

636%

11.

Google: PaLM 2 Code Chat

PaLM 2 fine-tuned for chatbot conversations that help with code-related questions. • 20070 context

567K tokens

539%

12.

DeepSeek-V2 Chat

DeepSeek-V2 Chat is a conversational finetune of DeepSeek-V2, a Mixture-of-Experts (MoE) language model. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluations. • 128000 context

53.6M tokens

527%

13.

Qwen 1.5 7B Chat

Qwen1.5 7B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). • 32768 context

7.27M tokens

371%

14.

Cohere: Command

Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy). • 4096 context

8.88M tokens

335%

15.

Meta: Llama 3 70B

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 70B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). • 8192 context

7.5M tokens

311%

16.

StripedHyena Hessian 7B (base)

This is the base model variant of the [StripedHyena series](/models?q=stripedhyena), developed by Together. StripedHyena uses a new architecture that competes with traditional Transformers, particularly in long-context data processing. It combines attention mechanisms with gated convolutions for improved speed, efficiency, and scaling. This model marks an advancement in AI architecture for sequence modeling tasks. • 32768 context

457K tokens

310%

17.

OpenAI: GPT-4o

1.52B tokens

299%

18.

Meta: CodeLlama 70B Instruct

Code Llama is a family of large language models for code. This one is based on [Llama 2 70B](/models/meta-llama/llama-2-70b-chat) and provides zero-shot instruction-following ability for programming tasks. • 2048 context

2.53M tokens

271%

19.

Google: Gemini Pro 1.5 (preview)

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.* #multimodal • 2800000 context

851M tokens

226%

20.

Google: PaLM 2 Chat

PaLM 2 is a language model by Google with improved multilingual, reasoning and coding capabilities. • 25804 context

16.8M tokens

181%

LLM Rankings

Compare models by tokens processed

Weekly active models

LLM Rankings

Compare models by tokens processed

Weekly active models