Skip to content
  1.  
  2. © 2023 – 2025 OpenRouter, Inc
    Favicon for InferenceNet

    inference.net

    Browse models provided by inference.net (Terms of Service)

    8 models

    Tokens processed on OpenRouter

    • OpenAI: gpt-oss-120bgpt-oss-120b

      gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

      by openai131K context$0.05/M input tokens$0.45/M output tokens
  3. OpenAI: gpt-oss-20bgpt-oss-20b

    gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

    by openai131K context$0.03/M input tokens$0.15/M output tokens
  4. Meta: Llama 3.2 1B InstructLlama 3.2 1B Instruct

    Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance. Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models. Click here for the original model card. Usage of this model is subject to Meta's Acceptable Use Policy.

    by meta-llama131K context$0.01/M input tokens$0.01/M output tokens
  5. Meta: Llama 3.2 11B Vision InstructLlama 3.2 11B Vision Instruct

    Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the original model card. Usage of this model is subject to Meta's Acceptable Use Policy.

    by meta-llama131K context$0.055/M input tokens$0.055/M output tokens
  6. Meta: Llama 3.2 3B InstructLlama 3.2 3B Instruct

    Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the original model card. Usage of this model is subject to Meta's Acceptable Use Policy.

    by meta-llama131K context$0.02/M input tokens$0.02/M output tokens
  7. Qwen: Qwen2.5-VL 7B InstructQwen2.5-VL 7B Instruct

    Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context$0.20/M input tokens$0.20/M output tokens
  8. Meta: Llama 3.1 8B InstructLlama 3.1 8B Instruct

    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

    by meta-llama131K context$0.02/M input tokens$0.03/M output tokens
  9. Mistral: Mistral NemoMistral Nemo

    A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.

    by mistralai131K context$0.0375/M input tokens$0.10/M output tokens