Skip to content
  1. Status
  2. Announcements
  3. Docs
  4. Support
  5. About
  6. Partners
  7. Enterprise
  8. Careers
  9. Pricing
  10. Privacy
  11. Terms
  12.  
  13. © 2025 OpenRouter, Inc

    NVIDIA: Llama 3.1 Nemotron Nano 8B v1

    nvidia/llama-3.1-nemotron-nano-8b-v1

    Created Apr 8, 2025131,072 context

    Llama-3.1-Nemotron-Nano-8B-v1 is a compact large language model (LLM) derived from Meta's Llama-3.1-8B-Instruct, specifically optimized for reasoning tasks, conversational interactions, retrieval-augmented generation (RAG), and tool-calling applications. It balances accuracy and efficiency, fitting comfortably onto a single consumer-grade RTX GPU for local deployment. The model supports extended context lengths of up to 128K tokens.

    Note: you must include detailed thinking on in the system prompt to enable reasoning. Please see Usage Recommendations for more.

    Providers for Llama 3.1 Nemotron Nano 8B v1

    OpenRouter routes requests to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.