Llava 13B

haotian-liu/llava-13b

Updated Nov 162,048 context
$10/M input tkns$10/M output tkns

LLaVA is a large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking GPT-4 and setting a new state-of-the-art accuracy on Science QA

#multimodal

OpenRouter attempts providers in this order unless you set dynamic routing preferences. Prices displayed per million tokens.