LLaVA is a large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities and setting a new state-of-the-art accuracy on Science QA.
#multimodal
Modalities
Context
2K
Released
Nov 16, 2023
Knowledge Cutoff
Jun 2023
Token volume and request traffic to this model over time.