Qwen: Qwen2.5-VL 7B Instruct

qwen/qwen-2.5-vl-7b-instruct

Created Aug 28, 202432,768 context
$0.20/M input tokens$0.20/M output tokens$0.145/K input imgs

Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:

  • SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.

  • Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.

  • Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.

  • Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

For more details, see this blog post and GitHub repo.

Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

Recent activity on Qwen2.5-VL 7B Instruct

Tokens processed per day

Jan 29Feb 4Feb 10Feb 16Feb 22Feb 28Mar 6Mar 12Mar 18Mar 24Mar 30Apr 5Apr 11Apr 17Apr 23Apr 290150M300M450M600M

More models from Qwen

    Qwen: Qwen2.5-VL 7B Instruct – Recent Activity | OpenRouter