DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model & technical report.
MIT licensed: Distill & commercialize freely!
Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:
For more details, see this blog post and GitHub repo.
Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.
MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context of up to 4 million tokens.
The text model adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The image model adopts the “ViT-MLP-LLM” framework and is trained on top of the text model.
To read more about the release, see: https://www.minimaxi.com/en/news/minimax-01-series-2