The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent: https://x.com/mistralai/status/1833758285167722836.
Prompt tokens measure input size. Reasoning tokens show internal thinking before a response. Completion tokens reflect total output length.