Google: Gemini Pro Vision 1.0
google/gemini-pro-vision
Updated Aug 1365,536 context
$0.125/M input tokens$0.375/M output tokens$2.5/K input imgs
Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response.
See the benchmarks and prompting guidelines from Deepmind.
Usage of Gemini is subject to Google's Gemini Terms of Use.
#multimodal