Nous: Hermes 2 Mistral 7B DPO

nousresearch/nous-hermes-2-mistral-7b-dpo

Updated Feb 2132,768 context
$0.18/M input tkns$0.18/M output tkns

This is the flagship 7B Hermes model, a Direct Preference Optimization (DPO) of Teknium/OpenHermes-2.5-Mistral-7B. It shows improvement across the board on all benchmarks tested - AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA.

The model prior to DPO was trained on 1,000,000 instructions/chats of GPT-4 quality or better, primarily synthetic data as well as other high quality datasets.