Nous: Hermes 2 Mistral 7B DPO
nousresearch/nous-hermes-2-mistral-7b-dpo
Updated Feb 218,192 context
This is the flagship 7B Hermes model, a Direct Preference Optimization (DPO) of Teknium/OpenHermes-2.5-Mistral-7B. It shows improvement across the board on all benchmarks tested - AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA.
The model prior to DPO was trained on 1,000,000 instructions/chats of GPT-4 quality or better, primarily synthetic data as well as other high quality datasets.