Wow I'm surprised to see Mistral 24B that high up, or on this chart at all, with...

		moffkalast 10 months ago \| parent \| context \| favorite \| on: Show HN: LLMs Playing Mafia games – See them lie, ... Wow I'm surprised to see Mistral 24B that high up, or on this chart at all, with NeMo on the absolute bottom. Maybe they accidentally mislabeled the ratings, because I sure haven't seen the 24B hold a coherent conversation beyond half a dozen back and forth messages without it having a mental breakdown and starting to repeat itself like Howard Hughes.

We definitely need to run much more simulations to get accurate dashboard