Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow I'm surprised to see Mistral 24B that high up, or on this chart at all, with NeMo on the absolute bottom. Maybe they accidentally mislabeled the ratings, because I sure haven't seen the 24B hold a coherent conversation beyond half a dozen back and forth messages without it having a mental breakdown and starting to repeat itself like Howard Hughes.


We definitely need to run much more simulations to get accurate dashboard




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: