Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article you linked says that GPT4 performed better than crowdsourced workers, not than experts. The experts performed better than GPT4 in all but 1 or 2 cases. And in my experience with Mechanical Turk, the workers from MT are often barely better than random chance.


Fair on the wording I suppose but

First of all, the dataset used for evaluation was created by those researchers, weighing it in their favor.

Second, GPT-4 still performs better in 6 of those. Hardly 1 or 2. And when it doesn't, it's usually very close.

All of this is to say that GPT-4 will smoke any bespoke NLP model/API which is the main point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: