The article you linked says that GPT4 performed better than crowdsourced workers, not than experts. The experts performed better than GPT4 in all but 1 or 2 cases. And in my experience with Mechanical Turk, the workers from MT are often barely better than random chance.