But there are fairly good models for doing NER that are not LLMs. Models that are open source and you can even run on a CPU, with parameter counts in the hundred of millions, not billions.
The article you linked says that GPT4 performed better than crowdsourced workers, not than experts. The experts performed better than GPT4 in all but 1 or 2 cases. And in my experience with Mechanical Turk, the workers from MT are often barely better than random chance.
While true, GPT-4 kinda just gets a lot of the classic NLP tasks, such as NER, right with zero fine-tuning or minimal prompt engineering (or whatever you want to call it). I haven't done an extensive study, but I do NLP daily as part of my current job. I often reach for GPT-4 now, and so far it does a better job than any other pretrained models or ones I've trained/fine-tuned, at least for data I work on.
But what about cost? There was a recent article saying that Doordash makes 40 billion predictions per day, which would result in 40 million dollars per day if using GPT4.
Sure, GPT4 is great for experimenting with and I often try it out, but at the end of the day, for deploying a widely used model, the cost benefit analysis will favor bespoke models a lot of the time.