Open source LLMs might do that, but I very much doubt that those models will be small enough to run even on high-end consumer hardware (like say RTX 3090 or 4090).
The way they'll do it, if they do it at all, is to find a way to squeeze the capability into smaller models and get much faster at executing them. That's where the market forces are.
That's exactly the core of the email that leaked out of Google: it's proving far better to be able to have lots of people iterating quickly (which necessarily means broad access to the necessary hardware) than to rely on massive models and bespoke hardware.
I'd anticipate something along the lines of a breakthrough in guided model shrinking, or some trick in partial model application that lets you radically reduce the number of calculations needed. Otherwise whatever happens isn't as likely to come out of the open source LLM community.
> it's proving far better to be able to have lots of people iterating quickly (which necessarily means broad access to the necessary hardware) than to rely on massive models and bespoke hardware
Very true, but can't Google just wait and take from the open-source-LLM community the findings, then quickly update their models on their huge clusters? It's not like they will lose the top position, already done that.
Yes and no. Some of the optimisation techniques that are being researched at the moment use the output of larger models to fine-tune smaller ones, and that sort of improvement can obviously only be one-way. Same with quantising a model beyond the point where the network is trainable. But anything that helps smaller models run faster without appealing to properties of a bigger model that has to already exist? Absolutely yes.
I'll give you the answer for every open source model over the next 2 years: It's far worse