Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> How does this compare to GPT-4?

I'll give you the answer for every open source model over the next 2 years: It's far worse



If you'd said that about OpenAI's DALL-E 2 you'd have been wrong.

I suspect Open Source LLMs will outpace the release version of GPT-4 before the end of this year.

It's less likely they will outpace whatever version of GPT-4 is shipped later this year, but still very much possible.



Open source LLMs might do that, but I very much doubt that those models will be small enough to run even on high-end consumer hardware (like say RTX 3090 or 4090).


The way they'll do it, if they do it at all, is to find a way to squeeze the capability into smaller models and get much faster at executing them. That's where the market forces are.

That's exactly the core of the email that leaked out of Google: it's proving far better to be able to have lots of people iterating quickly (which necessarily means broad access to the necessary hardware) than to rely on massive models and bespoke hardware.

I'd anticipate something along the lines of a breakthrough in guided model shrinking, or some trick in partial model application that lets you radically reduce the number of calculations needed. Otherwise whatever happens isn't as likely to come out of the open source LLM community.


> it's proving far better to be able to have lots of people iterating quickly (which necessarily means broad access to the necessary hardware) than to rely on massive models and bespoke hardware

Very true, but can't Google just wait and take from the open-source-LLM community the findings, then quickly update their models on their huge clusters? It's not like they will lose the top position, already done that.


Yes and no. Some of the optimisation techniques that are being researched at the moment use the output of larger models to fine-tune smaller ones, and that sort of improvement can obviously only be one-way. Same with quantising a model beyond the point where the network is trainable. But anything that helps smaller models run faster without appealing to properties of a bigger model that has to already exist? Absolutely yes.


That seems way off the mark.

Open source models can already approximate GPT-3.5 for most tasks on common home hardware, right now.


Okay, so "ignore my out of touch opinion of language models". Got it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: