I don't really think this is true, you can't really extrapolate the strengths an...

simonw · on April 1, 2024

Sure, you can't extrapolate the strengths and weaknesses of the larger ones from the smaller ones - but you still get a much firmer idea of what "they're fancy autocomplete" actually means.

If nothing else it does a great job of demystifying them. They feel a lot less intimidating once you've seen a small one running on your computer write a terrible haiku and hallucinate some non-existent API methods.

fzzzy · on April 1, 2024

It's funny that you say this, because the first thing I tried after ChatGPT came out (3.5-turbo was it?) was writing a haiku. It couldn't do it at all. Also, after 4 came out, it hallucinated an api that wasted a day for me. It's an api that absolutely should have existed, but didn't. Now, I frequently apply llm to things that are easily verifiable, and just double check everything.

famouswaffles · on April 2, 2024

>but you still get a much firmer idea of what "they're fancy autocomplete" actually means.

Interesting how you can have the same experience and come to opposite conclusions.

Seeing so many failure modes of the smaller models fall by the wayside as compute goes brrr just made me realize how utterly meaningless that phrase is.