Anyone who uses these models for more than 10 min will immediately realize that they're really, really bad compared to other free, OSS models. Even Phi-2 was giving me "on par" results except that its a model of a different league.
Many models are being released now, which is good to keep OpenAI on their toes and not mess up, but, truth be told, I've yet to see _any_ OSS model that I can run on my machine being as good as ChatGPT 3 (not 3.5, not 4, but the original one from when everyone went crazy).
My hopes for consumer hardware ChatGPT-3.5 within 2024 probably lie with what Meta will keep building upon.
Google was great, once. Now, they're a mere bystander in the larger scheme of things. I think that's a good thing. Everything in the world is cyclic and ephemeral and Google enjoyed their time while it lasted, but, newer and better things are and will, keep on coming.
PS: Completely unrelated, but, gmail is now the only Google product I actively use. I don't, genuinely, remember the last time I did a Google Search... When I need to do my own digging I use Phind these days.
Times are changing and that's great for tech and future generations joining the field and workforce!
Yi 34 200K finetunes (like Tess 1.5),Deepseek Code 33B and Miqu 70B definitely outpace ChatGPT-3.5, at least for me.
They don't have the augmentations of being a service, but generally they are smarter, have a bigger context and (perhaps most importantly) are truly unbound.
I am on a single 3090 desktop, for reference. Admittedly, this is much more expensive now than it was a few months ago, with the insane prices used 3090s are going for now.
Damn, I see, how many tokens per sec you get on that setup?
On a Macbook M2 I get ~10/12t/sec which is a tiny tad bit too slow for continued/ daily use, but if I think its worthy I might invest on a more powerful machine soon-ish!
On 33B/34B models I get 35 tokens/sec, way faster than I can read streaming in. At huge contexts (like 30K-74K), prompt processing takes forever and token generation is slower, but its still faster than I can read.
Miqu 70B is slow (less than 10 tok/sec, I think) because I have to split it with llama.cpp. I only use it for short context questions where I need a bit more intelligence.
And for reference, this is a SFF desktop! It's no Macbook, but still small enough (10L and flat) for me to fly with in carry on.
If Mixtral isn't outperforming chatgpt 3 you're configuring it wrong. It gives somewhat terse answers by default, but you can prompt it to spit out wordy answers of the sort chatgpt has been aligned to prefer easily enough.
Mixtral aka the 8x7B the "sparse mixture of experts" one is not the same as, eg. Mistral-7B which is still very, very good, just not quite hitting the mark on some things.
I still couldn't run Mixtral 8x7B on an M1 Macbook Pro with 32Gb ram, so maybe I am indeed doing it wrong? Or are there better quantized versions available now or..?
But GGUF Mixtral should fit in 32GB... just not with the full 32K context. Long context is very memory intense in llama.cpp, at least until they fully implement flash attention and a quantized cache.
> I've yet to see _any_ OSS model that I can run on my machine being as good as ChatGPT 3 (not 3.5, not 4, but the original one from when everyone went crazy).
It depends on your machine I guess, but IMO there's definitely OSS models out there that rival the original ChatGPT offering for certain use cases(dolphin mixtral comes to mind). Having a model with RAG capability is going to make a huge difference in the quality of the answer, as well.
Many models are being released now, which is good to keep OpenAI on their toes and not mess up, but, truth be told, I've yet to see _any_ OSS model that I can run on my machine being as good as ChatGPT 3 (not 3.5, not 4, but the original one from when everyone went crazy).
My hopes for consumer hardware ChatGPT-3.5 within 2024 probably lie with what Meta will keep building upon.
Google was great, once. Now, they're a mere bystander in the larger scheme of things. I think that's a good thing. Everything in the world is cyclic and ephemeral and Google enjoyed their time while it lasted, but, newer and better things are and will, keep on coming.
PS: Completely unrelated, but, gmail is now the only Google product I actively use. I don't, genuinely, remember the last time I did a Google Search... When I need to do my own digging I use Phind these days.
Times are changing and that's great for tech and future generations joining the field and workforce!