Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I hope someone will soon post a quantized version that I can run on my macbook pro.


Ollama has released the quantized version.

https://ollama.ai/library/codellama:70b https://x.com/ollama/status/1752034686615048367?s=20

Just need to run `ollama run codellama:70b` - pretty fast on macbook.


Really? what kinda macbook pro do you need to run it fast? will a M1 with 16GB ram work? or do we need something super beefy like a M2 fully decked out at 96GB ram to make it run?


M1 with 16GB Ram will barely run codellama:13b


I'm trying to understand how this works.. does it actually run the model on the MacBook Pro? Sorry I am totally new to this...


Yes, it runs a quantized [1] version of the model locally. This version uses low-precision data types to represent reduced weights and activations (8-bit integer instead of 32-bit). The specific model published by Ollama uses 4-bit quantization [2] and that's why it is able to run on MacBook pro.

If you want to try it out, this blog post[3] shows how to do it step by step - pretty straightforward.

[1] https://huggingface.co/docs/optimum/concept_guides/quantizat...

[2] https://ollama.ai/library/codellama:70b

[3] https://annjose.com/post/run-code-llama-70B-locally/


Thanks, I got all but the 70b model to work. It slows to a crawl on the Mac with 36 gb ram.


Cool. I am running this on M2 Max 64GB. Here is how it looks on my terminal [1]. Btw, the very first run after downloading the model is slightly slow, but the subsequent runs are ok.

[1] https://asciinema.org/a/fFbOEfeTxRShBGbqslwQMfJS4 Note: This recording is in real-time speed, not sped-up.


I am returning the M3 max 36 GB and picking this model instead. Saves me a grand and it seems to be much more powerful..


What do you mean you are returning it? It has been used already.


ollama run codellama:70b pulling manifest pulling 1436d66b6

1.1 GB/ 38 GB 24 MB/s 25m21s


Do you know how much vram is required?


if you are asking about Apple silicon based Macs, they have integerated GPUs and do not have dedicated graphics memory (UMA)

For running 4 bit quantized model, with 70B parameters you will need around 35G Ram to load it in the memory. So I sould say a Mac with at least 48G memory. That is M3 Max.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: