Really? what kinda macbook pro do you need to run it fast? will a M1 with 16GB ram work? or do we need something super beefy like a M2 fully decked out at 96GB ram to make it run?
Yes, it runs a quantized [1] version of the model locally. This version uses low-precision data types to represent reduced weights and activations (8-bit integer instead of 32-bit). The specific model published by Ollama uses 4-bit quantization [2] and that's why it is able to run on MacBook pro.
If you want to try it out, this blog post[3] shows how to do it step by step - pretty straightforward.
Cool. I am running this on M2 Max 64GB. Here is how it looks on my terminal [1]. Btw, the very first run after downloading the model is slightly slow, but the subsequent runs are ok.
if you are asking about Apple silicon based Macs, they have integerated GPUs and do not have dedicated graphics memory (UMA)
For running 4 bit quantized model, with 70B parameters you will need around 35G Ram to load it in the memory. So I sould say a Mac with at least 48G memory. That is M3 Max.