Hacker Newsnew | past | comments | ask | show | jobs | submit | more annjose's commentslogin

Co-author here. Happy to answer any questions or take any feedback on this project.


This discussion was fantastic - great flow, no filler words.

I love the passion and fun they were having while talking about the company, business, marketplace incentives, Ruby on Rails and mechanical keyboards! It was fun watching them have fun.


Ollama has released the quantized version.

https://ollama.ai/library/codellama:70b https://x.com/ollama/status/1752034686615048367?s=20

Just need to run `ollama run codellama:70b` - pretty fast on macbook.


Really? what kinda macbook pro do you need to run it fast? will a M1 with 16GB ram work? or do we need something super beefy like a M2 fully decked out at 96GB ram to make it run?


M1 with 16GB Ram will barely run codellama:13b


I'm trying to understand how this works.. does it actually run the model on the MacBook Pro? Sorry I am totally new to this...


Yes, it runs a quantized [1] version of the model locally. This version uses low-precision data types to represent reduced weights and activations (8-bit integer instead of 32-bit). The specific model published by Ollama uses 4-bit quantization [2] and that's why it is able to run on MacBook pro.

If you want to try it out, this blog post[3] shows how to do it step by step - pretty straightforward.

[1] https://huggingface.co/docs/optimum/concept_guides/quantizat...

[2] https://ollama.ai/library/codellama:70b

[3] https://annjose.com/post/run-code-llama-70B-locally/


Thanks, I got all but the 70b model to work. It slows to a crawl on the Mac with 36 gb ram.


Cool. I am running this on M2 Max 64GB. Here is how it looks on my terminal [1]. Btw, the very first run after downloading the model is slightly slow, but the subsequent runs are ok.

[1] https://asciinema.org/a/fFbOEfeTxRShBGbqslwQMfJS4 Note: This recording is in real-time speed, not sped-up.


I am returning the M3 max 36 GB and picking this model instead. Saves me a grand and it seems to be much more powerful..


What do you mean you are returning it? It has been used already.


ollama run codellama:70b pulling manifest pulling 1436d66b6

1.1 GB/ 38 GB 24 MB/s 25m21s


Do you know how much vram is required?


if you are asking about Apple silicon based Macs, they have integerated GPUs and do not have dedicated graphics memory (UMA)

For running 4 bit quantized model, with 70B parameters you will need around 35G Ram to load it in the memory. So I sould say a Mac with at least 48G memory. That is M3 Max.


The TEALS program looks very promising - thank you for suggesting here. I just filled out the application for a teaching assistant role (not confident to sign up for teaching right away). Looking forward to hearing from them.


And the site is down. >>> Due to the large wave of new users joining over the past several days, we have encountered technical issues which have left many experiencing service interruptions. We thank you for your patience and encouragement as we work to make Vero available to everyone.


I use VSCode for ReactNative projects and absolutely love its built-in debugging capabilities. With its clean and vivid UI, it makes my code look beautiful.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: