more annjose's comments

annjose · 2024-12-28T17:01:09 1735405269

Co-author here. Happy to answer any questions or take any feedback on this project.

annjose · on Nov 20, 2024

This discussion was fantastic - great flow, no filler words.

I love the passion and fun they were having while talking about the company, business, marketplace incentives, Ruby on Rails and mechanical keyboards! It was fun watching them have fun.

annjose · on Jan 30, 2024

Ollama has released the quantized version.

https://ollama.ai/library/codellama:70b https://x.com/ollama/status/1752034686615048367?s=20

Just need to run `ollama run codellama:70b` - pretty fast on macbook.

DaPowaa · on Jan 31, 2024

Really? what kinda macbook pro do you need to run it fast? will a M1 with 16GB ram work? or do we need something super beefy like a M2 fully decked out at 96GB ram to make it run?

israrkhan · on Jan 31, 2024

M1 with 16GB Ram will barely run codellama:13b

kungfupawnda · on Jan 31, 2024

I'm trying to understand how this works.. does it actually run the model on the MacBook Pro? Sorry I am totally new to this...

annjose · on Jan 31, 2024

Yes, it runs a quantized [1] version of the model locally. This version uses low-precision data types to represent reduced weights and activations (8-bit integer instead of 32-bit). The specific model published by Ollama uses 4-bit quantization [2] and that's why it is able to run on MacBook pro.

If you want to try it out, this blog post[3] shows how to do it step by step - pretty straightforward.

[1] https://huggingface.co/docs/optimum/concept_guides/quantizat...

[2] https://ollama.ai/library/codellama:70b

[3] https://annjose.com/post/run-code-llama-70B-locally/

kungfupawnda · on Jan 31, 2024

Thanks, I got all but the 70b model to work. It slows to a crawl on the Mac with 36 gb ram.

annjose · on Jan 31, 2024

Cool. I am running this on M2 Max 64GB. Here is how it looks on my terminal [1]. Btw, the very first run after downloading the model is slightly slow, but the subsequent runs are ok.

[1] https://asciinema.org/a/fFbOEfeTxRShBGbqslwQMfJS4 Note: This recording is in real-time speed, not sped-up.

kungfupawnda · on Jan 31, 2024

I am returning the M3 max 36 GB and picking this model instead. Saves me a grand and it seems to be much more powerful..

mkevac · on Feb 8, 2024

What do you mean you are returning it? It has been used already.

sciencesama · on Jan 30, 2024

ollama run codellama:70b pulling manifest pulling 1436d66b6

1.1 GB/ 38 GB 24 MB/s 25m21s

theLiminator · on Jan 30, 2024

Do you know how much vram is required?

israrkhan · on Jan 31, 2024

if you are asking about Apple silicon based Macs, they have integerated GPUs and do not have dedicated graphics memory (UMA)

For running 4 bit quantized model, with 70B parameters you will need around 35G Ram to load it in the memory. So I sould say a Mac with at least 48G memory. That is M3 Max.

annjose · on Nov 2, 2023

The TEALS program looks very promising - thank you for suggesting here. I just filled out the application for a teaching assistant role (not confident to sign up for teaching right away). Looking forward to hearing from them.

annjose · on March 4, 2018

And the site is down. >>> Due to the large wave of new users joining over the past several days, we have encountered technical issues which have left many experiencing service interruptions. We thank you for your patience and encouragement as we work to make Vero available to everyone.

annjose · on April 14, 2016

I use VSCode for ReactNative projects and absolutely love its built-in debugging capabilities. With its clean and vivid UI, it makes my code look beautiful.