Exactly. You want to come close to maxing out your RAM for model+context. I've r...

		zellyn 8 months ago \| parent \| context \| favorite \| on: Local LLM inference – impressive but too hard to w... Exactly. You want to come close to maxing out your RAM for model+context. I've run Gemma on a 64GB M1 and it was pretty okay, although that was before the Quantization-Aware Training version released last week, so it might be even better now.