Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
zellyn
8 months ago
|
parent
|
context
|
favorite
| on:
Local LLM inference – impressive but too hard to w...
Exactly. You want to come close to maxing out your RAM for model+context. I've run Gemma on a 64GB M1 and it was pretty okay, although that was before the Quantization-Aware Training version released last week, so it might be even better now.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: