Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have an 8GB, and I am considering two more 8GB, it should I get a single 16GB? The 8GB card was donated, and we need some pipelining... I have 10~15 2GB quadro cards... Apparently useless.


I mean... It depends?

You are just trying to host a llama server?

Matching the VRAM doesn't necessarily matter, get the most you can afford on a single card. Splitting beyond 2 cards doesn't work well at the moment.

Getting a non Nvidia card is a problem for certain backends (like exLLaMA) but fine for llama.cpp in the near future.

AFAIK most backends are not pipelined, the load jumps sequentially from one GPU to the next.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: