You are just trying to host a llama server?
Matching the VRAM doesn't necessarily matter, get the most you can afford on a single card. Splitting beyond 2 cards doesn't work well at the moment.
Getting a non Nvidia card is a problem for certain backends (like exLLaMA) but fine for llama.cpp in the near future.
AFAIK most backends are not pipelined, the load jumps sequentially from one GPU to the next.