Does CUDA even matter than much for LLMs? Especially inference? I don't think software would be the limiting factor for this hypothetical GPU. Afterall it would be competing with Apple's M chips not with the 4090 or Nvidia's enterprise GPUs.
It's the only thing that matters. Folks act like AMD support is there because suddenly you can run the most basic LLM workload. Try doing anything actually interesting (i.e, try running anything cool in the mechanistic interoperability or representation/attention engineer world) with AMD and suddenly everything broken, nothing works, and you have to spend millions worth of AI engineer developer time to try to salvage a working solution.