I think he's mostly referring to inference and not training, which I entirely ag...

I think he's mostly referring to inference and not training, which I entirely agree with - a 4x version of this card for workstations would do really well - even some basic interconnect between the cards a la nvlink would really drive this home.

The training can come after, with some inference and runtime optimizations on the software stack.