I think he's mostly referring to inference and not training, which I entirely agree with - a 4x version of this card for workstations would do really well - even some basic interconnect between the cards a la nvlink would really drive this home.
The training can come after, with some inference and runtime optimizations on the software stack.
The training can come after, with some inference and runtime optimizations on the software stack.