It’s still more complicated because you have to amortize costs over several years and then also keep in mind that big players will design their own hardware to avoid the A100 cost and energy. Disks and other devices will also consume power. And you datacenter cooling is now more expensive, etc.
Yes, it’s complicated, for sure. But if we amortize over 3 years, and triple costs for the power: it’s ~500k/yr for hardware and ~100k/yr for power.
In terms of TPUs or other custom accelerators, sure, they exist. However most definitely aren’t building their own hardware.
ETA: I’m not saying power is irrelevant, it clearly matters. But saying it’s the dominant financial constraint is clearly wrong, at least below Google/Amazon/Apple scale. Never mind the cost of the people running these trainings!