I think engineers learn this quickly in high-scale/performance production enviro...

I think engineers learn this quickly in high-scale/performance production environments. Even without hardware backgrounds. SLAs/costs create constraints you need to optimize against after promising the business line these magical models can enable that cool new feature for a million users.

Traditional AI/ML models (including smaller transformers) can definitely be optimized for mass scale/performance on cpu-optimized infrastructure.