As the article mentions, producing an optimal layout is an optimisation problem where the related decision problem is NP-complete. Even laying out standard cells has to be done using heuristic solutions - blowing up the size of the problem by going from cells to transistors just makes that worse.
The logic is built out of standard gates and logic blocks like flip-flops anyway, so the overhead of using standard cells that implement those building blocks likely isn't too great.
This is more apocryphal than lore, but the understanding I've picked up from EE friends is that standard cells are used because they're proven to work in a given fab process. You don't want your layout software coming up with a trillion different gate prototypes in the midst of laying out your logic circuit!
The logic is built out of standard gates and logic blocks like flip-flops anyway, so the overhead of using standard cells that implement those building blocks likely isn't too great.