So the main claim of this paper is: "...good models and training procedures are those that (1) optimize quickly in the ideal world and (2) do not optimize too quickly in the real world."
Will be interesting to replicate those results on different data.
Will be interesting to replicate those results on different data.