Never could I. What helps me often is to consider the C language heritage of Python. There, the beginner also has the slightly confusing some_var[y][x]=some_value, caused by the computers memory model behind. Consequently, I'm always looking for the hierarchy of fastest changing indexes. This explains/justifies a lot of the design decisions made also for numpy.
It's not simple because there are two axis of interest: the axis along you sum, and the axis that are preserved (ie, that gives the dimension of the returned array).
Both of them are of interest and after you were confused once about which one to supply to axis=..., there is no way back to clear the confusion. With einsum there is no confusion.
If you have a high-dimensional array then in your latter convention if you had to specify the “remaining” axes you’d need to provide a lengthy list. That’d be user-hostile.
On the other hand einsum is Crystal clear and not prone to confusion.
> np.einsum("ij -> j", B) # sum along rows to create one column-like array
> np.einsum("ij -> i", B) # sum along columns to create one row-like array
Edit: More Einstein sum fun at https://stackoverflow.com/questions/26089893/understanding-n...