I'm sure that Python program could have been rewritten to complete in an acceptable amount of time. Yes, the C++ program will be faster, but a good Python dev could probably have fixed that scientist's code with numpy, proper BLAS libraries, and maybe a quick dash of Cython.
Yes, for smaller stuff, numpy really does add a lot of overhead. It would make life much simpler if you could copy things into a numpy array more quickly, but oh well. In any case, you can find pretty tight pure Python code for most things. For instance, I needed to drop a relatively small Linear Regression calculation from like 500 microseconds using numpy or scipy (I don't remember) to double-digit microseconds somehow. I googled it for a little bit and after adapting some pure python code using regular lists, I got it into double digits. And then after converting the function rather easily into Cython (and just the function, not the entire program), its single-digit microseconds now.