I do a lot of log crunching with pypy, and datetime.strftime is a really big part of the cost. I wish someone could contribute faster datetime routines to the standard CPython and pypy.
Other parts that are always hot include split() and string concatenation. Java compilers can substitute StringBuffers when they see naive string concatenation, but in Python there's no easy way to build a string in a complex loop and you end up putting string fragments into a list and then finally join()ing them. Madness!
One trick that helped when I was doing log crunching (where time parsing was a good 10%) was to cache the parse.
All log lines began with a date+time like "2015-12-10 14:42:54.432" and there's maybe 100 lines per second. You can therefore just take the first 19 characters, parse that to a millisecond unix time and then separately parse the milliseconds to an int and add that. All you need is one cache entry (since logs are mostly in order) and then you can just do a string comparison (i.e. no hashmap lookup) to check the cache - instantly 100x fewer time parsing calls.
The best way to speed up a function is to not call it!
Well, it's half memoization, and half knowing that you can split the work into smaller parts, which allows you to take advantage of the memoization.
Just adding memoization to date and time parsing gets you very little when there's little duplication of the inputs, and without the breaking apart of the data could very likely have yielded worse performance.
In [63]: timeit dateutil.parser.parse('2013-05-11T21:23:58.970460+07:00')
10000 loops, best of 3: 89.5 µs per loop
In [64]: timeit arrow.get('2013-05-11T21:23:58.970460+07:00')
10000 loops, best of 3: 62.1 µs per loop
In [65]: timeit numpy.datetime64('2013-05-11T21:23:58.970460+07:00')
1000000 loops, best of 3: 714 ns per loop
In [66]: timeit iso8601.parse_date('2013-05-11T21:23:58.970460+07:00')
10000 loops, best of 3: 23.9 µs per loop
> Other parts that are always hot include split() and string concatenation. Java compilers can substitute StringBuffers when they see naive string concatenation, but in Python there's no easy way to build a string in a complex loop and you end up putting string fragments into a list and then finally join()ing them. Madness!
The Python solution you describe is the same as in Java. If you have `String a = b + c + d;` then the compiler may optimize this using a StringBuffer as you say[1]. In Python it's also pretty cheap to do `a = b + c + d` to concatenate strings (or `''.join([b, c, d])`; but you should run a little microbenchmark to see which works best). But if it's in a "complex loop" as you opine then Java will certainly not do this. So you have to build a buffer using StringBuilder and then use toString() which is basically the same exact process except it has the name `builder.toString` instead of `''.join(builder)`
Unless of course you have some interesting insights into the jvm internals about string concatenation optimizations.
Of course you have a different machine but the OP was getting 2.5 us per parse in .NET versus your 89.5 us in Python. I wouldn't have expected such a difference. No wonder it's hot path
Well that's dateutil (installed from pip) and not datetime (std). As part of log ingestion I would, of course, convert to UTC and drop the timezone distinctions since it does slow down python a lot when it has to worry about timezones. Working within the same units and no DST issues is much nicer/quicker.
Anyway, if you're installing packages from pip, may as well just install iso8601 and get the best performance - possibly beating .Net (who knows? as you said, I have a different machine than OP).
dateutil spends most of its time inferring the format, it's not really designed as a performance component, it's designed as a "if it looks like a date we'll give you a datetime" style component.
The Java story is slightly more complex. Javac has emitted code to make a StringBuilder and call append multiple times on it, and then the JIT has spotted this construction and optimised that.
This is, as you can guess, somewhat fragile especially when some of the parts may be constants. So JEP 280 has changed javac to emit an invokeDynamic instruction with information about the constant and dynamic parts of the string so the optimisation strategy can be chosen at run time and can change over time without requiring everyone to recompile their java code.
If you are using CPython 2.4 or 2.5 or later, use the clearest code. In earlier versions using join on a list was faster, but the inplace add operation was optimized, turning the stuff about string.join being faster mostly into mythlore.
So whilst pypy is otherwise much faster than CPython, its missing of this kind of optimisation is why actually CPython can be faster for parsing my logs.
I know about this because I've been bitten by it :)
Other parts that are always hot include split() and string concatenation. Java compilers can substitute StringBuffers when they see naive string concatenation, but in Python there's no easy way to build a string in a complex loop and you end up putting string fragments into a list and then finally join()ing them. Madness!