Does anyone know if -O3 (or -O2?) -march=native should be enough to get reasonab...

nkurz · on Nov 30, 2014

I think yes. A couple weeks ago I tested compilation for a project on Haswell and Sandy Bridge using various recent versions of CLang, GCC, and ICC to determine whether it was safe to use just "-O3 -march=native" in combination with "#include <x86intrin.h>" instead of more specific versions.

While my testing was far from rigorous, my conclusion was that this is now sufficient and acceptable to get appropriate platform-specific SIMD optimization.

You'll find some recent arguments for "-Ofast" instead of "-O3" or "-O2", but compiler versions that don't support this are still recent enough to common. Others occasionally argue that "-Os" is a better modern default, but I haven't found this to be true. Although more debatable, I'd suggest "-g -Wall -Wextra" also be part of the defaults compiler options for most projects.

e12e · on Nov 30, 2014

Thank you both, and especially for the mention of x86intrin.h ... I wasn't aware of that.

naner · on Nov 30, 2014

It's probably fine. Back in the day (early 00s) I used Gentoo and I think pretty much everyone used march (because why not?) with no real negative side effects. It may be a little more buggy with uncommon or new architectures.

I did have the occasional weird breakage with -Os, though (once it broke make...). Anyways, -O2, -O3, and -Os should be pretty reliable as I imagine they are the most used optimization flags these days.

You generally don't have to tweak options manually unless a small increase in performance is very important to your application or you are running specialized programs that will benefit largely from a specific optimization. Remember -- most Linux distributions use pretty basic cflags and are reasonably performant.

Come to think of it, the Gentoo project has probably been useful in rooting out weird cflag bugs in GCC. :)

EDIT: Also check this out:

http://wiki.gentoo.org/wiki/Safe_CFLAGS

bluecalm · on Nov 30, 2014

I remember that -flto sometimes adds few %'s to overall speed. If you are doing a lot of floating point math you can check various modes. First try is always -Ofast which turns on -ffast-math flag. From the gcc page:

    -ffast-math
    Sets -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range.

    This option causes the preprocessor macro __FAST_MATH__ to be defined.

    This option is not turned on by any -O option besides -Ofast since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.

I also remember that gcc started producing significantly faster code for my projects at around 4.7 or 4.8 version (can't remember which one).

yoklov · on Nov 30, 2014

`-ffast-math` is a double-edged sword. It can frequently optimize away stuff like NaN checks and the like, breaking code that uses NaN as a missing value. This, specifically, has bitten me before, but there are certainly other areas that it can bite you.

OTOH it gives the compiler to do a great deal of algebraic simplifications, including expression reordering. This probably will bring what the compiler can actually do more in line with what you think it should be able to do.

Basically, you need to test with it on if you're going to use `-ffast-math`. You might also have good luck with turning on a subset of the flags. For example, IIRC in the project with the NaN checks, using all of them except `-ffinite-math-only` fixed the problem in this case.

Some of them are obvious to turn on though. `-fno-math-errno` should be the default for most programs, if you ask me. I've never seen anybody check `errno` to see if their call to, e.g. `sqrt`, was invalid, and I hope I never do.

emn13 · on Nov 30, 2014

If -ffast-math won't work, a few more to consider:

-fassociative-math (generally safe unless you're dealing with math written specifically to take advantage of the details of floating point arithmetic)

-fno-signed-zeros - again, though it's possible some code depends on this, it's rather unlikely.

-fno-trapping-math - do you actually use traps?