Practically Accurate Floating-Point Math

conistonwater · on Oct 27, 2014

If you are interested in writing accurate numerical algorithms, as further reading I highly recommend reading "Accuracy and Stability of Numerical Algorithms" by Nick Higham, which is a very good comprehensive book. Like this article says, floating-point arithmetic is often thought of as mysterious, but it really isn't: it just obeys its own precisely specified rules. I think it's very good to dispel the mystery, so I really like this article.

Another way to do the same type of error analysis of numerical algorithms is to use a parameter sensitivity library. In the end, numerical stability can be thought of as the sensitivity of the final answer to separate independent changes in each intermediate value obtained in the computation of the answer. This is sometimes overlooked, I think, and it's quite easy to implement as well.

I have one quibble. They say "Use (x-y)(x+y) instead of x^2-y^2". This is true, but (in my opinion) misleading: the function (x,y) -> x^2-y^2 is itself ill-conditioned when x is near y, and using a more numerically stable algorithm to evaluate an ill-conditioned function will not improve the function's condition. If x and y are close to each other, the inaccuracy introduced by using x^2-y^2 is close to the inaccuracy due to using approximate values for x and y. So it you have x^2-y^2, I think the first thing you should worry about is whether your computation is ill-conditioned, and only then worry whether it's numerically stable. I say this because I sometimes see people try to evaluate x^2-1, notice it is inaccurate for x near 1, and try to replace x^2-1 with (x-1)*(x+1), which is still inaccurate.

dmlorenzetti · on Oct 27, 2014

The first thing you should worry about is whether your computation is ill-conditioned, and only then worry whether it's numerically stable.

OK, but how does one apply this advice in practice?

The radiative exchange between two surfaces goes as x^4-y^4 where x and y are the surface temperatures. I can worry about whether that's ill-conditioned all I want, but at the end of the day I'm further ahead evaluating (x-y)(x+y)(x^2+y^2) than the original expression.

Similarly, if I need to find x^2-1, what recourse do I have? Maybe there's some reduction that makes the computation more stable. But normally I would expect to discover such a reduction by turning x^2-1 into (x-1)(x+1), and then seeing if one of those factors cancels out somewhere. So it still seems like a good first step.

conistonwater · on Oct 28, 2014

I don't know about your radiative exchange example, but it is often possible to reparametrize the problem. For example, sometimes you can replace the parameters (x,y) with (s,t)=(x-y, y), which would get you further: x^2-y^2 = s (s+2t), which is stable at s=0, and if x>y>0, then it's stable everywhere. This can shift the region of instability to somewhere far from typical parameter values. Otherwise, yes, this isn't solvable.

stephencanon · on Oct 27, 2014

The correct way to evaluate x^2 - 1 is by using fma(x,x,-1). Now that Intel and AMD have finally made FMA available in hardware on commodity parts (better late than never!), it's realistic to start using this much more freely.

conistonwater · on Oct 27, 2014

That won't do much for x near 1, because the function x^2-1 is itself ill-conditioned. In other words, it is relevant that the floating-point value of x is itself only an approximation to some true value of x. So computing x^2-1 exactly for a given floating-point value of x does not give a good approximation to the true value of x^2-1. This is a mathematical property of the function x^2-1, and cannot be fixed with any algorithm. This is basically why I consider the example x^2-1 => (x-y)(x+y) misleading.

stephencanon · on Oct 27, 2014

There are multiple ways to analyze computations; condition number is one of them. It is relevant when inputs are assumed to be approximations to some hidden "correct" value. However, this assumption is not always warranted when dealing with floating-point numbers; sometimes, we would instead like to analyze errors under the assumption that the inputs are exact values. When this is the case, fma(x,x,-1) produces a correctly rounded result, whereas (x-1)(x+1) does not, and the ill-conditioning of the function x^2 - 1 does not enter into the analysis anywhere.

If you are used to working with complex computations (most of what is usually referred to as numerical analysis, for example), you likely do not find yourself in the latter situation very often. However, for those of us who work on low-level details of floating-point on a regular basis, the latter analysis is often critical.

conistonwater · on Oct 27, 2014

Okay, I agree, that is completely reasonable. I do usually work with fairly complex algorithms, so I care about individual operations less than about the final result.

acqq · on Oct 27, 2014

The expression won't turn to be good conditioned only by using fma. You actually missed the point of conistonwater's post.

stephencanon · on Oct 27, 2014

Condition number only matters if you assume that the inputs are perturbed by some error. Under the assumption that inputs are exact (which is appropriate to make for some uses), it does not enter into the analysis.

dalke · on Oct 27, 2014

"Use (x-y)(x+y)" is part of the "folk wisdom." It's between "We even know some folk wisdom intended to avoid the dangers." and "or is it increasing magnitude?", where the last bit shows that perhaps these aren't the best solutions.

conistonwater · on Oct 27, 2014

No, later in the article they recommend using it, saying that it decreases floating-point errors.

dalke · on Oct 27, 2014

Ahh! Yes, in the box on p94 titled "About That Folk Wisdom". Thanks for the correction!

SFjulie1 · on Oct 28, 2014

Sorry to tell you, but 99% of so called expert developers don't understand this. For them a real number is a a float (exactly). And I have been fired for saying e-commerce should be done with fixed point numbers. (we do + - * / at most) Headaches really comes with multi currency web sites since conversion brings incommesurability. They also told me that since we are using linear equation most of the time (are we?), errors should dissipates like the breeze (they also kind of say badlands are so statiscally rare that we sould not care).

Anyway, in real job as a dev, when I made a moving average it was the most impressive math I done, and when I was resorting to ifft (convolution with a hammer function) I was told it should not be used because it was unmaintainable because no one understood.

I have been so ... hammered down by computer engineer so sure of themselves that I learnt to shut my mouth to keep my job.

Who really read this?

I love it, but I never did any serious brainy stuff as a dev.

ntoronto · on Oct 28, 2014

Ugh. It's shameful when people use wrong arguments about mathematics to oppress. I really wish you the best.

This article first appeared in "Computing in Science and Engineering" magazine, so its audience is mainly scientists in various fields who use floating point in their research. Because of that breadth, I tried to make it as accessible as possible, while teaching enough floating-point principles to make debugging make sense.

I honestly hope that everyone who uses floating point reads this.

SFjulie1 · on Oct 29, 2014

Computer science on the field is ugly.

I went in university to learn physics, so I have my geometry and numerical analysis basic understanding.

I honestly thinks people who learnt it has a cursus for graduating did not care, and that they actually don't understand their difficulties even though they learnt about it. They don't see it is a float.

One of the other misconception I have to fight with is what is time.

CS engineer believe in a perfect synchronized time and timestamp that are monotonicly growing the same everywhere with a 10-9 precision and rely on that for distributed systems. My guts (and general relativity too I guess) are telling me they are wrong.

But since I am stuck reimplementing a bubble sort for phone numbers I don't quite have the time to work on it. I have to live with my tingling intuitions telling me I am either an overpessimistic ass being a pain for my colleague that is becoming obsolete by lack of studying, or the whole industry seems getting more magic oriented than science oriented.

Anyway, like a lot of people I am getting inadapted out there.

At least, I have spare time to work on game of life and complex systems... sometimes, and I have a lovely caring wife, so I am happy either way :)

ThatGeoGuy · on Oct 27, 2014

Probably an obvious error, but on page 81 (second page), they say the following:

    Another correct test case is as follows:
    > (rat (float -1 23 -6))
    - (26) / (34)

In truth, this should be - 23 / 64, which is a very different number. Perhaps there's comical value in an article about accurate math having inaccurate typography?

ntoronto · on Oct 27, 2014

That is indeed a typo. Thanks for catching it.

It's amazing how much you miss when you think you already know what's written.

dvt · on Oct 27, 2014

Even though refreshing and enlightening, the article doesn't cover another (major) reason why floating-point math is generally avoided (especially in high-performance applications): computational slowdown when dealing with subnormals[1][2].

I feel that there is a lot of overlap between cases where you may want to minimize error while at the same time still be performant (simulations, ray tracing, rendering, etc.). So you're left with a situation where you end up minimizing error, but still being slow.

[1] http://research.ihost.com/osihpa/osihpa-govind.pdf

[2] https://charm.cs.illinois.edu/newPapers/06-13/paper.pdf

stephencanon · on Oct 27, 2014

A couple points:

Support for subnormals can easily be disabled (usually be enabling "Flush to Zero" behavior) to allow for better performance on essentially all recent hardware.

Over the last few years, subnormal stalls have shrunk rapidly, and gone away entirely in some cases. Intel started this process with the Sandybridge architecture. These stalls are also greatly reduced in some arm64 implementations.

ntoronto · on Oct 28, 2014

We had this discussion on the Racket user's mailing list a couple of years ago. One user with an AMD processor experienced significant stalls; others with Intel processors didn't.

I don't know the status of AMD's floating point at the moment, but I hope they're doing something about that.

malkia · on Oct 27, 2014

AFAIK gpu's and ps3 cell chip did not support subnormals for that reason, also as the other poster said, they can be disabled. Alternatively you can add white-noise to your data to avoid it (Say after an IIR filter on sound data).

ntoronto · on Oct 27, 2014

This article (like all other IEEE Computing Now articles) is free to download for a limited time.

alricb · on Oct 27, 2014

Is it me, or have all the ells (the letter L) been replaced by ones (the digit 1) in the PDF?

Edit: It's me. It's just a font with sloping serifs, which is what confused me.

ntoronto · on Oct 27, 2014

It's not necessarily just you. I thought it was wrong when I first opened it. There may be something funny going on with the font - I think the "1" and "l" are switched.

Edit: Never mind, it copies/pastes correctly. The font is just different from what I'm used to.

alricb · on Oct 27, 2014

TIL: Prestige Elite [1] might not be the best font out there for programming.

1: http://en.wikipedia.org/wiki/Prestige_Elite

wukix · on Oct 27, 2014

Since we're on the subject of floating-point numbers in Lisp/Scheme: https://wukix.com/lisp-decimals

This uses the Lisp ratio type in place of floats, to avoid the usual float issues.

ntoronto, any thoughts?

malkia · on Oct 27, 2014

ratio's are good to have in any language, or as a library, but surely it's slower, and more memory is wasted (also hard to argue about how much exactly), and it still has to settle at some approximations for irrational numbers like PI which is used quite a lot when floating point numbers are (or at least in my experience).

ntoronto · on Oct 27, 2014

All correct, for exact rationals. Racket's exact rational arithmetic tends to take about 1000x the amount of time floating-point arithmetic takes to compute similar functions, and creates a lot of garbage on the heap. It gets worse, though: with long-running computations, exact rationals' numerators and denominators tend to grow without bound, unless you explicitly round them to a fixed precision. If you take the latter path, you might as well use bigfloats, which wrap MPFR's multi-precision floats, and are faster.

Just looking at the Wu-Decimal page, though, I can't tell whether they're internally exact rationals or base-10 floats. If they're base-10 floats, they have all the same issues base-2 floats have. If they're internally exact rationals, I wonder what they do for division, which the set D isn't closed under.

wukix · on Oct 28, 2014

Wu-Decimal uses exact rationals (they aren't base 10 floats). Division works according to normal Lisp semantics, since the CL ratio type is used for arithmetic. Let's say you divide something by 3 and now you have infinitely repeating digits: then it is no longer in set D, and Wu-Decimal no longer considers it to be of decimal type. Instead, it is treated as a fraction, again per standard CL semantics. The "Printing" example tries to clarify this (notice that 1/2 prints as '0.5' but 1/3 remains '1/3').

ntoronto · on Oct 28, 2014

I've monkeyed with a lot of numeric representations, and every one of them involves tough theoretical and practical trade-offs. Returning an exact rational as the result of division sounds reasonable.

My favorite representation so far pairs a signed float in [2^-512,2^512) with a signed integer to extend the exponent range. The idea is to avoid overflow and underflow, particularly when multiplying thousands of probabilities.

(On average, adding a few thousand log probabilities or densities and then exponentiating the sum yields a number with about 9 digits precision. Worst case for adding log probabilities and exponentiating is about 8 digits precision, and for adding log densities it's 0 digits. Multiplying the same number of probabilities or densities retains about 13 digits in the worst case.)

wukix · on Oct 28, 2014

If they're base-10 floats, they have all the same issues base-2 floats have

Nitpick: although base 10 floats have similar issues related to being of limited precision, they are superior to base 2 floats in the aspect of representing decimal numbers without the binary approximation: e.g. 1/10 is nonrepeating as 1.0 x 10^-1 but infinitely repeating as binary, so for IEEE 754 binary32 you get 1.10011001100110011001101 (1.60000002384185791015625) x 2^-4.

alricb · on Oct 28, 2014

Yes, but they're about as bad for representing base 12 numbers, so it evens out.