Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



This is actually fewer bits than they're using in the article, but that's not clear because they refer to their numbers by how many "[decimal] digits", not how many bits. The hardware doubles they're starting with are 64-bits wide, representing 15.5 decimal digits, but then they move into 128- and 256-bit representations. 80-bit math, such as that available on the old x87 floating point co-processors, is I believe about 21 decimal digits.


I found that very unintuitive. Is it typical in numerical analysis to talk about the number of decimal digits represented by a binary floating point number? I don't see what is gained by changing base.


It's a bad idea, or okay as an informal thing! It is just using the result that Log10(2^53) is 15.95 to say that 53 bits has 15.95 digits of 'information.'


Or 128 bits. FORTRAN has a REAL*16 type (quadruple precision floating point number). I encountered this at university in a physics department that studied non- linear systems also known as chaos theory. Rounding errors can quickly become a problem in these systems.


> FORTRAN has a REAL16 type

Fortran (note the post-'77 spelling) doesn't. It has KINDs. REALx is a non-standard extension, and there's definitely no requirement to implement 128-bit floating point. I don't know what hardware other than POWER has hardware 128-bit FP (both IEEE and IBM format, with a somewhat painful recent change of the ppc64le ABI default in GCC etc.).


GCC also has libquadmath which provides 128-bit floats by sticking 2 64-bit floats together. More than twice as slow, but extra precision just by changing a type and linking the library.

Works great in XaoS, which is where I like my extra precision, but a deep fractal zoom can consume as much precision as you can muster. Imagine a 64kbit-precision Mandelbrot zoom!


The main use of libquadmath seems to be to support gfortran. GCC/glibc also support _Float128 and <func>f128, which are nominally part of C, but only GCC implements them as far as I know.


I believe x87 still offers 80 bits in your PC today.


Nobody uses the x87 instructions any more, they've been superseded by MMX/SSE/SSE2 which can't do 80-bit operations. And it's not clear whether 80 bits would be enough of an improvement over 64 to fix the problem.


A slight correction, no one uses MMX anymore.

It's all SSE and AVX (wider and more comprehensive instruction set) nowadays.


I knew that, but I wanted to give the whole progression.


MMX isn't a successor to x87, though; it only happens to share its registers, because that allowed them to work without changing the OS. It does integer math only. SSE was the first Intel x86 vector instruction set to do single-precision floats (AMD had 3DNow!, but it was little used), but didn't really include much integer support for the newer, wider vectors (that came back only with SSE2).


Current x87 processors still can do 80-bit arithmetic. For example:

    #include <stdio.h>
    int main(void)
    { long double ld; double d;
      ld = 1000; ld = ld/3;
      d = 1000;  d = d/3;
      printf("%.16g %.16g\n",(double)(ld-333),d-333);
      return 0;
    }
produces the following output:

    0.3333333333333334 0.3333333333333144
The 80-bit long double type gives higher accuracy.

This is on an AMD Threadripper PRO 3945WX using gcc 9.4.0.


> Nobody uses the x87 instructions any more

Check the current GCC code from inlined fmod (e.g. -Ofast). Whether that's appropriate is the question.


D (dlang.org) has a "real" type alongside float and double which represents the widest float available on the target (so 80bits if x87 is available). It does see some use.

(But don't use it without a valid reason or your program is going to be slow because no SIMD)


> a "real" type alongside float and double which represents the widest float available on the target

Initial question: what would that be on a system without hardware floating point?

Luckily, https://dlang.org/spec/type.html says (not 100% unambiguously) it isn’t necessarily “the widest float available on the target“, but either that or, if that doesn’t have both the range and precision of double, double.

So, if you decide to use it, you’re guaranteed not to give up range or precision, may gain some, but also may give up performance.


I’m confused, why does it matter if nobody else uses them, if you need 80 bits of precision? Do more people use SANE?


Part of the problem is that 80 bit operations were never designed to be a visible data type, you were expected to round to a 64 bit type at the end of a series of operations. There was never a good way of indicating to your compiler where and when those transitions would happen.


Not so strange, as they were never intended to be used for compilers to begin with (you'd write assembler code by hand). The basic idea with having 80-bit internal precision is that you could load your 64-bit values from RAM, compute on them with so much extra precision you could basically ignore precision issues within your expressions, and then store the result back correctly rounded.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: