NumPy QuadDType: Quadruple Precision for Everyone

52 pointsposted 9 hours ago
by rbanffy

44 Comments

nestorD

6 hours ago

I have done my PhD on measuring floating-point error in large numerical simulations. In my experience, most of the time, if 64-bit floating points are not enough, then trying to increase precision further is unlikely to help much (there are exceptions).

In general, using a well-placed compensated (or exact) sum/dot product algorithm will solve most of your problems at a low cost. Nowadays, most people are familiar with Kahan summation (and related tricks like sorting numbers), but they do not know that it does not help much (often less than 128-bit floating points) and that the state of the art for those algorithms has moved quite a bit since. Nowadays, you can ensure that your sum/BLAS operation produces a result within one ulp of the exact result!

I highly recommend perusing the Handbook of Floating-Point Arithmetic (the Accurate Algorithm online page also seems fine [0]) to get an idea of what can be done. Implementations of these ideas can be found in most languages used for numerical computation (a quick Google search gave me options for C++/Python [1], Rust [2], etc.).

[0]: https://accurate-algorithms.readthedocs.io/en/latest/index.h...

[1]: https://github.com/yafshar/xsum

[2]: https://github.com/bsteinb/accurate

sebzim4500

5 hours ago

That's also my experience.

It's interesting that 64 bits are often necessary (as opposed to 32) but 128 bits so rarely are.

nestorD

2 hours ago

Yes! I suspect that it boils down to 32 bits not being that much (you can enumerate them to test a function), while 64 bits is a lot. By that point, if you are having problems, then you are likely hitting a wall that will not be crossed by throwing additional bits at the problem.

PaulHoule

5 hours ago

That’s what I was taught about numerics doing my PhD in the 1990s. I was taught that you shouldn’t even use doubles if you could use singles.

Conscat

8 hours ago

Does anyone have examples of calculations that need quadruple precision? My last work place seemed fine using double precision for simulating satellites in orbit, and I have the imprecision double precision makes black holes and quantum physics simulation work correctly, so I have no intuition for this.

gh02t

8 hours ago

I see it in statistics. It's not uncommon in a lot of statistical algorithms to need to take the product (or sum) of a lot of very small numbers and then divide by that product when computing likelihoods of a bunch of independent measurements. There are some algebra tricks you can use to get around loss of precision like working in the log domain or some tedious scaling tricks, but working in quad precision can help when those aren't enough.

The article also gives an example from quantum physics. Some chaotic or highly numerically unstable problems just need quad precision to not blow up; primarily problems/algorithms (can be the problem itself or the algorithm to solve it that causes the issues) that tend to catastrophically amplify small errors. Matrix inversion is another example, to the point that most algorithms try to do almost anything to avoid explicitly inverting a matrix. But sometimes you just have to.

Dylan16807

7 hours ago

To put some specifics on "a lot of very small numbers", it's around a million numbers with part-per-billion precision, or a billion numbers with part-per-million precision, where you approach the limits of naive summation in double precision.

gh02t

6 hours ago

For sums, yes, but products and division by small numbers are the real killer and those come up a lot in stats. Hence why you try to avoid them by working with log-probabilities (where products become sums), but sometimes you can't. Quad precision is a bit of a bandaid that just pushes the issue off when you don't have a better algorithm, but it works sometimes.

slashdave

7 hours ago

Matrix inversion precision can be fixed iteratively (at a CPU cost, of course). Nearly singular matrices should probably be factored in any case.

gh02t

7 hours ago

Probably, that falls into "doing almost anything to avoid inverting a matrix explicitly", but direct inversion is the classic example here.

kolbe

7 hours ago

For numerical PDE solvers, you never do the actual inversion, but the discretization of the system (denoted A) can have significant sensitivity to numerical precision. Look up Condition Number and Preconditioning. In fact, resolving the problems with numerical precision for some A takes as much time as actually solving the system, and it's worth it.

gh02t

6 hours ago

Oh I'm very aware, I'm lumping those all into "tricks used to avoid/work around the numerical issues associated with matrix inversion" for simplicity, because explicit computation of a matrix inverse is one of the classic examples of a numerically unstable task. Hence why a large subfield of applied math can be summarized as "coming up with ways to avoid matrix inversion." PDE solvers like you mention are one of the main applications for that.

Tricks like clever factorization (which a lot of factorization algorithms have their own severe numerical issues e.g. some of the ways to compute QR or SVD), preconditioners, sparse and/or iterative algorithms like GMRES, randomized algorithms (my favorite) etc are all workarounds you wouldn't need if there was a fast and stable way to exactly invert any arbitrary non-singular matrix. Well, you would have less need for, there are other benefits to those methods but improving numerical stability by avoiding inverses is one of the main ones.

gmiller123456

7 hours ago

One situation that already uses two doubles, which you may already be familiar with, is the International Astronomical Union's Standards of Fundamental Astronomy library. It uses two doubles for the date in order to be able to be able to have pico/nano second precision over thousands of years. Of course the algorithms aren't precise to that level over thousands of years, but you still need the ability to specify it to that level during the time periods they are that accurate for.

mitthrowaway2

5 hours ago

Interesting. I would have thought that fixed-point would make more sense for a time quantity. If you use a 64-bit integer to represent picoseconds, it gives you a time range of about 18 million years.

Dylan16807

a minute ago

You calculated something wrong. 2^64 picoseconds is only half a year.

Once you decide you need to go over 64 bits, it largely becomes a matter of convenience, and I can easily see two doubles being more convenient than either a quad or a 128-bit integer.

mitthrowaway2

5 hours ago

It's a very quick and easy / lazy way to measure the floating-point error in your existing double-precision algorithms; run the computation at quadruple precision and check for differences in the output. If there's any significant difference then you take a closer look.

physicsguy

7 hours ago

You can easily get catastrophic cancellation in physical simulations, people are just not very aware of it.

If for e.g. you see bare summation rather than some sort of sorting by magnitude then expect bad results as numbers get bigger.

bee_rider

7 hours ago

It is a bit weird though, because people also get very, very clever about designing their physical simulations so that they don’t have to care too much about their working precision.

physicsguy

5 hours ago

Yeah, GPUs actually helped with this a lot when they first came out because suddenly to get the great performance you had to drop to single precision, at least until 2018 or so (can’t remember exactly now, but it used to be the case that the older cards were 2x as slow at DP)

bee_rider

5 hours ago

I don’t really do GPU stuff, but I expect double to be at least 1/2 as slow as single, even on good hardware. I mean it is twice as many bits. (Maybe you think in bits/second, though, which is totally fair).

I looked up my consumer card on a whim (playing with sprinkling some GPGPU into my code—cupy is actually really easy to use and ergonomic!), and apparently doubles are 1/32 as fast as singles. Yikes!

gnulinux

7 hours ago

I use it for generating music through physical models. If the model requires a very numeric-sensitive algorithm like a differential equation solver, it's nice to have the extra precision, and usually performance doesn't matter too much in these sort of things (because ultimately, I just click "render" and wait anyway). To be very honest, I don't even know if I need 128bits, maybe a 32bit float would do... I never ran a comparative analysis. It just gives me more room for error, so that I can focus on the music/sound and not too much on the numerical analysis/programming. Working on hobby music+programming projects is the ultimate freedom.

Dylan16807

7 hours ago

I'm glad you're having fun but if you never compared it to double and single then you're not giving a valid example. Especially if you would have picked even higher if it was available.

sfpotter

7 hours ago

Basically guaranteed that you don't need quad for this.

a1369209993

4 hours ago

Of course, you also need to use solid gold audio cables, or you're just throwing away the extra precision via poor signal fidelity.

colechristensen

7 hours ago

Any long running calculation where the new state depends on the old state and there isn't much of a correction factor. Anything with "sensitive dependance on initial conditions", any dynamical systems.

For example: simulating the three body problem (three celestial bodies of equalish size orbiting eachother). Very small differences get amplified, and any difference at all in how the math is implemented on different systems will result in divergent solutions.

awoimbee

8 hours ago

> The long double type varies dramatically across platforms: > [...] > What sets numpy_quaddtype apart is its dual-backend approach: > Long Double: This backend uses the native long double type, which can offer up to 80-bit precision on some systems allowing backwads compatibility with np.longdouble.

Why introduce a new alias that might be 128 bits but also 80 ? IMO the world should focus on well defined types (f8, f16, f32, f64, f80, f128), then maybe add aliases.

bee_rider

6 hours ago

Maybe it depends on the application?

If you have written a linear system solver, you might prefer to express yourself in terms of single/double precision. The user is responsible for knowing if their matrix can be represented in single precision (whatever that means, it is their matrix and their hardware after all), how well conditioned it is, all that stuff. You might rather care that you are working in single precision, and that there exists a double precision (with, it is assumed, hardware support, because GPUs can’t hurt us if we pretend they don’t exist) to do iterative refinement in if you need to.

0cf8612b2e1e

7 hours ago

Really seems like propagating the current platform inconsistencies into the future. Stick with 128 always, performance be damned. Slow code is much preferable to subtlety broken because you switched the host OS.

gnulinux

7 hours ago

Especially if you need 128-bit float precision. It's very well known and understood that quad float is much slower in most platforms, extremely slow in some. If you're using quad float, it's because you absolutely need need all 128 bits, so why even reduce it to 80 bits for "performance"? Seems like an irrelevant design choice. Programmer can choose between f128 vs f80 if performance is intractable in the target platform.

sfpotter

6 hours ago

The only real place for quad precision is when you're working with numbers that truly require you to be able span a massive difference in magnitudes without round-off error accumulating.

For some context, an electron "has structure" roughly on a length scale of about 10^-15 meters (details here unimportant). If you're representing a quantity in O(1) meters, then doing a handful of floating-point operations in double will likely accumulate error on the order of O(1) electrons. Incredible accuracy!

I think the key here is to be mindful of units and make sure you're combining like with like and scaling things appropriately as you go. If you're doing things the wrong way, quad will punish you eventually, too.

An underappreciated tool for working more confidently with floats is interval arithmetic. Worth checking out if you actually (truly) need to do things robustly (most people don't).

andrepd

6 hours ago

That's one of the (many) problems with ieee floats: they "waste" as much bits for mantissa precision in the range of 10^0, 10^1, 10^2, etc, as they do in 10^300 or 10^-300!

Proposals like Posits aim to improve on this :)

sfpotter

6 hours ago

IEEE floats are an incredible feat of engineering. The people who designed them were geniuses. We should be appreciative of the fact that we have a consistent and portable way to do numerics. I'm glad we've got IEEE 754 and don't have to suffer through the pain of numerous competing standards.

nestorD

6 hours ago

In practice, Posits grabs very few additional bits and those are mostly felt in low bits (16 and less) regimes. That's a big reason it is mostly seeing adoption in memory constraint regimes.

A great thing in the Posit proposal is that they also consider a distinct accumulator type for things like sums and dot product. In my experience (I did a PhD on measuring floating point error in large numerical simulations) those operations are the ones that are most likely to sneak on you and need fixing. But most people don't realize that those accumulator type are actually easy to use with IEEE floats, they are entirely orthogonal to using Posits.

jabl

8 hours ago

As an aside, the GNU Fortran compiler (gfortran) also provides a quad precision floating point type, similar to this one. If the target supports native quad precision it uses that, otherwise it falls back to software emulation.

Math functions are provided via libquadmath. Newer glibc also provides quad precision functions with slightly different names, IIRC using a f128 suffix rather than "q" like libquadmath.

Archit3ch

6 hours ago

> Math functions are provided via libquadmath

Notably lacking support for ARM.

slashdave

7 hours ago

gnu in general (c++ for example).

JanisErdmanis

7 hours ago

Unlikely to be user here, just curious. If one would be willing to use numpy_quaddtype with scipy or other libraries that depend on the numpy would that be possible without adding this library as dependency to them?

jofer

6 hours ago

I am very happy to see that the article _immediately_ explains how this is different from long-existing np.float128/np.longdouble.

I was scratching my head for a bit there... It's also good to see that custom dtypes have come a long way. This is a great application of a custom dtype, and that wouldn't have been possible a few years back (well, okay, a "few" to me is probably like 8 years, but still).

OutOfHere

8 hours ago

Is there something here for people to use?

Can the comparison also be done using the max precision unsigned integers instead of just float64?

slashdave

7 hours ago

In my experience, it can be helpful for quickly fixing precision issues. Sometimes you just want to get on with your calculations and not go back and reformulate your numerical weak points. It's less necessary with good practice.

Archit3ch

6 hours ago

Should be fun for audio circuit simulations!

kolbe

7 hours ago

Until my hardware implements f128, even if I had a solid use for it, I cannot sacrifice the speed relative to f64.