ComplexSystems
5 months ago
This article is saying that it can be numerically unstable in certain situations, not that it's theoretically incorrect.
adgjlsfhk1
5 months ago
This is less about numerical instability and more that iterative algorithms with error control their error, but when you run AD on them you are ADing the approximation and a derivative of an approximation can be arbitrarily different from an approximation of a derivative.
ogogmad
5 months ago
That makes more sense. The title is flat out wrong IMO.
adgjlsfhk1
5 months ago
I think it is correct. lots of people view AD as a black box that you can throw algorithms to and get derivatives out, and this shows that that isn't true
wakawaka28
5 months ago
If you wrote code that failed to compile, you wouldn't impulsively call your compiler incorrect. This title sounds like it puts the blame in the wrong place. You can get error accumulation from even a basic calculation in a loop. We could try to solve these problems but it's not the algorithm's fault you don't know what you're doing.
ChrisRackauckas
5 months ago
This has nothing to do with floating point error accumulation or numerical stability in the floating point sense. You can do this with arbitrary sized floating point values and you will still get the same non-convergence result.
wakawaka28
5 months ago
Yes, it does. I admit that I have not tried to get deep into this, but it says right in the summary that this is due to error propagation and numerical instability. You can have such errors in ordinary hand-written code. The size of the float does not matter (much) for some examples, which wipe out a bunch of significant figures from the result. I'm not going to sit here and pretend I know exactly the details going on here but I studied numerical analysis and AD quite a bit back in the day. Even in the 2010s people knew that there was a chance of the resulting expressions having inherent instability. Even simple things like the quadratic formula have more stable and less stable forms. How much worse might it be for auto-generated expressions (or expression sequences, or equivalent; AD can be done a few ways) such as in AD? AD guarantees analytically correct logic (in infinite precision, for example) if you use it right but error analysis is not even attempted by most libraries.
adgjlsfhk1
5 months ago
> AD guarantees analytically correct logic (in infinite precision, for example) if you use it right
The entire point of the video is that this isn't true. It is true for static algorithms, but for algorithms that iterate to convergence, the AD will ensure that the primal has converged, but will not ensure the dual has converged.
wakawaka28
5 months ago
I don't think you understood what I wrote there. The rules for algebraically computing a derivative are simple and deterministic and essentially captured within AD algorithms. They MUST therefore be correct in an analytical sense, given infinite precision. The video starts off by saying, they are dealing with considerations about how computers actually work. That kind of implies finite precision. Like I said, concerns about stability of the methods are not new. Your original function might not be differentiable at any given point, for example. You have to know about that stuff rather than blindly applying "automatic" techniques. There is a lot of literature about how to use AD and what can go wrong. This is a paper I just found in a basic search, that is a survey of known pitfalls: https://wires.onlinelibrary.wiley.com/doi/full/10.1002/widm....
The entire point of the video is mired in an hour or so of details about how they had trouble using it for solving ODEs. I am familiar with forward and reverse mode but for me to appreciate it I would have to get up to speed with their exact problem and terminology. Anyway, my point is that AD requires you to know what you are doing. This video seems like a valuable contribution to the state of the art but I think you have to recognize that the potential for problems was known to numerical analysis experts for decades, so this is not as groundbreaking as it appears. The title should read, "Automatic differentiation can be tricky to use" to establish that it is in fact a skill issue. The mitigation of these corner cases is valuable, to make them more versatile or foolproof. But the algorithms are not incorrect just because you didn't get it to solve your problem.
wakawaka28
5 months ago
Not to spam you, but this is probably a function that would not work for AD: https://math.stackexchange.com/questions/2383397/differentia...
That is, it is a series that converges, but trying to take the derivative as a sum of individual terms results in divergence. I learned a lot of this type stuff ages ago, but in 2025 I just searched for an example lol... I am long overdue for a review of numerical analysis and real analysis.
ChatGPT also says something about an example related to some Fourier series, maybe related to this: https://en.m.wikipedia.org/wiki/Convergence_of_Fourier_serie... You can ask it all about this stuff. It seems pretty decent, although I have not gone too far into it.
adgjlsfhk1
5 months ago
Don't worry, This is interesting! AD should work on this example (at all points where the derivative converges) see this desmos graph for a very informal proof that the series converges https://www.desmos.com/calculator/djf8qtilok.
The place where I think we're talking past each other here is that in infinite precision, AD perfectly differentiates your algorithm, but even an algorithm using arbitrary (or even infinite) precision math, that to high accuracy controls the error of a differentialable problem, AD can still do weird things.
wakawaka28
4 months ago
Try that with `g(x, i) = sin(ix) / i`. I think that is one that ChatGPT said wouldn't work, as in you can't get the derivative of `f(x)` term-by-term. I guess another issue that could happen is that the original sequence converges, and the derivative sequence converges, but they converge at different rates. So code that calculates the function to sufficient precision would not automatically get the derivative to any particular error threshold.
adgjlsfhk1
4 months ago
> g(x, i) = sin(ix) / i
That's an example where the derivative does not exist.
> I guess another issue that could happen is that the original sequence converges, and the derivative sequence converges, but they converge at different rates.
This is a lot closer to what's happening in the video. For a potentially simpler example than an ODE solver, if you had a series evaluator that given a series evaluated it at a point, AD would need a similar fix to make sure the convergence test is including the convergence of the derivative.
omnicognate
5 months ago
Yeah, perhaps the actual title would be better: "The Numerical Analysis of Differentiable Simulation". (Rather than the subtitle, which is itself a poor rewording of the actual subtitle in the video.)
goosedragons
5 months ago
It can be both. A mistake in AD primitives can lead to theoretically incorrect derivatives. With the system I use I have run into a few scenarios where edge cases aren't totally covered leading to the wrong result.
I have also run into numerical instability too.
froobius
5 months ago
> A mistake in AD primitives can lead to theoretically incorrect derivatives
Ok but that's true of any program. A mistake in the implementation of the program can lead to mistakes in the result of the program...
goosedragons
5 months ago
That's true! But it's also true that any program dealing with floats can run into numerical instability if care isn't taken to avoid it, no?
It's also not necessarily immediately obvious that the derivatives ARE wrong if the implementation is wrong.
srean
5 months ago
> It's also not necessarily immediately obvious that the derivatives ARE wrong if the implementation is wrong.
It's neither full proof or fool proof but an absolute must is a check that the loss function is reducing. It quickly detects a common error that the sign came out wrong in my gradient call. Part of good practice one learns in grad school.
froobius
5 months ago
You can pretty concretely and easily check that the AD primatives are correct by comparing them to numerical differentiation.
godelski
5 months ago
I haven't watched the video but the text says they're getting like 60+% error on simple linear ODEs which is pretty problematic.
You're right, but the scale of the problem seems to be the issue