crazygringo
7 months ago
This fails to acknowledge that synthesized noise can lack the detail and information in the original noise.
When you watch a high-quality encode that includes the actual noise, there is a startling increase in resolution from seeing a still to seeing the video. The noise is effectively dancing over a signal, and at 24 fps the signal is still perfectly clear behind it.
Whereas if you lossily encode a still that discards the noise and then adds back artificial noise to match the original "aesthetically", the original detail is non-recoverable if this is done frame-by-frame. Watching at 24 fps produces a fundamentally blurrier viewing experience. And it's not subtle -- on old noisy movies the difference in detail can be 2x.
Now, if h.265 or AV1 is actually building its "noise-removed" frames by always taking into account several preceding and following frames while accounting for movement, it could in theory discover the signal of the full detail across time and encode that, and there wouldn't be any loss in detail. But I don't think it does? I'd love to know if I'm mistaken.
But basically, the point is: comparing noise removal and synthesis can't be done using still images. You have to see an actual video comparison side-by-side to determine if detail is being thrown away or preserved. Noise isn't just noise -- noise is detail too.
kderbe
7 months ago
Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely). So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.
Regarding aesthetics, I don't think AV1 synthesized grain takes into account the size of the grains in the source video, so chunky grain from an old film source, with its big silver halide crystals, will appear as fine grain in the synthesis, which looks wrong (this might be mitigated by a good film denoiser). It also doesn't model film's separate color components properly, but supposedly that doesn't matter because Netflix's video sources are often chroma subsampled to begin with: https://norkin.org/pdf/DCC_2018_AV1_film_grain.pdf
Disclaimer: I just read about this stuff casually so I could be wrong.
zoky
7 months ago
> Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely)
That might seem like a reasonable assumption, but in practice it’s not really the case. Due to nonlinear response curves, adding noise to a bright part of an image has far less effect than a darker part. If the image is completely blown out the grain may not be discernible at all. So practically speaking, grain does travel with objects in a scene.
This means detail is indeed encoded in grain to an extent. If you algorithmically denoise an image and then subtract the result from the original to get only the grain, you can easily see “ghost” patterns in the grain that reflect the original image. This represents lost image data that cannot be recovered by adding synthetic grain.
wyager
7 months ago
It sounds like the "scaling function" mentioned in the article may be intended to account for the nonlinear interaction of the noise.
creato
7 months ago
> If you algorithmically denoise an image and then subtract the result from the original to get only the grain, you can easily see “ghost” patterns in the grain that reflect the original image. This represents lost image data that cannot be recovered by adding synthetic grain.
The synthesized grain is dependent on the brightness. If you were to just replace the frames with the synthesized grain described in the OP post instead of adding it, you would see something very similar.
majormajor
7 months ago
> So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.
The problem is that the initial noise-removal and compression passes still removed detail (that is more visible in motion than in stills) that you aren't adding back.
If you do noise-removal well you don't have to lose detail over time.
But it's much harder to do streaming-level video compression on a noisy source without losing that detail.
The grain they're adding somewhat distracts from the compression blurriness but doesn't bring back the detail.
Thorrez
7 months ago
>The grain they're adding somewhat distracts from the compression blurriness but doesn't bring back the detail.
Instead of wasting bits trying to compress noise, they can remove noise first, then compress, then add noise back. So now there aren't wasted bits compressing noise, and those bits can be used to compress detail instead of noise. So if you compare FGS compression vs non-FGS compression at the same bitrate, the FGS compression did add some detail back.
strogonoff
7 months ago
I imagined that at some point someone would come up with the idea “let’s remove more noise to compress things better and then add it back on the client”. Turns out, it is Netflix (I mean, who else wins so much from saving bandwidth).
Personally I rejected the idea after thinking about it for a couple of minutes, and I’m not yet sure I was wrong.
The challenge with noise is that it is actually cannot be perfectly automatically distinguished and removed from what could be finer details and textures even in a still photo, not to mention high-resolution footage. If removing noise was as simple as that, digital photography would be completely different. If you have removed noise, you can’t just add back missing detail later—if you could, you would not have removed it in the first place (alas, no algorithm is good enough, and even human eye can be faulty).
Thorrez
7 months ago
I'm not saying that the final result is as good as the original.
I'm saying that the final result is better than standard compression at the same bitrate.
strogonoff
7 months ago
That might be true; however, if this takes hold I would be surprised if they choose to keep producing and shipping the tasty grain high fidelity footage.
Considering that NR is generally among the very first steps in development pipeline (as that’s where it is the most effective), and the rest of dynamic range wrangling and colour grading comes on top of it, they might consider it a “waste” to 1) process two times (once with this new extreme NR, once with minimal NR that leaves the original grain), 2) keep around both copies, and especially (the costliest step) to 3) ship that delicious analog noise over Internet to people who want quality.
I mean, how far do we go? It’ll take even less bandwidth to just ship prompts to a client that generates the entire thing on the fly. Imagine the compression ratios…
Thorrez
7 months ago
That argument could be made to reject any form of lossy compression.
Lossy compression enables many use cases that would otherwise be impossible. Is it annoying that streaming companies drive the bitrate overly low? Yes. However, we shouldn't blame the existence of lossy compression algorithms for that. Without lossy compression, streaming wouldn't be feasible in the first place.
crazygringo
7 months ago
> Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely). So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.
Sorry if I wasn't clear -- I was referring to the underlying objects moving. The codec is trying to capture those details, the same way our eye does.
But regardless of that, you absolutely cannot compare stills. Stills do not allow you to compare against the detail that is only visible over a number of frames.
godelski
7 months ago
People often assume noise is normal and IID but it usually isn't. It's s fine approximation but isn't the same thing, which is what the parent is discussing.
Here's an example that might help you intuit why this is true.
Let's suppose you have a digital camera and walk towards a radiation source and then away. Each radioactive particle that hits the CCD causes it to over saturate, creating visible noise in the image. The noise it introduces is random (Poisson) but your movement isn't.
Now think about how noise is introduced. There's a lot of ways actually, but I'm sure this thought exercise will reveal to you how some cause noise across frames to be dependent. Maybe as a first thought, think about from sitting on a shelf degrading.
notpushkin
7 months ago
I think this is geared towards film grain noise, which is independent from movement?
godelski
7 months ago
It's the same thing. Yes, not related to the movement of the camera, but I thought that would be easier to build your intuition about silver particles being deposited onto film. You make in batches, right?
The point is that just because things are random doesn't mean there aren't biases.
To get much more accurate, it helps to understand what randomness actually is. It is a measurement of uncertainty. A measurement of the unknown. This is even true for quantum processes that are truly random. That means we can't know. But just because we can't know doesn't mean it's completely unknown, right? We have different types of distributions and different parameters in those distributions. That's what we're trying to build intuition about
alright2565
7 months ago
I think you've missed the point here: the noise in the originals acts as dithering, and increases the resolution of the original video. This is similar to the noise introduced intentionally in astronomy[1] and in signal processing[2].
Smoothing the noise out doesn't make use of that additional resolution, unless the smoothing happens over the time axis as well.
Perfectly replicating the noise doesn't help in this situation.
[1]: https://telescope.live/blog/improve-image-quality-dithering [2] https://electronics.stackexchange.com/questions/69748/using-...
kderbe
7 months ago
Your first link doesn't seem to be about introducing noise, but removing it by averaging the value of multiple captures. The second is to mask quantizer-correlated noise in audio, which I'd compare to spatial masking of banding artifacts in video.
Noise is reduced to make the frame more compressible. This reduces the resolution of the original only because it inevitably removes some of the signal that can't be differentiated from noise. But even after noise reduction, successive frames of a still scene retain some frame-to-frame variance, unless the noise removal is too aggressive. When you play back that sequence of noise-reduced frames you still get a temporal dithering effect.
magicalhippo
7 months ago
Here's[1] a more concrete source, which summarizes dithering in analog to digital converters as follows:
With no dither, each analog input voltage is assigned one and only one code. Thus, there is no difference in the output for voltages located on the same ‘‘step’’ of the ADC’s ‘‘staircase’’ transfer curve. With dither, each analog input voltage is assigned a probability distribution for being in one of several digital codes. Now, different voltages with-in the same ‘‘step’’ of the original ADC transfer function are assigned different probability distributions. Thus, one can see how the resolution of an ADC can be improved to below an LSB.
In actual film, I presume the random inconsistencies of the individual silver halide grains is the noise source, and when watching such a film, I presume the eyes are doing the averaging through persistence of vision[2].
In either case, a key point is that you can't bring back any details by adding noise after the fact.
[1]: https://www.ti.com/lit/an/snoa232/snoa232.pdf section 3.0 - Dither
adgjlsfhk1
7 months ago
One thing worth noting is that this extra detail from dithering can be recovered when denoising by storing the image to higher precision. This is a lot of the reason 10 bit AV1 is so popular. It turns out that by adding extra bits of image, you end up with an image that is easier to compress accurately since the encoder has lower error from quantization.
TD-Linux
7 months ago
The AR coefficients described in the paper are what allow basic modeling of the scale of the noise.
> In this case, L = 0 corresponds to the case of modeling Gaussian noise whereas higher values of L may correspond to film grain with larger size of grains.
dperfect
7 months ago
This is a really good point.
To illustrate the temporal aspect: consider a traditional film projector. Between every frame, we actually see complete darkness for a short time. We could call that darkness "noise", and if we were to linger on that moment, we'd see nothing of the original signal. But since our visual systems tend to temporally average things out to a degree, we barely even notice that flicker (https://en.wikipedia.org/wiki/Flicker_fusion_threshold). I suspect noise and grain are perceived in a similar way, where they become less pronounced compared to the stable parts of the signal/image.
Astrophotographers stack noisy images to obtain images with higher SNR. I think our brains do a bit of that too, and it doesn't mean we're hallucinating detail that isn't there; the recorded noise - over time - returns to the mean, and that mean represents a clearer representation of the actual signal (though not entirely, due to systematic/non-random noise, but that's often less significant).
Denoising algorithms that operate on individual frames don't have that context, so they will lose detail (or will try to compensate by guessing). AV1 doesn't specify a specific algorithm to use, so I suppose in theory, a smart algorithm could use the temporal context to preserve some additional detail.
arghwhat
7 months ago
The noise does not contain a signal, does not dance over it, and is not detail. It is purely random fluctuations that are added to a signal.
If you have a few static frames and average them, you improve SNR by retaining the unchanged signal and having the purely random noise cancel itself out. Retaining noise itself is not useful.
I suspect the effect you might be seeing is either just an aesthetic preference for the original grain behavior, or that you are comparing low bandwidth content with heavy compression artifacts like smoothing/low pass filtering (not storing fine detail saves significant bandwidth) to high bandwidth versions that maintain full detail, entirely unrelated to the grain overlaid on top.
account42
7 months ago
> If you have a few static frames and average them, you improve SNR by retaining the unchanged signal and having the purely random noise cancel itself out.
That's exactly the point of GP though. Even though each individual frame might be almost indistinguishable from random noise you can still extract patterns over time. This is also the case if you don't average the frames in software but let the viewer's brain do it. If you just remove all "noise" from each frame and then add random noise back those patterns will be lost.
In practice you won't have static frames but also movement so recovering the signal from the noise becomes a lot more complicated.
arghwhat
7 months ago
Anything with a pattern is by definition not noise, and the comment was that noise had signal. If you remove all noise, no signal or pattern is lost by definition.
However, the issue is that lossy compression removes various types of minute detail, smoothing surfaces to reduce the amount of data that has to be stored, be it noise "grain" or skin pores, according to compression settings. Storing the original noise as it was would basically make any compression impossible.
Scaevolus
7 months ago
Denoising generally removes signal too. Removing noise and reconstituting similar noise to maintain the apparent qualities of an input can help compression, but you are also cutting out true details (typically fine detail).
The effect GP is pointing out is how denoisers damage detail, which is true. This detail can persist over multiple frames, which is why many denoisers include a temporal comparison component to mitigate the damage, but you still lose detail.
arghwhat
7 months ago
Definitely, but that's a result of compressing (with any algorithm) past a bitrate that would allow detail to be retained with high frequency details including noise being the first to go, not the result of noise having signal as was stated and more importantly unrelated to film grain synthesis which is about adding something new on the output side.
FieryTransition
7 months ago
I love this concept/principle, one similar example I often bring up when I talk about machine learning, is comparing how a human would analyse night footage from a camera, and how a ML algorithm can pick up things no human would think about, even artifacts from the sensors which can be used as features. Noise is rarely ever just noise.
hungmung
7 months ago
Some of the new 4K discs use DRR and the denoising process seems to remove the pores on people's faces occasionally, leaving actors looking like their face is made of wax.
BoingBoomTschak
7 months ago
Don't want to sound too snarky, but aren't you just saying that good denoisers must be temporal in addition to spatial? Something like an improved V-BM3D (cf https://arxiv.org/abs/2001.01802) and even traditional V-BM3D works fine even on non-Gaussian noise.
Together with external noise generation (something like https://old.reddit.com/r/AV1/comments/r86nsb/custom_photonno...) that'd support external denoising (pass the reference and denoised video to get the grain spec), the whole FGS thing would be much more interesting.
cma
7 months ago
You could do an ai or compressed sensing upscale first, with multiple frames helping, have that be the base video sent, and then regranularize that.