crazygringo
6 hours ago
This fails to acknowledge that synthesized noise can lack the detail and information in the original noise.
When you watch a high-quality encode that includes the actual noise, there is a startling increase in resolution from seeing a still to seeing the video. The noise is effectively dancing over a signal, and at 24 fps the signal is still perfectly clear behind it.
Whereas if you lossily encode a still that discards the noise and then adds back artificial noise to match the original "aesthetically", the original detail is non-recoverable if this is done frame-by-frame. Watching at 24 fps produces a fundamentally blurrier viewing experience. And it's not subtle -- on old noisy movies the difference in detail can be 2x.
Now, if h.265 or AV1 is actually building its "noise-removed" frames by always taking into account several preceding and following frames while accounting for movement, it could in theory discover the signal of the full detail across time and encode that, and there wouldn't be any loss in detail. But I don't think it does? I'd love to know if I'm mistaken.
But basically, the point is: comparing noise removal and synthesis can't be done using still images. You have to see an actual video comparison side-by-side to determine if detail is being thrown away or preserved. Noise isn't just noise -- noise is detail too.
kderbe
6 hours ago
Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely). So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.
Regarding aesthetics, I don't think AV1 synthesized grain takes into account the size of the grains in the source video, so chunky grain from an old film source, with its big silver halide crystals, will appear as fine grain in the synthesis, which looks wrong (this might be mitigated by a good film denoiser). It also doesn't model film's separate color components properly, but supposedly that doesn't matter because Netflix's video sources are often chroma subsampled to begin with: https://norkin.org/pdf/DCC_2018_AV1_film_grain.pdf
Disclaimer: I just read about this stuff casually so I could be wrong.
zoky
3 hours ago
> Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely)
That might seem like a reasonable assumption, but in practice it’s not really the case. Due to nonlinear response curves, adding noise to a bright part of an image has far less effect than a darker part. If the image is completely blown out the grain may not be discernible at all. So practically speaking, grain does travel with objects in a scene.
This means detail is indeed encoded in grain to an extent. If you algorithmically denoise an image and then subtract the result from the original to get only the grain, you can easily see “ghost” patterns in the grain that reflect the original image. This represents lost image data that cannot be recovered by adding synthetic grain.
creato
3 hours ago
> If you algorithmically denoise an image and then subtract the result from the original to get only the grain, you can easily see “ghost” patterns in the grain that reflect the original image. This represents lost image data that cannot be recovered by adding synthetic grain.
The synthesized grain is dependent on the brightness. If you were to just replace the frames with the synthesized grain described in the OP post instead of adding it, you would see something very similar.
wyager
an hour ago
It sounds like the "scaling function" mentioned in the article may be intended to account for the nonlinear interaction of the noise.
godelski
2 hours ago
People often assume noise is normal and IID but it usually isn't. It's s fine approximation but isn't the same thing, which is what the parent is discussing.
Here's an example that might help you intuit why this is true.
Let's suppose you have a digital camera and walk towards a radiation source and then away. Each radioactive particle that hits the CCD causes it to over saturate, creating visible noise in the image. The noise it introduces is random (Poisson) but your movement isn't.
Now think about how noise is introduced. There's a lot of ways actually, but I'm sure this thought exercise will reveal to you how some cause noise across frames to be dependent. Maybe as a first thought, think about from sitting on a shelf degrading.
notpushkin
an hour ago
I think this is geared towards film grain noise, which is independent from movement?
TD-Linux
4 hours ago
The AR coefficients described in the paper are what allow basic modeling of the scale of the noise.
> In this case, L = 0 corresponds to the case of modeling Gaussian noise whereas higher values of L may correspond to film grain with larger size of grains.
crazygringo
3 hours ago
> Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely). So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.
Sorry if I wasn't clear -- I was referring to the underlying objects moving. The codec is trying to capture those details, the same way our eye does.
But regardless of that, you absolutely cannot compare stills. Stills do not allow you to compare against the detail that is only visible over a number of frames.
alright2565
5 hours ago
I think you've missed the point here: the noise in the originals acts as dithering, and increases the resolution of the original video. This is similar to the noise introduced intentionally in astronomy[1] and in signal processing[2].
Smoothing the noise out doesn't make use of that additional resolution, unless the smoothing happens over the time axis as well.
Perfectly replicating the noise doesn't help in this situation.
[1]: https://telescope.live/blog/improve-image-quality-dithering [2] https://electronics.stackexchange.com/questions/69748/using-...
kderbe
4 hours ago
Your first link doesn't seem to be about introducing noise, but removing it by averaging the value of multiple captures. The second is to mask quantizer-correlated noise in audio, which I'd compare to spatial masking of banding artifacts in video.
Noise is reduced to make the frame more compressible. This reduces the resolution of the original only because it inevitably removes some of the signal that can't be differentiated from noise. But even after noise reduction, successive frames of a still scene retain some frame-to-frame variance, unless the noise removal is too aggressive. When you play back that sequence of noise-reduced frames you still get a temporal dithering effect.
magicalhippo
3 hours ago
Here's[1] a more concrete source, which summarizes dithering in analog to digital converters as follows:
With no dither, each analog input voltage is assigned one and only one code. Thus, there is no difference in the output for voltages located on the same ‘‘step’’ of the ADC’s ‘‘staircase’’ transfer curve. With dither, each analog input voltage is assigned a probability distribution for being in one of several digital codes. Now, different voltages with-in the same ‘‘step’’ of the original ADC transfer function are assigned different probability distributions. Thus, one can see how the resolution of an ADC can be improved to below an LSB.
In actual film, I presume the random inconsistencies of the individual silver halide grains is the noise source, and when watching such a film, I presume the eyes are doing the averaging through persistence of vision[2].
In either case, a key point is that you can't bring back any details by adding noise after the fact.
[1]: https://www.ti.com/lit/an/snoa232/snoa232.pdf section 3.0 - Dither
adgjlsfhk1
an hour ago
One thing worth noting is that this extra detail from dithering can be recovered when denoising by storing the image to higher precision. This is a lot of the reason 10 bit AV1 is so popular. It turns out that by adding extra bits of image, you end up with an image that is easier to compress accurately since the encoder has lower error from quantization.
arghwhat
5 hours ago
The noise does not contain a signal, does not dance over it, and is not detail. It is purely random fluctuations that are added to a signal.
If you have a few static frames and average them, you improve SNR by retaining the unchanged signal and having the purely random noise cancel itself out. Retaining noise itself is not useful.
I suspect the effect you might be seeing is either just an aesthetic preference for the original grain behavior, or that you are comparing low bandwidth content with heavy compression artifacts like smoothing/low pass filtering (not storing fine detail saves significant bandwidth) to high bandwidth versions that maintain full detail, entirely unrelated to the grain overlaid on top.
cma
an hour ago
You could do an ai or compressed sensing upscale first, with multiple frames helping, have that be the base video sent, and then regranularize that.