> See, that’s just flat-out lying. What’s this mythical circumstance where playing audio A at the same volume as audio B on one device will magically make A louder than Bon another? Especially when dealing with server-side ad insertion, as the article discusses, where the service has full control of the input files and the output stream?
Consider a streaming movie with surround sound with an inserted ad that is in stereo. I'm playing it on a 5.1 home theater system and you are playing it on a stereo phone. Your system is mixing the surround sound down to stereo.
When your device does that it applies attenuation to the program so that if several channels in the 5.1 stream have something loud all at the same time it won't be too loud in the down mix for stereo and clip. When the commercial cuts in your device recognizes it is ordinary stereo and it doesn't need to down mix. It goes straight through without the attenuation that down mixer applies to the program.
Whatever level the commercial is really at relative to the program, it is going to sound loader than that on your system because of that attenuation difference.
On my device it is not attenuating the 5.1 program since it has all the necessary channels. However, if the commercial is at the same level as the program it will actually sound louder on mine. That's because the same total level of sound split among 5 speakers perceptually seems less loud than the same total level coming from stereo speakers.
The streamer can do loudness normalization between the program and the commercial. It can calculate what the perceived human loudness will be at any time in the adjust the levels so that on my device the perceived level of the 5.1 program when it gets to the commercial will match the perceived level of the commercial on stereo.
But for devices that are down mixing to stereo there is still going to be the attenuation the down mixers uses, and that differs from device to device. That limits what can be done server side to get the program and commercial to match.
Some multichannel formats do include metadata for the device telling it how much to attenuate when down mixing to stereo. If all the device supported that it should be possible for the server to fully take care of loudness matching. Otherwise you probably need device side normalization.
Another approach would be to up mix the stereo commercial server side to whatever surround sound format the program is using. Then they could do server side loudness normalization between the program and the commercial without it being messed up by the difference in how stereo devices down mix.
I'm not sure why that is generally not done. LLMs are suggesting several reasons but I have no idea if they are reasonable. I'll leave exploring that to someone else.