divbzero
8 hours ago
This seems like a lot of added complexity for limited gain. Are there cases where gzip and br at their highest compression levels aren’t good enough?
ks2048
7 hours ago
Some examples here: https://github.com/WICG/compression-dictionary-transport/blo...
show significant gain of using dictionary over compressed w/o dictionary.
It seems like instead of sites reducing bloat, they will just shift the bloat to your hard-drive. Some of the examples said dictionary of 1MB which doesn't seem big, but could add up if everyone is doing this.
sltkr
an hour ago
That demonstrates how useless this is. It only shaves off kilobytes on extremely bloated sites that waste megabytes of data.
For example, take the CNN example:
> The JavaScript was 98% smaller using the previous version as a dictionary for the new version than if the new version was downloaded with brotli alone. Specifically, the 278kb JavaScript was 90kb with brotli alone and 2kb when using brotli and the previous version as a dictionary.
Oh wow! 98% savings! That's amazing! Except in absolute terms the difference between 90 KB and 2 KB is only 88 KB. Meanwhile, cnn.com pulls in 63.7 MB of data just on the first page load. So in reality, that 88 KB saved was less than 0.14% of the total data, which is negligible.
pmarreck
8 hours ago
Every piece of information or file that is compressed sends a dictionary along with it. In the case of, say, many HTML or CSS files, this dictionary data is likely nearly completely redundant.
There's almost no added complexity since zstd already handles separate compression dictionaries quite well.
pornel
7 hours ago
The standard compressed formats don't literally contain a dictionary. The decompressed data becomes its own dictionary while its being decompressed. This makes the first occurrence of any pattern less efficiently compressed (but usually it's still compressed thanks to entropy coding), and then it becomes cheap to repeat.
Brotli has a default dictionary with bits of HTML and scripts. This is built in into the decompressor, and not sent with the files.
The decompression dictionaries aren't magic. They're basically a prefix for decompressed files, so that a first occurrence of some pattern can be referenced from the dictionary instead of built from scratch. This helps only with the first occurrences of data near the start of the file, and for all the later repetitions the dictionary becomes irrelevant.
The dictionary needs to be downloaded too, and you're not going to have dictionaries all the way down, so you pay the cost of decompressing the data without a dictionary whether it's a dictionary + dictionary-using-file, or just the full file itself.
bsmth
7 hours ago
If you're shipping a JS bundle, for instance, that has small, frequent updates, this should be a good use case. There's a test site here that accompanies the explainer which looks interesting for estimates: https://use-as-dictionary.com/generate/
wat10000
4 hours ago
In some applications, there’s no “good enough,” even small gains help and can be significant when multiplied across a large system. It’s like the software version of American Airlines saving $40,000/year by removing one olive from their salads.