Discord Reduced WebSocket Traffic by 40%

94 pointsposted 2 hours ago
by hampus

55 Comments

transcriptase

39 minutes ago

When can we expect discord not to take 20-30 seconds to launch on a $5000 PC? What exactly is it doing, recompiling the client from source using a single core each time it opens?

dumbo-octopus

34 minutes ago

It needs to download the 20 distinct update patches that they added in the past 4 hours, in series, all of which combine to together change the actual product in precisely no way whatsoever.

HeXetic

28 minutes ago

> change the actual product in precisely no way whatsoever

How dare you belittle the new Super Ultra Nitro Deluxe Gold Platinum emoji, stickers, and playable sound effects.

troupo

17 minutes ago

Don't forget the 15 000 or so notifications that you can't realistically turn off

smolder

12 minutes ago

I mute every "server" permanently, disable all "mentions" (@here, @role, etc) and it stays quiet except for DMs. If there's something I want to see I'll go look in a channel.

maxfurman

33 minutes ago

It's an electron app, so, yes, kinda. It has to load and parse all of its own JS at boot time (barring any lazy loading within the app)

candiddevmike

20 minutes ago

VSCode is quite a bit snappier for me though, I don't think electron is necessarily the problem.

nicoburns

13 minutes ago

It opens in about 2 seconds if you use the web client (https://discord.com/app)

Waterluvian

8 minutes ago

Didn’t even realize there was a desktop app worth getting. The website seems to work fine.

Jowsey

24 minutes ago

The onus is really on Discord, but you can use https://openasar.dev to partially fix the problem for yourself - it's an open source drop-in replacement for the client updater/bootstrapper.

paxys

2 hours ago

Reading through the post they seem to have been hyper focused on compression ratios and reducing the payload size/network bandwidth as much as possible, but I don't see a single mention of CPU time or evidence of any actual measureable improvement for the end user. I have been involved with a few such efforts at my own company, and the conclusion always was that the added compression/decompression overhead on both sides resulted in worse performance. Especially considering we are talking about packets at the scale of bytes or a few kilobytes at most.

dumbo-octopus

an hour ago

The explicitly mention compression time. It’s actually lower in the new approach.

> the compression time per byte of data is significantly lower for zstandard streaming than zlib, with zlib taking around 100 microseconds per byte and zstandard taking 45 microseconds

dvh

43 minutes ago

Those are atrocious numbers, that's only 23kB/s for the faster variant. It should have been GB/s not kB.

mananaysiempre

19 minutes ago

For what it’s worth, the benchmark on the Zstandard homepage[1] shows none of the setups tested breaking 1GB/s on compression, and only the fastest and sloppiest ones breaking 1GB/s on decompression. If you can live with its API limitations, libdeflate is known[2] to squeeze past 1GB/s decompressing normal Deflate compression levels. In any case, asking for multiple GB/s is probably unfair.

Still, looking at those benchmarks, 10MB/s sounds like the absolute minimum reasonable speed, and they’re reporting nearly three orders of magnitude below that. A modern compressor does not run at mediocre dialup speeds; something in there is absolutely murdering the performance.

And I’m willing to believe it’s just the constant-time overhead. The article mentions “a few hundred bytes” per message payload in a stream of messages, and the actual data of the benchmarks implies 1.6KB uncompressed. Even though they don’t reinitialize the compressor on each message, that is still a very very modest amount of data.

So it might be that general-purpose compressors are simply a bad tool here from a performance standpoint. I’m not aware of a good tool for this kind of application, though.

[1] https://facebook.github.io/zstd/#benchmarks

[2] https://github.com/zlib-ng/zlib-ng/issues/1486

refulgentis

28 minutes ago

First, let's establish a cheery mood: Happy Friday!!!!

Second, I noticed we're extrapolating from a tossed out measurement in "microseconds per byte" here, of extremely small payloads, probably included fixed-cost overhead of doing anything at all.

All leading up to: Is "atrocious" the right word choice here? :)

More directly: do you really think Discord rolled out a compression algorithm that does 23 KB/s for payloads in the megabytes?

Even more directly, avoiding being passive and just adopting your tone: this is atrocious analysis that glibly chooses to create obviously wrong numbers, then criticizes them as if they are real.

Starlevel004

19 minutes ago

> More directly: do you really think Discord rolled out a compression algorithm that does 23 KB/s for payloads in the megabytes?

yes, actually

refulgentis

7 minutes ago

that's "yngmi" bait; you're suggesting it takes 2 minutes per intro payload. (2 MB / 20 KB/s ≈ 100 seconds = 1m40s)

ihumanable

2 hours ago

> zstandard streaming significantly outperforms zlib both in time to compress and compression ratio.

Time to compress is a measure of how long the CPU spends compressing. So this is in the blogpost

koito17

an hour ago

I think the person is concerned with client-side compute, not just server-side compute. The article does not mention whether zstd has additional decompression overhead compared to zlib.

Client-side compute may sound like a contrived issue, but Discord runs on a wide variety of devices. Many of these devices are not necessarily the latest flagship smartphones, or a computer with a recent CPU.

I am going to guess that zstd decompression is roughly as expensive as zlib, since (de)compression time was a motivating factor in the development of zstd. Also the reason to prefer zstd over xz, despite the latter providing better compression efficiency.

colechristensen

an hour ago

zstd has faster decompression

though I always thought lz4 to be the sweet spot for anything requiring speed, somewhat less compression ratio in exchange for very fast compression and decompression

jhgg

an hour ago

I think one thing this blog post did not mention was the historical context of moving from uncompressed to compressed traffic (using zlib), something I worked on in 2017. IIRC, the bandwidth savings were massive (75%). It did use more server side CPU, and negligible client side CPU, so we went for it anyways as bandwidth is a very precious thing to optimize for especially with cloud bandwidth costs.

Either way the incremental improvements here are great - and it's important to consider optimization both from transport level (compression, encoding) and also from a protocol level (the messages actually sent over the wire.)

Also one thing not mentioned is client side decompression on desktop used to use a JS implementation of zlib (pako) to a native implementation, that's exposed to the client via napi bindings.

TulliusCicero

2 hours ago

> I don't see a single mention of CPU time

> Looking once again at MESSAGE_CREATE, the compression time per byte of data is significantly lower for zstandard streaming than zlib, with zlib taking around 100 microseconds per byte and zstandard taking 45 microseconds.

zarzavat

11 minutes ago

Performance is probably the wrong lens. Mobile data is often expensive in terms of money, whereas compression is cheap in terms of CPU time. More compression is almost always the right answer for users of mobile apps.

xnx

2 hours ago

Do the packets transmit through Discord's servers? Reducing their bill may be more important to them than user performance.

bri3d

2 hours ago

They're going from 2+MB (for some reason) to 300KB - even if decompression is "slow," that's going to be a win for their bandwidth costs and for perceived speed for _most_ users.

I was surprised to see little server-side CPU benchmarking too, though. While I'd expect overall client timing for (transfer + decompress) to be improved dramatically unless the user was on a ridiculously fast network connection, I can't imagine server load not being affected in a meaningful way.

usernamear

2 hours ago

There already was compression before, through zlib. The findings, as showed in the post, was that Zstandard was also a lot more efficient than zlib from a cpu time standpoint.

jhgg

an hour ago

The 2mb case is pathological - an account on MANY servers with no local cache state (the READY payload works to only send data that's changed between when you've reconnected by having the client send hashes of data it knows.)

paxys

2 hours ago

Bandwidth costs for text messages, maybe, but how much data is that really compared to images, audio, video or even just the app's JS bundle?

toast0

14 minutes ago

The bandwidth probably doesn't really matter, but a 2MB must have blob vs a 300kB must have blob at the start of a connection is a big difference.

The start of a tcp connection is limited by round trip times more than bandwidth. Especially for mobile, optimizing to reduce the number of round trips required is pretty handy.

bri3d

2 hours ago

Presumably that's all CDNed and therefore a lot cheaper to serve.

bguebert

an hour ago

Probably not for live shared audio/video

usernamear

2 hours ago

Some of those payloads are much larger than a few kilobytes (READY, MESSAGE_CREATE etc.) There is a section and data on "time to compress". No time to decompress though.

hiddencost

2 hours ago

It's almost certainly about hosting costs, not user facing value.

BoorishBears

2 hours ago

They might have been getting murdered by egress fees in which case they'd be willing to make that sacrifice

bri3d

2 hours ago

Interesting way to approach this (dictionary based compression over JSON and Erlang ETF) vs. moving to a schema-based system like Cap'n Proto or Protobufs where the repeated keys and enumeration values would be encoded in the schema explicitly.

Also would be interested in benchmarks between Zstandard vs. LZ4 for this use case - for a very different use case (streaming overlay/HUD data for drones), I ended up using LZ4 with dictionaries produced by the Zstd dictionary tool. LZ4 produced similar compression at substantially higher speed, at least on the old ARM-with-NEON processor I was targeting.

I guess it's not totally wild but it's a bit surprising that common bootstrapping responses (READY) were 2+MB, as well.

echelon

2 hours ago

They use JSON over the wire and not a binary protocol? That's madness and reminds me of XML / Jabber.

Protos or a custom wire protocol would be far better suited to the task.

SirGiggles

an hour ago

Wouldn’t the ETF (Erlang Term Format) suffice in this case?

IIRC it’s used in the desktop client and some community libraries (specifically JDA) have support for it.

There were some quirks regarding ETF usage with Discord’s Gateway but I can’t recall at the moment.

jerf

a minute ago

Erlang terms are, to a first approximation, the same as JSON.

To a second approximation it gets more complicated. Atoms can save you some repetition of what would otherwise be strings, but it's fairly redundant to what compression would get you anyhow, and erlang term representations of things like dictionaries can be quite rough:

    3> dict:from_list([{a, 1}, {b, 2}]).
    {dict,2,16,16,8,80,48,
          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
          {{[],
            [[a|1]],
            [[b|2]],
            [],[],[],[],[],[],[],[],[],[],[],[],[]}}}
If you're converting from naive Erlang terms to some communication protocol encoding you're already paying for a total conversion and you might as well choose from the full set of option at that point.

treyd

an hour ago

I don't understand why so many protocols that expect to handle large amounts of data don't default to a binary schema. JSON is fine on the edges, but the wire format between nodes is not the edge.

Spivak

an hour ago

Hard disagree given the constraints. Every bot is also consuming the Discord API and forcing 3rd-party devs, many whom aren't particularly advanced coders to suddenly deal a binary wire format would be painful especially if you needed to constantly update a proto file. Their API is also part WebSocket part HTTP and many methods doing double-duty.

atiedebee

an hour ago

Some RIFF like format would not be that hard to parse different sections of. You get to ignore the parts you dont recognise and decode the parts you do.

Moving to a binary format would be better for 99.9% of users and would be a slight inconvenience to a few people creating bots. Discord could easily publish a library for reading the format if needed.

xx_ns

an hour ago

To be fair, this is exactly what the Accept and Content-Type standard HTTP headers are for. Clients can tell the API "OK, send me application/json data instead of binary data" or vice versa. You can have the majority of your traffic (client traffic) using the binary format, and still support JSON for bot API usage. This is standardized for both WebSockets and HTTP.

random_

32 minutes ago

Moreover, i imagine a lot of these bots are built on top of an SDK instead of directly working with API calls, so would be just a matter of changing the SDK internals.

fearthetelomere

6 minutes ago

>Diving into the actual contents of one of those PASSIVE_UPDATE_V1 dispatches, we would send all of the channels, members, or members in voice, even if only a single element changed.

> the metrics that guided us during the [zstd experiment] revealed a surprising behavior

This feels so backwards. I'm glad that they addressed this low-hanging fruit, but I wonder why they didn't do this metrics analysis from the start, instead of during the zstd experiment.

I also wonder why they didn't just send deltas from the get-go. If PASSIVE_UPDATE_V1 was initially implemented "as a means to scale Discord servers to hundreds of thousands of users", why was this obvious optimization missed?

jhgg

5 minutes ago

It was a bug

acer4666

an hour ago

Anytime I have a discord tab open it noticeably grinds my computer to a halt

creesch

37 minutes ago

How many servers have you joined and how many of those are large and active? Also relevant, do you need to be in all of them?

Most of the time I have seen people complain about this it is because they have joined a ton of hyperactive servers.

You could argue it shouldn't be an issue and more dynamically load things like messages on servers. But then you'd have people complaining that switching servers takes so long.

Dalewyn

19 minutes ago

>How many servers have you joined and how many of those are large and active?

Yes.

>Also relevant, do you need to be in all of them?

Yes.

You must be new here, because if you aren't connected to dozens of servers and idling in hundreds of channels (you only speak in maybe two or three of them) you aren't IRCing right.

What? I'm a confused old clod because we're talking about Discord in the year of our lord 2024? Same thing, it's a massive textual chat network based on a server-channel hub-spoke architecture at its core.

What is actually worth our time asking is why we could do all that and more with no problems in the 80s and 90s using hardware a thousandth or less as powerful as what we have today.

creesch

9 minutes ago

Discord is different from IRC in both the scale and payload. It's besides the point anyway, even if discord is still an unoptimismed piece of shit.

You clearly have issues with that unpolished turd. So I figured I'd offer my insights in what often causes that.

If you still insist on having your computer coming to a crawl because IRC did it better than that's entirely up to you.

sfn42

38 minutes ago

Sounds like a problem with your computer. I can have discord, 50 browser tabs, two different games, a JetBrains IDE and various other stuff open at the same time without any trouble at all.

And my computer isn't particularly crazy. Maybe like $1500.

keb_

35 minutes ago

"Works on my machine"

sfn42

33 minutes ago

Lol, in fact it works on all my machines. And lots of other peoples machines.