QUIC is not quick enough over fast internet

353 pointsposted 8 hours ago
by Shank

150 Comments

raggi

6 hours ago

There are a number of concrete problems:

- syscall interfaces are a mess, the primitive APIs are too slow for regular sized packets (~1500 bytes), the overhead is too high. GSO helps but it’s a horrible API, and it’s been buggy even lately due to complexity and poor code standards.

- the syscall costs got even higher with spectre mitigation - and this story likely isn’t over. We need a replacement for the BSD sockets / POSIX APIs they’re terrible this decade. Yes, uring is fancy, but there’s a tutorial level API middle ground possible that should be safe and 10x less overhead without resorting to uring level complexity.

- system udp buffers are far too small by default - they’re much much smaller than their tcp siblings, essentially no one but experts have been using them, and experts just retune stuff.

- udp stack optimizations are possible (such as possible route lookup reuse without connect(2)), gso demonstrates this, though as noted above gso is highly fallible, quite expensive itself, and the design is wholly unnecessarily intricate for what we need, particularly as we want to do this safely from unprivileged userspace.

- several optimizations currently available only work at low/mid-scale, such as connect binding to (potentially) avoid route lookups / GSO only being applicable on a socket without high peer-competition (competing peers result in short offload chains due to single-peer constraints, eroding the overhead wins).

Despite all this, you can implement GSO and get substantial performance improvements, we (tailscale) have on Linux. There will be a need at some point for platforms to increase platform side buffer sizes for lower end systems, high load/concurrency, bdp and so on, but buffers and congestion control are a high complex and sometimes quite sensitive topic - nonetheless, when you have many applications doing this (presumed future state), there will be a need.

JoshTriplett

6 hours ago

> Yes, uring is fancy, but there’s a tutorial level API middle ground possible that should be safe and 10x less overhead without resorting to uring level complexity.

I don't think io_uring is as complex as its reputation suggests. I don't think we need a substantially simpler low-level API; I think we need more high-level APIs built on top of io_uring. (That will also help with portability: we need APIs that can be most efficiently implemented atop io_uring but that work on non-Linux systems.)

raggi

6 hours ago

> I don't think io_uring is as complex as its reputation suggests.

uring is extremely problematic to integrate into many common application / language runtimes and it has been demonstrably difficult to integrate into linux safely and correctly as well, with a continual stream of bugs, security and policy control issues.

in principle a shared memory queue is a reasonable basis for improving the IO cost between applications and IO stacks such as the network or filesystem stacks, but this isn't easy to do well, cf. uring bugs and binder bugs.

arghwhat

3 hours ago

Two things:

One, uring is not extremely problematic to integrate, as it can be chained into a conventional event loop if you want to, or can even be fit into a conventionally blocking design to get localized syscall benefits. That is, you do not need to convert to a fully uring event loop design, even if that would be superior - and it can usually be kept entirely within a (slightly modified) event loop abstraction. The reason it has not yet been implemented is just priority - most stuff isn't bottlenecked on IOPS.

Two, yes you could have e middle-ground. I assume the syscall overhead you call out is the need to send UDP packets one at a time through sendmsg/sendto, rather than doing one big write for several packets worth of data on TCP. An API that allowed you to provide a chain of messages, like sendmsg takes an iovec for data, is possible. But it's also possible to do this already as a tiny blocking wrapper around io_uring, saving you new syscalls.

londons_explore

3 minutes ago

I think you need to look at a common use case and consider how many syscalls you'd like it to take and how many CPU cycles would be reasonable.

Let's take downloading a 1MB jpeg image over QUIC and rendering it on the screen.

I would hope that can be done in about 100k CPU cycles and 20 syscalls, considering that all the jpeg decoding and rendering is going to be hardware accelerated. The decryption is also hardware accelerated.

Unfortunately, no network API allows that right now. The CPU needs to do a substantial amount of processing for every individual packet, in both userspace and kernel space, for receiving the packet and sending the ACK, and there is no 'bulk decrypt' non-blocking API.

Even the data path is troublesome - there should be a way for the data to go straight from the network card to the GPU, with the CPU not even touching it, but we're far from that.

Veserv

2 hours ago

The system call to send multiple UDP packets in a single call has existed since Linux 3.0 over a decade ago[1]: sendmmsg().

[1] https://man7.org/linux/man-pages/man2/sendmmsg.2.html

arghwhat

2 hours ago

Ah nice, in that case OP's point about syscall overhead is entirely moot. :)

That should really be in the `SEE ALSO` of `man 3 sendmsg`...

JoshTriplett

5 hours ago

> with a continual stream of bugs, security and policy control issues

This has not been true for a long time. There was an early design mistake that made it quite prone to these, but that mistake has been fixed. Unfortunately, the reputational damage will stick around for a while.

raggi

5 hours ago

13 CVEs so far this year afaik

bonzini

5 hours ago

CVE numbers from the Linux CNA are bollocks.

JoshTriplett

5 hours ago

This conversation would be a good one to point them to to show that their policy is not just harmless point-proving, but in fact does cause harm.

For context, to the best of my knowledge the current approach of the Linux CNA is, in keeping with long-standing Linux security policy of "every single fix might be a security fix", to assign CVEs regardless of whether something has any security impact or not.

kuschku

3 hours ago

CVE assignment != security issue

CVE numbers are just a way to ensure everyone is talking about the same bug. Not every security issue has a CVE, not every CVE is a security issue.

Often, a regular bug turns out years later to have been a security issue, or a security issue turns out to have no security impact at all.

If you want a central authority to tell you what to think, just use CVSS instead of the binary "does it have a CVE" metric.

di4na

5 hours ago

I would not call it harm. The use of uring in higher level languages is definitely prone to errors, bugs and security problems

JoshTriplett

5 hours ago

See the context I added to that comment; this is not about security issues, it's about the Linux CNA's absurd approach to CVE assignment for things that aren't CVEs.

raggi

5 hours ago

this is a bit of a distraction, sure the leaks and some of the deadlocks are fairly uninteresting, but the toctou, overflows, uid race/confusion and so on are real issues that shouldn't be dismissed as if they don't exist.

jeffparsons

5 hours ago

I find this surprising, given that my initial response to reading the iouring design was:

1. This is pretty clean and straightforward. 2. This is obviously what we need to decouple a bunch of things without the previous downsides.

What has made it so hard to integrate it into common language runtimes? Do you have examples of where there's been an irreconcilable "impedance mismatch"?

raggi

5 hours ago

https://github.com/tailscale/tailscale/pull/2370 was a practical drive toward this, will not proceed on this path.

much more approachable, boats has written about challenges integrating in rust: https://without.boats/tags/io-uring/

in the most general form: you need a fairly "loose" memory model to integrate the "best" (performance wise) parts, and the "best" (ease of use/forward looking safety) way to integrate requires C library linkage. This is troublesome in most GC languages, and many managed runtimes. There's also the issue that uring being non-portable means that the things it suggests you must do (such as say pinning a buffer pool and making APIs like read not immediate caller allocates) requires a substantially separate API for this platform than for others, or at least substantial reworks over all the existing POSIX modeled APIs - thus back to what I said originally, we need a replacement for POSIX & BSD here, broadly applied.

Diggsey

an hour ago

Historically there have been too many constraints on the Linux syscall interface:

- Performance

- Stability

- Convenience

- Security

This differs from eg. Windows because on Windows the stable interface to the OS is in user-space, not tied to the syscall boundary. This has resulted in unfortunate compromises in the design of various pieces of OS functionality.

Thankfully things like futex and io-uring have dropped the "convenience" constraint from the syscall itself and moved it into user-space. Convenience is still important, but it doesn't need to be a constraint at the lowest level, and shouldn't compromise the other ideals.

modeless

3 hours ago

Seems to me that the real problem is the 1500 byte MTU that hasn't increased in practice in over 40 years.

j16sdiz

an hour ago

The real problem is some so called "sysadmin" drop all ICMP, breaking path mtu discovery.

p_l

an hour ago

For all practical purposes, the internet MTU is lower than ethernet default MTU.

Sometimes for ease of mind I end up clamping it to v6 minimum (1280) just in case .

asmor

3 hours ago

That's on the list that right after we all migrate to IPv6.

SomaticPirate

6 hours ago

What is GSO?

jesperwe

6 hours ago

Generic Segmentation Offload

"GSO gains performance by enabling upper layer applications to process a smaller number of large packets (e.g. MTU size of 64KB), instead of processing higher numbers of small packets (e.g. MTU size of 1500B), thus reducing per-packet overhead."

underdeserver

2 hours ago

This is more the result.

Generally today an Ethernet frame, which is the basic atomic unit of information over the wire, is limited to 1500 bytes (the MTU, or Maximum Transmission Unit).

If you want to send more - the IP layer allows for 64k bytes per IP packet - you need to split the IP packet into multiple (64k / 1500 plus some header overhead) frames. This is called segmentation.

Before GSO the kernel would do that which takes buffering and CPU time to assemble the frame headers. GSO moves this to the ethernet hardware, which is essentially doing the same thing only hardware accelerated and without taking up a CPU core.

chaboud

6 hours ago

Likely Generic Segmentation Offload (if memory serves), which is a generalization of TCP segmentation offload.

Basically (hyper simple), the kernel can lump stuff together when working with the network interface, which cuts down on ultra slow hardware interactions.

raggi

6 hours ago

it was originally for the hardware, but it's also valuable on the software side as the cost of syscalls is far too high for packet sized transactions

thorncorona

6 hours ago

presumably generic segmentation offloading

USiBqidmOOkAqRb

2 hours ago

Shipping? Government services online? Piedmont airport? Alcoholics anonymous? Obviously not.

Please introduce your initialisms, if it's not guaranteed that first result in a search will be correct.

cookiengineer

6 hours ago

Say what you want but I bet we'll see lots of eBPF modules being loaded in the future for the very reason you're describing. An ebpf quic module? Why not!

And that scares me, because there's not a single tool that has this on its radar for malware detection/prevention.

raggi

6 hours ago

we can consider ebpf "a solution" when there's even a remote chance you'll be able to do it from an unentitled ios app. somewhat hyperbole, but the point is, this problem is a problem for userspace client applications, and bpf isn't a particularly "good" solution for servers either, it's high cost of authorship for a problem that is easily solvable with a better API to the network stack.

quotemstr

4 hours ago

> Yes, uring is fancy, but there’s a tutorial level API middle ground possible that should be safe and 10x less overhead without resorting to uring level complexity.

And the kernel has no business providing this middle-layer API. Why should it? Let people grab whatever they need from the ecosystem. Networking should be like Vulkan: it should have a high-performance, flexible API at the systems level with being "easy to use" a non-goal --- and higher-level facilities on top.

astrange

27 minutes ago

The kernel provides networking because it doesn't trust userspace to do it. If you provided a low level networking API you'd have to verify everything a client sends is not malicious or pretending to be from another process. And for the same reason, it'd only work for transmission, not receiving.

That and nobody was able to get performant microkernels working at the time, so we ended up with everything in the monokernel.

If you do trust the client processes then it could be better to just have them read/write IP packets though.

JoshTriplett

8 hours ago

In the early days of QUIC, many people pointed out that the UDP stack has had far far less optimization put into it than the TCP stack. Sure enough, some of the issues identified here arise because the UDP stack isn't doing things that it could do but that nobody has been motivated to make it do, such as UDP generic receive offload. Papers like this are very likely to lead to optimizations both obvious and subtle.

Animats

7 hours ago

What is UDP offload going to do? UDP barely does anything but queue and copy.

Linux scheduling from packet-received to thread has control is not real-time, and if the CPUs are busy, may be rather slow. That's probably part of the bottleneck.

The embarrassing thing is that QUIC, even in Google's own benchmarks, only improved performance by about 10%. The added complexity probably isn't worth the trouble. However, it gave Google control of more of the stack, which may have been the real motivation.

amluto

7 hours ago

Last I looked (several months ago), Linux's UDP stack did not seemed well tuned in its memory management accounting.

For background, the mental model of what receiving network data looks like in userspace is almost completely backwards compared to how general-purpose kernel network receive actually works. User code thinks it allocates a buffer (per-socket or perhaps a fancier io_uring scheme), then receives packets into that buffer, then processes them.

The kernel is the other way around. The kernel allocates buffers and feeds pointers to those buffers to the NIC. The NIC receives packets and DMAs them into the buffers, then tells the kernel. But the NIC and the kernel have absolutely no concept of which socket those buffers belong to until after they are DMAed into the buffers. So the kernel cannot possibly map received packets to the actual recipient's memory. So instead, after identifying who owns a received packet, the kernel retroactively charges the recipient for the memory. This happens on a per-packet basis, it involves per-socket and cgroup accounting, and there is no support for having a socket "pre-allocate" this memory in advance of receiving a packet. So the accounting is gnarly, involves atomic operations, and seems quite unlikely to win any speed awards. On a very cursory inspection, the TCP code seemed better tuned, and it possibly also won by generally handling more bytes per operation.

Keep in mind that the kernel can't copy data to application memory synchronously -- the application memory might be paged out when a packet shows up. So instead the whole charging dance above happens immediately when a packet is received, and the data is copied later on.

For quite a long time, I've thought it would be nifty if there was a NIC that kept received data in its own RAM and then allowed it to be efficiently DMAed to application memory when the application was ready for it. In essence, a lot of the accounting and memory management logic could move out of the kernel into the NIC. I'm not aware of anyone doing this.

JoshTriplett

6 hours ago

> For quite a long time, I've thought it would be nifty if there was a NIC that kept received data in its own RAM and then allowed it to be efficiently DMAed to application memory when the application was ready for it.

I wonder if we could do a more advanced version of receive-packet steering that sufficiently identifies packets as definitely for a given process and DMAs them directly to that process's pre-provided buffers for later notification? In particular, can we offload enough information to a smart NIC that it can identify where something should be DMAed to?

mgaunard

2 hours ago

Most advanced NICs support flow steering, which makes the NIC write to different buffers depending on the target port.

In practice though, you only have a limited amount of these buffers, and it causes complications if multiple processes need to consume the same multicast.

amluto

5 hours ago

I don’t think the result would be compatible with the socket or io_uring API, but maybe io_uring could be extended a bit. Basically the kernel would opportunistically program a “flow director” or similar rule to send packets to special rx queue, and that queue would point to (pinned) application memory. Getting this to be compatible with iptables/nftables would be a mess or maybe entirely impossible.

I’ve never seen the accelerated steering stuff work well in practice, sadly. The code is messy, the diagnostics are basically nonexistent, and it’s not clear to me that many drivers support it well.

fragmede

7 hours ago

RDMA is common for high performance applications but it doesn't work over the Internet.

Danieru

6 hours ago

It's a good thing the NIC is connected over pcie then.

shaklee3

6 hours ago

You can do GPUdirect over the Internet without RDMA though.

jpgvm

2 hours ago

GPUDirect relies on the PeerDirect extensions for RDMA and are thus an extension to the RDMA verbs, not a separate an independent thing that works without RDMA.

derefr

6 hours ago

Presuming that this is a server that has One (public) Job, couldn't you:

1. dedicate a NIC to the application;

2. and have the userland app open a packet socket against the NIC, to drink from its firehose through MMIO against the kernel's own NIC DMA buffer;

...all without involving the kernel TCP/IP (or in this case, UDP/IP) stack, and any of the accounting logic squirreled away in there?

(You can also throw in a BPF filter here, to drop everything except UDP packets with the expected specified ip:port — but if you're already doing more packet validation at the app level, you may as well just take the whole firehose of packets and validate them for being targeted at the app at the same time that they're validated for their L7 structure.)

amluto

5 hours ago

I think DPDK does something like this. The NIC is programmed to aim the packets in question at a specific hardware receive queue, and that queue is entirely owned by a userspace program.

A lot of high end NICs support moderately complex receive queue selection rules.

SSLy

2 hours ago

> 1. dedicate a NIC to the application;

you need to respond to ICMPs which have different proto/header number than UDP or TCP.

raggi

6 hours ago

UDP offload gets you implicitly today:

- 64 packets per syscall, which is enough data to amortize the syscall overhead - a single packet is not.

- UDP offload optionally lets you defer checksum computation, often offloading it to hardware.

- UDP offload lets you skip/reuse route lookups for subsequent packets in a bundle.

What UDP offload is no good for though, is large scale servers - the current APIs only work when the incoming packet chains neatly organize into batches per peer socket. If you have many thousands of active sockets you’ll stop having full bundles and the overhead starts sneaking back in. As I said in another thread, we really need a replacement for the BSD APIs here, they just don’t scale for modern hardware constraints and software needs - much too expensive per packet.

infogulch

7 hours ago

In my head the main benefit of QUIC was always multipath, aka the ability to switch interfaces on demand without losing the connection. There's MPTCP but who knows how viable it is.

modeless

2 hours ago

Is that actually implemented and working in practice? My connection still hangs whenever my wifi goes out of range...

rocqua

5 hours ago

Mptcp sees use in the Telco space, so they probably know.

JoshTriplett

7 hours ago

Among other things, GRO (receive offloading) means you can get more data off of the network card in fewer operations.

Linux has receive packet steering, which can help with getting packets from the network card to the right CPU and the right userspace thread without moving from one CPU's cache to another.

apitman

7 hours ago

Ditching head of line blocking is potentially a big win, but I really wish it wouldn't have come with so much complexity.

10000truths

6 hours ago

Bulk throughout isn't on par with TLS mainly because NICs with dedicated hardware for QUIC offload aren't commercially available (yet). Latency is undoubtedly better - the 1-RTT QUIC handshake substantially reduces time-to-first-byte compared to TLS.

Vecr

7 hours ago

I think one of the original drivers was the ability to quickly tweak parameters, after Linux rejected what I think was userspace adjustment of window sizing to be more aggressive than the default.

The Linux maintainers didn't want to be responsible for congestion collapse, but UDP lets you spray packets from userspace, so Google went with that.

RachelF

7 hours ago

Also bear in mind that many of today's network cards have processors in them that handle much of the TCP/IP overhead.

kccqzy

6 hours ago

That's mostly still for the data center. Which end-user network cards that I can buy can do TCP offloading?

phil21

6 hours ago

Unless I’m missing something here, pretty much any Intel nic released in the past decade should support tcp offload. I imagine the same is true for Broadcom and other vendors as well, but I don’t have something handy to check.

JoshTriplett

6 hours ago

Some wifi cards offload a surprising amount in order to do wake-on-wireless, but that's not for performance.

nextaccountic

3 hours ago

Do you mean that under the same workload, TCP will perform better?

sbstp

7 hours ago

Even HTTP/2 seems to have been rushed[1]. Chrome has removed support for server push. Maybe more thought should be put into these protocols instead of just rebranding whatever Google is trying to impose on us.

[1] https://varnish-cache.org/docs/trunk/phk/h2againagainagain.h...

KaiserPro

3 hours ago

HTTP2 was a prototype that was designed by people who either assumed that mobile internet would get better much quicker than it did, or who didn't understand what packet loss did to throughput.

I suspect part of the problem is that some of the rush is that people at major companies will get a promotion if they do "high impact" work out in the open.

HTTP/2 "solves head of line blocking" which is doesn't. It exchanged an HTTP SSL blocking issues with TCP on the real internet issue. This was predicted at the time.

The other issue is that instead of keeping it a simple protocol, the temptation to add complexity to aid a specific use case gets too much. (It's human nature I don't blame them)

surajrmal

7 hours ago

It's okay to make mistakes, that's how you learn and improve. Being conservative has drawbacks of its own. Id argue we need more parties involved earlier in the process rather than just time.

zdragnar

6 hours ago

It's a weird balancing act. On the other hand, waiting for everyone to agree on everything means that the spec will take a decade or two for everyone to come together, and then all the additional time for everyone to actively support it.

AJAX is a decent example. Microsoft's Outlook Web Access team implemented XMLHTTP as an activex thing for IE 5 and soon the rest of the vendors adopted it as a standard thing as XmlHttpRequest objects.

In fact, I suspect the list of things that exist in browsers because one vendor thought it was a good idea and everyone hopped on board is far, far longer than those designed by committee. Often times, the initially released version is not exactly the same that everyone standardized on, but they all get to build on the real-world consequences of it.

I happen to like the TC39 process https://tc39.es/process-document/ which requires two live implementations with use in the wild for something to get into the final stage and become an official part of the specification. It is obviously harder for something like a network stack than a JavaScript engine to get real world use and feedback, but it has helped to keep a lot of the crazier vendor specific features at bay.

est

5 hours ago

I don't blame Google, all major version changes are very brave, I praised them. The problem is lack of non-google protocols for competition.

crashingintoyou

2 hours ago

Don't have access to the published version but draft at https://arxiv.org/pdf/2310.09423 mentions ping RTT at 0.23ms.

As someone frequently at 150ms+ latency for a lot of websites (and semi-frequently 300ms+ for non-geo-distributed websites), in practice with the latency QUIC is easily the best for throughput, HTTP/1.1 with a decent number of parallel connections is a not-that-distant second, and in a remote third is HTTP/2 due to head-of-line-blocking issues if/when a packet goes missing.

M2Ys4U

28 minutes ago

>The results show that QUIC and HTTP/2 exhibit similar performance when the network bandwidth is relatively low (below ∼600 Mbps)

>Next, we investigate more realistic scenarios by conducting the same file download experiments on major browsers: Chrome, Edge, Firefox, and Opera. We observe that the performance gap is even larger than that in the cURL and quic_client experiments: on Chrome, QUIC begins to fall behind when the bandwidth exceeds ∼500 Mbps.

Okay, well, this isn't going to be a problem over the general Internet, it's more of a problem in local networks.

For people that have high-speed connections, how often are you getting >500Mbps from a single source?

dathinab

41 minutes ago

it says it isn't fast _enough_

but as far as I can tell it's fast _enough_ just not as fast as it could be

mainly they seem to test situations related to bandwidth/latency which aren't very realistically for the majority of users (because most users don't have supper fast high bandwidth internet)

this doesn't meant QUIC can't be faster or we shouldn't look into reducing overhead, just it's likely not as much as a deal as it might initially loook

botanical

7 hours ago

> we identify the root cause to be high receiver-side processing overhead

I find this to be the issue when it comes to Google, and I bet it was known before hand; pushing processing to the user. For example, the AV1 video codec was deployed when no consumer had HW decoding capabilities. It saved them on space at the expense of increased CPU usage for the end-user.

I don't know what the motive was there; it would still show that they are carbon-neutral while billions are busy processing the data.

danpalmer

6 hours ago

> the AV1 video codec was deployed when no consumer had HW decoding capabilities

This was a bug. An improved software decoder was deployed for Android and for buggy reasons the YouTube app used it instead of a hardware accelerated implementation. It was fixed.

Having worked on a similar space (compression formats for app downloads) I can assure you that all factors are accounted for with decisions like this, we were profiling device thermals for different compression formats. Setting aside bugs, the teams behind things like this are taking wide-reaching views of the ecosystem when making these decisions, and at scale, client concerns almost always outweigh server concerns.

watermelon0

5 hours ago

YouTube had the same issue with VP9 on laptops, where you had to use an extension to force H264, to avoid quickly draining the battery.

toastal

4 hours ago

If only they would give us JXL on Android

anfilt

6 hours ago

Well I will say if your running servers hit billions of times per day. Offloading processing to the client when safe to do so starts make sense financially. Google does not have to pay for your CPU or storage usage ect...

Also I will say if said overhead is not too much it's not that bad of a thing.

kccqzy

6 hours ago

This is indeed an issue but it's widespread and everyone does it, including Google. Things like servers no longer generating actual dynamic HTML, replaced with servers simply serving pure data like JSON and expecting the client to render it into the DOM. It's not just Google that doesn't care, but the majority of web developers also don't care.

SquareWheel

6 hours ago

There's clearly advantages to writing a web app as an SPA, otherwise web devs wouldn't do it. The idea that web devs "don't care" (about what exactly?) really doesn't make any sense.

Moving interactions to JSON in many cases is just a better experience. If you click a Like button on Facebook, which is the better outcome: To see a little animation where the button updates, or for the page to reload with a flash of white, throw away the comment you were part-way through writing, and then scroll you back to the top of the page?

There's a reason XMLHttpRequest took the world by storm. More than that, jQuery is still used on more than 80% of websites due in large part to its legacy of making this process easier and cross-browser.

tock

4 hours ago

I don't think Facebook is the best example given the sheer number of loading skeletons I see on their page.

Banou

an hour ago

I think one of the reasons Google choose UDP is that it's already a popular protocol, on which you can build reliable packets, while also having the base UDP unreliability on the side.

From my perspective, which is a web developer's, having QUIC, allowed the web standards to easily piggy back on top of it for the Webtransport API, which is ways better than the current HTTP stack and WebRTC which is a complete mess. Basically giving a TCP and UDP implementation for the web.

Knowing this, I feel like it makes more sense to me why Google choose this way of doing, which some people seem to be criticizing.

apitman

6 hours ago

Currently chewing my way laboriously through RFC9000. Definitely concerned by how complex it is. The high level ideas of QUIC seem fairly straight forward, but the spec feels full of edge cases you must account for. Maybe there's no other way, but it makes me uncomfortable.

I don't mind too much as long as they never try to take HTTP/1.1 from me.

ironmagma

6 hours ago

Considering they can’t really even make IPv6 happen, that seems like a likely scenario.

BartjeD

5 hours ago

https://www.google.com/intl/en/ipv6/statistics.html

I think it's just your little corner of the woods that isn't adopting it. Over here the trend is very clearly to move away from IPv4, except for legacy reasons.

alt227

42 minutes ago

The majority of this traffic is mobile devices. Most use ipv6 by default.

Uptake on dekstop/laptops/servers is still extremely low and will be for a long time to come.

apitman

4 hours ago

The important milestone is when it's safe to turn IPv4 off. And that's not going to happen as long as any country hasn't fully adopted it, and I don't think that's ever going to happen. For better or worse NAT handles outgoing connections and SNI routing handles incoming connections for most use cases. Self-hosting is the most broken but IMO that's better handled with tunneling anyway so you don't expose your home IP.

jeroenhd

3 hours ago

IPv4 doesn't need to be off. Hacks and workarounds like DS-Lite can stay with us forever, just like hacks and workarounds like NAT and ALGs will.

consp

an hour ago

DS-lite (aka CGNAT), now we don't need to give the costumers a proper IP address anymore. It should be banned as it limits IPv6 adoption and it getting more and more use for "customers own good" and is annoying as hell to work around.

AlienRobot

2 hours ago

>I think it's just your little corner of the woods that isn't adopting it.

The graph says adoption is under 50%.

Even U.S. is at only 50%. Some countries are under 1%.

jakeogh

3 hours ago

I think keeping HTTP/1.1 is almost as important as not dropping IPV4 (there are good reasons to not being able to tag everything; it's harder to block a country than a user.) For similar reasons we should keep old protocols.

On a positive note, AFAICT 90%(??) of QUIC implementations ignored the proposed the spin bit: https://news.ycombinator.com/item?id=20990754

jacob019

8 hours ago

Maybe moving the connection protocol into userspace isn't such a great plan.

mrweasel

4 hours ago

Maybe moving the entire application to the browser/cloud wasn't the best idea for a large number of use cases?

Video streaming, sure, but we're already able to stream 4K video over a 25Mbit line. With modern internet connections being 200Mbit to 1Gbit, I don't see that we need the bandwidth in private homes. Maybe for video conferencing in large companies, but that also doesn't need to be 4K.

The underlying internet protocols are old, so there's no harm in assessing if they've outlived their usefulness. However, we should also consider in web applications and "always connected" is truly the best solution for our day to day application needs.

kuschku

2 hours ago

> With modern internet connections being 200Mbit to 1Gbit, I don't see that we need the bandwidth in private homes

Private connections tend to be asymmetrical. In some cases, e.g. old DOCSIS versions, that used to be due to technical necessity.

Private connections tend to be unstable, the bandwidth fluctuates quite a bit. Depending on country, the actually guaranteed bandwidth is somewhere between half of what's on the sticker, to nothing at all.

Private connections are usually used by families, with multiple people using it at the same time. In recent years, you might have 3+ family members in a video call at the same time.

So if you're paying for a 1000/50 line (as is common with DOCSIS deployments), what you're actually getting is usually a 400/20 line that sometimes achieves more. And those 20Mbps upload are now split between multiple people.

At the same time, you're absolutely right – Gigabit is enough for most people. Download speeds are enough for quite a while. We should instead be increasing upload speeds and deploying FTTH and IPv6 everywhere to reduce the latency.

simiones

an hour ago

The problem is that the biggest win by far with QUIC is merging encryption and session negotiation into a single packet, and the kernel teams have been adamant about not wanting to maintain encryption libraries in kernel. So, QUIC or any other protocol like it being in kernel is basically a non-starter.

foota

8 hours ago

I don't have access to the article, but they're saying the issue is due to client side ack processing. I suspect they're testing at bandwidths far beyond what's normal for consumer applications.

dartharva

8 hours ago

It's available on arxiv and nope, they are testing mostly for regular 4G/5G speeds.

https://arxiv.org/pdf/2310.09423

DannyBee

7 hours ago

4g tops out at 1gbps only when one person is on the network. 5g tops out at ~10gbps (some 20gbps i guess) only when one person is on the network.

They are testing at 1gbps.

This is not regular 4g speed for sure, and it's a rare 5g speed. regular 5g speed is (in the US) 40-50mbps, so, 20x slower than they are testing.

vrighter

4 hours ago

Gigabit fiber internet is quite cheap and increasingly available (I'm not from the US). I don't just use the internet over a 4/5g connection. This definitely affects more people than you think.

izend

7 hours ago

What about 1gbps fiber at home, it is becoming common in Canada. I have 1gbps up/down.

dartharva

7 hours ago

Still won't be beyond normal consumer applications' capacity, right?

KaiserPro

3 hours ago

Http1.1 has been around for 28 years. At the time, gigabit ethernet was _expensive_. 9600baud on mobile was rare.

and yet http1.1 runs on gigabit networks pretty well.

kccqzy

6 hours ago

The flexibility and ease of changing a userspace protocol IMO far outweighs anything else. If the performance problem described in this article (which I don't have access to) is in userspace QUIC code, it can be fixed and deployed very quickly. If similar performance issue were to be found in TCP, expect to wait multiple years.

vrighter

4 hours ago

Well, the problem is probably that it is in userspace in the first place.

01HNNWZ0MV43FF

7 hours ago

Does QUIC mandate that, or is that just the stepping stone until the chicken-and-egg problem is solved and we get kernel support?

kmeisthax

6 hours ago

No, but it depends on how QUIC works, how Ethernet hardware works, and how much you actually want to offload to the NIC. For example, QUIC has TLS encryption built-in, so anything that's encrypted can't be offloaded. And I don't think most people want to hand all their TLS keys to their NIC[0].

At the very least you probably would have to assign QUIC its own transport, rather than using UDP as "we have raw sockets at home". Problem is, only TCP and UDP reliably traverse the Internet[1]. Everything in the middle is sniffing traffic, messing with options, etc. In fact, Google rejected an alternate transport protocol called SCTP (which does all the stream multiplexing over a single connection that QUIC does) specifically because, among other things, SCTP's a transport protocol and middleboxes choke on it.

[0] I am aware that "SSL accelerators" used to do exactly this, but in modern times we have perfectly good crypto accelerators right in our CPU cores.

[1] ICMP sometimes traverses the internet, it's how ping works, but a lot of firewalls blackhole ICMP. Or at least they did before IPv6 made it practically mandatory to forward ICMP packets.

justinphelps

an hour ago

SCTP had already solved the problem that QUIC proposes to solve. Google of all companies has the influence to properly implement and accommodate other L4 protocols. QUIC seems like doubling down on a hack and breaks the elegance of OSI model.

_flux

5 hours ago

I don't think passing just the session keys to NIC would sound so perilous, though.

vlovich123

6 hours ago

As others in the thread summarized the paper as saying the issue is ack offload. That has nothing to do with whether the stack is in kernel space or user space. Indeed there’s some concern about this inevitable scenario because the kernel is so slow moving, updates take much longer to propagate to applications needing them without a middle ground whereas as user space stacks they can update as the endpoint applications need them to.

wmf

7 hours ago

On mobile the plan is to never use kernel support so that apps can have the latest QUIC on old kernels.

mholt

8 hours ago

I don't have access to the paper but based on the abstract and a quick scan of the presentation, I can confirm that I have seen results like this in Caddy, which enables HTTP/3 out of the box.

HTTP/3 implementations vary widely at the moment, and will likely take another decade to optimize to homogeneity. But even then, QUIC requires a lot of state management that TCP doesn't have to worry about (even in the kernel). There's a ton of processing involved with every UDP packet, and small MTUs, still engrained into many middle boxes and even end-user machines these days, don't make it any better.

So, yeah, as I felt about QUIC ... oh, about 6 years ago or so... HTTP/2 is actually really quite good enough for most use cases. The far reaches of the world and those without fast connections will benefit, but the majority of global transmissions will likely be best served with HTTP/2.

Intuitively, I consider each HTTP major version an increased order of magnitude in complexity. From 1 to 2 the main complexities are binary (that's debatable, since it's technically simpler from an encoding standpoint), compression, and streams; then with HTTP/3 there's _so, so much_ it does to make it work. It _can_ be faster -- that's proven -- but only when networks are slow.

TCP congestion control is its own worst enemy, but when networks aren't congested (and with the right algorithm)... guess what. It's fast! And the in-order packet transmissions (head-of-line blocking) makes endpoint code so much simpler and faster. It's no wonder TCP is faster these days when networks are fast.

I think servers should offer HTTP/3 but clients should be choosy when to use it, for the sake of their own experience/performance.

geocar

3 hours ago

I turned off HTTP2 and HTTP3 a few months ago.

I see a few million daily page views: Memory usage has been down, latency has been down, network accounting (bandwidth) is about the same. Revenue (ads) is up.

> It _can_ be faster -- that's proven -- but only when networks are slow.

It can be faster in a situation that doesn't exist.

It sounds charitable to say something like "when networks are slow" -- but because everyone has had a slow network experience, they are going to think that QUIC would help them out, but real world slow network problems don't look like the ones that QUIC solves.

In the real world, QUIC wastes memory and money and increases latency on the average case. Maybe some Google engineers can come up with a clever heuristic involving TCP options or the RTT information to "switch on QUIC selectively" but honestly I wish they wouldn't bother, simply because I don't want to waste my time benchmarking another half-baked google fart.

withinboredom

an hour ago

The thing is, very few people who use "your website" are on slow, congested networks. The number of people who visit google on a slow, congested network (airport wifi, phones at conferences, etc) is way greater than that. This is a protocol to solve a google problem, not a general problem or even a general solution.

geocar

an hour ago

Since I buy ads on Google to my site I would argue it’s representative of Google’s traffic.

But nice theory.

altairprime

7 hours ago

The performance gap is shown to be due to hardware offloading, not due to congestion control, in the arxiv paper above.

vlovich123

7 hours ago

And because Quic is encrypted at a fundamental level, offload likely means needing to share keys with the network card which is a trust concern.

10000truths

5 hours ago

This is already how TLS offload is implemented for NICs that support it. The handshake isn't offloaded, only the data path. So essentially, the application performs the handshake, then it calls setsockopt to convert the TCP socket to a kTLS socket, then it passes the shared key, IV, etc. to the kTLS socket, and the OS's network stack passes those parameters to the NIC. From there, the NIC only handles the bulk encryption/decryption and record encapsulation/decapsulation. This approach keeps the drivers' offload implementations simple, while still allowing the application/OS to manage the session state.

truetraveller

7 hours ago

I'd say Http1.1 is good enough for most people, especially with persistent connections. Http2 is an exponential leap in complexity, and burdensome/error-prone for clients to implement.

apitman

7 hours ago

The day they come for HTTP/1.1 is the day I die on a hill.

01HNNWZ0MV43FF

7 hours ago

Yeah I imagine 1 + 3 being popular. 1.1 is so simple to implement and WebTransport / QUIC is basically a teeny VPN connection.

larsonnn

an hour ago

Site is blocking Apples private relay :(

sylware

an hour ago

To go faster, you need to simplify a lot.

bell-cot

an hour ago

To force a lucrative cycle of hardware upgrades, you need software to do the opposite.

True story: Back in the early aughties, Intel was hosting regular seminars for dealers and integrators selling either Intel-made PC's, or white box ones. I attended one of those, and the Intel rep openly claimed that Intel had challenged Microsoft to produce software which could bring a GHz CPU to its knees.

latentpot

6 hours ago

QUIC is the standard problem across n number of clients who choose Zscaler and similar content inspection tools. You can block it at the policy level but you also need to have it disabled at the browser level. Which sometimes magically turns on again and leads to a flurry of tickets for 'slow internet', 'Google search not working' etcetera.

watermelon0

5 hours ago

Wouldn't the issue in this case be with Zscaler, and not with QUIC?

v1ne

an hour ago

Hmm, interesting. We also have a policies imposed by the Regulator™ that leads to us inspecting all web traffic. All web traffic goes through a proxy that's configured in the web browser. No proxy, no internet.

Out of curiosity: What's your use case to use ZScaler for this inspection instead?

chgs

4 hours ago

The problem here is choosing software like zscaler

mcosta

2 hours ago

Zscaler is not chosen, it is imposed by the corporation

jiggawatts

4 hours ago

I wonder if the trick might be to repurpose technology from server hardware: partition the physical NIC into virtual PCI-e devices with distinct addresses, and map to user-space processes instead of virtual machines.

So in essence, each browser tab or even each listening UDP socket could have a distinct IPv6 address dedicated to it, with packets delivered into a ring buffer in user-mode. This is so similar to what goes on with hypervisors now that existing hardware designs might even be able to handle it already.

Just an idle thought...

jeroenhd

2 hours ago

I've often pondered if it was possible to assign every application/tab/domain/origin a different IPv6 address to exchange data with, to make tracking people just a tad harder, but also to simplify per-process firewall rules. With the bare minimum, a /64, you could easily host billions of addresses per device without running out.

I think there may be a limit to how many IP addresses NICs (and maybe drivers) can track at once, though.

What I don't really get is why QUIC had to be invented when multi-stream protocols like SCTP already exist. SCTP brings the reliability of TCP with the multi-stream system that makes QUIC good for websites. Piping TLS over it is a bit of a pain (you don't want a separate handshake per stream), but surely there could be techniques to make it less painful (leveraging 0-RTT? Using session resumptions with tickets from the first connected stream?).

simiones

an hour ago

First and foremost, you can't use SCTP on the Internet, so the whole idea is dead on arrival. The Internet only really works for TCP and UDP over IP - anything else, you have a loooooong tail of networks which will drop the traffic.

Secondly, the whole point of QUIC is to merge the TLS and transport handskakes into a single packet, to reduce RTT. This would mean you need to modify SCTP anyway to allow for this use case, so even what small support exists for SCTP in the large would need to be upgraded.

Thirdly, there is no reason to think that SCTP is better handled than UDP at the kernel's IP stack level. All of the problems of memory optimizations are likely to be much worse for SCTP than for UDP, as it's used far, far less.

astrange

14 minutes ago

Is there a service like test-ipv6 to see if SCTP works? Obviously harder to run since you can't do it in a browser.

KaiserPro

3 hours ago

Or just have multiple TCP streams. Super simple, low cost, uses all the optimisations we have already.

when the latency/packet drop is low, prune the connections and you get monster speed.

When the latency/loss is high, grow the number of concurrent connections to overcome slow start.

Doesn't give you QUIC like multipath though.

m_eiman

14 minutes ago

There’s Multipath TCP.

Sparkyte

7 hours ago

Maybe I'm the only person who thinks that trying to make existing internet protocols faster is wasted energy. But who am I to say anything.

cheema33

4 hours ago

> Maybe I'm the only person who thinks that trying to make existing internet protocols faster is wasted energy. But who am I to say anything.

If you have a valid argument to support your claim, why not present it?

Sparkyte

4 hours ago

They are already expected standards so when you create optimizations you're building on functions that need to be supported additionally on top of them. This leads to incompatibility and sometimes often worse performance as what is being experienced here with QUIC.

You can read more about such things from, The Evolution of the Internet Congestion Control. https://groups.csail.mit.edu/ana/Publications/The_Evolution_...

A good solution is to create a newer protocol when the limits of an existing protcol are exceeded. No one thought of needing HTTPS long ago and now we have 443 for HTTP security. If we need something to be faster and it has already achieved an arbitrary limit for the sake of backward compatibility it would be better to introduce a new protocol.

I dislike the idea that we're turning into another Reddit where we are pointing fingers at people for updoots. If you dislike my opinion please present one equal to where that can be challenged.

paulgb

3 hours ago

> A good solution is to create a newer protocol when the limits of an existing protcol are exceeded.

It’s not clear to me how this is different to what’s happening. Is your objection that they did it on top of UDP instead of inventing a new transport layer?

Sparkyte

3 hours ago

No, actually what I mean was that QUIC being a protocol on UDP was intended to take advantage of the speed of UDP to do things faster that some TCP protocols did. While the merit is there the optimizations done on TCP itself has drastically improved the performance of TCP based protocols. UDP is still exceptional but it is like using a crowbar to open bottle. Not exactly the tool intended for the purpose.

Creating a new protocol starting from scratch would be better effort spent. A QUICv2 is on the way. https://datatracker.ietf.org/doc/rfc9369/

I don't think it addresses the problems with QUICv1 in terms of lightweight performance and bandwidth which the post claims QUIC lacks.

simiones

an hour ago

Creating a new transport protocol for use on the whole Internet is a massive undertaking, not only in purely technical terms, but much more difficult, in political terms. Getting all of the world's sysadmins to allow your new protocol is a massive massive undertaking.

And if you have the new protocol available today, with excellent implementations for Linux, Windows, BSD, MacOS, Apple iOS, and for F5, Cisco, etc routers done, it will still take an absolute minimum of 5-10 years until it starts becoming available on the wider Internet, and that is if people are desperate to adopt it. And the vast majority of the world will not use it for the next 20 years.

The time for updating hardware to allow and use new protocols is going to be a massive hurdle to anything like this. And the advantage to doing so over just using UDP would have to be monumental to justify such an effort.

The reality is that there will simply not be a new transport protocol used on the wide internet in our lifetimes. Trying to get one to happen is a pipe dream. Any attempts at replacing TCP will just use UDP.

Veserv

2 hours ago

QUICv2 is not really a new standard. It explicitly exists merely to intentionally rearrange some fields to prevent standard hardcoding/ossification and exercise the version negotiation logic of implementations. It says so right in the abstract:

“Its purpose is to combat various ossification vectors and exercise the version negotiation framework.”

likis

3 hours ago

You posted your opinion without any kind of accompanying argument, and it was also quite unclear what you meant. Whining about being a target and being downvoted is not really going to help your case.

I initially understood your first post as: "Let's not try to make the internet faster"

With this reply, you are clarifying your initial post that was very unclear. Now I understand it as:

"Let's not try to make existing protocols faster, let's make new protocols instead"

Sparkyte

3 hours ago

More that if a protocol has met it's limit and you are at a dead end it is better to build a new one from the ground up. Making the internet faster is great but you eventually hit a wall. You need to be creative and come up with better solutions.

In fact our modern network infrastructure returns on designs intended for limited network performance. Our networks are fiber and 5g which are roughly 170,000 times faster and wider since the initial inception of the internet.

Time for a QUICv2

https://datatracker.ietf.org/doc/rfc9369/

But I don't think it addresses the disparity between it and lightweight protocols as networks get faster.

foul

2 hours ago

It's wasted energy when they aren't used at their full capacity.

I think that GoogleHTTP has real-world uses for bad connectivity or in datacenters where they can fine-tune their data throughput (and buy crazy good NICs), but it seems that to use it for replacing TCP (which seems to be confirmed as very good when receiver and sender aren't controlled by the same party) the world needs a hardware overhaul or something.