No such thing as exactly-once delivery

78 pointsposted 8 hours ago
by todsacerdoti

57 Comments

Animats

4 hours ago

There's no such thing as finite-time arbitration for multiprocessor requests to memory, but it works anyway. This is similar. You can't guarantee time-bounded exactly once delivery, but after enough interchanges the probability of being wrong approaches zero.

This shows up explicitly in blockchain systems, where confidence improves as the number of confirmation cycles increases.

YZF

an hour ago

This is the first time I'm hearing about the topic of arbitrating multiprocessor access to memory. Isn't there a trivial counterexample where the bandwidth is divided by the number of processors and each processor gets a timeslot?

Processors, despite being a distributed system (well really every ASIC is), don't typically suffer from your classical distributed systems style issues because they have no failures, or rather they only have certain catastrophic failure modes that are low probability and well controlled for. Your typical distributed system to contrast has constant failures.

rcxdude

an hour ago

It's more to do with the underlying electronics: digital is an abstraction, everything is analog underneath, and logic gates and registers are actually just very non-linear analog circuits. What this means is that it's always in principle possible for an asynchronous edge (i.e. one that is not aligned to the other edges in the system) to be arbitrarily non-determinate: never actually being decided as a 1 or a 0 as it propagates through the digital circuit. This isn't an entirely theoretical issue, either: it can and will cause very strange behaviour in a digital system because a marginal or metastable value can be interpreted differently by different parts of the circuit that should be seeing the same value. It's possible in practice to push the chance of this happening extremely low, but it comes at the cost of latency: you need to spend more time to allow the circuit to actually settle into one state or the other.

Within a fully synchronous system, it's possible to avoid this, though. But most multi-processor systems are not synchronous.

londons_explore

37 minutes ago

> But most multi-processor systems are not synchronous.

Is this actually true? A modern AMD or Intel CPU doesn't have one quartz crystal per core, so therefore all per-core clocks have to be derived from one clock source, meaning the relationship between all the clock frequencies and phases is deterministic, even if not specced in the datasheet.

YZF

20 minutes ago

There would be a single clock typically but OP is right in the sense that as the signal propagates through the system you can get nondeterministic effects: https://en.wikipedia.org/wiki/Metastability_(electronics)

In practice the probability of that is just driven low enough that it's not an issue.

Animats

an hour ago

> This is the first time I'm hearing about the topic of arbitrating multiprocessor access to memory.

See [1]. This was a very real problem in early multiprocessor computer design.

[1] https://en.wikipedia.org/wiki/Arbiter_%28electronics%29

fanf2

17 minutes ago

Are there many (or even any) multi-drop buses in modern computers? I get the impression that these days almost everything is point-to-point links. But I know less about what’s happening on silicon. I have seen talk of crossbar interconnects on microcontrollers, which (aiui) are more active, unlike passive buses?

dang

3 hours ago

Recent and related:

Yes, you can have exactly-once delivery - https://news.ycombinator.com/item?id=41598006 - Sept 2024 (133 comments)

Also. Others?

You cannot have exactly-once delivery (2015) - https://news.ycombinator.com/item?id=34986691 - March 2023 (217 comments)

Show HN: Exactly-once delivery counter using Watermill messaging library - https://news.ycombinator.com/item?id=26888303 - April 2021 (9 comments)

High-performance, exactly-once, failure-oblivious distributed programming (2018) - https://news.ycombinator.com/item?id=20562733 - July 2019 (26 comments)

Exactly-once Semantics: How Kafka Does it - https://news.ycombinator.com/item?id=14670801 - June 2017 (39 comments)

Delivering Billions of Messages Exactly Once - https://news.ycombinator.com/item?id=14664405 - June 2017 (133 comments)

You Cannot Have Exactly-Once Delivery - https://news.ycombinator.com/item?id=9266725 - March 2015 (55 comments)

Exactly-Once Messaging in Kafka - https://news.ycombinator.com/item?id=8880612 - Jan 2015 (8 comments)

Exactly-Once Delivery May Not Be What You Want - https://news.ycombinator.com/item?id=8614264 - Nov 2014 (10 comments)

notfed

3 hours ago

Tangent, but: this reminds me of many philosophical debates, e.g. "do we have free will?".

What I've found is that usually what's happening is that the "debaters" assume different definitions of the thing they're debating about, which means they're really debating about two entirely different things, with the same name, and that's the real reason why they've come to different conclusions.

klabb3

7 minutes ago

Not sure why you’re downvoted. I think this is completely true. To add more examples, I’ve seen a lot of debates about async IO and “non-blocking” where people talk over each other because they clearly mean different things. I tend to avoid ambiguous terms, but it’s an uphill battle.

rzzzt

2 hours ago

One recommendation I've read is that in such cases you should treat the single label you and your opponent have attached to two different ideas as forbidden. Describe it in more detail to reveal the difference (don't just swap words in the starting combination with their synonyms).

wmf

2 hours ago

That's fine for discussions between experts but when a hundred noobs enter the discussion it's just unpaid tutoring. These threads are bringing out a lot of "I've discovered something all the distributed systems experts missed" energy.

aidenn0

an hour ago

For me "do we have free will" debate is useless because if the answer is "no" then that implies the result of the debate is predetermined and all involved lack any agency to change the outcome.

Therefore any adult arguing the "no free will" side is either

1. Disingenuous

2. Purposefully playing Candyland as an adult.

Either of which is so damaging to the Ethos that I lack any desire to continue the debate.

jchw

26 minutes ago

> that implies the result of the debate is predetermined and all involved lack any agency to change the outcome.

In a superdeterministic world, the presence of an illusion of free will is baked into everything humans do anyways. If the world is superdeterministic, but you can't actually predict what's about to happen with any additional certainty, then it doesn't really change anything anyway. So, there's no reason to change the way you argue based on the answer to the question in my opinion.

Of course, now I'm arguing about whether arguing about free will makes any sense, which is perhaps even more silly, but alas.

yarg

an hour ago

That doesn't follow.

There's nothing disingenuous or moronic about believing that physical processes might be fundamentally deterministic.

aidenn0

34 minutes ago

> There's nothing disingenuous or moronic about believing that physical processes might be fundamentally deterministic.

My argument is that a game of Candyland is fundamentally deterministic in the same way.

yarg

10 minutes ago

Fair enough, but the parameter space is incomparable (and in the case of space-time irreplicable and chaotic, making events unrepeatable).

And, unlike the game, it is impossible to hold the oracle's view (so even if freewill is illusory, it's an unbreakable illusion).

griffgeorge

3 hours ago

> That's why we don't say Sequin offers exactly-once delivery anywhere. We say it offers at-least-once delivery and exactly-once processing.

The author should note that the homepage at https://sequinstream.com/ does in fact claim that Sequin supports "Exactly-once delivery" under the "Key features" heading.

notfed

3 hours ago

Well that doesn't seem consistent.

bhaney

3 hours ago

It will be eventually

kibwen

8 minutes ago

Unless it's available and partition-tolerant, that is.

hinkley

28 minutes ago

Engineers often don't get to touch the website.

tombert

4 hours ago

Reminds me of this: https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...

Part of the reason I find distributed and concurrent computing so fascinating is that we lose so many assumptions about how computers are supposed to work; ordering guarantees, data actually arriving to destinations, fairness, etc.

While it is of course possible for data to get lost or mangled in a single-threaded, non-distributed application, it's rare enough to where you often don't need to plan much around it. Once you add concurrency, and especially if you make it distributed, you suddenly have to plan around the eventuality of the messages not arriving, or arriving in the wrong order, or arriving multiple times.

Personally, I've gotten into the habit of PlusCal and TLA+ for everything, and modeling in the assumption that stuff might never arrive or arrive multiple times, which can work as a very good sanity check (though it's of course not a silver bullet).

magicalhippo

an hour ago

This feels a bit like those relativity puzzles, for example the barn-pole paradox[1].

That is, delivered exactly-once seems to be a relative statement where one party can state they did deliver exactly-once while another party can disagree, and they're both correct relative to their local frames of reference.

So, like how GR has no global conservation of energy, just local, could one say that distributed systems have no global conservation of messages, only local?

Or am I just late night rambling again...

[1]: https://en.wikipedia.org/wiki/Ladder_paradox

benlivengood

an hour ago

Messages don't have to be delivered more than once to achieve exactly-once processing; the receiver can offer a query interface for delivery status. If every attempt at delivery checks for a unique message ID and only sends the message if it has not been received then you get exactly-once delivery of the message, if not the metadata.

Optimistically assuming the message has not been delivered yet will sometimes reduce average latency (most databases), but for very expensive messages it is better to check first (rsync)

dondraper36

4 hours ago

It's admirable when vendors don't try to abuse theory to make their product look better.

There was some blog post last week that supposedly proved the impossibility of exactly once delivery wrong, but that was quite expectedly another attempt to rename the standard terms.

convolvatron

4 hours ago

if you employ persistence to keep sequence numbers, and/or retransmissions and redundancy, you can drive the likelihood of distributed consensus over time arbitrarily close to probabilities where it doesn't matter anyways (like the earth suddenly being coincident with the sun).

so what are we arguing about?

sethammons

3 hours ago

there is a group of people that say that there is no such thing as exactly-once-delivery and they are correct. There is also a group of people who cannot hear nor read nor process the word "delivery" in that term and instead insist that exactly-once-processing is easily achievable via [deduplication, idempotent actions, counters, ...]. They are correct as long as they acknowledge "processing" vs "delivery" -- and there are several who refuse to acknowledge that and don't believe the distinction matters and they are wrong.

andrewflnr

3 hours ago

Not that I disagree with getting terms exactly right, but if the "processing" that happens exactly once in your messaging system is firing a "delivery" event to a higher level of the stack (over a local channel), that's observationally indistinguishable from exactly-once delivery to that higher level, isn't it? What am I missing?

wmf

3 hours ago

Nope, because that event could also be lost/duplicated.

Izkata

an hour ago

In the systems I'm used to, that "passing up" is a function call within the same codebase. There isn't a second delivery system for a message to be lost/duplicated, though sometimes they obscure it by making it look more like such a system.

ViewTrick1002

3 hours ago

The problems always stem from the side effects. You can achieve exactly once like properties in a contained system.

But if the processing contains for example sending an email then you need to apply said actions for all downstream actions, which is where exactly once as seen by the outside viewer gets hard.

rwiggins

3 hours ago

Out of curiosity, how would you describe TCP in these terms? Does the TCP stack's handling of sequence numbers constitute processing (on the client and server both I assume)? Which part(s) of a TCP connection could be described as delivery?

wmf

3 hours ago

From TCP's perspective it has delivered the data when read() returns successfully. This is also the point at which TCP frees the packet buffers. From the app's perspective, TCP is an at-most-once system because data can be lost in corner cases caused by failures.

(Of course plenty of people on the Internet use different definitions to arrive at the opposite conclusion but they are all wrong and I am right.)

YZF

an hour ago

I agree this is what you'd consider delivery. Also agree everyone else is wrong an I am right ;)

The similar question in TCP is what happens when the sender writes to their socket and then loses the connection. At this point the sender doesn't know whether the receiver has the data or not. Both are possible. For the sender to recover it needs to re-connect and re-send the data. Thus the data is potentially delivered more than once and the receiver can use various strategies to deal with that.

But sure, within the boundary of a single TCP connection data (by definition) is never delivered twice.

wellpast

3 hours ago

is there a real difference between delivery and processing?

In a streaming app framework (like Flink or Beam, or Kafka Streams), I can get an experience of exactly once "processing" by reverting to checkpoints on failure and re-processing.

But what's the difference between doing this within the internal stores of a streaming app or coordinating a similar checkpointing/recovery/"exactly-once" mechanism with an external "delivery destination"?

ithkuil

3 hours ago

Yes there is a difference and it's quite crucial to understanding how two groups of people can claim seemingly opposite things and both be correct.

It's important to note that by processing here we don't just mean any "computing" but the act of "committing the side effects of the computation"

wellpast

an hour ago

A typical delivery target is a data store or cache. Writing to such a delivery can be construed of as a “side effect”. But if you accept eventual consistency, achieving exactly once semantics is possible.

Eg a streaming framework will let you tally an exactly once counter. You can flush the value of that tally to a data store. External observers will see an eventually consistent exactly-once-delivery result.

mgsouth

an hour ago

Vendor: "Exactly once is possible. We do it!"

No, as a vendor you're either deliberately lying, or completely incompentent. Let's insert the words they don't want to.

"We do exactly once good enough for your application!"

How could you possibly know that?

"We do exactly once good enough for all practical purposes!"

So are my purposes practical? Am I a true scotsman?

"We do exactly once good enough for applications we're good enough for!"

Finally, a truthful claim. I'm not holding my breath.

Edit: More directly address parent:

It's not possible to even guarentee at-least-once in a finite amount of time, let alone exactly-once. For example, a high-frequency trading system wants to deliver exactly one copy of that 10-million-share order. And do it in under a microsecond. If you've got a 1.5 microsecond round trip, it's not possible to even get confirmation an order was received in that time limit, much less attempt another delivery. Your options are a) send once and hope. b) send lots of copies, with some kind of unique identifier so the receiver can do process-at-most-once, and hope you don't wind up sending to two different receiving servers.

SpicyLemonZest

3 hours ago

What you can't do, and what naive consumers of distributed systems often expect, is drive the probability arbitrarily low solely from the producer side. The consumer has to be involved, even if they consider themselves to be a single-node process and don't want to bother with transactionality or idempotency keys.

socketcluster

an hour ago

The "exactly once delivery is impossible" narrative got so strong that people feel that they have to redefine the word 'delivery' to make it fit the premise.

If a message can be guaranteed to be received one or more times but results in a single entry being inserted into a database (e.g. in an idempotent way using UUIDs to enforce uniqueness), then clearly, a consumer reading the records from that database would only receive the message exactly once. If you consider the intended recipient to be the process, not the host, then the definition of 'delivery' allows you to claim that exactly-once delivery is possible.

Of course this is a pragmatic, not mathematical argument. In theory you can't mathematically guarantee even at least once delivery because you can't guarantee that there's not going to be a Carrington event which will bring down the internet and drive humanity to extinction; thus preventing the consumer server from being rebooted.

BeeOnRope

17 minutes ago

In a sense it's just kicking the can down the road: how you ensure the consumer reads the message only once? It may fail at some point after it has read the message but updated some state to reflect that.

So it's not just a pointless argument about the semantics of the term "delivery": the fact that no communication channel can have exactly once delivery means that these systems are much more difficult to implement. For example, you can't just chain together 2 "exactly once" delivery systems into some longer one with a stateless middle node: instead you need some concept of idempotency that spans both systems or the middle node itself needs to de-duplicate (statefully) the first link and forward messages on to the second link.

YZF

an hour ago

What about other messages though? The question isn't usually limited to one message in isolation, it's related to the state of the system as a whole. So sure, trivially, setting one bit where that bit is never cleared can only be done once, so any process setting that bit is also trivially (and not very usefully?) "exactly once delivery".

mgsouth

an hour ago

On the contrary, that is an extremely useful process, and is the basis for vast swathes of multiprocessing algorithms (such as "has this thing finished initializing", or "has it been freed"). However, it is "at most once", not "exactly once". The bit may never be set.

YZF

24 minutes ago

Fair enough. A single bit that can only get set can indeed be useful.

thadt

3 hours ago

> With that definition, SQS, Kafka, and Sequin are all systems that guarantee exactly-once processing. The term processing captures both the delivery of the message and the successful acknowledgment of the message.

This terminology confuses my face. Maybe we need a better name for "exactly-once processing"? When I say "processing" I'm thinking of the actions that my program is going to take on said message - the very processing that they're talking about can fail. Can we not just say 'the message is held until acknowledgement'?

theptip

3 hours ago

The simple way to visualize this is that all the processing can happen within a serializable transaction. If anything fails, the transaction rolls back. Marking the message as delivered is part of the same transaction.

thadt

2 hours ago

Which means that the 'processing' in my application has to have the semantics of a serializable transaction. And it might not - as in the very example given in the post, where a processing application sends an email.

Hence my complaint that making statements like "our system offers exactly once processing" can be completely correct - from the point of view of the messaging system. But it can be misleading to the casual reader in understanding how it fits into the broader solution, because the messaging system cannot guarantee that my application processes it exactly once. I'm saying the terminology doesn't feel right.

dataflow

an hour ago

If there is no such thing as exactly-once delivery because the recipient might not feel like deduplicating, then there can be no such thing as at-least-once delivery either, because the recipient might not feel like being endlessly available for delivery either. Or the mailman might not feel like (re-)attempting deliveries endlessly either. Or the sender might not feel like re-sending things endlessly either.

Is this insightful? Surprising? What's the point of the debate?

0xbadcafebee

3 hours ago

I mean, you can do exactly-once delivery, it's just a more complex state machine with a specific design requirement. Here it is using mail:

  # Receive new mail message, incoming id '1234'.
  # No other writer knows about this message.
  $ echo "hello world" > incoming/1.1234.txt
  
  # If files stay in incoming/ too long, it means they probably failed writing,
  # and are thus stale and can be removed.
  
  # When the message is done being written, the writer moves it to a new state.
  $ mv incoming/1.1234.txt new/1.1234.txt
  
  # The writer can then report to the sender that the message was accepted.
  # In mail, this is seen as optional, as the whole point is to deliver the message,
  # not to make sure you inform the sender that you will deliver the message.
  
  # Here's the exactly-once bit:
  #   If you need assurance of exactly-once delivery, split out the delivery portion 
  #   from the confirmation portion.
  #   
  #   Have one actor talk to the sender, and another actor queue the message in another 
  #   temporary queue, like 'accepted/'. Once the message is queued there, you tell the 
  #   client it is accepted, wait for successful termination of the connection, and then
  #   move the message to the 'new/' queue.
  # 
  #   This way, if there is an error between accepting the message and confirming with
  #   the sender, you can detect that and leave the message in 'accepted/' where you can
  #   later decide what to do with it. (But it won't get picked up for processing)
  
  # The message is now ready to be picked up for processing.
  # To process the message, a processor picks up the new message and moves it to a new state.
  # 
  # The processor adds an identifier '5678', so the processor can identify which file it's working on.
  # It can have extra logic to re-attempt processing on this ID if it fails.
  $ mv new/1.1234.txt process/1.1234.5678.txt
  
  # If the processor dies or takes too long, you can identify that (stale file, processor tracking its work, etc)
  # and this can be moved back to 'new/' for another attempt. Moving it back to 'new/' is still atomic so
  # there is no danger of another processor continuing to work on it.
  
  # Once the processor is done processing, it will move to the complete state.
  $ mv process/1.1234.5678.txt done/1.txt
  
  # If the processing file no longer exists, it means it was either already done,
  # or previously died and was moved back to 'new/'.
This process all depends on a database [filesystem] using synchronous atomic operations [mv]. If your distributed system can't handle that, yeah, you're gonna have a hard time.

bcrl

an hour ago

You have made a false assumption: directory operations across directories are not atomic. The typical way around that is to hardlink the file into the target directory and then sync the target directory then delete the old name, but now you no longer know if the file is new and has been processed or old when the systems is interrupted with the intermediate state on disk with the file in both directories. Which is exactly the problem messaging systems have. There is always a point in the state machines across multiple systems where a crash at the right point in time can lead to replay of a message. I worked on persistent messaging for a few years, and it is impossible to resolve this in real world systems. The best you can do is to narrow the window by reducing latency of the network and for commits to persistent storage.

mgsouth

2 hours ago

So that everyone's on the same page, perhaps you could clarify what "delivery" means to you? For example:

* The human to whom the message is addressed reads it on their screen.

* The email is inserted into that person's inbox. It's possible something will happen to the message after than point, before the addressee actually reads it, but that isn't what you're talking about.

* The email has been accepted by a server operating as an endpoint for the addressee (their company's Exchange server), _and_ acknowledgement has been received by the sending server.

* The above, but no guarentees about the sending server getting the acknowledgement.

etc.

[Edit: also, what "guarentee" means to you. 100%, 99.99999% is good enough, and so on.]