Event Sourcing seems like massive overkill for the stated problem. The core requirement is simple: "show account balance at any point in time" for regulatory compliance.
What specific audit requirements existed beyond point-in-time balance queries? The author dismisses alternatives as "less business-focused" but doesn't justify why temporal tables or structured audit logs couldn't satisfy the actual compliance need.
The performance issues were predictable: 2-5 seconds for balance calculations, requiring complex snapshot strategies to get down to 50-200ms. This entire complexity could have been avoided with a traditional audit trail approach.
The business context analogy to accounting ledgers is telling - but accounting systems don't replay every transaction to calculate current balances. They use running totals with audit trails, which is exactly what temporal tables provide.
Event Sourcing is elegant from a technical perspective, but here it's solving a problem that simpler, proven approaches handle just fine. The regulatory requirement was for historical balance visibility, not event replay capabilities.
I think they needed to be clearer about what the actual requirement was.
If the requirement is, "Show the balance _as it was_ at that point in time", this system doesn't fulfil it. They even say so in the article: if something is wrong, throw away the state and re-run the events. That's necessarily different behaviour. To do this requirement, you actually have to audit every enquiry and say what you thought the result was, including the various errors/miscalculations.
If the requirement is, "Show the balance as it should have been at that point in time", then it's fine.
> They use running totals with audit trails, which is exactly what temporal tables provide.
In the author's case, they separate writes and reads into different DBs. The read-optimized DB has aggregated balances stored, not events. This is not materially different, and the trade-offs regarding staleness of data will be mostly the same.
This made me do a double-take. Surely you would never do this, right? It seems to be directly counter to the idea of being able to audit changes:
“Event replay: if we want to adjust a past event, for example because it was incorrect, we can just do that and rebuild the app state.”
No, that definitely happens.
There are two kinds of adjustments: an adjustment transaction (pontual), or re-interpreting what happened (systemic). The event sourcing pattern is useful on both situations.
Sometimes you need to replay events to have a correct report because your interpretation at the time was incorrect or it needs to change for whatever reason (external).
Auditing isn't about not changing anything, but being able to trace back and explain how you arrived at the result. You can have as many "versions" as you want of the final state, though.
Yeah, that's a big NO. Events are immutable. If an event is wrong, you post an event with an amendment. Then yes, rebuild the app state.
Not speaking about their case, but I think some cases a "versioned mutable data store" with a event log that lists updates/inserts makes more sense than an "immutable event log" one like kafka.
Consider the update_order_item_quantity event in a classic event sourced systems. It's not possible to guarantee that two waiters dispatching two such events at same time when current quantity is 1 would not cause the quantity to become negative/invalid.
If the data store allowed for mutability and produced an event log it's easy:
Instead of dispatching the update_order_item_quantity you would update the order document specifying the current version. In the previous example second request would fail since it specified a stale version_id. And you can get the auditability benefits of classic event sourcing system as well because you have versions and an event log.
This kind of architecture is trivial to implement with CouchDB and easier to maintain than kafka. Pity it's impossible to find managed hosting for CouchDB outside of IBM.
Any modern DB with a WAL (write ahead log) is an immutable event system, where the events are the DB primitives (insert, update, delete...).
When you construct your own event system you are constructing a DB with your own primitives (deposit, withdraw, transfer, apply monthly interest...).
You have to figure out your transaction semantics. For example, how to reject invalid events.
> Any modern DB with a WAL (write ahead log) is an immutable event system, where the events are the DB primitives (insert, update, delete...).
Agreed, I just wish apart from WAL they also had versioning as first class and their update api required clients to pass the version they have "last seen" to prevent inconsistencies.
On most SQL databases, you can put CHECK constraints on columns so that the database rejects events. But this is controversial, as people don't like putting logic on the DB.
CosmosDB has etags on every document
DBs only work because the events are artificial and nobody cares about what's written in them.
And DBs are not really CQRS because the events are artificial and don't have business data that people are interested in keeping.
The big caveat here is GDPR and other privacy laws. In some cases you need the ability to scrub the event store completely of PII for legal reasons, even if only to null the relevant fields in the events.
Without preemptive defensive coding in your aggregates (whatever you call them) this can quickly blow up in your face.
What I have read about it is: encrypt PII with a client-depending key, do not post the key to the event system. When an erasure request comes in, delete the corresponding key. Now the data cannot be decrypted anymore for that client.
That's what I said too, and the answer was "No, just because it cannot be decrypted today does not mean it cannot be decrypted in the future. The data must be deleted"
That is some serious architecture to put in place before you can even start using event sourcing.
For finance recordkeeping requirements take precedence over privacy requirements. Audit trail data must be on WORM storage and must not be scrubbable.
The argument I've always heard for this was issues with code, not the event. If for a period of time you have a bug in your code, with event sourcing, you can fix the bug and replay all the events to correct current projections of state.
What if your correction renders subsequent events nonsensical?
Instead of modifying the original (and incorrect) event, you can add a manual correction event with the info of who did it and why, and replay the events. This is how we dealt with such corrections with event sourcing.
But you don't need to replay in that case. You just fire the correction event and the rest is taken care of.
It's poorly phrased but I'm not sure they meant "mutate the past". The keyword is "adjust" which could mean "append a correction".
But then you wouldn't need a replay. So the author really means mutate the past.
Semi related question: does anyone have experience introducing proper financial data handling (ledgers or other alternatives) in a fintech _after the fact_ ?
As in, fixing things during a scaleup phase when business has been working for a while and the original improvised systems are breaking, but you can’t stop business to repair.
Currently undergoing a similar project and would really appreciate any resource thrown my way, both purely technical and/or for interfacing with accounting people with no hybrid roles to bridge the domain gap.
They already had PostgreSQL (with its strong ACID guarantees) in place, yet the design introduces eventual consistency via MongoDB for reads—without a compelling justification. A DBA could have optimized those PostgreSQL queries to single-digit milliseconds, avoiding the added sync overhead entirely. Instead, it feels like unnecessary complexity was layered onto a proven double-entry ledger approach.
Not sure why there is so much hate on this thread. I found the post well written, insightful, and pragmatic.
Having built systems that process billions of events and displayed results, triggered notifications, etc in real time (not RTOS level, I'm talking 1 or 2 seconds of latency) you absolutely need to separate reads and writes. And if you can trust db replication to be fast and reliable, you can indeed skip distributed locks and stay on the right side of the CAP theorem.
Event sourcing is how every write ahead log works. Which powers basically every db.
Is the concern on this thread that they preoptimized? I thought they walked through their decision making process pretty clearly.
I suspect there is a bit of knee-jerk because so often this pattern is misapplied. I actually quite like the example in the article although I'm basically allergic to CQRS in general.
I think your point about write-ahead logging etc is a good one. If you need a decent transactional system, you're probably using a system with some kind of WAL. If you're event sourcing and putting events into something which already implements a WAL, you need to give your head a wobble - why is the same thing being implemented twice? There can be great reasons, but I've seen (a few times) people using a perfectly fine transactional DB of some kind to implement an event store, effectively throwing away all the guarantees of the system underneath.
For sure. Event logs in a transactional dbs are weird. I was surprised that they weren't using something like kafka for this.
> their MVP was not auditable and thus not compliant with financial regulations and also not scalable (high usage and fault tolerance).
There it is. My automatic response to any questions about event sourcing is “if you have to ask, you don’t need it.” This is one of those situations where the explosion in complexity somewhat makes sense: when you need legally enforced auditability.
Event sourcing is a really cool architecture that makes theoretical sense but the yak shaving needed to implement it is at least an order of magnitude more than any other design.
I'd call it a trade-off. I've done event sourcing for systems that didn't really need it, and the ability to fix things retroactively by just replaying the log through the corrected pipeline of processors was marvelous. It lends itself to functional programming with effects instead of banging on global state wherever and whenever, and that pays dividends. But there's no free lunch, and pipelines could get complex by needing to track "epochs" and keeping bug-compatible behavior for old messages that would be patched downstream. Since the system just built persisted state at the end, I could checkpoint it at a new state and retire the older epochs. Doubtful that would that pass a rigorous audit, but if that's not one of your requirements, you really can do event sourcing halfway and still reap a lot of the benefits.
Yes. And that's usually anything that touches money or is limited by it.
But you don't need to decide to use it. The people describing the requirements will tell you, insist on it, and threaten you if you don't do it.
Yea I think that's a fair take.
If you peer underneath the covers of a lot of financial stuff, and it's effectively double entry accounting. Which is a giant ledger (or ledgers) of events
This! I've recently was hired to fix a company that tried to build their own niche ERP over the last 10 years and they had totally drowned in the confluent kool-aid. There is very very few projects where event sourcing is the best solution.
The message on his homepage doesn’t make sense right - it should say IT industry?
> I am a Software Architect, Ex-Founder & AI enthusiast with over 8 years in the IT.
They appear to be ESL and if in Germany they use "IT" the same way we do in France, I can understand why they skipped the "industry" part.
> INSERT INTO events (account_id, type, amount, timestamp)
VALUES (123, 'deposit', 100, NOW())
How would it work if they had to support intra system transfers? So one user balance should be withdrawn and another should get a deposit? That's not possible to do atomically with event sourcing right?
Any problem with event sourcing can be solved with more events (said semi-sarcastically).
In this case it’s XTransactionStarted, XTransactionDepositConfirmed, and XTransactionCreditConfirmed or something along those lines. External interactions tend to follow that kind of pattern where it tracks success/failure in the domain events.
The command side of CQRS tends to be the services that guarantee ordered events either via the backing database or with master-slave topology.
Do you mean inter-system? Intra-system would mean on the same db, so a simple transaction would do.
For inter-system consistency, you’d probably need a reconciliation mechanism or some kind of 2 phase commit
I mean a bank account sending money to another bank account. That would be 2 events(one withdraw and one deposit)?
But if I'm downstream consumer consuming the event log and computing the state from that, if for some reasons I receives only first event the state computed would be invalid and not represent the real state of accounts?
Nobody relies on atomic transactions to model money transfers in the real world. You can read up on "clearing" and "settlement" processes to get an idea.
It could be one event with two database rows inserted, CQRS doesn’t have to map 1:1 to db entries. But in general, these kinds of systems are more complex than 1 or 2 writes, so relying purely on transactions isn’t very common
Create an event type for transfer with amount, source account and destination account.
Right, but then your downstream consumers have to be global and can't be partitioned based on account_id or some other partition key?
Consumers will have to read all events that concern them, yes. Maybe this type of event must end up in multiple partitions and will require some architectural shenanigans.
The answer to Event Sourcing and CQRS is no.
The answer is rarely so clearly cut. This is a natural extension of the bank system of having both account balance and account transactions. How would you feel if the bank only knows your account balance but not transactions? It has its uses. It's just a bit overused in places where it's unnecessary.
Fair. I was being flippant.
Traumatic flashbacks to 2017. Glad we moved on from this nonsense. Still dealing with the wreckage, though.
Event sourcing is a terrible idea that may be useful for some incredibly niche scenario.
Data sync is one scenario where it can be useful.
What did you find to be the most problematic?
It kinda sounds like all you needed was a ledger, otherwise didn’t get why would you use CQRS.
For me the most important ideas is an immutable ledger and isolating the primary OLTP database from the secondary support services. Reporting, user management, and notifications etc. can be satisfied with a stale copy of the transactional data.
I feel for these guys. The software downturn is steepening. A huge glut of talent with unsustainable comps. And years of toil doing stuff that, in the absence of being paid money for it, nobody would do and no one would care.