hackernews client

Event Sourcing, CQRS and Micro Services: Real FinTech Example

85 pointsposted 4 months ago

(lukasniessen.medium.com)

69 Comments

buster

4 months ago

Event Sourcing seems like massive overkill for the stated problem. The core requirement is simple: "show account balance at any point in time" for regulatory compliance.

What specific audit requirements existed beyond point-in-time balance queries? The author dismisses alternatives as "less business-focused" but doesn't justify why temporal tables or structured audit logs couldn't satisfy the actual compliance need.

The performance issues were predictable: 2-5 seconds for balance calculations, requiring complex snapshot strategies to get down to 50-200ms. This entire complexity could have been avoided with a traditional audit trail approach.

The business context analogy to accounting ledgers is telling - but accounting systems don't replay every transaction to calculate current balances. They use running totals with audit trails, which is exactly what temporal tables provide.

Event Sourcing is elegant from a technical perspective, but here it's solving a problem that simpler, proven approaches handle just fine. The regulatory requirement was for historical balance visibility, not event replay capabilities.

ealexhudson

4 months ago

I think they needed to be clearer about what the actual requirement was.

If the requirement is, "Show the balance _as it was_ at that point in time", this system doesn't fulfil it. They even say so in the article: if something is wrong, throw away the state and re-run the events. That's necessarily different behaviour. To do this requirement, you actually have to audit every enquiry and say what you thought the result was, including the various errors/miscalculations.

If the requirement is, "Show the balance as it should have been at that point in time", then it's fine.

ricardobeat

4 months ago

> They use running totals with audit trails, which is exactly what temporal tables provide.

In the author's case, they separate writes and reads into different DBs. The read-optimized DB has aggregated balances stored, not events. This is not materially different, and the trade-offs regarding staleness of data will be mostly the same.

alecco

4 months ago

Not only overkill, but error-prone. I had to suffer to work on a massive financial system based in serialized Python objects. And expensive as hell.

nateroling

4 months ago

This made me do a double-take. Surely you would never do this, right? It seems to be directly counter to the idea of being able to audit changes:

“Event replay: if we want to adjust a past event, for example because it was incorrect, we can just do that and rebuild the app state.”

manoDev

4 months ago

No, that definitely happens.

There are two kinds of adjustments: an adjustment transaction (pontual), or re-interpreting what happened (systemic). The event sourcing pattern is useful on both situations.

Sometimes you need to replay events to have a correct report because your interpretation at the time was incorrect or it needs to change for whatever reason (external).

Auditing isn't about not changing anything, but being able to trace back and explain how you arrived at the result. You can have as many "versions" as you want of the final state, though.

refset

4 months ago

Aka 'bitemporal' - https://tidyfirst.substack.com/p/eventual-business-consisten...

user

4 months ago

[deleted]

r1cka

4 months ago

The argument I've always heard for this was issues with code, not the event. If for a period of time you have a bug in your code, with event sourcing, you can fix the bug and replay all the events to correct current projections of state.

mirekrusin

4 months ago

What if your correction renders subsequent events nonsensical?

mrkeen

4 months ago

There is a very real chance of this happening, and two choices.

One - bake whatever happens into your system permanently, like 99% of all apps, and disallow corrections.

Two - keep the events around so that you can check and re-check your corrections before you check in new code or data.

zsoltkacsandi

4 months ago

Instead of modifying the original (and incorrect) event, you can add a manual correction event with the info of who did it and why, and replay the events. This is how we dealt with such corrections with event sourcing.

kabes

4 months ago

But you don't need to replay in that case. You just fire the correction event and the rest is taken care of.

mrkeen

4 months ago

It would be outside of the normal exceptional cases, yes.

Like buggy data that crashes the system.

If you have the old events there, you can "measure twice, cut once", in the sense that you can keep re-running your old events and compare them to the new events under unit-test conditions, and be absolutely sure that your history re-writing won't break anything else.

It's not for just doing a refund or something.

javcasas

4 months ago

Yeah, that's a big NO. Events are immutable. If an event is wrong, you post an event with an amendment. Then yes, rebuild the app state.

saxenaabhi

4 months ago

Not speaking about their case, but I think some cases a "versioned mutable data store" with a event log that lists updates/inserts makes more sense than an "immutable event log" one like kafka.

Consider the update_order_item_quantity event in a classic event sourced systems. It's not possible to guarantee that two waiters dispatching two such events at same time when current quantity is 1 would not cause the quantity to become negative/invalid.

If the data store allowed for mutability and produced an event log it's easy:

Instead of dispatching the update_order_item_quantity you would update the order document specifying the current version. In the previous example second request would fail since it specified a stale version_id. And you can get the auditability benefits of classic event sourcing system as well because you have versions and an event log.

This kind of architecture is trivial to implement with CouchDB and easier to maintain than kafka. Pity it's impossible to find managed hosting for CouchDB outside of IBM.

javcasas

4 months ago

Any modern DB with a WAL (write ahead log) is an immutable event system, where the events are the DB primitives (insert, update, delete...).

When you construct your own event system you are constructing a DB with your own primitives (deposit, withdraw, transfer, apply monthly interest...).

You have to figure out your transaction semantics. For example, how to reject invalid events.

saxenaabhi

4 months ago

> Any modern DB with a WAL (write ahead log) is an immutable event system, where the events are the DB primitives (insert, update, delete...).

Agreed, I just wish apart from WAL they also had versioning as first class and their update api required clients to pass the version they have "last seen" to prevent inconsistencies.

javcasas

4 months ago

On most SQL databases, you can put CHECK constraints on columns so that the database rejects events. But this is controversial, as people don't like putting logic on the DB.

serbrech

4 months ago

CosmosDB has etags on every document

marcosdumay

4 months ago

DBs only work because the events are artificial and nobody cares about what's written in them.

And DBs are not really CQRS because the events are artificial and don't have business data that people are interested in keeping.

throwup238

4 months ago

The big caveat here is GDPR and other privacy laws. In some cases you need the ability to scrub the event store completely of PII for legal reasons, even if only to null the relevant fields in the events.

Without preemptive defensive coding in your aggregates (whatever you call them) this can quickly blow up in your face.

javcasas

4 months ago

What I have read about it is: encrypt PII with a client-depending key, do not post the key to the event system. When an erasure request comes in, delete the corresponding key. Now the data cannot be decrypted anymore for that client.

Netcob

4 months ago

That's what I said too, and the answer was "No, just because it cannot be decrypted today does not mean it cannot be decrypted in the future. The data must be deleted"

mattmanser

4 months ago

That is some serious architecture to put in place before you can even start using event sourcing.

ndriscoll

4 months ago

For finance recordkeeping requirements take precedence over privacy requirements. Audit trail data must be on WORM storage and must not be scrubbable.

speed_spread

4 months ago

It's poorly phrased but I'm not sure they meant "mutate the past". The keyword is "adjust" which could mean "append a correction".

kabes

4 months ago

But then you wouldn't need a replay. So the author really means mutate the past.

ff4

4 months ago

They already had PostgreSQL (with its strong ACID guarantees) in place, yet the design introduces eventual consistency via MongoDB for reads—without a compelling justification. A DBA could have optimized those PostgreSQL queries to single-digit milliseconds, avoiding the added sync overhead entirely. Instead, it feels like unnecessary complexity was layered onto a proven double-entry ledger approach.

mrkeen

4 months ago

* Most single-db deployments give up on ACID for performance reasons (see READ_COMMITTED)

* Even if you have ACID, it's not sufficient for distributed systems. Its guarantees will keep one node consistent with itself. No transactionality between the customers app and your db.

Maybe you're one bank with all the customers, but as soon as you want to talk to other banks, are you really going to share the one ACID instance? Who's the DBA?

Currently the state of fintech is 90% of devs being in denial that they're in a distributed system.

> proven double-entry ledger approach.

Yes. In that language, the ledger is the list of events. If today's devs were around 300 years ago they'd be calling for 'balances' instead of 'ledgers' because they're simpler.

antonvs

4 months ago

> proven double-entry ledger approach.

A double-entry ledger is a combination of a process and a view that was mistaken for a data model centuries ago, and that mistake became embedded.

Fundamentally, you’re dealing with a sequence of events. The double-entry ledger is a particular result of processing those events - a view. There are many other useful views.

This is well understood in academic accounting. See e.g. https://en.wikipedia.org/wiki/Resources,_Events,_Agents for an alternative system that doesn’t make the same mistake.

coryvirok

4 months ago

Not sure why there is so much hate on this thread. I found the post well written, insightful, and pragmatic.

Having built systems that process billions of events and displayed results, triggered notifications, etc in real time (not RTOS level, I'm talking 1 or 2 seconds of latency) you absolutely need to separate reads and writes. And if you can trust db replication to be fast and reliable, you can indeed skip distributed locks and stay on the right side of the CAP theorem.

Event sourcing is how every write ahead log works. Which powers basically every db.

Is the concern on this thread that they preoptimized? I thought they walked through their decision making process pretty clearly.

ealexhudson

4 months ago

I suspect there is a bit of knee-jerk because so often this pattern is misapplied. I actually quite like the example in the article although I'm basically allergic to CQRS in general.

I think your point about write-ahead logging etc is a good one. If you need a decent transactional system, you're probably using a system with some kind of WAL. If you're event sourcing and putting events into something which already implements a WAL, you need to give your head a wobble - why is the same thing being implemented twice? There can be great reasons, but I've seen (a few times) people using a perfectly fine transactional DB of some kind to implement an event store, effectively throwing away all the guarantees of the system underneath.

coryvirok

4 months ago

For sure. Event logs in a transactional dbs are weird. I was surprised that they weren't using something like kafka for this.

mrkeen

4 months ago

> Not sure why there is so much hate on this thread.

1) "Kafka is resume-driven-development" is a meme.

2) Devs are in denial about being in a distributed system, and think that single-threaded thinking (in proximity to a DB that calls itself ACID) leads to correct results in a distributed setting.

kace91

4 months ago

Semi related question: does anyone have experience introducing proper financial data handling (ledgers or other alternatives) in a fintech _after the fact_ ?

As in, fixing things during a scaleup phase when business has been working for a while and the original improvised systems are breaking, but you can’t stop business to repair.

Currently undergoing a similar project and would really appreciate any resource thrown my way, both purely technical and/or for interfacing with accounting people with no hybrid roles to bridge the domain gap.

serguzest

4 months ago

We are in a similar situation, and here's what I did:

1) Learned the basic concepts of double-entry bookkeeping. 2) Told ChatGPT about my business domain and requested an example Chart of Accounts (CoA) tailored to it.

Feel free to reach out to me, I’d love to exchange ideas.

saxenaabhi

4 months ago

Isn't it simple? Create events from past relational data and add them to the log?

kace91

4 months ago

Well, yes and no. We’ve got engineers on one side who store mutable relational data (orders, purchases, subscriptions, what have you) and on the other side accountants thinking in terms of accounts (we have such and such in account 705xx).

Mapping the two domains is the main issue, and how much the new system should reflect accounting movement of money vs the current engineering model or a completely different in between

mrngm

4 months ago

I'm not sure why orders/purchases/subscriptions would be seen as mutable. At some point in time, customer X decided to exchange money amount M for a subscription Y. This means the company is obliged to deliver subscription Y, because the customer engaged in a contract with the company, moved amount M to the company, and expecting to (regularly) receive Y. At renewal, the customer, again, needs to pay amount M in order to continue receiving Y.

The only mutable thing here would be the end date of said subscription, at which point the company no longer requires amount M from the customer, and the customer no longer receives Y.

Then on the accounting side, every time subscription Y renews, said customer in account 750xx needs to have its balance lowered by amount M, only to get increased again when they pay.

The only way to bridge this gap is to have the engineers know what accounting needs, and let them build the right infrastructure. In this [2018] [video] https://www.youtube.com/watch?v=KH0l8QqhzYk I recently watched, the speaker Rahul Pilani explains how Netflix organised their billing systems, and how all parts fit together. I'm not saying you should copy their infrastructure, but it doesn't hurt to look at a higher level how the business operates and what their accounting requirements are.

kace91

4 months ago

>I'm not sure why orders/purchases/subscriptions would be seen as mutable. At some point in time, customer X decided to exchange money amount M for a subscription Y.

Think for example about orders where a costumer bought three items and later cancelled one, the order value mutates as it is updated and at most we have a copy of the previous order state before price was updated (for some cases not even that).

If you think that’s not a good modeling for financial processes, well so do I, it’s the legacy we’re supposed to manage; moving out from that type of non ideal system to something more solid is what I’m researching.

mrngm

4 months ago

I suppose that's a flexible way of looking at orders, but it sounds more like a shopping basket. What if the customer ordered three items, you started shipping those, then the customer cancels one item. That sounds like the wrong order (hah) of things.

mrkeen

4 months ago

Your engineers are wrong. You're not going to be able to bridge that gap.

Here's a scenario: you've partnered with a credit card provider. They charge some money each month per card, and you pass that onto your customers who use the cards.

One day the partner sends you a 'card-cancelled' message. Have you built your system to accept that message unconditionally? Or did your engineers put in defensive code ("fail fast", assertions, status checks, db constraints) so that your system can reject that message?

Because that's how we've built our system at work. Our engineers are proud of something called "data integrity" that our almost-ACID (READ_COMMITTED) DB supposedly has. We won't move to events (listening to what actually happened) because we'd be giving up on pretending that our DB guarantees somehow correspond to what's going on in the real world.

throwup238

4 months ago

> their MVP was not auditable and thus not compliant with financial regulations and also not scalable (high usage and fault tolerance).

There it is. My automatic response to any questions about event sourcing is “if you have to ask, you don’t need it.” This is one of those situations where the explosion in complexity somewhat makes sense: when you need legally enforced auditability.

Event sourcing is a really cool architecture that makes theoretical sense but the yak shaving needed to implement it is at least an order of magnitude more than any other design.

chuckadams

4 months ago

I'd call it a trade-off. I've done event sourcing for systems that didn't really need it, and the ability to fix things retroactively by just replaying the log through the corrected pipeline of processors was marvelous. It lends itself to functional programming with effects instead of banging on global state wherever and whenever, and that pays dividends. But there's no free lunch, and pipelines could get complex by needing to track "epochs" and keeping bug-compatible behavior for old messages that would be patched downstream. Since the system just built persisted state at the end, I could checkpoint it at a new state and retire the older epochs. Doubtful that would that pass a rigorous audit, but if that's not one of your requirements, you really can do event sourcing halfway and still reap a lot of the benefits.

marcosdumay

4 months ago

Yes. And that's usually anything that touches money or is limited by it.

But you don't need to decide to use it. The people describing the requirements will tell you, insist on it, and threaten you if you don't do it.

dmoy

4 months ago

Yea I think that's a fair take.

If you peer underneath the covers of a lot of financial stuff, and it's effectively double entry accounting. Which is a giant ledger (or ledgers) of events

kabes

4 months ago

This! I've recently was hired to fix a company that tried to build their own niche ERP over the last 10 years and they had totally drowned in the confluent kool-aid. There is very very few projects where event sourcing is the best solution.

geoffbp

4 months ago

The message on his homepage doesn’t make sense right - it should say IT industry?

> I am a Software Architect, Ex-Founder & AI enthusiast with over 8 years in the IT.

wiether

4 months ago

They appear to be ESL and if in Germany they use "IT" the same way we do in France, I can understand why they skipped the "industry" part.

risyachka

4 months ago

It kinda sounds like all you needed was a ledger, otherwise didn’t get why would you use CQRS.

rawgabbit

4 months ago

For me the most important ideas is an immutable ledger and isolating the primary OLTP database from the secondary support services. Reporting, user management, and notifications etc. can be satisfied with a stale copy of the transactional data.

saxenaabhi

4 months ago

> INSERT INTO events (account_id, type, amount, timestamp) VALUES (123, 'deposit', 100, NOW())

How would it work if they had to support intra system transfers? So one user balance should be withdrawn and another should get a deposit? That's not possible to do atomically with event sourcing right?

throwup238

4 months ago

Any problem with event sourcing can be solved with more events (said semi-sarcastically).

In this case it’s XTransactionStarted, XTransactionDepositConfirmed, and XTransactionCreditConfirmed or something along those lines. External interactions tend to follow that kind of pattern where it tracks success/failure in the domain events.

The command side of CQRS tends to be the services that guarantee ordered events either via the backing database or with master-slave topology.

mrkeen

4 months ago

You're getting hung up on the "double" entries. That's mostly for reducing accidental and deliberate (fraud) calculation errors if you have humans crunching the numbers. The two entries can be in the same row if you like.

As for the inter-system transfers, It's not possible in the general case. You might not have a network cable between the two systems. And if you do, you run into CAP, two-generals, etc.

ACID is out of the question because the two parties don't share a DB. And if they did, some tech lead would turn down the acidity level for performance reasons.

The best you can do is intelligently interpret the information that has been given to you. That's all ledgers are: a list of things that some system knows. 'UPDATE CustomerBalance...' (CRUD) is not a fact, but 'Customer paid...' is a fact, and that's all event-sourcing is.

herval

4 months ago

Do you mean inter-system? Intra-system would mean on the same db, so a simple transaction would do.

For inter-system consistency, you’d probably need a reconciliation mechanism or some kind of 2 phase commit

saxenaabhi

4 months ago

I mean a bank account sending money to another bank account. That would be 2 events(one withdraw and one deposit)?

But if I'm downstream consumer consuming the event log and computing the state from that, if for some reasons I receives only first event the state computed would be invalid and not represent the real state of accounts?

manoDev

4 months ago

Nobody relies on atomic transactions to model money transfers in the real world. You can read up on "clearing" and "settlement" processes to get an idea.

herval

4 months ago

It could be one event with two database rows inserted, CQRS doesn’t have to map 1:1 to db entries. But in general, these kinds of systems are more complex than 1 or 2 writes, so relying purely on transactions isn’t very common

javcasas

4 months ago

Create an event type for transfer with amount, source account and destination account.

saxenaabhi

4 months ago

Right, but then your downstream consumers have to be global and can't be partitioned based on account_id or some other partition key?

javcasas

4 months ago

Consumers will have to read all events that concern them, yes. Maybe this type of event must end up in multiple partitions and will require some architectural shenanigans.

doctorpangloss

4 months ago

I feel for these guys. The software downturn is steepening. A huge glut of talent with unsustainable comps. And years of toil doing stuff that, in the absence of being paid money for it, nobody would do and no one would care.

aaronrobinson

4 months ago

The answer to Event Sourcing and CQRS is no.

kccqzy

4 months ago

The answer is rarely so clearly cut. This is a natural extension of the bank system of having both account balance and account transactions. How would you feel if the bank only knows your account balance but not transactions? It has its uses. It's just a bit overused in places where it's unnecessary.

aaronrobinson

4 months ago

Fair. I was being flippant.

sfjailbird

4 months ago

Traumatic flashbacks to 2017. Glad we moved on from this nonsense. Still dealing with the wreckage, though.

Event sourcing is a terrible idea that may be useful for some incredibly niche scenario.

zigzag312

4 months ago

Data sync is one scenario where it can be useful.

rawgabbit

4 months ago

What did you find to be the most problematic?