Kafka Is Costing You Years of Engineering Time

3 pointsposted 15 hours ago
by galeaspablo

8 Comments

deniscoady

9 hours ago

Disclaimer: I work for Redpanda and formerly Cloudera.

I've worked with Apache Kafka at massive (50+ Gbps) scales. It's a proper nightmare. When it breaks – it breaks fast and violently.

But the problem is that Apache Kafka (and more modern Kafka-compatible alternatives like Redpanda < obligatory mention) solve a need for a durable streaming log that other systems cannot offer. The access patterns, requirements, use cases, ecosystem, etc, are different from those of traditional databases and require a proper streaming solution.

Streaming from a traditional database is kinda a solved problem. Why not just use a managed Kafka provider with a change data capture (CDC) capability if you don't want to deal with Kafka yourself? At least then you get to use all of the tools in the vibrant Kafka ecosystem.

galeaspablo

9 hours ago

Hey Denis, I haven’t run into you before. But hi, this is Luis, Ambar’s founder. Nice to meet a fellow data streamer.

When I started writing Ambar I thought streaming from a database was a solved problem. But in operational use cases where ordering and delivery guarantees are assumptions developers need, it isn’t a solved problem. The first version of Ambar was just Debezium under the hood, but guess what, it failed and failed hard. Like you described Kafka. Hence we built Ambar :)

FYI we’ve considered using Redpanda under the hood instead of Kafka, but didn’t dare make the jump yet.

deniscoady

8 hours ago

Ah okay, so is Ambar more of a way to finally replace Debezium then?

taylodl

14 hours ago

Conspicuously missing from this article is any mention of an alternative. Kafka, bad. Alternative, what alternative?

erik_seaberg

13 hours ago

https://docs.ambar.cloud/ says the author's org is polling PostgreSQL (or MySQL) tables and producing records for JSON/HTTP consumers. As for me, I think a Kafka broker quorum is hard to beat for a fast and durable distributed ringbuffer.

galeaspablo

9 hours ago

We use Kafka under the hood. We stream instead of poll. We used to work at the Kafka team in AWS :)

Our thesis is that a big blocker is the PhD the whole team needs in Kafka. For example, if you want to set up an API similar to Ambar with tools like Kafka connect, the connectors have failure modes that will bite you once a year and bite you hard. Eg losing your changelog in MySQL, and having to start from scratch or risk losing ordering guarantees.

galeaspablo

9 hours ago

Disclosure: this post is from my colleague.

A: You have managed vendors that simplify Kafka, such as MSK, Confluent, and Redpanda. And other software like Pulsar.

But we believe the solution to the time sink exposed in the article lies one level of abstraction higher. In the case of data analytics, there are tools/companies such as Decodable/Streamkap/Airbyte that simplify your life as an engineer.

In the case of operational streaming, we (Ambar) are making a bet on the tried and tested outbox/inbox pattern as a replacement for producing directly into Kafka et al, and thus managing all of its quirks and complexities. That’s the alternative we offer, but of course there are other folks in this space.

Admittedly, we didn’t dive deep into alternatives in the post. But we did explain at the end that we’ll cover it in another post. I’ll add a link at the bottom later pointing to some alternatives. :)

Thanks for reading!

jauntywundrkind

11 hours ago

RisingWave, RedPanda, Apache Pulsar, Druid, others might do in various cases