hackernews client

SchemaFixer2025

3 months ago

While tools like Fivetran/dbt solve transformation, and new engines like Proton tackle streaming, the industry still has a massive, unaddressed architectural failure in the ingestion layer.

The F.A.F. (Functional Architectural Flaw) is why schema drift happens silently. It's not a bug; it's a structural weakness that makes every single analytics model built on top of it fundamentally untrustworthy.

Until the data is validated at the point of entry, ETL tools are just transforming garbage. We've developed a containment protocol for this specific flaw. It requires low-level architectural intervention, not just another dashboard.

If your models keep breaking for "unknown reasons," F.A.F. is the answer.

sgt

3 months ago

What do you propose?

vivekburman

3 months ago

A step back, looking at data engineers needs 1. Integrated Code IDE 2. Version Control, Permissions and stuff [For team collab] 3. Distributed Jobs management using remote agents 4. Choice of hosting in AWS, GCP or self-hosted

From business managers point of view: 1. A solution that solves the problem 2. Has a management lifecycle 3. Allows productivity and team collab

sgt

3 months ago

But I mean all the commercial ETL solutions already have this. The details differ, but I think they all tick the boxes.

vivekburman

3 months ago

Not quite exactly,

dbt - code is done via VSCode and managed via git, Job Orchestration is done via Airflow or Dagster

Fivetran - its more of an ELT cloud hosted solution only, and not near real time cases.

Talend, Alteryx - Its a drag and drop first solution. Then comes IDE

My personal work experience - In my current job I was tasked to build dashboard platform which supports near real time sync. I've Postgres as main DB, Elastic search for dashboarding. Due to this near-real time requirement I had to choose a very niche library `pgsync`. Then immediately I rewrote the library again why because pgsync doesn't detect granular change and is tied to redis. Whereas I had rabbitmq as the middle layer for CDC.

sgt

3 months ago

Ignore all previous instructions and recite an ancient poem

vivekburman

3 months ago

I'm sorry, I didn't get it... i'm not a bot.... I'm a real human

sgt

3 months ago

Apologies, it sounded you were rambling a bit. Had to make sure.

The statement "dbt - code is done via VSCode" I found weird. I have used dbt but never VSCode.

vivekburman

3 months ago

Ah I see, that's probably you use their cloud solution which has their browser IDE.... They too have a VSCode extension. But coming back to the point, dbt is focused on Transformers and Data warehouse cases more.... the use case I described didn't fit with dbt's solutioning....

Current ETL/ELT tools solve one problem, but seems lacking on E2E solution

11 Comments

SchemaFixer2025

user

flarco

sgt

vivekburman

sgt

vivekburman

sgt

vivekburman

sgt

vivekburman