erulabs
6 days ago
What a great post. Humble and honest and simple and focused on an issue most developers think is so simple (“why not just vibe code SQL?”, “whatever, just scale up the RDS instance”).
Compliments aside, where this article stops is where things get exciting. Postgres shines here, as does Vitess, Cassandra, ScyllaDB, even MongoDB has materialized views now. Vitess and Scylla are so good, it’s a shame they’re not more popular among smaller startups!
What I haven’t seen yet is a really good library for managing materialized views.
malthejorgensen
5 days ago
Don’t you have to manually “refresh” Postgres materialized views, essentially making it an easier to implement cache (the Redis example in the blog post) rather than the type always-auto-updating materialized view the blog post author is actually touting?
striking
5 days ago
The real bummer is not that you have to manually refresh them, it's that refreshing them involves refreshing the entire view. If you could pick and choose what gets refreshed, you might just sometimes have a stale cache here and there while parts of it get updated. But refreshing a materialized view that is basically just not small or potentially slightly interesting runs the risk of blowing your write instance up.
For this reason I would strongly advise, in the spirit of https://wiki.postgresql.org/wiki/Don't_Do_This, that you Don't Do Materialized Views.
Sure, Differential/Timely Dataflow exist and they're very interesting; I have not gotten to build a database system with them and the systems that provide them in a usable format to end users (e.g. Materialize) are too non-boring for me to want to deploy in a production app.
lbreakjai
5 days ago
Out of the box, you're right, but there are extensions that do just that:
https://github.com/sraoss/pg_ivm
It's however not available on RDS, so I've never had the chance to try it myself.
nine_k
4 days ago
I think it's impossible to do an incremental update in an arbitrary case. Imagine an m-view based on a query that selects top 100 largest purchases during last 30 days on an e-commerce site. Or, worse, a query that selects the largest subtree of followers on a social network site.
Only certain kinds of conditions, such as a rolling window over a timestamp field, seem amenable to efficient incremental updates. What am I missing?
392
3 days ago
Karrot_Kream
4 days ago
That's probably beyond the scale level appropriate for a materialized view. For that I'd use something like DBT.
ropable
5 days ago
Yes, you need to refresh the materialized views periodically. Which mean that, just like any other caching mechanism, you're solving one problem (query performance) but introducing another (cache invalidation). I've personally used Postgres MVs to great success, but there are tradeoffs.
recroad
5 days ago
So the author is wrong that they’re automatic kept in sync?
JamesSwift
4 days ago
No, they conflate the two concepts together, though they acknowledge this is a special case here:
> There are a few startups these days peddling a newfangled technology called “incremental view maintenance” or “differential dataflow”
I think they should be a little more explicit about the differences though, because it can be very misleading for those who arent aware of the distinction.
erulabs
5 days ago
Oh interesting, I didn’t know that - I’ve been so far in MySQL/Vitess land for so long, I haven’t used Postgres in several years. That’s disappointing!
bdcravens
5 days ago
In lieu of good MV support, you can always just run a scheduled query to store results in a persisted table that you identify as a materialized view. For example, when doing this in SQL Server, I give the table name a "cache" prefix.
enedil
5 days ago
Materialized views in ScyllaDB are (were?) known to be a buggy implementation. In particular, they often depended on the cluster being healthy at the time of propagating the changes.