Looking Forward to Postgres 19: It's About Time

55 pointsposted 2 hours ago
by xngbuilds

17 Comments

IgorPartola

43 minutes ago

This is something that is incredibly useful. I built a system like this a while back that also adds versioning to each time period. The use case is this: let’s say you are tracking your state’s sales tax rate. You do not control this and data entry is manual so it is error prone. The rate is updated typically annually but sometimes more frequently.

Let’s say for 2026 you have it at 7.25% and you entered that into the system ahead of time (say December 2025). Today, June 12 you learn that it should have been 7.35%. It would be incorrect to say that the rate changed today: it was 7.35% since January 1. But you also don’t want to lose the fact that all your invoices have been generated using the wrong rate because if you go to recalculate them you will get a different answer.

In this case what you do is create version 2 of the rate in your database with the same time period but the correct rate. This would allow your other database objects to reference either version 1 or 2 and to even recalculate all the objects that reference version 1 to now reference version 2 such that you can get line item corrections and figure out what to do about them.

It is cumbersome to use but for the specific use case of modeling real world laws that are not available as machine-readable info it is the best option I came up with.

larsnystrom

an hour ago

Why are they storing a time period (start and end date) in the first example? Why not just store the date when the price comes into effect? That would make both overlaps and time travel impossible without using any constraints.

advisedwang

8 minutes ago

JOINs and other operations become really difficult if you can't evaluate whether a row applies or not based on that row alone.

throwaway7783

an hour ago

Works when there is always an active price. Having an explicit end date allows certain rows to be inactive automatically after validity period. Think of seasonal categories/products etc which dont exist after a specific period

promano

41 minutes ago

And even if you don't have seasonal products, you still need an end date to mark when you stop selling a product; otherwise you have to do something hacky like defining "<price> = NULL means we stopped selling the product after <valid_from>" and inserting an extra record.

I think the end date should be nullable though, but valid_to is NOT NULL in the starting example... later in the article, when showing the "new way" using date ranges, it inserts a row with an open-ended range, which is more what I'd expect.

larsnystrom

an hour ago

You could also make the price column nullable and just insert a row with price null and the date from which there should be no price.

quotemstr

an hour ago

It's a trade-off. If you store both endpoints you can continue to think of rows as order-invariant tuples. If you store only one endpoint, you have to impose a meaningful order on the rows in order for them to make sense.

larsnystrom

an hour ago

Sure, from a theoretical perspective, but in practice there’s got to be some sort of order at some point even when storing timespans.

munk-a

an hour ago

It can be super ugly to try and hand-manage date time range manipulation in a system with moving parts. It is, of course, possible, but it's a headache to try and pull it off when there is complex business logic focused on those datetimes and interactions between ranges. The period constraint is an excellent tool for trivially guaranteeing range coverage (in a case where, for instance, a customer is known to be active from a to b it helps ensure that there aren't any gaps created during the juggling of different sub-ranges) while the new DELETE FOR PERIOD OF syntax makes manipulating swathes of history (I don't care what was happening here this interval should now be X) much more trivial than before.

bhaak

an hour ago

I remember reading about this feature for Oracle in the 2000s and was always interested to use it in a production environment.

It never came to pass when we used Oracle, maybe now with Postgres I will finally have a chance at it.

cherryteastain

27 minutes ago

Very useful feature for trading systems where the exchange might tell you your order got filled but then send a trade correction or bust message

evdubs

an hour ago

> Recently, a new type of question has entered the database arena: what did this data look like last Tuesday?

This question has been answerable in Dolt for years now.

quotemstr

an hour ago

This kind of DB isn't really answering this question. There's a lot more subtlety to time-span analysis than snapshotting. In particular, aligning two series is non-obvious.

Say you have one time series with CPU-core task switches:

T=1 task=A T=3, task=B, T=5 task=A, ...

... and another of CPU frequency changes ...

T=2 freq_hz=800, T=5 freq_hz=1200, T=6 freq_hz=900

How, in SQL, do you express the question "How many CPU cycles did each task use?"? Try to do it with more complex examples. You'll tear your hair out.

Having worked on this sort of data analysis quite a lot, I'm strongly of the opinion that SQL needs syntax, not just table-valued functions, for expressing questions about timelines.

evdubs

an hour ago

The cool thing about Dolt is that you [eventually] get the features of the databases (MySQL, PostgreSQL, SQLite, MongoDB) they emulate, so you can have your PG 19 temporality features as well as branching and merging.

quotemstr

an hour ago

Yep. I'm just pointing out that the problems Dolt solves are different from the problems a timeline-aware SQL algebra solves.

jacques_chester

26 minutes ago

Exciting. Honestly I expect this will do more to advance bitemporal design than decades of jawboning has.

And really, ranges are an amazing substrate for this. I've had to do this by hand in a ... less featuresome ... SQL-speaking DB and it was clunky and performed fairly unimpressively.

quotemstr

an hour ago

Great! I've been wanting native time-based tables for ages. Years and years ago, I even wrote my own DB engine to get them!

See https://dancol.org/dctv/index.xml

I still think there's a lot of value in first-class syntax for time queries. Splitting ranges like Postgres 19 does is a good first step, but there's also a lot of power in broadcasting over these ranges, combining them in various ways, and storing multiple, independent ranges in a single table.

Ignore the bit about active development: these days, it'd take more sense to add the operators I describe to Postgres and DuckDB than to make a numpy-based engine just to host the analysis. This work predates DuckDB, and it's reassuring that DuckDB (and now Postgres) are thinking along similar lines.

I'm also glad that in the intervening years "data lake"-style analysis has become more prominent. My ideal data processing pipeline consists of sourcing from raw data and pipelining views all the way to human-meaningful outputs. Materialization, if it occurs, is just an optimization.