i_have_to_speak
4 days ago
This really has nothing to do with CTEs.
The SQL standard does not specify how many times a subquery must or can be run. In this case, Postgres is free to run the uncorrelated subquery many times. Unfortunately, the subquery result is not idempotent.
The real solution is not to rely on the optimizer for the correctness of your query. The query should be logically correct, irrespective of the query plan, right? The correct solution would be to ensure that the subquery runs only once, using a materialized cte or temp table. Do not base your "solution" on the discovery that for this particular version of Postgres, the planner runs this type of subquery once for the '=' operator and multiple times for 'IN'.
Diggsey
3 days ago
Based on my understanding, your comment is the most accurate in this thread.
However, I think it's telling that almost every single comment here has a different understanding of how PostgreSQL works, and nobody can link to any documentation that conclusively addresses this case. I think that indicates that this is a failing if not of implementation, then of documentation.
For example: another comment says that `=` forces PostgreSQL to evaluate the subquery exactly once. I couldn't find any justification for that statement. Is it guaranteed? I don't know.
Another comment says "if only the subquery was ordered then the result would be consistent", but AFAIK, this is not true in the presence of "SKIP LOCKED".
I think the problem stems from the fact that the SQL standard is written from an academic point of view: the ordering of execution of subqueries is not defined because it is irrelevant within the relational algebra, so optimizations which change the execution order are just that: optimizations, with no observable side effects.
Real database systems do not actually have the nice algebraic properties of the relational algebra... The same subquery may give different results over multiple executions, etc. Given this, it's important that the database system give sufficient and sufficiently clear guarantees such that they can still be used correctly, and so that the optimizations implemented by the system can be justified.