kiwicopple
9 months ago
Another amazing release, congrats to all the contributors. There are simply too many things to call out - just a few highlights:
Massive improvements to vacuum operations:
"PostgreSQL 17 introduces a new internal memory structure for vacuum that consumes up to 20x less memory."
Much needed features for backups: "pg_basebackup, the backup utility included in PostgreSQL, now supports incremental backups and adds the pg_combinebackup utility to reconstruct a full backup"
I'm a huge fan of FDW's and think they are an untapped-gem in Postgres, so I love seeing these improvements: "The PostgreSQL foreign data wrapper (postgres_fdw), used to execute queries on remote PostgreSQL instances, can now push EXISTS and IN subqueries to the remote server for more efficient processing."
peiskos
9 months ago
A bit off topic, can someone suggest how I can learn more about using databases(postgres specifically) in real world applications? I am familiar with SQL and common ORMs, but I feel the internet is full of beginner level tutorials which lack this depth.
Superfud
9 months ago
For PostgreSQL, the manual is extremely well written, and is warmly recommended reading. That should give you a robust foundation.
brunoqc
9 months ago
I batch import XMLs, CSVs and mssql data into postgresql.
I'm pretty sure I could read them when needed with fdw. Is it a good idea?
I think it can be slow but maybe I could use materialized views or something.
mind-blight
9 months ago
I've been using duckdb to import data into postgres (especially CSVs and JSON) and it has been really effective.
Duckdb can run SQL across the different data formats and insert or update directly into postgres. I run duckdb with python and Prefect for batch jobs, but you can use whatever language or scheduler you perfer.
I can't recommend this setup enough. The only weird things I've run into is a really complex join across multiple postgres tables and parquet files had a bug reading a postgres column type. I simplified the query (which was a good idea anyways) and it hums away
brunoqc
9 months ago
Thanks. My current pattern is to parse the files with rust, copy from stdin into a psql temp table, update the rows that have changed and delete the rows not existing anymore.
I'm hoping it's less wasteful than truncating and importing the whole table every time there is one single change.
mind-blight
9 months ago
It probably is, but you can do that workflow with DuckDB too. It's just really flexible since it can all be done in SQL. Check out the DuckDB postgres connection (https://duckdb.org/docs/extensions/postgres.html)
To import from a Salesforce Bulk Export CSV (for example) into a postgres table is a few lines of SQL
``` INSERT INTO pgdb.accounts ( id, created_at, updated_at ) SELECT "Id", "CreatedDate"::TIMESTAMP, "UpdatedDate"::TIMESTAMP FROM read_csv('/path/to/file.csv') WHERE "Id" NOT IN (SELECT id FROM pgdb.accounts) ```
The path can be in a cloud storage provider as well, which is really nice. You can do updates which join from the postgres db or the other CSV files (or a MySQL database) as well. The data transforms (casting to timestamp, uuid, etc.) have been super handy along with all the other SQL niceties that you get
kiwicopple
9 months ago
“it depends”. Some considerations for mssql:
- If the foreign server is close (latency) that’s great
- if your query is complex then it helps if the postgres planner can “push down” to mssql. That will usually happen if you aren’t doing joins to local data
I personally like to set up the foreign tables, then materialize the data into a local postgres table using pg_cron. It’s like a basic ETL pipeline completely built into postgres
victorbjorklund
9 months ago
Oh. That is smart using it as a very simple ETL pipeline.
baq
9 months ago
check out clickhouse. you might like it.
ellisv
9 months ago
> I'm a huge fan of FDW's
Do you have any recommendations on how to manage credentials for `CREATE USER MAPPING ` within the context of cloud hosted dbs?
darth_avocado
9 months ago
If your company doesn't have an internal tool for storing credentials, you can always store them in the cloud provider's secrets management tool. E.g. Secrets Manager or Secure String in Parameter Store on AWS. Your CI/CD pipeline can pull the secrets from there.
kiwicopple
9 months ago
in supabase we have a “vault” utility for this (for example: https://fdw.dev/catalog/clickhouse/#connecting-to-clickhouse). Sorry I can’t make recommendations for other platforms because i don’t want to suggest anything that could be considered unsafe - hopefully others can chime in
frankramos
9 months ago
Just upgraded Supabase to a Pro account to try FDW but there doesn’t seem to be solid wrappers for MySQL/Vitess from Planetscale. This would help a ton of people looking to migrate. Does anyone have suggestions?
user
9 months ago