(Ozgun from Ubicloud)
I agree with the blog post's technical contents, but I feel we came across too strong in the title. For Ubicloud as a managed Postgres provider, we use strict memory overcommit. Our experience with operating Postgres at scale taught us that it's better to enable this than going with the defaults.
However, I can see many other scenarios, where using strict memory overcommit would have unanticipated side-effects. That's why Linux doesn't go with strict memory commit as its default.
(Furkan, submitter) Hmm, I haven’t thought about that. I updated the title to better reflect Ubicloud Postgres' position.
This has bitten me multiple times. The problem I have is that at work we deploy the application (written in Go) and PostgreSQL on the same machine. The backend app allocates a lot of virtual memory, and initially we had overcommit to 0 (heuristic). This caused crashes on big queries in PostgreSQL and we set it to 2. The whole system became a bit unstable because the backend would still allocate a lot of virtual memory and at some point we ran into errors when allocating.
For now, we have overcommit_ratio set to a value that is stable from experience, but there really seems to be no silver lining. Go is very happy to allocate a lot of virtual memory, but so are most managed languages. The best solution would probably be to host the backend and the database on separate servers.
Yes, it would. Basically every serious database tries to allocate everything and more - back in the day we'd just allocate VMs on the machine even with the overhead because knowing it cannot leave its constraints and would work within them was worth the cost.
There are many reasons to use a dedicated host (or VM) for a DB server, but if only the accessible memory needs to be limited a container is the simpler, more efficient tool. Said that, I would expect to be able to configure how much memory a DB process is allowed to allocate. I remember distinctly that PostgreSQL allows such. But of course both can be configured simultaneously, a belts&suspenders approach if you will.
Whether failed transactions are actually so much more desirable than a OOM-killed process isn't quite obvious, but it might be easier to troubleshoot.
They allude to this in the article but I would emphasize caution when using mode 2 especially if one has already adjusted overcommit ratios as one can prevent forks. Test this in a QA/Perf environment first, also testing the restart of all applications. Load test and do full QA tests before deploying to Production and even then when deploying to production I would just dynamically change the setting via app deployment scripts until confidence is high instead of putting it in the sysctl config files.
I've gone through this exercise in the past on much older kernels which they cover as well and just me personally I ran into less issues by leaving overcommit to 0 and just dropping the overcommit ratio to 0 and setting the oom_score_adj for programs as high as 1000 if I wanted vmscan to leave them alone and of course using the Redhat formulas for setting vm.min_free_kbytes, vm.admin_reserve_kbytes, vm.user_reserve_kbytes. And of course be vigilant in disallowing app owners from using every last bit of memory.
I read this article about 3 weeks ago when this bit me. Really great write-up, some tricky details.
I think this is also a good lesson on why it's best to isolate mission-critical services like databases on their own compute nodes.
I have disabled overcommit both on Windows and on Linux. I hate having random programs being killed.
Unfortunately, many programs commit 2x memory than they actually use. Often I see ~32GB committed and ~16GB resident.
Does this result in programs more frequently erroring/crashing because they can't allocate? I don't know how well many of the programs I frequently use on my desktop (Firefox, GNOME desktop, JVM + IntelliJ, Slack, etc.) handle allocation failures. I'm not sure they would do much better than crash, but I know the default OOM killer settings work well for me. About once a year a real runaway process (usually a throwaway program I'm working on) gets OOM-killed, and that's fine with me.