speedgoose
9 hours ago
Looking at the htop screenshot, I notice the lack of swap. You may want to enable earlyoom, so your whole server doesn't go down when a service goes bananas. The Linux Kernel OOM killer is often a bit too late to trigger.
You can also enable zram to compress ram, so you can over-provision like the pros'. A lot of long-running software leaks memory that compresses pretty well.
Here is how I do it on my Hetzner bare-metal servers using Ansible: https://gist.github.com/fungiboletus/794a265cc186e79cd5eb2fe... It also works on VMs.
TheDong
3 hours ago
Even better than earlyoom is systemd-oomd[0] or oomd[1].
systemd-oomd and oomd use the kernel's PSI[2] information which makes them more efficient and responsive, while earlyoom is just polling.
earlyoom keeps getting suggested, even though we have PSI now, just because people are used to using it and recommending it from back before the kernel had cgroups v2.
[0]: https://www.freedesktop.org/software/systemd/man/latest/syst...
CGamesPlay
an hour ago
"earlyoom is just polling"?
> systemd-oomd periodically polls PSI statistics for the system and those cgroups to decide when to take action.
It's unclear if the docs for systemd-oomd are incorrect or misleading; I do see from the kernel.org link that the recommended usage pattern is to use the `poll` system call, which in this context would mean "not polling", if I understand correctly.
100721
an hour ago
Unrelated to the topic, it seems awfully unintuitive to name a function ‘poll’ if the result is ‘not polling.’ I’m guessing there’s some history and maybe backwards-compatible rewrites?
Bender
7 hours ago
Another option would be to have more memory that required over-engineer and to adjust the oom score per app, adding early kill weight to non critical apps and negative weight to important apps. oom_score_adj is already set to -1000 by OpenSSH for example.
NSDJUST=$(pgrep -x nsd); echo -en '-378' > /proc/"${NSDJUST}"/oom_score_adj
Another useful thing to do is effecively disable over-commit on all staging and production servers (0 ratio instead of 2 memory to fully disable as these do different things, memory 0 still uses formula) vm.overcommit_memory = 0
vm.overcommit_ratio = 0
Also use a formula to set min_free and reserved memory using a formula from Redhat that I do not have handy based on installed memory. min_free can vary from 512KB to 16GB depending on installed memory. vm.admin_reserve_kbytes = 262144
vm.user_reserve_kbytes = 262144
vm.min_free_kbytes = 1024000
At least that worked for me in about 50,000 physical servers for over a decade that were not permitted to have swap and installed memory varied from 144GB to 4TB of RAM. OOM would only occur when the people configuring and pushing code would massively over-commit and not account for memory required by the kernel. Not following best practices defined by Java and thats a much longer story.Another option is to limit memory per application in cgroups but that requires more explaining than I am putting in an HN comment.
Another useful thing is to never OOM kill in the first place on servers that are only doing things in memory and need not commit anything to disk. So don't do this on a disked database. This is for ephemeral nodes that should self heal. Wait 60 seconds so drac/ilo can capture crash message and then earth shattering kaboom...
# cattle vs kittens, mooooo...
kernel.panic = 60
vm.panic_on_oom = 2
For a funny side note, those options can also be used as a holy hand grenade to intentionally unsafely reboot NFS diskless farms when failing over to entirely different NFS server clusters. setting panic to 15 mins, triggering OOM panic by setting min_free to 16TB at the command line via Ansible not in sysctl.conf, swapping clusters, arp storm and reconverge.liqilin1567
2 hours ago
Thanks for sharing I think these are very useful suggestions.
RobRivera
19 minutes ago
To learn tricks like this what resource do you recommend I read? System administrators handbook? (Still on my TOREAD queue)
levkk
8 hours ago
Yeah, no way. As soon as you hit swap, _most_ apps are going to have a bad, bad time. This is well known, so much so that all EC2 instances in AWS disable it by default. Sure, they want to sell you more RAM, but it's also just true that swap doesn't work for today's expectations.
Maybe back in the 90s, it was okay to wait 2-3 seconds for a button click, but today we just assume the thing is dead and reboot.
bayindirh
8 hours ago
This is a wrong belief because a) SSDs make swap almost invisible, so you can have that escape ramp if something goes wrong b) SWAP space is not solely an escape ramp which RAM overflows into anymore.
In the age of microservices and cattle servers, reboot/reinstall might be cheap, but in the long run it is not. A long running server, albeit being cattle, is always a better solution because esp. with some excess RAM, the server "warms up" with all hot data cached and will be a low latency unit in your fleet, given you pay the required attention to your software development and service configuration.
Secondly, Kernel swaps out unused pages to SWAP, relieving pressure from RAM. So, SWAP is often used even if you fill 1% of your RAM. This allows for more hot data to be cached, allowing better resource utilization and performance in the long run.
So, eff it, we ball is never a good system administration strategy. Even if everything is ephemeral and can be rebooted in three seconds.
Sure, some things like Kubernetes forces "no SWAP, period" policies because it kills pods when pressure exceeds some value, but for more traditional setups, it's still valuable.
db48x
2 hours ago
This is not really true of most SSDs. When Linux is really thrashing the swap it’ll be essentially unusable unless the disk is _really_ fast. Fast enough SSDs are available though. Note that when it’s really thrashing the swap the workload is 100% random 4KB reads and writes in equal quantities. Many SSDs have high read speeds and high write speeds but have much worse performance under mixed workloads.
I once used an Intel Optane drive as swap for a job that needed hundreds of gigabytes of ram (in a computer that maxed out at 64 gigs). The latency was so low that even while the task was running the machine was almost perfectly usable; in fact I could almost watch videos without dropping frames at the same time.
kryptiskt
6 hours ago
My work Ubuntu laptop has 40GB of RAM and and a very fast Nvme SSD, if it gets under memory pressure it slows to a crawl and is for all practical purposes frozen while swapping wildly for 15-20 minutes.
So no, my experience with swap isn't that it's invisible with SSD.
interroboink
4 hours ago
I don't know your exact situation, but be sure you're not mixing up "thrashing" with "using swap". Obviously, thrashing implies swap usage, but not the other way around.
db48x
2 hours ago
If it’s frozen, or if the mouse suddenly takes seconds to respond to every movement, then it’s not just using some swap. It’s thrashing for sure.
webstrand
3 hours ago
I've experimented with no-swap and find the same thing happens. I think the issue is that linux can also evict executable pages (since it can just reload them from disk).
I've had good experience with linux's multi-generation LRU feature, specifically the /sys/kernel/mm/lru_gen/min_ttl_ms feature that triggers OOM-killer when the "working set of the last N ms doesn't fit in memory".
omgwtfbyobbq
an hour ago
It's seldom invisible, but in my experience how visible it is depends on the size/modularity/performance/etc of what's being swapped and the underlying hardware.
On my 8gb M1 Mac, I can have a ton of tabs open and it'll swap with minimal slowdown. On the other hand, running a 4k external display and a small (4gb) llm is at best horrible and will sometimes require a hard reset.
I've seen similar with different combinations of software/hardware.
baq
an hour ago
Linux being absolute dogshit if it’s under any sort of memory pressure is the reason, not swap or no swap. Modern systems would be much better off tweaking dirty bytes/ratios, but fundamentally the kernel needs to be dragged into the XXI century sometime.
eru
6 hours ago
How long is long running? You should be getting the warm caches after at most a few hours.
> Secondly, Kernel swaps out unused pages to SWAP, relieving pressure from RAM. So, SWAP is often used even if you fill 1% of your RAM. This allows for more hot data to be cached, allowing better resource utilization and performance in the long run.
Yes, and you can observe that even in your desktop at home (if you are running something like Linux).
> So, eff it, we ball is never a good system administration strategy. Even if everything is ephemeral and can be rebooted in three seconds.
I wouldn't be so quick. Google ran their servers without swap for ages. (I don't know if they still do it.) They decided that taking the slight inefficiency in memory usage, because they have to keep the 'leaked' pages around in actual RAM, is worth it to get predictability in performance.
For what it's worth, I add generous swap to all my personal machines, mostly so that the kernel can offload cold / leaked pages and keep more disk content cached in RAM. (As a secondary reason: I also like to have a generous amount of /tmp space that's backed by swap, if necessary.)
With swap files, instead of swap partitions, it's fairly easy to shrink and grow your swap space, depending on what your needs for free space on your disk are.
gchamonlive
7 hours ago
> SSDs make swap almost invisible
It doesn't. SSDs came a long way but so did memory dies and buses, and with that the way programs work also changed as more and more they are able to fit their stacks and heaps on memory more often than not.
I have had a problem with shellcheck that for some reason eats up all my ram when I open I believe .zshrc and trust me, it's not invisible. The system crawls to a halt.
bayindirh
7 hours ago
It depends on the SSD, I may say.
If we're talking about SATA SSDs which top at 600MBps, then yes, an aggressive application can make itself known. However, if you have a modern NVMe, esp. a 4x4 one like Samsung 9x0 series or if you're using a Mac, I bet you'll notice the problem much later, if ever. Remember the SSD trashing problem on M1 Macs? People never noticed that system used SWAP that heavily and trashed the SSD on board.
Then, if you're using a server with a couple of SAS or NVMe SSDs, you'll not notice the problem again, esp. if these are backed by RAID (even md counts).
gchamonlive
7 hours ago
Now that you say, I have a new Lenovo yoga with those SoC ram with crazy parallel channel config (16gb spread across 8 dies of 2gb). It's noticeably faster than my Acer nitro with dual channel 16gb ddr5. I'll check that, but I'd say it's not what the average home user (and even server I'd risk saying) would have.
xienze
7 hours ago
> it's not invisible. The system crawls to a halt.
I’m gonna guess you’re not old enough to remember computers with memory measured in MB and IDE hard disks? Swapping was absolutely brutal back then. I agree with the other poster, swap hitting an SSD is a barely noticeable in comparison.
justsomehnguy
7 hours ago
What do you prefer:
( ) a 1% chance the system would crawl to a halt but would work
( ) a 1% change the kernel would die and nothing would work
gchamonlive
7 hours ago
I think I've not made myself as clear as I could. Swap is important for efficient system performance way before you hit OOM on main memory. It's not, however, going to save system responsiveness in case of OOM. This is what I mean.
eru
6 hours ago
The trade-off depends on how your system is set up.
Eg Google used to (and perhaps still does?) run their servers without swap, because they had built fault tolerance in their fleet anyway, so were happier to deal with the occasional crash than with the occasional slowdown.
For your desktop at home, you'd probably rather deal with a slowdown that gives you a chance to close a few programs, then just crashing your system. After all, if you are standing physically in front of your computer, you can always just manually hit the reset button, if the slowdown is too agonising.
macintux
5 hours ago
That’s very common to distributed systems: much better to have a failed node than a slow node. Slow nodes are often contagious.
andai
7 hours ago
Can someone explain this to me? Doesn't swap just delay the fundamental issue? Or is there a qualitative difference?
eru
6 hours ago
Swap delays the 'fundamental issue', if you have a leak that keeps growing.
If your problem doesn't keep growing, and you just have more data that programs want to keep in memory than you have RAM, but the actual working set of what's accessed frequently still fits in RAM, then swap perfectly solves this.
Think lots of programs open in the background, or lots of open tabs in your browser, but you only ever rapidly switch between at most a handful at a time. Or you are starting a memory hungry game and you don't want to be bothered with closing all the existing memory hungry programs that idle in the background while you play.
danielheath
4 hours ago
I run a chat server on a small instance; when someone uploads a large image to the chat, the 'thumbnail the image' process would cause the OOM-killer to take out random other processes.
Adding a couple of gb of swap means the image resizing is _slow_, but completes without causing issues.
charcircuit
2 hours ago
The problem is freezing the system for hours or more to delay the issue is not worth it. I'd rather a program get killed immediately than having my system locked up for hours before a program gets killed.
justsomehnguy
6 hours ago
https://news.ycombinator.com/item?id=45007821
> Doesn't swap just delay the fundamental issue?
The fundamental issue here is what the linux fanboys literally think what killing a working process and most of the time the process[0] is a good solution for not solving the fundamental problem of memory allocation in the Linux kernel.
Availability of swap allows you to avoid malloc failure in a rare case your processes request more memory than physically (or 'physically', heh) present in the system. But in the mind of so called linux administrators even if a one byte of the swap would be used then the system would immediately crawl to a stop and never would recover itself. Why it always should be the worst and the most idiotic scenario instead of a sane 'needed 100MB more, got it - while some shit in the memory which wasn't accessed since the boot was swapped out - did the things it needed to do and freed that 100MB' is never explained by them.
[0] imagine a dedicated machine for *SQL server - which process would have the most memory usage on that system?
ssl-3
5 hours ago
Indeed.
Also: When those processes that haven't been active since boot (and which may never be active again) are swapped out, more system RAM can become available for disk caching to help performance of things that are actively being used.
And that's... that's actually putting RAM to good use, instead of letting it sit idle. That's good.
(As many are always quick to point out: Swap can't fix a perpetual memory leak. But I don't think I've ever seen anyone claim that it could.)
qotgalaxy
4 hours ago
What if I care more about the performance of things that aren't being used right now than the things that are? I'm sick of switching to my DAW and having to listen to my drive thrash when I try to play a (say) sampler I had loaded.
ssl-3
an hour ago
Just set swappiness to [say] 5, 2, 1, or even 0, and move on with your project with a system that is more reluctant to go into swap.
And maybe plan on getting more RAM.
(It's your system. You're allowed to tune it to fit your usage.)
db48x
an hour ago
Sounds like you just need more memory.
hhh
3 hours ago
Kubernetes supports swap now.
I still don’t use it though.
adastra22
7 hours ago
What pressure? If your ram is underutilized, what pressure are you talking about?
If the slowest drive on the machine is the SSD, how does caching to swap help?
bayindirh
7 hours ago
A long running Linux system uses 100% of its RAM. Every byte unused for applications will be used as a disk cache, given you read more data than your total RAM amount.
This cache is evictable, but it'll be there eventually.
Linux used to don't touch unused pages in the RAM in the older days if your RAM was not under pressure, but now it swaps out pages unused for a long time. This allows more cache space in RAM.
> how does caching to swap help?
I think I failed to convey what I tried to say. Let me retry:
Kernel doesn't cache to SSD. It swaps out unused (not accessed) but unevictable pages to SWAP, assuming that these pages will stay stale for a very long time, allowing more RAM to be used as cache.
When I look to my desktop system, in 12 days, Kernel moved 2592MB of my RAM to SWAP despite having ~20GB of free space. ~15GB of this free space is used as disk cache.
So, to have 2.5GB more disk cache, Kernel moved 2592 MB of non-accessed pages to SWAP.
adastra22
6 hours ago
Yes, and if I am writing an API service, for example, I don’t want to suddenly add latency because I hit pages that have been swapped out. I want guarantees about my API call latency variance, at least when the server isn’t overloaded.
I DON’T WANT THE KERNEL PRIORITIZING CACHE OVER NRU PAGES.
The easiest way to do this is to disable swap.
baq
an hour ago
If you’re writing services in anything higher level than C you’re leaking something somewhere that you probably have no idea exists and the runtime won’t ever touch again.
eru
6 hours ago
You better not write your API in Python, or any language/library that uses amortised algorithms in the standard (like Rust and C++ do). And let's not mention garbage collection.
gnosek
2 hours ago
Or you can set the vm.swappiness sysctl to 0.
sethherr
6 hours ago
I’m asking because I genuinely don’t know - what are “pages” here?
adastra22
6 hours ago
That’s a fair question. A page is the smallest allocatable unit of RAM, from the OS/kernel perspective. The size is set by the CPU, traditionally 4kB, but these days 8kB-4MB are also common.
When you call malloc(), it requests a big chunk of memory from the OS, in units of pages. It then uses an allocator to divide it up into smaller, variable length chunks to form each malloc() request.
You may have heard of “heap” memory vs “stack” memory. The stack of course is the execution/call stack, and heap is called that because the “heap allocator” is the algorithm originally used for keeping track of unused chunks of these pages.
(This is beginner CS stuff so sorry if it came off as patronizing—I assume you’re either not a coder or self-taught, which is fine.)
wallstop
7 hours ago
Edit:
wallstop@fridge:~$ free -m
total used free shared buff/cache available
Mem: 15838 9627 3939 26 2637 6210
Swap: 4095 0 4095
wallstop@fridge:~$ uptime
00:43:54 up 37 days, 23:24, 1 user, load average: 0.00, 0.00, 0.00
bayindirh
7 hours ago
The command you want to use is "free -m".
This is from another system I have close:
total used free shared buff/cache available
Mem: 31881 1423 1042 10 29884 30457
Swap: 976 2 974
2MB of SWAP used, 1423 MB RAM used, 29GB cache, 1042 MB Free. Total RAM 32 GB.eru
6 hours ago
If you are interested in human consumption, there's "free --human" which decided on useful units by itself. The "--human" switch is also available for "du --human" or "df --human" or "ls -l --human". It's often abbreviated as "-h", but not always, since that also often stands for "--help".
wallstop
5 hours ago
Thanks! My other problem was formatting. Just wanted to share that I see 0 swap usage and nowhere near 100% memory usage as a counterpoint.
adgjlsfhk1
7 hours ago
The OS uses almost all the ram in your system (it just doesn't tell you because then users complain that their OS is too ram heavy). The primary thing it uses it for is caching as much of your storage system as possible. (e.g. all of the filesystem metadata and most of the files anyone on the system has touched recently). As such, if you have RAM that hasn't been touched recently, the OS can page it out and make the rest of the system faster.
adastra22
6 hours ago
At the cost of tanking performance for the less frequently used code path. Sometimes it is more important to optimize in ways that minimize worst case performance rather than a marginal improvement to typical work loads. This is often the case for distributed systems, e.g. SaaS backends.
vasco
7 hours ago
In EC2 using any kind of swapping is just wrong, the comment you replied to already made all the points that can be made though.
bayindirh
7 hours ago
From my understanding, the comment I'm replying to uses EC2 example to portray that swapping is wrong in any and all circumstances, and I just replied with my experience with my system administrator hat.
I'm not an AWS guy. I can see and touch the servers I manage, and in my experience, SWAP works, and works well.
matt-p
7 hours ago
Just for context EC2 typically uses network storage that, for obvious reasons, often has fairly rubbish latency and performance characteristics. Swap works fine if you have local storage, though obviously it burns through your SSD/NVME drive faster and can other side effects on it's performance (usually not particularly noticeable).
commandersaki
7 hours ago
This is a wrong belief
This is not about belief, but lived experience. Setting up swap to me is a choice between a unresponsive system (with swap) or a responsive system with a few oom kills or downed system.
bayindirh
7 hours ago
> This is not about belief, but lived experience.
I mean, I manage some servers, and this is my experience.
> Setting up swap to me is a choice between a unresponsive system (with swap) or a responsive system with a few oom kills or downed system.
Sorry, but are you sure that you budgeted your system requirements correctly? A Linux system shall neither fill SWAP nor trigger OOM regularly.
eru
6 hours ago
Swap also works really well for desktop workloads. (I guess that's why Apple uses it so heavily on their Macbooks etc.)
With a good amount of swap, you don't have to worry about closing programs. As long as your 'working set' stays smaller than your RAM, your computer stays fast and responsive, regardless of what's open and idling in the background.
commandersaki
4 hours ago
It doesn’t happen often, and I have a multi user system with unpredictable workloads. It’s also not about swap filling up, but giving the pretense the system is operable in a memory exhausted state which means oom killer doesn’t run, but the system is unresponsive and never recovers.
Without swap oom killer runs and things become responsive.
Dylan16807
3 hours ago
"as soon as you hit swap" is a bad way of looking at things. Looking around at some servers I run, most of them have .5-2GB of swap used despite a bunch of gigabytes of free memory. That data is never or almost never going to be touched, and keeping it in memory would be a waste. On a smaller server that can be a significant waste.
Swap is good to have. The value is limited but real.
Also not having swap doesn't prevent thrashing, it just means that as memory gets completely full you start dropping and re-reading executable code over and over. The solution is the same in both cases, kill programs before performance falls off a cliff. But swap gives you more room before you reach the cliff.
KaiserPro
8 hours ago
Yeahna, thats just memory exhaustion.
Swap helps you use ram more efficiently, as you put the hot stuff in swap and let the rest fester on disk.
Sure if you overwhelm it, then you're gonna have a bad day, but thats the same without swap.
Seriously, swap is good, don't believe the noise.
gchamonlive
7 hours ago
It's good, and Aws shouldn't disable it by default, but it won't save the system from OOM.
matt-p
7 hours ago
I bet there's a big "burns through our SSDs faster" spreadsheet column or similar that caused it to be disabled.
gchamonlive
6 hours ago
Maybe. Or maybe it's an arbitrary decision.
Many won't enable swap. For some swap wouldn't help anyways, but others it could help soak up spikes. The latter in some cases will upgrade to a larger instance without even evaluating if swap could help, generating AWS more money.
Either way it's far-fetched to derive intention from the fact.
adastra22
7 hours ago
I don’t understand. If you provision the system with enough RAM, then you can for every page in RAM, hot or not.
akvadrako
7 hours ago
Only if you have more RAM than disk space, which is wasteful for many applications.
adastra22
6 hours ago
Running out of memory kills performance. It is better to kill the VM and restart it so that any active VM remains low latency.
That is my interpretation of what people are saying upthread, at least. To which posters such as yourself are saying “you still need swap.” Why?
eru
6 hours ago
RAM costs money, disk space costs less money.
It's a bit wasteful to provision your computers so that all the cold data lives in expensive RAM.
fluoridation
6 hours ago
>It's a bit wasteful to provision your computers so that all the cold data lives in expensive RAM.
But that's a job applications are already doing. They put data that's being actively worked on in RAM they leave all the rest in storage. Why would you need swap once you can already fit the entire working set in RAM?
vlovich123
5 hours ago
Because then you have more active working memory as infrequently used pages are moved to compressed swap and can be used for more page cache or just normal resident memory.
Swap ram by itself would be stupid but no one doing this isn’t also turning on compression.
eru
5 hours ago
Sure, some applications are written to manually do a job that your kernel can already do for you.
In that case, and if you are only running these applications, the need for swap is much less.
fluoridation
5 hours ago
You mean to tell me most applications you've ever used read the entire file system, loading every file into memory, and rely on the OS to move the unused stuff to swap?
eru
an hour ago
No? What makes you think so?
fluoridation
an hour ago
Then what do you mean, some applications organize hot and cold data in RAM and storage respectively? Just about every application does it.
adastra22
6 hours ago
When building distributed systems, service degradation means you’ll have to provision more systems. Cheaper to provision fewer systems with more RAM.
eru
5 hours ago
It depends on what you are doing, and how your system behaves.
If you size your RAM and swap right, you get no service degradation, but still get away with using less RAM.
But when I was at Google (about a decade ago), they followed exactly the philosophy you were outlining and disabled swap.
gchamonlive
8 hours ago
How programs use ram also changed from the 90s. Back then they were written targeting machines that they knew would have a hard time fitting all their data in memory, so hitting swap wouldn't hurt perceived performance too drastically since many operations were already optimized to balance data load between memory and disk.
Nowadays when a program hits swap it's not going to fallback to a different memory usage profile that prioritises disk access. It's going to use swap as if it were actual ram, so you get to see the program choking the entire system.
winrid
8 hours ago
Exactly. Nowadays, most web services are run in a GC'ed runtime. That VM will walk pointers all over the place and reach into swap all the time.
cogman10
7 hours ago
Depends entirely on the runtime.
If your GC is a moving collector, then absolutely this is something to watch out for.
There are, however, a number of runtimes that will leave memory in place. They are effectively just calling `malloc` for the objects and `free` when the GC algorithm detects an object is dead.
Go, the CLR, Ruby, Python, Swift, and I think node(?) all fit in this category. The JVM has a moving collector.
masklinn
21 minutes ago
Python’s not a mover but the cycle breaker will walk through every object in the VM.
Also since the refcounts are inline, adding a reference to a cold object will update that object. IIRC Swift has the latter issue as well (unless the heap object’s RC was moved to the side table).
manwe150
3 hours ago
MemBalancer is a relatively new analysis paper that argues having swap allows maximum performance by allowing small excesses, that avoids needing to over-provision ram instead. The kind of gc does not matter since data spends very little time in that state and on the flip side, most of the time the application has twice has access to twice as much memory to use
eru
6 hours ago
A moving GC should be better at this, because it can compact your memory.
cogman10
5 hours ago
A moving collector has to move to somewhere and, generally by it's nature, it's constantly moving data all across the heap. That's what makes it end up touching a lot more memory while also requiring more memory. On minor collections I'll move memory between 2 different locations and on major collections it'll end up moving the entire old gen.
It's that "touching" of all the pages controlled by the GC that ultimately wrecks swap performance. But also the fact that moving collector like to hold onto memory as downsizing is pretty hard to do efficiently.
Non-moving collectors are generally ultimately using C allocators which are fairly good at avoiding fragmentation. Not perfect and not as fast as a moving collector, but also fast enough for most use cases.
Java's G1 collector would be the worst example of this. It's constantly moving blocks of memory all over the place.
eru
an hour ago
> It's that "touching" of all the pages controlled by the GC that ultimately wrecks swap performance. But also the fact that moving collector like to hold onto memory as downsizing is pretty hard to do efficiently.
The memory that's now not in use, but still held onto, can be swapped out.
zozbot234
7 hours ago
Every garbage collector has to constantly sift through the entire reference graph of the running program to figure out what objects have become garbage. Generational GC's can trace through the oldest generations less often, but that's about it.
Tracing garbage collectors solve a single problem really really well - managing a complex, possibly cyclical reference graph, which is in fact inherent to some problems where GC is thus irreplaceable - and are just about terrible wrt. any other system-level or performance-related factor of evaluation.
cogman10
6 hours ago
> Every garbage collector has to constantly sift through the entire reference graph of the running program to figure out what objects have become garbage.
There's a lot of "it depends" here.
For example, an RC garbage collector (Like swift and python?) doesn't ever trace through the graph.
The reason I brought up moving collectors is by their nature, they take up a lot more heap space, at least 2x what they need. The advantage of the non-moving collectors is they are much more prompt at returning memory to the OS. The JVM in particular has issues here because it has pretty chunky objects.
Dylan16807
3 hours ago
> The reason I brought up moving collectors is by their nature, they take up a lot more heap space, at least 2x what they need.
If the implementer cares about memory use it won't. There are ways to compact objects that are a lot less memory-intensive than copying the whole graph from A to B and then deleting A.
eru
6 hours ago
Modern garbage collectors have come a long way.
Even not so modern ones: have you heard of generational garbage collection?
But even in eg Python they introduced 'immortal objects' which the GC knows not to bother with.
zoeysmithe
7 hours ago
This is really interesting and I've never really heard about this. What is going on with the kernel team then? Are they just going to keep swap as-is for backwards compatibility then everyone else just disables it? Or if this advice just for high performance clusters?
kccqzy
7 hours ago
No. I use swap for my home machines. Most people should leave swap enabled. In fact I recommend the setup outlined in the kernel docs for tmpfs: https://docs.kernel.org/filesystems/tmpfs.html which is to have a big swap and use tmpfs for /tmp and /var/tmp.
gchamonlive
7 hours ago
As someone else said, swap is important not only in the case the system exhaust main memory, but it's used to efficiently use system memory before that (caching, offload page blocks to swap that aren't frequently used etc...)
slyall
4 hours ago
My 2cents is that in a lot of cases swap is being used for unimportant stuff leave more RAM for your app. Do a "ps aux" and look at all the RAM used by weird stuff. Good news is those things will be swapped out.
Example on my personal VPS
$ free -m
total used free shared buff/cache available
Mem: 3923 1225 328 217 2369 2185
Swap: 1535 1335 200
LaurensBER
8 hours ago
The beauty of ZRAM is that on any modern-ish CPU it's surprisingly fast. We're talking 2-3 ms instead of 2-3 seconds ;)
I regularly use it on my Snapdragon 870 tablet (not exactly a top of the line CPU) to prevent OOM crashes (it's running an ancient kernel and the Android OOM killer basically crashes the whole thing) when running a load of tabs in Brave and a Linux environment (through Tmux) at the same time.
ZRAM won't save you if you do actually need to store and actively use more than the physical memory but if 60% of your physical memory is not actively used (think background tabs or servers that are running but not taking requests) it absolutely does wonders!
On most (web) app servers I happily leave it enabled to handle temporary spikes, memory leaks or applications that load a whole bunch of resources that they never ever use.
I'm also running it on my Kubernetes cluster. It allows me to set reasonable strict memory limits while still having the certainty that Pods can handle (short) spikes above my limit.
akerl_
6 hours ago
Is it possible you misread the comment you're replying to? They aren't recommending adding swap, they're recommending adjusting the memory tunables to make the OOM killer a bit more aggressive so that it starts killing things before the whole server goes to hell.
the8472
6 hours ago
YMMV. Garbage-collected/pointer-chasing languages suffer more from swapping because they touch more of the heap all the time. AWS suffers more from swap because EBS is ridiculously slow and even their instance-attached NVMe is capped compared physical NVMe sticks.
henryfjordan
8 hours ago
Does HDD vs SSD matter at all these days? I can think of certain caching use-cases where swapping to an SSD might make sense, if the access patterns were "bursty" to certain keys in the cache
winrid
8 hours ago
It's still extremely slow and can cause very unpredictable performance. I have swap setup with swappiness=1 on some boxes, but I wouldn't generally recommend it.
elwebmaster
4 hours ago
what an ignorant and clueless comment. Guess what? Todays disks are NVMe drives which are orders of magnitude faster than the 5400rpm HDDs of the 90s. Today's swap is 90s RAM.
zymhan
7 hours ago
Where on earth did you get this misconception?
commandersaki
7 hours ago
Lived experience? With swap system stays up but is unresponsive, without it is either responsive due to oom kill or completely down.
GuinansEyebrows
7 hours ago
in either case, what do you do? if you can't reach a box and it's otherwise safe to do so, you just reboot it. so is it just a matter of which situation occurs more often?
commandersaki
3 hours ago
The thing is you can survive memory exhaustion if the oom killer can do its job, which it can't many times when there's swap. I guess the topmost response to this thread talks about an earlyoom tool that might alleivate this, but I've never used it, and I don't find swap helpful anyway so there's no need for me to go down this route.
01HNNWZ0MV43FF
8 hours ago
It's not just 3 seconds for a button click, every time I've run out of RAM on a Linux system, everything locks up and it thrashes. It feels like 100x slowdown. I've had better experiences when my CPU was underclocked to 20% speed. I enable swap and install earlyoom. Let processes die, as long as I can move the mouse and operate a terminal.
zozbot234
7 hours ago
> It feels like 100x slowdown.
Yup, this is a thing. It happens because file-backed program text and read-only data eventually get evicted from RAM (to make room for process memory) so every access to code and/or data beyond the current 4K page can potentially involve a swap-in from disk. It would be nice if we had ways of setting up the system so that pages of code or data that are truly critical for real-time responsiveness (including parts of the UI) could not get evicted from RAM at all (except perhaps to make room for the OOM reaper itself to do its job) - but this is quite hard to do in practice.
shrubble
7 hours ago
It's always a good idea to have a tiny amount of swap just in case. Like 1GB.
akerl_
6 hours ago
Why?
CGamesPlay
an hour ago
Because some portion of the RAM used by your daemons isn't actually being accessed, and using that RAM to store file cache is actually a better use than storing idle memory. The old rule about "as much swap as main memory" definitely doesn't hold any more, but a few GB to store unneeded wired memory to dedicate more room to file cache is still useful.
As a small example from a default Ubuntu installation, "unattended-upgrades" is holding 22MB of RSS, and will not impact system performance at all if it spends next week swapped out. Bigger examples can be found in monolithic services where you don't use some of the features but still have to wire them into RAM. You can page those inactive sections of the individual process into swap, and never notice.
angch
2 hours ago
Like a highway brake failure ramp, you have room for handling failures gentler. So services don't just get outright killed. If you monitor your swap usage, any usage of swap gives you early warning that your services require more memory already.
Gives you some time to upgrade, or tune services before it goes ka-boom.
akerl_
2 hours ago
If your memory usage is creeping up, the way you'll find out that you need more memory is by monitoring memory usage via the same mechanisms you'd hypothetically use to monitor your swap usage.
If your memory usage spikes suddenly, a nominal amount of swap isn't stopping anything from getting killed; you're at best buying yourself a few seconds, so unless you spend your time just staring at the server, it'll be dead anyways.
statictype
7 hours ago
Thanks for this. We resorted to setting ram thresholds in systemd.
Is earlyoom a better solution than that to prevent an erratic process from making an instance unresposnsive?
cmurf
an hour ago
Some workloads may do better with zswap. Cache is compressed, and pages evicted to disk based swap on an LRU basis.
The case of swap thrashing sounds like a misbehaving program, which can maybe be tamed by oomd.
System responsiveness though needs a complete resource control regime in place, that preserves minimum resources for certain critical processes. This is done with cgroupsv2. By establishing minimum resources, the kernel will limit resources for other processes. Sure, they will suffer. That’s the idea.
nurettin
an hour ago
Of course swap should be enabled. But oom killer has always allowed access to an otherwise unreachable system. The pause is there so you can impress your junior padawan who rushed to you in a hurry.
cactusplant7374
9 hours ago
What's the performance hit from compressing ram?
YouAreWRONGtoo
9 hours ago
It's sometimes not a hit, because CPUs have caches and memory bandwidth is the limiting factor.
aidenn0
8 hours ago
Depends on the algorithm (and how much CPU is in use); if you have a spare CPU, the faster algorithms can more-or-less keep up with your memory bandwidth, making the overhead negligible.
And of course the overhead is zero when you don't page-out to swap.
speedgoose
9 hours ago
I haven’t scientifically measured, but you don’t compress the whole ram. It is more about reserving a part of the ram to have very fast swap.
For an algorithm using the whole memory, that’s a terrible idea.
LargoLasskhyfv
an hour ago
>...but you don’t compress the whole ram.
I do: https://postimg.cc/G8Gcp3zb (casualmeasurement.png)
sokoloff
8 hours ago
> It is more about reserving a part of the ram to have very fast swap.
I understand all of those words, but none of the meaning. Why would I reserve RAM in order to put fast swap on it?
vlovich123
8 hours ago
Swap to disk involves a relatively small pipe (usually 10x smaller than RAM). So instead of paying the cost to page out to disk immediately, you create compressed pages and store that in a dedicated RAM region for compressed swap.
This has a number of benefits: in practice more “active” space is freed up as unused pages are compressed and often compressible. Often times that can be freed application memory that is reserved within application space but in the free space of the allocator, especially if that allocator zeroes it those pages in the background, but even active application memory (eg if you have a browser a lot of the memory is probably duplicated many times across processes). So for a usually invisible cost you free up more system RAM. Additionally, the overhead of the swap is typically not much more than a memcpy even compressed which means that you get dedup and if you compressed erroneously (data still needed) paging it back in is relatively cheap.
It also plays really well with disk swap since the least frequently used pages of that compressed swap can be flushed to disk leaving more space in the compressed RAM region for additional pages. And since you’re flushing retrieving compressed pages from disk you’re reducing writes on an SSD (longevity) and reducing read/write volume (less overhead than naiive direct swap to disk).
Basically if you think of it as tiered memory, you’ve got registers, l1 cache, l2 cache, l3 cache, normal RAM, compressed swap RAM, disk swap - it’s an extra interim tier that makes the system more efficient.
waynesonfire
8 hours ago
> zram, formerly called compcache, is a Linux kernel module for creating a compressed block device in RAM, i.e. a RAM disk with on-the-fly disk compression. The block device created with zram can then be used for swap or as a general-purpose RAM disk
To clarify OP's represention of the tool, it compresses swap space not resident ram. Outside of niche use-cases, compressing swap has overall little utility.
coppsilgold
5 hours ago
Incorrect, with zram you swap ram to compressed ram.
It has the benefit of absorbing memory leaks (which for whatever reason compress really well) and compressing stale memory pages.
Under actual memory pressure performance will degrade. But in many circumstances where your powerful CPU is not fully utilized you can 2x or even 3x your effective RAM (you can opt for zstd compression). zram also enables you to make the trade-off of picking a more powerful CPU for the express purpose of multiplying your RAM if the workload is compatible with the idea.
PS: On laptops/workstations, zram will not interfere with an SSD swap partition if you need it for hibernation. Though it will almost never be used for anything else if you configure your zram to be 2x your system memory.
dboreham
6 hours ago
Haven't used swap since 2010.
awesome_dude
7 hours ago
How do you get swap on a VPS?
justsomehnguy
6 hours ago
Search "linux enable swap in a file"
To enable a swap file in Linux, first create the swap file using a command like sudo dd if=/dev/zero of=/swapfile bs=1G count=1 for a 1GB file. Then, set it up with sudo mkswap /swapfile and activate it using sudo swapon /swapfile. To make it permanent, add /swapfile swap swap defaults 0 0 to your /etc/fstab file.
collinmanderson
6 hours ago
Yes. I think might also need to chmod 600 /swapfile. I do this on all my VPS, especially helps for small VPS with only 1GB ram:
fallocate -l 1G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
Works really well with no problems that I've seen. Really helps give a bit more of a buffer before applications get killed. Like others have said, with SSD the performance hit isn't too bad.awesome_dude
4 hours ago
IME SWAP has been explicitly disabled by the VPS providers.
Partly it's a money thing (they want to sell you RAM), partly it's so that the shared disk isn't getting thrashed by multiple VPS
awesome_dude
6 hours ago
Strongly suggest you try doing that on a VPS, then report back
ahepp
6 hours ago
What do you think is going to happen? I tested it out on an ec2 instance just now and it seems to have worked as one would expect.
awesome_dude
5 hours ago
EC2 != VPS
cmpxchg8b
26 minutes ago
They both offer virtualized guests under a hypervisor host. EC2 does have more offload specialization hardware but for the most part they are functionally equivalent, unless I'm missing something...
justsomehnguy
6 hours ago
https://news.ycombinator.com/item?id=45007821
And that was like... two years ago? 1GB of RAM and actually ~700MB usable before I found the proper magik incantations to really disable kdump.
Also have used 1GB machines for literally years.
Strongly suggest you shouldn't strongly suggest.
awesome_dude
5 hours ago
Uh.. your link... doesn't show how a VPS can have SWAP enabled
You do understand what's being discussed... right?
justsomehnguy
5 hours ago
Literally up the chain: https://news.ycombinator.com/item?id=45663111
Or you have a very peculiar understanding what 'VPS' means.