ot
5 days ago
Utilization is not a lie, it is a measurement of a well-defined quantity, but people make assumptions to extrapolate capacity models from it, and that is where reality diverges from expectations.
Hyperthreading (SMT) and Turbo (clock scaling) are only a part of the variables causing non-linearity, there are a number of other resources that are shared across cores and "run out" as load increases, like memory bandwidth, interconnect capacity, processor caches. Some bottlenecks might come even from the software, like spinlocks, which have non-linear impact on utilization.
Furthermore, most CPU utilization metrics average over very long windows, from several seconds to a minute, but what really matters for the performance of a latency-sensitive server happens in the time-scale of tens to hundreds of milliseconds, and a multi-second average will not distinguish a bursty behavior from a smooth one. The latter has likely much more capacity to scale up.
Unfortunately, the suggested approach is not that accurate either, because it hinges on two inherently unstable concepts
> Benchmark how much work your server can do before having errors or unacceptable latency.
The measurement of this is extremely noisy, as you want to detect the point where the server starts becoming unstable. Even if you look at a very simple queueing theory model, the derivatives close to saturation explode, so any nondeterministic noise is extremely amplified.
> Report how much work your server is currently doing.
There is rarely a stable definition of "work". Is it RPS? Request cost can vary even throughout the day. Is it instructions? Same, the typical IPC can vary.
Ultimately, the confidence intervals you get from the load testing approach might be as large as what you can get from building an empirical model from utilization measurement, as long as you measure your utilization correctly.
eklitzke
5 days ago
I agree. If you actually know what you're doing you can use perf and/or ftrace to get highly detailed processor metrics over short periods of time, and you can see the effects of things like CPU stalls from cache misses, CPU stalls from memory accesses, scheduler effects, and many other things. But most of these metrics are not very actionable anyway (the vast majority of people are not going to know what to do with their IPC or cache hit or branch hit numbers).
What most people care about is some combination of latency and utilization. As a very rough rule of thumb, for many workloads you can get up to about 80% CPU utilization before you start seeing serious impacts on workload latency. Beyond that you can increase utilization but you start seeing your workload latency suffer from all of the effects you mentioned.
To know how much latency is impacted by utilization you need to measure your specific workload. Also, how much you care about latency depends on what you're doing. In many cases people care much more about throughput than latency, so if that's the top metric then optimize for that. If you care about application latency as well as throughput then you need to measure both of those and decide what tradeoffs are acceptable.
tracker1
5 days ago
> There is rarely a stable definition of "work". Is it RPS? Request cost can vary even throughout the day. Is it instructions? Same, the typical IPC can vary.
I think this is probably one of the most important points... similarly, is this public facing work dealing with any kind of user request, or is it simply crunching numbers/data to build an AI model from a stable backlog/queue?
My take has always been with modern multi-core, hyper-threaded CPUs that are burstable is to consider ~60% a "loaded" server. That should have work split if it's that way for any significant portion of a day. Mostly dealing with user-facing services. So bursts and higher traffic portions of the day are dramatically different from lower utilization portions of the day.
A decade ago, this lead to a lot of work for cloud provisioning on demand for the heavier load times. Today it's a bit more complicated when you have servers with 100+ cores as an option for under $30k (guestimate based on $10k CPU price). Today, I'd lean to over-provisioning dedicated server hardware and supplement with cloud services (and/or self-cloud-like on K8s) as pragmatically as reasonable... depending on the services of course. I'm not currently in a position where I have this level of input though.
Just looking at how, as an example, StackOverflow scaled in the early days is even more possible/prudent today to a much larger extent... You can go a very long way with a half/full rack and a 10gb uplink in a colo data center or two.
In any case, for me... >= 65% CPU load for >= 30m/day means it's at 100% effective utilization, and needs expansion relatively soon. Just my own take.
everforward
5 days ago
> In any case, for me... >= 65% CPU load for >= 30m/day means it's at 100% effective utilization, and needs expansion relatively soon.
I think this depends on workload still because IO heavy apps hyperthread well and can push up to 100%. I think most of the apps I've worked on end up being IO bound because "waiting on SQL results" or the more generic "waiting on downstream results" is 90% of their runtime. They might spend more time reading those responses off the wire than they do actually processing anything.
There are definitely things that isn't true of though, and your metrics read about right to me.
jimmySixDOF
5 days ago
IEEE Hot Interconnects just wrapped up and they discussed latency performance tuning for Ultra Ethernet where it looks smooth on 2- or 5- sec view but at 100ms you see the obvious frame burst effects. If you don't match your profiling to the workload a false negative compounds your original problem by thinking you tested this so better look elsewhere.
SAI_Peregrinus
5 days ago
That's all true, and the % part is still a lie. As you note, CPU utilization isn't linear, and percentages are linear measures. CPU utilization isn't a lie, % CPU utilization is.
ot
5 days ago
It is a linear percentage of the amount of time the CPU is not idle. It is not linear in the amount of useful work, but that's not what "utilization" means.
The lie is the assumption that CPU time is linear in useful work, but that has nothing to do with the definition of utilization, it's just something that people sometimes naively believe.
> CPU utilization isn't a lie, % CPU utilization is
What do you mean by this? Utilization is, by definition, a ratio. % just determines that the scale is in [0, 100].
perching_aix
8 hours ago
Admittedly, I'm not there on the industry frontlines reading (or writing) whitepapers on CPU design, so my knowledge on CPU internals is fairly limited. Here's the premise I'm working with:
- operations are implemented in different sub-units of each core
- operations are pipelined, to help saturate these sub-units, so multiple ops executing on different sub-units can be in-flight at the same time
- operations are reordered and their execution is predicted, to help saturate the pipelines
Given all of these, to report the overall saturation of each core sounds like quite the challenge. It'd mean collecting data on how busy each sub-unit is, versus how busy it could be, then weighing that against how saturated the pipelines are leading there. Maybe one sub-unit is being fed to its brink, but another could still be fed work, it just isn't: maybe the program cannot do so, or isn't willing to do so, doesn't matter.
And so none of this would show up on the scheduler I believe. From the scheduler data, you get the assignment saturation, and then whatever the CPU ended up executing is whatever it did. Did it only do integer math? Did it only do matrix math? Busy spin? Something else? Maybe most sub-units remained completely dark. It's not a utilization ratio then though, but an assignment ratio. How much time each logical core spent assigned work, versus how much it didn't.
Provided I'm not off-base, I really don't find this to be a matter of "naivity" then on people's part. It's an honestly incorrect use of language. Regardless of the reason, e.g. if the kernel cannot actually determine the kind of utilization I describe, or if it doesn't make sense on a fundamental level somehow to try to, this still doesn't justify torturing the language by calling this utilization. It could be just referred to what it is then: assignment. This is like the difference between reserved and committed memory. Or like the difference between me working from 9-5, and me being on meetings from 9-1 and working from 1-5.
SAI_Peregrinus
5 days ago
Utilization can never reach 100%, since not all of the components of the CPU (or even one core) can actually be in use at once. Quite a few are shared between operations and thus mutually exclusive.
1718627440
4 days ago
It's of course time. It's 1-X all components of the CPU were idle.
SirMaster
5 days ago
What about 2 workloads that both register 100% CPU usage, but one workload draws significantly more power and heats the CPU up way more? Seems like that workload is utilizing more of the CPU, more of the transistors or something.
inetknght
5 days ago
Indeed, and there's a thing called "race to sleep". That is, you want to light up as much of the core as possible as fast as possible so you can get the CPU back to idle as soon as possible to save on battery power, because having the CPU active for more time (but not using as many circuits as it "could") draws a lot more power.
MBCook
5 days ago
At the same time, it takes a certain amount of time for a CPU to switch power levels, and I remember it being surprisingly slow on some (older?) processors.
So in Linux (and I assume elsewhere) there were attempts to figure out if the cost in time/power to move up to a higher power state would be worth the faster processing, or if staying lower power but slower would end up using less power because it was a short task.
I think the last chips I remember seeing numbers for were some of the older Apple M-series chips, and they were lightning fast to switch power levels. That would certainly make it easier to figure out if it was worth going up to a higher power state, if I’m remembering correctly.
magicalhippo
5 days ago
I deliberately put my govenor to the conservative one, as I hated fans spinning up for a second and then down again repeatedly. Much rather sacrifice a bit of battery and speed for quiet.
SirMaster
4 days ago
Can't you just cap the fan speed? Or does it actually get too hot at a lower fan speed to where it would throttle or crash?
magicalhippo
4 days ago
I wanted the full power when doing long compiles and such. Just not the fan jojo acion when neowsing the web or writing.
Also swapping the governor was trivial and reliable. Modifying fan profiles has always been a bit of a struggle for me, with huge differences in hardware support, persistence etc.
porridgeraisin
4 days ago
> jojo acion when neowsing
jumping into action when browsing
magicalhippo
4 days ago
Thanks. I hate the autocorrect as it's so often wrong, but hitting the right "keys" with no tactile feedback is such a pain. I miss T9...
MBCook
5 days ago
Smart. That would drive me nuts too.
saagarjha
5 days ago
Yes, this is pretty normal; your processor will downclock to accommodate. For HPC where the workloads are pretty clearly defined it’s possible to even measure how close you’re coming to the thermal envelope and adjust the workload.
throwaway31131
5 days ago
Percent utilization for most operating systems is the amount of time the idle task is not scheduled. So for both workloads the idle task was never scheduled, hence 100% "utilization".
BrendanLong
5 days ago
Some esoteric methods of measuring CPU utilizations are to calculate either the current power usage over the max available power, or the current temperature over the max operating temperature. Unfortunately these are typically even more non-linear than the standard metrics (but they can be useful sometimes).
gblargg
5 days ago
Like measuring RMS of an AC voltage by running it through a heating element: https://wikipedia.org/wiki/True_RMS_converter#Thermal_conver...
PunchyHamster
5 days ago
except it doesn't really tell you much, because having some parts of CPUs underutilized doesn't mean adding load will utilize them. Like if load underutilizes floating point units and you have nothing else that uses them
inetknght
5 days ago
> Like if load underutilizes floating point units
This is why I sigh really hard when people talk about some measurement of FLOPS per second, as if it's the only thing that matters.
It matters. Perhaps it matters a lot for specific workloads. But most general workloads are integer-based.
colejohnson66
5 days ago
But *sparkle emoji* AI *sparkle emoji*
kqr
5 days ago
Also there's dark silicon to consider – the CPU simply cannot for thermal reasons run power to all parts of itself at the same time.