hackernews client

magicalhippo

5 months ago

I'm a bit curious why this couldn't have been covered earlier by simply having the OS disable some cores on a 4-module 8-thread part. After all the video does point out that if only one thread uses one module, it as full access to all the resources of the module.

Also, the benchmark is clock-for-clock, so while the older Phenom II looks like it's ahead, the Buldozer should be able to go faster still.

All that said, I really enjoyed this retrospective look.

zokier

5 months ago

There were some benchmarks at the time with disabled cores, for example: https://www.hardware.fr/articles/842-9/efficacite-cmt.html

Yizahi

5 months ago

They were real though. How many ALU were there on say FX-8350? 8 ALUs. How many FPUs were there? 8 FPU each 128 bit wide. What alternative definition of core whis doesn't satisfy? CPU was underperforming at that time and Intel fans were trying to equate their Hyperthreading with AMDs core organization, but they were always real cores.

dragontamer

5 months ago

8 Integer ALUs, 4 Vector FPUs, 8x L1 d-caches but only 4x L2 d-Caches.

And perhaps most importantly: 4x decoders/4x L1 iCache. IIRC, the entire damn chip was decoder-bound.

--------

Note: AMD Zen has 4x Integer pipelines and 4x FPU pipelines __PER CORE__. Modern high-performance systems CANNOT have a single 2x-pipeline FPU shared between two cores (averaging one pipeline per core). Modern Zen is closer to 4x pipelines per core, maybe more depending on how you count load/store units.

user

5 months ago

[deleted]

dannyw

5 months ago

Yup. The limited decoders meant your pipeline just wasn’t flowing every cycle, because many of the stages were sitting idle.

dragontamer

5 months ago

Note that Intel's modern e-Core has 3x decoders per core. When code is straight, they alternate (decoder#1 / decoder#2 / decoder#3). When code is branchy, they split up across different jumps aka if/else statements.

Shrinking the decoder on Bulldozer was clearly the wrong move for Fx-series / AMD. Today's chips are going wide decoder (ex: Apple can do 8x decode per clock tick), deep opcode cache (AMD Zen has a large opcode cache allowing for 6x way lookup per clocktick), or Intel's new and interesting multiple-decoder thing.

sidewndr46

5 months ago

How do you know the behavior of the decoding portion of Intel's E-core's? Do you work for them?

AlotOfReading

5 months ago

People use clever code to tease out microarchitectural details and scour through public information to with these things out. Agner Fog is one example. His microarch analysis documents 3x decoders for the Tremont microarch, predecessor to gracemont (what's currently used for E-cores).

https://www.agner.org/optimize/microarchitecture.pdf

zokier

5 months ago

The architectures of Intel cores is widely discussed and publicized. Here are the some details for the e-cores mentioned: https://chipsandcheese.com/p/skymont-intels-e-cores-reach-fo...

> Leapfrogging fetch and decode clusters have been a distinguishing feature of Intel’s E-Core line ever since Tremont. Skymont doubles down by adding another decode cluster, for a total of three clusters capable of decoding a total of nine instructions per cycle.

dragontamer

5 months ago

Intel tells you this in their optimization manuals and white papers.

They want you to write code that takes advantage of their speedups. Agner Fog is a better writer (a sibling comment already linked to Agner Fogs stuff). But I also like referencing the official manuals and whitepapers as a primary source document.

Hard to beat Intels documents on Intel chips after all.

Zardoz84

5 months ago

I had a few FX cores (and I keep yet stored). The early cheap 4 cores and the latter generation 8 cores (FX 8370E). And I can say that if you run code that scales well with multiple CPUs, it excels at it ( I can share a n-problem simalutor that I used as benchmark back in the day) Even, they aged far better than some Intel cpus of the time, because they had 8 cores.

FX cores had his issues. But one, was the AMD bet too early, and too hard that the future was to have a high number of cores.

zokier

5 months ago

Problem was that even for multithreaded workloads the "8 core" FX-8150 did not always win against 4 hyperthreaded Intel cores. That is pretty apparent from e.g. the benchmarks here: https://www.phoronix.com/review/intel_corei7_3770k

You can easily see the multithreaded workloads there because you have the six core 3960X as comparison too.

stn8188

5 months ago

That was a neat video, I wasn't aware of the FX architecture in that detail. I loved my FX series... I had a 6300 that got me through engineering school, and now the same basic desktop serves as my kids' gaming computer (though I was able to upgrade to a cheap 8350). It definitely still holds it's own with the older games that I let the kids play!

dannyw

5 months ago

It was good value! There’s few good value CPU’s sadly. I remember using my ryzen 3300x for years and years; it got me by on a budget.

HankStallone

5 months ago

I'm actually replacing the FX-8350 in my fileserver next week, because I was running ffmpeg on it and it kept crashing about a minute into the job, so I assume it was overheating either the CPU or something on the motherboard.

It's almost 10 years old, so I can't complain. And I think I got a check for $2 or something like that from the class-action suit.

doublepg23

5 months ago

Definitely worth replacing for the performance at this point but is it possible it just needs a repaste? Thermal paste would’ve definitely dried out over 10 years and cause overheating symptoms.

close04

5 months ago

I'm running one daily for the past ~12-13 years and the stability is impeccable but the performance is as you'd imagine. More likely that the motherboard age and degradation of various components would lead to instability, than the CPU itself.

HankStallone

5 months ago

Good point. I was kind of itching to upgrade that box anyway, but maybe I should repaste it and make it a backup server.

puskavi

5 months ago

Wouldn't be surprised if caps on mobo have been cooked by all the heat

FancyFane

5 months ago

The Phenom II will always have a special place in my heart being the CPU of choice in my first CPU build in 2011. It's wild to see it's still being compared to modern CPUs, and winning the against the competition in select benchmarks.

ahartmetz

5 months ago

I completely skipped the FX disaster / Intel dominance phase by holding on to a Phenom II X6. At the time, my upgrade policy was "when twice the performance is available for the same price as the old part". That never quite happened with Intel's 4 core parts.

flyinghamster

5 months ago

One of my old builds was a Phenom II X2 550 Black, where I found that I could either overclock it, or unlock two more cores, but not both. I chose the cores, and it ran that way for a long time. That was one of the best bang-for-the-buck deals I ever ran into for a CPU.

Zardoz84

5 months ago

They had real cores. Only, that each two cores, shared the float point units.

wmf

5 months ago

Nothing could really save the FX series. It had lower performance than Intel with twice the die size.

What if AMD FX had "real" cores? [video]

25 Comments

magicalhippo

zokier

Yizahi

dragontamer

user

dannyw

dragontamer

sidewndr46

AlotOfReading

zokier

dragontamer

Zardoz84

zokier

stn8188

dannyw

HankStallone

doublepg23

close04

HankStallone

puskavi

FancyFane

ahartmetz

flyinghamster

Zardoz84

wmf