Porting OpenVMS to the Itanium Processor Family (2003)[pdf]

30 pointsposted 11 hours ago
by naves

20 Comments

twoodfin

10 hours ago

The Apache #’s pretty much give the game away: An Itanium clocked 50% higher was losing to a 2yo Alpha by about 20% on throughput at peak.

VLIW made sense when Intel wanted to win the FP-heavy workstation market. But while it was in development, integer-heavy web workloads became dominant and that was basically the ballgame.

johndoe0815

8 hours ago

The world would be much nicer if we still had new Alpha CPUs. It was intended to be a CPU architecture that lasts 25 years and Digital intended the architecture to support a 1000x increase in performance during that time.

Now we have RISC-V reinventing the wheel. Not the worst outcome, but we could have had it so much better...

aardvark179

6 hours ago

Much as some aspects of the Alpha were great its weak memory model would have resulted in even more concurrency issues than we have now, and way more explicit fences.

ahoka

8 hours ago

It couldn't even handle unaligned access in it’s original form. Surely an architecture to last for 25 years.

fredoralive

7 hours ago

Not handling unaligned access gracefully is a classic RISC "feature", as part of the general simplification of a processor to its basics. I'm not sure if it's really an Alpha specific thing. Plus they added some instructions to ease the pain in 1996.

The main issue people tend to bring up with Alpha is the very loose memory model, of the "things happen, but different processors may not really agree on the order they happened in" type of thing (plus, isn't it rude to want to know what other cores have in their cache?). Which would be a pain in our modern multicore world.

Of course we don't know how things would've evolved over time, ARM (at least on big cores[1]) shifted towards the forgiving model for unaligned access, it's possible over time Alpha would've similarly moved to a more forgiving environment for low level programmers.

[1] On embedded stuff, you're going to the Hardfault handler.

dfox

7 hours ago

BWX does not help with unaligned accesses, but solves the fact that original Alpha did not even have instructions for memory accesses smaller than word. Which kind of becomes an issue when you start building the systems on PC-like hardware (another related “feature” is that EV5 does not have equivalent to MTRRs, but dealing with the weirdness of VGA framebuffer accesses is part of the architecture specification by means of hardcoded uncached memory region).

fredoralive

7 hours ago

TBH, I'm not an expert on Alpha, and wow, as an embedded programmer by trade, that's really wacky way of handling memory access. I guess it made more sense in the minicomputer world where you control the whole stack, but as a more general purpose architecture its well, not the greatest is it.

dfox

5 hours ago

There is a lot of good things to be said about Alpha and it is probably the most RISC of all 90's RISC ISAs, but the actual hardware is full of weirdness and all the real CPUs were deeply pipelined OoO designs (think Intel NetBurst) that prioritized high clock rates and huge straight-line throughput above all else (which is also why that ran really hot and could not be really scaled down for embedded use). Taking good ideas from that while discarding the “speed-demon” design is a part of why AMD become relevant again in 00's with amd64 cementing that position (but well, AMD K7 is very much Alpha-related design, to the extent that the chipsets are interchangeable between K7 and EV6. The interesting part of that is that these CPUs do not have FSB in the “bus” sense, but there is a point-to-point link between CPU and chipset).

fredoralive

5 hours ago

AMD using an Alpha bus for early Athlons feels like a weird lost opportunity. Cheap x86 aimed motherboards that can run also run Alpha chips with Windows 2000 + FX!32 for compatibility, it might’ve had a chance to shine, albeit a slight chance. Sadly by then Compaq had already boarded the Itanic…

formerly_proven

7 hours ago

Alpha had a super loosy-goosy memory model because iirc the cache size they wanted couldn’t be built with the performance they needed on the process they had, so they made it from two wholly independent cache banks, both serving the same core through a shared queue.

formerly_proven

7 hours ago

DEC designed StrongARM pretty much immediately after Alpha shipped because Alpha chips ran hot as frick and DEC engineers didn’t see a path to low-power Alpha.

jcranmer

7 hours ago

The Itanium architecture is a weird weird architecture. It's not weird in the sense of Alpha's weirdness (e.g., the super weak memory model), which can be fairly easily compensated for, but it's weird in several ways that make me just stare at the manual and go "how are you supposed to write a compiler for this?" It's something that requires a sufficiently smart compiler to get good performance, while at the same time so designed as to make writing that sufficiently smart compiler effectively compiled.

It wouldn't surprise me if Itanium actually had pretty compelling SPECint numbers. But a lot of those compelling numbers would have come from massive overtuning of the compiler to the benchmark specifically. Something that's going to be especially painful for I/O-heavy workloads is that the gargantuan register files make any context switch painfully slow.

aleph_minus_one

5 hours ago

There exists a co-evolution between compilers, programming languages and CPUs (or more generally ASICs). I consider it to be very plausible that it is quite possible to develop a programming language that makes it sufficiently easy for a programmer to write performant code for an Itanium, but such a programming language would look different from C or C++.

aleph_minus_one

5 hours ago

> The Apache #’s pretty much give the game away: An Itanium clocked 50% higher was losing to a 2yo Alpha by about 20% on throughput at peak.

This is not just a benchmark of the CPUs, but also of the compilers involved. It is well-known that it was very hard to write a compiler that generates programs that could harness the optimization potential of Itanium's instruction set.

formerly_proven

7 hours ago

Itanium was primarily developed by Intel, Itanium 2 primarily by the HP team that also was responsible for the competitive PA-RISC chips. (Or so they say). In any case, Itanium 2 still outperformed much later AMD Opterons and Intel Xeons running at twice the clock in numerical workloads. That’s pretty impressive.

twoodfin

7 hours ago

That’s my point: If the demand for high-end compute at the turn of the millennium had looked the same as the demand for high-end compute in 1992, Itanium probably would have conquered the world.

But Tim Berners-Lee had a NeXT and some good ideas…