jjcm
9 hours ago
I'm definitely excited to see 64 bit as a default part of the spec. A lot of web apps have been heavily restricted by this, in particular any online video editors. We see a bunch of restrictions due to the 32 bit cap today here at Figma. One thing I'm curious though is whether mobile devices will keep their addressable per-tab memory cap the same. It's often OS defined rather than tied to the 32 bit space.
prewett
3 hours ago
I guess I’m just a crusty ol’ greybeard C++ developer, but it seems like a video editor is out of place in a document browser. There’s a perfectly good native operating system that nobody uses any more.
If we think we need a more thoroughly virtualized machine than traditional operating system processes give us (which I think is obvious), then we should be honest and build a virtualization abstraction that is actually what we want, rather than converting a document reader into a video editor…
wyldfire
3 hours ago
> ... document browser ... document reader ...
I'm going to assume you're being sincere. But even the crustiest among us can recognize that the modern purpose for web browsers is not (merely) documents. Chances are, many folks on HN in the last month have booked tickets for a flight or bought a home or a car or watched a cat video using the "document browser".
> If we think we need a more thoroughly virtualized machine than traditional operating system processes give us (which I think is obvious)...
Like ... the WASM virtual machine? What if the WASM virtual machine were the culmination of learning from previous not-quite-good-enough VMs?
WASM -- despite its name -- is not truly bound to the "document" browser.
xmichael909
3 hours ago
Personally not a fan of Windows 95 in the browser, however the browser stoped being a “document reader” a decade ago it’s the only universel, sandbox runtime, and everything is moving in that direction ... safe code. WASM isnt a worst VM; it’s a diffrent trade off: portable, fast start, capability scoped compute without shiping a OS. Raw device still have their place (servers). If you need safe distribution + performance thats “good enough” WASM in the browser is going to be the future of client.
RussianCow
3 hours ago
The browser removes the friction of needing to install specialized software locally, which is HUGE when you want people to actually use your software. Figma would have been dead in the water if it wasn't stupidly simple to share a design via a URL to anyone with a computer and an internet connection.
imoverclocked
3 hours ago
There are projects to run WASM on bare metal.
I do agree that we tend to run a lot in a web-browser or browser environment though. It seems like a pattern that started as a hack but grew into its own thing through convenience.
It would be interesting to sit down with a small group and figure out exactly what is good/bad about it and design a new thing around the desired pattern that doesn't involve a browser-in-the-loop.
syndeo
18 minutes ago
> run WASM on bare metal
Heh, reminds me of those boxes Sun used to make that only ran Java. (I don’t know how far down Java actually went; perhaps it was Solaris for the lower layers now that I think about it…)
chrsig
3 hours ago
Think of it like emacs. Browsers are perfectly good operating systems just needing a better way to view the web.
mwcz
2 hours ago
That's too true to be funny!
FpUser
2 hours ago
I a 64yo fart. Started programming with machine codes. Native is my bread and butter. Still have no problems and am using browser as a deployment platform for some type of applications. Almost each thing has it's own use.
renehsz
7 hours ago
Unfortunately, Memory64 comes with a significant performance penalty because the wasm runtime has to check bounds (which wasn't necessary on 32-bit as the runtime would simply allocate the full 4GB of address space every time).
But if you really need more than 4GB of memory, then sure, go ahead and use it.
jsheard
7 hours ago
The comedy option would be to use the new multi-memory feature to juggle a bunch of 32bit memories instead of a 64bit one, at the cost of your sanity.
baq
7 hours ago
didn't we call it 'segmented memory' back in DOS days...?
munificent
7 hours ago
We call it "pointer compression" now. :)
mananaysiempre
6 hours ago
Seriously though, I’ve been wondering for a while whether I could build a GCC for x86-64 that would have 32-bit (low 4G) pointers (and no REX prefixes) by default and full 64-bit ones with __far or something. (In this episode of Everything Old Is New Again: the Very Large Memory API[1] from Windows NT for Alpha.)
[1] https://devblogs.microsoft.com/oldnewthing/20070801-00/?p=25...
o11c
5 hours ago
A moderate fraction of the work is already done using:
https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
Unfortunately the obvious `__attribute__((mode(...)))` errors out if anything but the standard pointer-size mode (usually SI or DI) is passed.
Or you may be able to do it based on x32, since your far pointers are likely rare enough that you can do them manually. Especially in C++. I'm pretty sure you can just call "foreign" syscalls if you do it carefully.
dajtxx
2 hours ago
6502 zero page instruction vibes.
magicalhippo
6 hours ago
It was glorious I tell you.
Especially how you could increase the segment value by one or the offset by 16 and you would address the same memory location. Think of the possibilities!
And if you wanted more than 1MB you could just switch memory banks[1] to get access to a different part of memory. Later there was a newfangled alternative[2] where you called some interrupt to swap things around but it wasn't as cool. Though it did allow access to more memory so there was that.
Then virtual mode came along and it's all been downhill from there.
[1]: https://en.wikipedia.org/wiki/Expanded_memory
[2]: https://hackaday.com/2025/05/15/remembering-more-memory-xms-...
mananaysiempre
3 hours ago
> Think of the possibilities!
Schulman’s Unauthorized Windows 95 describes a particularly unhinged one: in the hypervisor of Windows/386 (and subsequently 386 Enhanced Mode in Windows 3.0 and 3.1, as well as the only available mode in 3.11, 95, 98, and Me), a driver could dynamically register upcalls for real-mode guests (within reason), all without either exerting control over the guest’s memory map or forcing the guest to do anything except a simple CALL to access it. The secret was that all the far addresses returned by the registration API referred to the exact same byte in memory, a protected-mode-only instruction whose attempted execution would trap into the hypervisor, and the trap handler would determine which upcall was meant by which of the redundant encodings was used.
And if that’s not unhinged enough for you: the boot code tried to locate the chosen instruction inside the firmware ROM, because that will have to be mapped into the guest memory map anyway. It did have a fallback if that did not work out, but it usually succeeded. This time, the secret (the knowledge of which will not make you happier, this is your final warning) is that the instruction chosen was ARPL, and the encoding of ARPL r/m16, AX starts with 63 hex, also known as the ASCII code of the lowercase letter C. The absolute madmen put the upcall entry point inside the BIOS copyright string.
(Incidentally, the ARPL instruction, “adjust requested privilege level”, is very specific to the 286’s weird don’t-call-it-capability-based segmented architecture... But it’s has a certain cunning to it, like CPU-enforced __user tagging of unprivileged addresses at runtime.)
marcosdumay
4 hours ago
And turned out we have the transistors to avoid it, but it's a really good optimization for CPUs nowadays.
At least most people design non-overlaping segments. And I'm not sure wasm would gain anything from it, being a virtual machine instead of real.
malkia
5 hours ago
wait.... UNREAL MODE!
evmar
6 hours ago
It looks like memories have to be declared up front, and the memcpy instruction takes the memories to copy between as numeric literals. So I guess you can't use it to allocate dynamic buffers. But maybe you could decide memory 0 = heap and memory 1 = pixel data or something like that?
afiori
6 hours ago
Honestly you could allocate a new memory for every page :-)
TrueDuality
6 hours ago
The irony for me is that it's already slow because of the lack of native 64-bit math. I don't care about the memory space available nearly as much.
sehugg
5 hours ago
Eh? I'm pretty sure it's had 64-bit math for awhile -- i64.add, etc.
jesse__
5 hours ago
They might have meant lack of true 64bit pointers ..? IIRC the chrome wasm runtime used tagged pointers. That comes with an access cost of having to mask off the top bits. I always assumed that was the reason for the 32bit specification in v1
Findecanor
5 hours ago
Actually, runtimes often allocate 8GB of address space because WASM has a [base32 + index32] address mode where the effective address could overflow into the 33rd bit.
On x86-64, the start of the linear memory is typically put into one of the two remaining segment registers: GS or FS. Then the code can simply use an address mode such as "GS:[RAX + RCX]" without any additional instructions for addition or bounds-checking.
zarzavat
4 hours ago
I still don't understand why it's slower to mask to 33 or 34 bit rather than 32. It's all running on 64-bit in the end isn't it? What's so special about 32?
nagisa
4 hours ago
That's because with 32-bit addresses the runtime did not need to do any masking at all. It could allocate a 4GiB area of virtual memory, set up page permissions as appropriate and all memory accesses would be hardware checked without any additional work. Well that, and a special SIGSEGV/SIGBUS handler to generate a trap to the embedder.
With 64-bit addresses, and the requirements for how invalid memory accesses should work, this is no longer possible. AND-masking does not really allow for producing the necessary traps for invalid accesses. So every one now needs some conditional before to validate that this access is in-bounds. The addresses cannot be trivially offset either as they can wrap-around (and/or accidentally hit some other mapping.)
kannanvijayan
3 hours ago
I don't feel this is going to be as big of a problem as one might think in practice.
The biggest contributor to pointer arithmetic is offset reads into pointers: what gets generated for struct field accesses.
The other class of cases are when you're actually doing more general pointer arithmetic - usually scanning across a buffer. These are cases that typically get loop unrolled to some degree by the compiler to improve pipeline efficiency on the CPU.
In the first case, you can avoid the masking entirely by using an unmapped barrier region after the mapped region. So you can guarantee that if pointer `P` is valid, then `P + d` for small d is either valid, or falls into the barrier region.
In the second case, the barrier region approach lets you lift the mask check to the top of the unrolled segment. There's still a cost, but it's spread out over multiple iterations of a loop.
As a last step: if you can prove that you're stepping monotonically through some address space using small increments, then you can guarantee that even if theoretically the "end" of the iteration might step into invalid space, that the incremental stepping is guaranteed to hit the unmapped barrier region before that occurs.
It's a bit more engineering effort on the compiler side.. and you will see some small delta of perf loss, but it would really be only in the extreme cases of hot paths where it should come into play in a meaningful way.
azakai
4 hours ago
The special part is the "signal handler trick" that is easy to use for 32-bit pointers. You reserve 4GB of memory - all that 32 bits can address - and mark everything above used memory as trapping. Then you can just do normal reads and writes, and the CPU hardware checks out of bounds.
With 64-bit pointers, you can't really reserve all the possible space a pointer might refer to. So you end up doing manual bounds checks.
kannanvijayan
3 hours ago
Hi Alon! It's been a while.
Can't bounds checks be avoided in the vast majority of cases?
See my reply to nagisa above (https://news.ycombinator.com/item?id=45283102). It feels like by using trailing unmapped barrier/guard regions, one should be able to elide almost all bounds checks that occur in the program with a bit of compiler cleverness, and convert them into trap handlers instead.
azakai
2 hours ago
Hi!
Yeah, certainly compiler smarts can remove many bounds checks (in particular for small deltas, as you mention), hoist them, and so forth. Maybe even most of them in theory?
Still, there are common patterns like pointer-chasing in linked list traversal where you just keep getting an unknown i64 pointer, that you just need to bounds check...
phire
3 hours ago
Because CPUs still have instructions that automatically truncate the result of all math operations to 32 bits (and sometimes 8-bit and 16-bit too, though not universally).
To operate on any other size, you need to insert extra instructions to mask addresses to the desired size before they are used.
tjoff
7 hours ago
Webapps limited by 4GiB memory?
Sounds about right. Guess 512 GiB menory is the minimum to read email nowadays.
jjcm
6 hours ago
I know you're in this for the satire, but it's less about the webapps needing the memory and more about the content - that's why I mentioned video editing webapps.
For video editing, 4GiB of completely uncompressed 1080p video in memory is only 86 frames, or about 3-4 seconds of video. You can certainly optimize this, and it's rare to handle fully uncompressed video, but there are situations where you do need to buffer this into memory. It's why most modern video editing machines are sold with 64-128GB of memory.
In the case of Figma, we have files with over a million layers. If each layer takes 4kb of memory, we're suddenly at the limit even if the webapp is infinitely optimal.
gertop
5 hours ago
> 4GiB of completely uncompressed 1080p video in memory is only 86 frames
How is that data stored?
Because (2^32)÷(1920×1080×4) = 518 which is still low but not 86 so I'm curious what I'm missing?
jjcm
2 hours ago
> How is that data stored?
So glad you asked. It's stored poorly because I'm bad at maths and I'm mixing up bits and bytes.
That's what I get for posting on HN while in a meeting.
brokencube
5 hours ago
I would guess 3 colour channels at 16bit (i.e. 2 bytes)
(2^32)÷(1920×1080×4×3×2) = 86
05
5 hours ago
Apparently with 24 bytes per pixel instead of bits :) Although to be fair, there's HDR+ and DV, so probably 4(RGBA/YUVA) floats per pixel, which is pretty close..
miniBill
6 hours ago
In fairness, this is talking about Figma, not an email client
poly2it
6 hours ago
It doesn't actually allocate 4 GiB. Memory can be mapped without being physically occupied.
IshKebab
6 hours ago
No, web apps can actually use 4GB of memory (now 16GB apparently).
hsbauauvhabzb
6 hours ago
Finally a web browser capable of loading slack
mikestaas
3 hours ago
I'm excited by the GC, ref types and JS string API! Been a while J, how are you going?
wslh
6 hours ago
I assume that looking into the present we need to think about running local LLMs in the browser. Just a few days ago I submitted an article about that [1].