smarks
a year ago
(from Feb 2021)
Some of the JDK's unmodifiable collections, such as those from `Set.of()` and `Map.of()`, also randomize iteration order. At the time they were added, Go and Python also randomized iteration order. However, more recently, Python moved away from randomization. Early in the Python 3.x releases, dict iteration order was randomized. In 3.6, iteration order was insertion order, but only as an implementation detail. In 3.7, this was elevated to a guarantee.
https://docs.python.org/3/library/stdtypes.html#dict
(search for "dictionary order")
There are occasional complaints about Java's unmodifiable collections' randomized order, as it can occasionally cause intermittent test failures or even failures in production. However, the advantage of the randomized order is that we've been able to reorganize the internal implementations of these data structures -- twice -- without worrying about compatibility problems caused by changing the ordering. Historically this has been a problem with other structures such as `HashMap`. Even though HashMap's order isn't guaranteed, it's predictable and stable, and changing its ordering has definitely flushed out bugs.
o11c
a year ago
IMO the real problem is that sometimes you really do want a deterministic order (not caring which one), but the container doesn't offer any API that provides it.
And since hash-first languages provide a broken API, there's no way to provide it for arbitrary types. Compare-first languages (like C++) generally provide it, but paying the price of tree-based structures (assuming a B-tree can't negate it) isn't actually necessary.
Rather than either of those choices, I argue that languages really should be "key-first" - rather than having classes implement hashing or comparison themselves, they should return a tuple of the data that will be fed into said hash or comparison. Besides being a nicer API, this also prevents many common bugs.
lesam
a year ago
C++ gives both ‘map’ and ‘unordered_map’, and in my experience idiomatic C++ uses unordered_map unless you actually need a tree.
tialaramex
a year ago
> IMO the real problem is that sometimes you really do want a deterministic order (not caring which one), but the container doesn't offer any API that provides it.
This would be a lot more convincing if you had an actual concrete example where this was true, rather than just insisting that "sometimes" it's true.
alexhornby
a year ago
Few examples I’ve come across: hashing, float summation, reproducible serialisation
medstrom
a year ago
I get two of those, can you tell me more about float summation?
tialaramex
a year ago
Because the exponent varies, it matters what order we add floats together, if we add them in one order we get a very different answer to another. If we expected a consistent result that's very surprising.
Suppose you have a billion, and also you have a million ones, all as 32-bit floating point numbers. If we begin with the billion, and add each one, the additions do nothing, because in the representation used for the billion a single one is too small to matter, so our answer is just one billion again each time -- but if we begin with the ones, adding the billion last, we've reached a million when we add the billion, that's not too small to matter and we get 1.001 billion.
skratky
a year ago
Use the Kahan summation algorithm.
tialaramex
a year ago
Kahan solves my examples but in general cannot be relied on to deliver any specific benefit - it's never worse but it may not be meaningfully better either.
Pairwise algorithms can promise better results but aren't applicable unless we're OK with the idea of separately storing the numbers somewhere while we run the algorithm or we provide this hash table with a random access mechanism it may otherwise have no use for.
tialaramex
a year ago
Huh. I guess somebody could build a specialised container for this purpose, which deliberately has some arbitrary but unchanging ordering. Certainly that looks practical in Rust (indeed it may already exist, I haven't looked).
hinkley
a year ago
LRU caching.
hinkley
a year ago
Java and JavaScript both have a way to get keys back in insertion order.
It’s pretty simple to implement an LRU cache when you have time ordered iterators.