keeda
2 months ago
Heheh a bit tangential, but a long time ago, I had a similar thought: how much performance could we gain if we just compared hash values (typically integers) and avoided comparing actual keys -- and the pointer-chasing that entails -- as far as possible?
The problem is that for a regular hash table, eventually keys must be compared because two keys could have the same hash value. So maybe we relegate key comparisons only in cases when we encounter a collision.
The only case where this can work is when the set of keys that could ever be looked up is static. Otherwise we could always get a lookup for a new, non-existent key that creates a new collision and return the wrong value. Still, there are cases where this could be useful, e.g. looking up enum values, or a frozendict implementation. Something like minimal perfect hashing, but simpler.
So I came up with this silly project and benchmarked it in Java, C# and a hacky CPython "extension": https://github.com/kunalkandekar/Picadillo
A very micro optimization, but turned out there was a 5% - 30% speedup on a PC of that era.
bluuewhale
2 months ago
This is a really fun project!
In SwissTable, the main difference is that you only get a probabilistic signal of presence (kind of like a Bloom filter), but the core idea feels very similar.