Besides the standard tricks to speed up the running time like threads and mmap, ...

anonymoushn · on Oct 30, 2024

I had similar results exploring hash tables for a nearly identical task here[0].

It seems like swiss tables are tuned for workloads of mostly gets of keys that are not present. Otherwise the two-level lookup is an extra cost with not much benefit.

IIRC Joad Nacer says he is using a hash table for this (again, essentially identical) task on highload.fun[1]. This was sort if surprising for the other competitors because the top couple solutions below that one use some variety of bloom filter instead.

0: https://easyperf.net/blog/2022/05/28/Performance-analysis-an...

1: https://highload.fun/tasks/2/leaderboard

ww520 · on Oct 31, 2024

Yes. Swiss is good for non-duplicate lookups. For highly duplicate data the extra memory fetch for the metadata byte really kills the performance.

For this contest, there’re a billion lookups with only couple hundreds distinct keys. That means for most lookups, the full path of locating the key is executed - hashing, metadata comparison, hashed value comparison, and full key comparison. It’s actually quite expensive. Removing any part from the execution path really helps.

norskeld · on Oct 30, 2024

When it comes to custom hash tables, I always remember this video [0] by Strager about implementing the perfect hash table and hash function. Fascinating insights and perf hints.

[0]: https://youtu.be/DMQ_HcNSOAI

ww520 · on Oct 31, 2024

Many videos on hash table are great. I found this one particularly good.

[https://www.youtube.com/watch?v=M2fKMP47slQ] C++Now 2018: You Can Do Better than std::unordered_map: New Improvements to Hash Table Performance

adeptima · on Oct 30, 2024

Gosh, ant thnx for sharing. Hope this would never become a mainstream during coding interview. Imaging - please list top 10 hash table implementations and its time and space complexity

ww520 · on Oct 31, 2024

Ha. I didn’t know the detail of any of these tables beforehand. It’s just a matter of doing a bit of research. The main takeaway is there’s no perfect hash table. It all depends on the data and the operations. Contests like this are the perfect excuse to deep dive into these kinds of topic.