(1) The trick for resizing is, we only rehash all k elements to recalculate buck...

(1) The trick for resizing is, we only rehash all k elements to recalculate buckets every k inserts or so, and we double the hash table size. So, if you imagine that you just came from a resize, you had k elements and had performed N(k) hashes, then when you hit 2k elements you will have to resize again, and you will perform N(2k) = N(k) + k + 2k total hashes. This recurrence is solved by N(k) = 3k + C for an arbitrary constant C. Averaged over the elements you have inserted k, it's easy to see that for very large dictionaries, you only hash each element on average 2-3 times -- three times if you trigger a resize with k, 2 times if you come in just before triggering the resize.

(2) Strictly speaking you don't need this overhead and you can use trees to keep it sparse as well, although as far as I know the low-n overhead slows it down too much in practice when compared to the high-n space efficiency. That is, you could declare a binary tree structure with only those buckets and pathways of the 2^32 which you need, but to find something within the structure requires checking and following ~32 pointers and storing something requires creating ~32 nodes.