C++ STL uses trees because the generic requirement for them to work is a < operator. Designing generics around hashing is much more difficult, and most languages struggle with that.
Ironically, due to caches, sorting and then using algorithms that rely on order tend to be superior than most hashing implementations, even though they are theoretically worse due to log(n) factor. So in a way, C++ algorithms are actually more modern.
> Ironically, due to caches, sorting and then using algorithms that rely on order tend to be superior than most hashing implementations
This doesn't match my experience at all. C++ trees are not cache-friendly; they're pointer-chasing (and there's no arena implementation in the STL). Second, any sort of ordering structure (be it through a tree or through sorting + binary search) is notoriously prone to branch mispredictions. They look great in a microbenchmark if you search for the same element all the time and the trees are small, but in a practical application, it can be incredibly painful. Pretty much by design, every choice is 50–50 and unpredictable.
std::unordered_map from most STLs isn't a fantastic hash table, but in my experience it absolutely destroys std::map (and is also comfortably ahead of something like std::lower_bound on a sorted array). All of this is assuming large-ish N, of course; for small N (e.g. below 30, usually), the best by far is a simple unsorted array and just going through it linearly.
Agreed, they should use a B-tree to get cache locality and easy generics, but there is legacy code there.
I was referring to the performance of algorithms. For example `std::unique`, `std::lower_bounds`, etc. Many of these use sorted lists, whereas most other languages' standard libraries utilize hashing for these.
> is also comfortably ahead of something like std::lower_bound on a sorted array
I would be interested to learn more about when that's the case. But, it's also not very flexible. You can put an `int` in it, great. Can you put `std::pair<int, int>` in it? Does it work as well?
B-trees don't work well for CPU caches; 64b lines are typically too small to gain much. (OK, the M1 has 128b lines, but still.) And you still get the branch mispredicts, and a significant increase in code complexity, so hashing is still significantly better. (I've seen C++ projects where the main structure was a B+-tree because the lower levels could be residing on disk, but where they maintained a separate hash table in memory to skip the top first few B-tree levels!)
> I would be interested to learn more about when that's the case. But, it's also not very flexible. You can put an `int` in it, great. Can you put `std::pair<int, int>` in it? Does it work as well?
If by “it” you're talking about std::unordered_map (as key); yes, you can, but you'd have to supply your own hash function, because std::pair<A, B> does not have a std::hash specialization. (It's a bit sad, but the problem is highly specific to std::pair/std::tuple.) Likewise, you can use any arbitrary type you create yourself, as long as it has operator== and you've written a hash function for it.
Ironically, due to caches, sorting and then using algorithms that rely on order tend to be superior than most hashing implementations, even though they are theoretically worse due to log(n) factor. So in a way, C++ algorithms are actually more modern.