Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A while back I had to perform an entity resolution task on several million entities.

We arrived at a point in the algorithm where we had a long list of connected components by ids that needed to be reduced into a mutually independent set of components, e.g. ({112, 531, 25}, {25, 238, 39, 901}, {43, 111}, ...)

After much head banging about working out way to do this that wouldn't lead to an out-of-memory error, we found the Disjoint Set Forest Data Structure and our prayers were answered.

I was very grateful for it!



Exactly the same for me. We had a defect management system, and a defect can be connected to others as dupes. We'd have whole chains of dupes. We used union find to identify them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: