A while back I had to perform an entity resolution task on several million entit... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		fumblebee on July 22, 2022 \| parent \| context \| favorite \| on: Ask HN: What are some cool but obscure data struct... A while back I had to perform an entity resolution task on several million entities. We arrived at a point in the algorithm where we had a long list of connected components by ids that needed to be reduced into a mutually independent set of components, e.g. ({112, 531, 25}, {25, 238, 39, 901}, {43, 111}, ...) After much head banging about working out way to do this that wouldn't lead to an out-of-memory error, we found the Disjoint Set Forest Data Structure and our prayers were answered. I was very grateful for it!

BeetleB on July 22, 2022 [–]

Exactly the same for me. We had a defect management system, and a defect can be connected to others as dupes. We'd have whole chains of dupes. We used union find to identify them.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact