Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, this post is in the context of an e2e encrypted DB, but it's subject to the same constraints: https://medium.com/@ZeroDB_/scalable-full-text-search-over-e...

If you understand btrees you understand the hardest part already :)

Basically, you need to design a search index that examines the fewest DB pages in order to find the result. The Lucene scoring method stores a mapping of term -> document[] sorted in relevance order. The main idea is that you can examine only the first n documents for each term in the search query in order to find the most relevant search results. Picking n is sort of tricky, but if your index is stored in this way it's possible to fulfill a large % of queries efficiently without downloading the whole index.

Here's a little Python implementation of what I mean by a "user space implementation". Note that it's a toy but it performs pretty well on some demo sklearn data sets: https://gist.github.com/petehunt/724eeb77189332db101ad7b0db8...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: