Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great. This link helps. Google's site basically says that in addition to specifically excluding their crawler via robots.txt, I also have MANUALLY submit a request to them. As I said in a reply below, this is nonsensical and Bing is being more reasonable, but I'll grit my teeth and do it since Google has more power here.

It definitely seems like Google is exploiting a loophole in spirit of the definition of robots.txt. Robots.txt is an ancient standard, and I don't think it was anticipated at that time that search engines would gain enough confidence about pages' relevance to list them even if they had not indexed/crawled them.



As I recall, the spirit of robots.txt was not about appropriateness of search results so much as "this URL space can generate an unbounded graph, please don't DoS my server by trying to exhaustively traverse it."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: