Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://developers.google.com/search/docs/crawling-indexing/...

   "Important: For the noindex rule to be effective, the page or resource must not be blocked by a robots.txt file, and it has to be otherwise accessible to the crawler. If the page is blocked by a robots.txt file or the crawler can't access the page, the crawler will never see the noindex rule, and the page can still appear in search results, for example if other pages link to it."
It's counterintuitive but if you want a page to never appear on Google search, you need to flag it as noindex, and not block it via robots.txt.

> 1. What does it look like for a page to be indexed when googlebot is not allowed to crawl it? What is shown in search results (since googlebot has not seen its content)?

It'll usually list the URL with a description like "No information is available for this page". This can happen for example if the page has a lot of backlinks, it's blocked via robots.txt, and it's missing the noindex flag.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: