Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone have suggestions on what a proper robots.txt would be?

How about:

  User-agent: *
  Allow: /
  Sitemap: https://example.com/sitemap.xml


The recommendation is to use an empty "Disallow:" rule rather than a catch all "Allow:" rule.

Otherwise that is the canonical minimal example.


Like this?

  User-agent: *
  Disallow: 
  Sitemap: https://example.com/sitemap.xml


Precisely.


Why not just this:

   Sitemap: https://example.com/sitemap.xml
Won't crawlers crawl by default?


That's a valid robots.txt, but "proper" is entirely dependent on what you want to achieve. If you aren't looking to treat different bots differently, and are looking allow all of your site to be indexed, then that is exactly what you want.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: