What is robots.txt?
Robots.txt is a plain text file stored at the root of your domain (e.g., yoursite.com/robots.txt). It tells search engine crawlers — like Googlebot — which pages or sections of your site they are allowed or not allowed to visit.
Basic robots.txt Syntax
User-agent: *— Applies the rule to all bots.User-agent: Googlebot— Applies the rule only to Google's crawler.Allow: /— Allows crawling of the entire site.Disallow: /admin/— Blocks the /admin/ directory from being crawled.Sitemap: https://yoursite.com/sitemap.xml— Points bots to your sitemap.
What Pages Should You Block in robots.txt?
- Admin areas (
/admin/,/wp-admin/) - Login and registration pages
- Internal search result pages
- Staging or test environments
- Duplicate content pages
How to Generate a robots.txt File
FAQ
Does robots.txt affect my Google ranking?
Indirectly yes. If you accidentally block important pages, Google can't index them and they won't rank. Always double-check your robots.txt using Google Search Console's URL inspection tool.
What if I don't have a robots.txt?
Search engines will crawl your entire site by default. It's good practice to have one even if it just contains Allow: /.