Robots.txt Generator
Create and customize robots.txt files to control search engine crawling
Generated Robots.txt
Best Practices
About Robots.txt
A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site. This is used mainly to avoid overloading your site with requests and to prevent search engines from indexing certain pages.
Practical guide: robots.txt limits and strengths
What this is
Robots.txt is a plain-text policy file at the site root that tells compliant crawlers which URL paths they should not fetch. It governs crawling, not secrets: blocked URLs can still appear in search results if linked elsewhere, and it does not block users who know the link. Use it to reduce crawl noise, protect fragile endpoints, and publish sitemap locations—not as your only privacy control.
How to use it
Build rules per User-agent group, then test with Search Console’s robots tester or fetch-as scenarios you care about. Keep directives simple and avoid conflicting Allow/Disallow patterns that are hard to reason about. Include absolute Sitemap: lines for each index you maintain. After edits, deploy to /robots.txt only—subfolder copies are ignored by spec.
How to read the results
Most engines respect robots.txt for crawl scheduling, but interpretation can differ for edge patterns. A disallow stops fetches; it does not equal noindex. If you need a URL out of the index while still crawlable for some reason, use meta robots or HTTP headers instead. Crawl-delay is not universally honored—treat it as a hint at best.
Common mistakes
Blocking CSS or JavaScript needed for rendering can harm how Google understands the page. Assuming disallow removes a URL from the index leads to surprises when external links exist. Wildcards and case sensitivity trip up teams migrating platforms. Duplicating sitemap lines dozens of times adds noise, not power. Pair robots.txt with canonicals, redirects, authentication, and noindex where each is appropriate.
Common Directives
- User-agent: Specifies which crawler the rules apply to
- Disallow: Tells crawlers not to access specific pages/directories
- Allow: Explicitly allows crawling of specific pages/directories
- Sitemap: Points to your XML sitemap location
- Crawl-delay: Suggests a delay between crawler requests
Frequently Asked Questions
- Admin areas and login pages
- Search result pages
- Private or temporary files
- Duplicate content pages
- Development or staging environments