đ¤ Robots.txt Validator & Tester
Validate your robots.txt file syntax, test specific URLs against crawling rules, and get instant feedback on potential issues. Prevent indexing mistakes that could harm your SEO before they happen.
Input
Paste the content of your robots.txt file to validate syntax and check for common issues.
Test Specific URL
Check if a specific URL is allowed or blocked by your robots.txt rules.
Load example:
Validation Results
No Validation Yet
Paste your robots.txt content or load an example to start validation.
How This Tool Works
Input Your robots.txt
Paste your robots.txt content or provide a URL to fetch it automatically.
Parse & Validate
The tool parses all directives and validates syntax against official specifications.
Identify Issues
Detects errors, warnings, and potential problems that could affect crawling.
Test URLs
Test specific URLs against your rules to see if they'll be crawled or blocked.
What is robots.txt?
The robots.txt file is a text file placed in the root directory of your website that tells search engine crawlers which pages or sections of your site they can or cannot access. It's part of the Robots Exclusion Protocol (REP), a standard used by websites to communicate with web crawlers and robots.
đ¯ Purpose
- Control crawler access to your site
- Prevent server overload from crawlers
- Keep private pages out of search results
- Manage crawl budget efficiently
đ Location
- Must be at root:
https://example.com/robots.txt - Not in subdirectories
- Case-sensitive filename
- Plain text format (.txt)
â ī¸ Important Note
- Not a security mechanism
- Publicly accessible file
- Bots can ignore it (most respect it)
- Doesn't guarantee de-indexing
đ Key Directives
- User-agent: Specify which bot
- Disallow: Block access to path
- Allow: Override disallow rules
- Sitemap: Location of sitemap
robots.txt Syntax Guide
Basic Structure
# Comment line
User-agent: *
Disallow: /admin/
Allow: /public/
Sitemap: https://example.com/sitemap.xml
Each group starts with User-agent: followed by one or more
Disallow: or Allow: directives.
User-Agent Directive
# All bots
User-agent: *
# Specific bot
User-agent: Googlebot
# Multiple bots (separate groups)
User-agent: Googlebot
Disallow: /private/
User-agent: Bingbot
Disallow: /private/
Specifies which crawler the rules apply to. Use * for all crawlers.
Disallow Directive
# Block entire section
Disallow: /admin/
# Block specific file
Disallow: /secret.html
# Block all
Disallow: /
# Allow all (empty disallow)
Disallow: Tells crawlers not to access specified paths. Trailing slash matters!
Allow Directive
# Block folder but allow specific file
User-agent: *
Disallow: /private/
Allow: /private/public-page.html
Overrides Disallow: rules. More specific rules take precedence.
Wildcards
# * matches any sequence
Disallow: /*.pdf$
# $ indicates end of URL
Disallow: /*?*session=
# Block all URLs with parameters
Disallow: /*? * matches any sequence of characters. $ matches end of URL.
Sitemap Directive
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml
# Must be absolute URL
# Can have multiple sitemaps Points crawlers to your XML sitemap(s). Must be full absolute URLs.
Common robots.txt Mistakes
Blocking Important Pages
â Wrong:
Disallow: / (blocks entire site)
â
Correct:
Disallow: /admin/ (only block admin area)
Accidentally blocking your entire site is the most common and costly mistake.
Blocking CSS/JS Files
â Wrong:
Disallow: *.css
Disallow: *.js
â
Correct:
Don't block CSS/JS needed for rendering
Google needs CSS and JavaScript to properly render and index your pages.
Wrong File Location
â Wrong:
example.com/admin/robots.txt
example.com/ROBOTS.TXT
â
Correct:
example.com/robots.txt
Must be in root directory with exact lowercase filename.
Syntax Errors
â Wrong:
Useragent: * (missing hyphen)
Disallow /admin (missing colon)
â
Correct:
User-agent: *
Disallow: /admin/
Proper syntax is critical. Even small errors can break rules.
Using for Security
â Wrong Assumption:
robots.txt will keep pages private
â
Reality:
Use proper authentication/passwords
robots.txt is publicly readable and not all bots respect it. Never rely on it for security.
Relative Sitemap URLs
â Wrong:
Sitemap: /sitemap.xml
â
Correct:
Sitemap: https://example.com/sitemap.xml
Sitemap URLs must be absolute with full protocol and domain.
robots.txt Best Practices
DO: Keep It Simple
Start with basic rules and only add complexity when needed. Simple is better than complicated.
User-agent: *
Disallow: /admin/
Disallow: /private/
Sitemap: https://example.com/sitemap.xml DO: Test Before Deploying
Always test your robots.txt file with this tool or Google Search Console before making it live.
DO: Include Your Sitemap
Always add a Sitemap directive to help search engines discover all your pages.
Sitemap: https://example.com/sitemap.xml DO: Use Comments
Add comments (lines starting with #) to explain complex rules for future reference.
# Block all parameter URLs to prevent duplicate content
Disallow: /*? DON'T: Block Important Resources
Never block CSS, JavaScript, or images needed to render your pages properly.
DON'T: Use robots.txt for De-indexing
If pages are already indexed, use noindex meta tags or HTTP headers instead of robots.txt.
DON'T: Mix User-Agent Groups
Keep rules for each user-agent in separate groups. Don't mix directives between groups.
DON'T: List Sensitive URLs
Don't put sensitive URLs in robots.txt - the file is public and can be viewed by anyone.
Real-World robots.txt Examples
Small Business Website
User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Allow: /
Sitemap: https://business.com/sitemap.xml Simple and effective for most small business sites.
E-commerce Site
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /*?*sort=
Disallow: /*?*filter=
Allow: /
Sitemap: https://shop.com/sitemap.xml
Sitemap: https://shop.com/sitemap-products.xml Blocks checkout pages and parameter URLs to prevent duplicate content.
News Website
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
User-agent: Googlebot-News
Disallow: /archives/
Allow: /
Sitemap: https://news.com/sitemap.xml
Sitemap: https://news.com/news-sitemap.xml Separate rules for Google News crawler, multiple sitemaps.
Blog with WordPress
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-admin/admin-ajax.php
Disallow: /trackback/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /feed/
Disallow: /comments/
Disallow: */xmlrpc.php
Sitemap: https://blog.com/sitemap.xml Comprehensive WordPress setup blocking theme/plugin files.
Frequently Asked Questions
Will robots.txt remove pages from Google?
No. robots.txt only prevents crawlers from accessing pages, it doesn't remove already indexed pages. To remove indexed pages, use noindex meta tags or submit removal requests via Google Search Console. In fact, blocking an indexed page with robots.txt can prevent Google from seeing the noindex tag.
Do all search engines respect robots.txt?
Most legitimate search engines (Google, Bing, Yahoo, etc.) respect robots.txt. However, malicious bots, scrapers, and some email harvesters may ignore it. robots.txt is not a security mechanism - use proper authentication for truly private content.
Can I have multiple robots.txt files?
No. Each domain can have only ONE robots.txt file, and it must be located at the root directory (example.com/robots.txt). Subdirectories cannot have their own robots.txt files. For subdomains (blog.example.com), you can have a separate robots.txt.
What's the difference between Disallow and noindex?
Disallow (robots.txt): Prevents crawlers from accessing the page.
Doesn't guarantee de-indexing.
Noindex (meta tag): Tells search engines not to index the page.
Requires the page to be crawlable.
For already-indexed pages you want removed, use noindex, NOT robots.txt disallow.
Should I block my staging/development site?
Yes! Use robots.txt to block all crawlers on staging sites to prevent duplicate content issues:
User-agent: *
Disallow: /
Also use noindex meta tags and password protection for extra security.
How often should I update robots.txt?
Update robots.txt whenever you change your site structure, add new sections to block, or launch new features. After any change, test it with tools like this one or Google Search Console's robots.txt tester, then monitor crawl reports for any issues.
Can robots.txt affect my SEO rankings?
Yes, both positively and negatively. A properly configured robots.txt helps manage crawl budget and prevents duplicate content. However, blocking important pages or resources (like CSS/JS) can seriously harm your SEO. Always test changes carefully.
What if I don't have a robots.txt file?
If no robots.txt file exists, search engines will crawl your entire site. This is fine for most small sites. However, it's recommended to create one to explicitly allow crawling and point to your sitemap, even if you're not blocking anything.