How to Create robots.txt: The No‑Fluff 2026 Guide
Illustration: robots.txt directives control search engine access — SMARTCHAINE dashboard ready.
You don’t need developer wizardry to create a robots.txt file. But one misplaced slash? That can block your entire site from Google. In this guide, I’ll walk you through exactly how to build, test, and deploy robots.txt for modern SEO — no fluff, just real-world tactics that work for AI Overviews and classic search alike.
What robots.txt actually does (and what it can’t fix)
The robots.txt file lives at the root of your domain (yoursite.com/robots.txt). It politely asks compliant crawlers like Googlebot to include or exclude specific pages or folders. But here’s the nuance: it’s a directive, not a firewall. If other sites link to a blocked page, Google might still index its URL (without content). And bad bots ignore the file entirely.
How to create robots.txt — practical walkthrough
Creating the file takes 3 minutes. Let’s do it the right way:
- Open a plain text editor (Notepad, VS Code, or even TextEdit in plain mode).
- Start with the default directive:
User-agent: *(applies to all respectful bots). - Add
Disallow:to block paths. Example:Disallow: /private/blocks the /private/ folder. - Optionally add
Allow:to override a disallow for sub-paths (more on that later). - Include your sitemap:
Sitemap: https://smartchaine.cloud/sitemap.xml— helps Google discover pages. - Save the file as
robots.txt(all lowercase, no .txt extension hidden). - Upload to your site’s root directory using FTP, cPanel, or your hosting file manager.
- Verify by visiting
https://yourdomain.com/robots.txtin your browser.
Robots.txt syntax: your cheat sheet
Get these rules right, and you’re 90% there. The rest is testing.
User-agent: Googlebot— target a specific crawler. Use * for all.Disallow: /admin/— blocks access to the /admin/ folder.Allow: /public/— allows a subfolder even if parent is disallowed.Sitemap: https://...— absolute URL to XML sitemap.# comment— add notes for your team.
User-agent: *
Disallow: /internal/
Disallow: /search-results?
Allow: /internal/landing-page/
Sitemap: https://smartchaine.cloud/sitemap_index.xml
🔍 This blocks the whole “/internal/” folder except a specific landing page, plus blocks all search result URLs with parameters — smart for crawl budget.
4 painful robots.txt mistakes (and how to avoid them)
I’ve audited over 200 websites. Here’s what routinely breaks:
- Disallow: / → blocks your entire site. One slash, disaster. Use “Disallow: ” (empty) to allow everything.
- Case sensitivity: “Disallow: /Images/” won’t block “/images/”. Bots are case-sensitive.
- Missing sitemap declaration — slows down discovery.
- Blocking CSS/JS files — Google can’t render your page properly, harming indexing.
Don’t guess: test your robots.txt with real tools
Before you break production, run these checks:
- Google Search Console → “robots.txt Tester” (still available in legacy tools and the new GSC report).
- curl command:
curl -I https://yourdomain.com/robots.txt→ ensure HTTP 200 response. - Mobile‑friendly test — reveals blocked resources indirectly.
Advanced tactics: wildcards, parameter blocking, and managing crawl budget
Google supports limited wildcards ($ and *). For instance, Disallow: /*?sort= blocks any URL with “?sort=” parameter. Great for e‑commerce filters. And if you have thousands of low-value pages, robots.txt is your first line of defense for crawl budget efficiency.
Disallow:
/tag/ Disallow:
/author/✅ Keeps Google focused on money content, not thin tag pages.
robots.txt: your most pressing questions
Will robots.txt remove pages from Google index?
No — it only blocks crawling. If a page is already indexed and you block it via robots.txt, Google may keep the URL in results but without a snippet. Use “noindex” meta tags for actual removal.
Can I use robots.txt to block AI bots like GPTBot?
Yes. Add User-agent: GPTBot to block OpenAI’s crawler. Many SEOs now manage multiple AI bots separately.
Disallow: /
How often does Google re-read robots.txt?
Typically within 24 hours, but can be longer. Use GSC to request a re-fetch after critical changes.
Stop guessing. Start crawling smarter.
SMARTCHAINE’s SEO audit suite automatically detects robots.txt errors, missing sitemaps, and crawl waste — plus gives you one‑click fixes.
No credit card required • 5‑min audit
🔗 More from SMARTCHAINE:
• SEO crawl budget optimization guide
• Free XML sitemap generator
• Google Indexing API integration
• Structured data for rich snippets
• Technical SEO checklist (2026)
📚 External references: Google robots.txt official spec • Google’s robots.txt parsing update