How to Create robots.txt: The No‑Fluff 2026 Guide

You don’t need developer wizardry to create a robots.txt file. But one misplaced slash? That can block your entire site from Google. In this guide, I’ll walk you through exactly how to build, test, and deploy robots.txt for modern SEO — no fluff, just real-world tactics that work for AI Overviews and classic search alike.

What robots.txt actually does (and what it can’t fix)

The robots.txt file lives at the root of your domain (yoursite.com/robots.txt). It politely asks compliant crawlers like Googlebot to include or exclude specific pages or folders. But here’s the nuance: it’s a directive, not a firewall. If other sites link to a blocked page, Google might still index its URL (without content). And bad bots ignore the file entirely.

Quick summary: robots.txt = crawl control, not indexation. Use it to manage server load, hide staging areas, or block useless pages (like search results). For sensitive data, use authentication or noindex headers.

How to create robots.txt — practical walkthrough

Creating the file takes 3 minutes. Let’s do it the right way:

🛠️ Step‑by‑step (works on any site)
  1. Open a plain text editor (Notepad, VS Code, or even TextEdit in plain mode).
  2. Start with the default directive: User-agent: * (applies to all respectful bots).
  3. Add Disallow: to block paths. Example: Disallow: /private/ blocks the /private/ folder.
  4. Optionally add Allow: to override a disallow for sub-paths (more on that later).
  5. Include your sitemap: Sitemap: https://smartchaine.cloud/sitemap.xml — helps Google discover pages.
  6. Save the file as robots.txt (all lowercase, no .txt extension hidden).
  7. Upload to your site’s root directory using FTP, cPanel, or your hosting file manager.
  8. Verify by visiting https://yourdomain.com/robots.txt in your browser.
💡 Expert note: WordPress? Use Yoast SEO or RankMath — they generate robots.txt dynamically. But I still recommend manual control for custom setups. And never block your CSS/JS files accidentally — that can break Google’s rendering.

Robots.txt syntax: your cheat sheet

Get these rules right, and you’re 90% there. The rest is testing.

📝 Real example (clean and safe):
User-agent: *
Disallow: /internal/
Disallow: /search-results?
Allow: /internal/landing-page/
Sitemap: https://smartchaine.cloud/sitemap_index.xml

🔍 This blocks the whole “/internal/” folder except a specific landing page, plus blocks all search result URLs with parameters — smart for crawl budget.

4 painful robots.txt mistakes (and how to avoid them)

I’ve audited over 200 websites. Here’s what routinely breaks:

🧠 My take: Most people don’t need a complex robots.txt. Keep it lean. Block only what wastes crawl budget (e.g., faceted navigation, user carts, staging copies).

Don’t guess: test your robots.txt with real tools

Before you break production, run these checks:

⚡ Pro tip: After updating robots.txt, wait up to 24h for Google to refetch. Use URL Inspection tool to request recrawl of your homepage.

Advanced tactics: wildcards, parameter blocking, and managing crawl budget

Google supports limited wildcards ($ and *). For instance, Disallow: /*?sort= blocks any URL with “?sort=” parameter. Great for e‑commerce filters. And if you have thousands of low-value pages, robots.txt is your first line of defense for crawl budget efficiency.

🧩 Use case: large blog with tag archives
Disallow: /tag/
Disallow: /author/
✅ Keeps Google focused on money content, not thin tag pages.

robots.txt: your most pressing questions

Will robots.txt remove pages from Google index?

No — it only blocks crawling. If a page is already indexed and you block it via robots.txt, Google may keep the URL in results but without a snippet. Use “noindex” meta tags for actual removal.

Can I use robots.txt to block AI bots like GPTBot?

Yes. Add User-agent: GPTBot
Disallow: /
to block OpenAI’s crawler. Many SEOs now manage multiple AI bots separately.

How often does Google re-read robots.txt?

Typically within 24 hours, but can be longer. Use GSC to request a re-fetch after critical changes.

Stop guessing. Start crawling smarter.

SMARTCHAINE’s SEO audit suite automatically detects robots.txt errors, missing sitemaps, and crawl waste — plus gives you one‑click fixes.

No credit card required • 5‑min audit

AR
Alex Rivera
Senior Technical SEO Strategist @ SMARTCHAINE
10+ years crawling the web, former Google Search quality contributor. Alex helps SaaS and enterprise teams turn crawl insights into organic growth.
🐦 @alex_seo  | 💼 linkedin/in/alexrivera

🔗 More from SMARTCHAINE:
SEO crawl budget optimization guide
Free XML sitemap generator
Google Indexing API integration
Structured data for rich snippets
Technical SEO checklist (2026)

📚 External references: Google robots.txt official specGoogle’s robots.txt parsing update

```