SEO Glossary

robots.txt

File telling crawlers which URLs they can access.

The robots.txt file is a text file at the root of a domain (e.g., example.com/robots.txt) that gives crawlers instructions about which URLs they can and cannot access. It uses the Robots Exclusion Protocol standard.

robots.txt controls crawl access, not indexation. Disallowing a URL prevents Google from crawling it, but if the URL is linked to from other pages, Google may still index it based on those external signals — it just will not have read the page content.

Common legitimate uses of robots.txt include blocking /admin/, /staging/, /api/ endpoints, duplicate parameter-based URLs, and internal search results. A poorly configured robots.txt that blocks CSS or JavaScript files can prevent Google from rendering your pages correctly.

Example

TheProjectSEO's robots.txt intentionally allows AI training bots (no Disallow for GPTBot, ClaudeBot, etc.) as part of an AEO strategy. Most sites block these by default.

Apply this in practice

Definitions are step one.

Our team implements robots.txt correctly for clients across 15 active engagements. If you want a technical SEO audit that covers this and 100+ other checkpoints, talk to us.

Get a Free SEO Audit Browse All Terms