What is Robots.txt?
Robots.txt is a plain-text file placed in the root directory of a website that instructs search engine crawlers which pages or sections they are allowed or disallowed from crawling. It follows the Robots Exclusion Protocol and is the first file crawlers check before accessing a site. Robots.txt does not prevent pages from being indexed if they are linked to from other crawled pages.
Why it matters
A misconfigured robots.txt can accidentally block important pages from being crawled, leading to deindexing, or waste crawl budget by allowing bots to access low-value pages.