With the robots.txt file, site owners
have a simple way to control which parts of a website are accessible by crawlers.
To help site owners further express how search engines and web
crawlers can use their pages, the web standards group came
up with robots meta tags in 1996, just a few months after meta tags
were proposed for HTML (and anecdotally, also before Google
was founded). Later, X-Robots-Tag HTTP response headers were added.
These instructions are sent together with a URL, so crawlers can only take them into account
if they’re not disallowed from crawling the URL through the robots.txt file. Together, they
form the Robots Exclusion Protocol (REP).