The search engine giant is officially going to stop obeying robots.txt noindex directive starting 1 Sep 2019. Publishers still using robots.txt noindex directive will be required to use an alternative to stop Google from crawling or indexing their pages. According to the company, Google never documented unsupported robots.txt rules such as no follow, crawl-delay, and no index. And since Google Bots used to support it unofficially in the past, it will not be the case hereafter.
Google announced to withdraw robots.txt directive support on 2 July 2019, through a post on Google Webmasters Blog. The company confirmed that robtos.txt noindex was never officially supported by Google and starting today, the crawlers will also stop supporting it. Google shared the blog through twitter bidding its goodbyes to the unsupported robots.txt rules.
Alternate Ways From Google to Control Indexing
Google published 5 alternative methods to control crawling on its official blog:
- Noindex in robots meta tags: While robots.txt no index is no more supported, meta tags with no index directive in either HTML header codes or HTTP response headers have become the most effective way to stop URLs from indexing.
- 404 and 410 HTTP status codes: These status codes inform the search engines that the page on respective URL does not exist. Google automatically drops these URLs after crawling them once.
- Password protection: If a page is hiding behind a login, Google will drop it from the index unless the password protection indicates either paywalled content or subscription.
- Disallow in robots.txt: If the search engines don’t know about a page because they were blocked from being crawled, it implies that their content won’t be indexed. Search engines may index a URL if other pages have links to it but Google aims to make pages less visible if the crawlers cannot see the content.
- Search Console Remove URL tool: The tool is widely known for temporarily removing a URL from Google’s search results.
Make sure you are not using robots.txt no index on any pages. If you are, we recommend you to immediately use one of the above-mentioned methods to avoid your pages from indexing.