The Siteimprove SEO module notifies users about pages that are excluded by noindex/nofollow. This article is intended to explain the difference between noindex and nofollow metatags, when to use them and how these tags affect web indexing and Search Engine Result Pages (SERPs).
Both noindex and nofollow are a part of the Robot Exclusion Protocol (REP), the standard for controlling how webpages on your site are indexed. Let's take a look at some examples of noindex and nofollow and how they control access and indexing of your website by Google and other search engines.
What is noindex and when to use it?
Usually when Googlebot finds a page, it reads all the links on that page and then fetches those pages and indexes them. This is the basic process by which Googlebot "crawls" the web. This is useful as it allows Google to include all the pages on your site, as long as they are linked together. What if you do not want some pages on your site to appear in Google's index? This is where the noindex metatag is applied.
When you add a "noindex" metatag to a webpage, it tells the search engine that it cannot add the page onto its search index even though the search engine can crawl the page.
Articles in CNN’s Breaking News section may only appear for a few hours before being updated and moved to the Articles section. In this case CNN would want the full articles indexed, not the breaking news section with a short part of the full article.
So you could add a noindex tag on the articles, currently in the Breaking news section and remove the tag, once the article is not breaking news anymore.
To turn regular links into noindex links, add "noindex" to the HTML code:
<a href="http://www.example.com" rel="noindex">Link text</a>
What is nofollow and when to use it?
Nofollow is a HTML attribute that instructs most search engines to refrain from following a link and thereby transfer value to the page linked to. Some SEO experts interpret this as a way of telling search engines that you do not trust or cannot vouch for the content of the link being linked to. So in short, if you do want a search engine to index your web page in search, but you don't want it to follow the links on that page; add a nofollow tag to your page.
To turn regular links into nofollow links, add "nofollow" to the HTML code*:
<a href="http://www.example.com" rel="nofollow">Link text</a>
*You can manually add the code by hand, but many CMS will automatically insert it when needed. Talk to your webmaster for advice.
When users search in Google with news related phrases, CNN would like their article sections (with articles) to be on top in SERPs, because articles are CNN’s most valuable asset.
It would not make sense to have their Login section at the top.
So a way to tell Google that articles are more important that logging in, CNN would add the nofollow tag to their login link.
Note: The Siteimprove crawler does not consider "noindex" or "nofollow" when determining what content to crawl. We crawl based on crawl settings.