How do I know which pages have been added or removed with the latest crawl? (Check History)
By Guðrún Gústafsdóttir
You have recently noticed an increase or decrease of pages/links on your site. What is happening? This article explains the Check history function within the Quality Assurance module, which can be utilized to detect a decrease or increase in page count and/or link count.
The Check History functionality within Quality Assurance provides information for each report that was sent at the end of a full crawl (typically every 5 days). The Check History overview provides information including report date and time, link count, the number of broken links, misspellings, and potential misspellings, and page count. More detailed information can be accessed by clicking on a number in the page count and link count.
Note: More detailed information on page count and link count can only be accessed for the five most recent crawls, older crawl report information will become non-clickable.
Below are definitions found within the Page count and Link count columns:
Known pages: Pages that have been seen during the previous report crawl and that have also been seen in this specific report crawl.
New pages: New pages seen during this specific report crawl that were not seen in the previous report crawl.
Removed pages: Pages that were seen on the previous report crawl but not seen on this specific report crawl.
Known links: Links that Siteimprove crawlers have detected during the previous report crawl and that have also been seen in this specific report crawl.
New Links: New links detected during the specifically chosen report crawl.
Removed Links: Links that were on the previous report but are not detected on this particular crawl.
Check History can be useful to detect a decrease or increase in page count and/or link count. Often you will see that Siteimprove's crawler has picked up a new part of the website (for example, a subdomain, calendars, events list, suppliers list, planning applications, etc.) which can explain a large increase in page/link count. The Check History functionality also helps determine if any sections or pages of your website should be included or excluded from the crawl.
Common Reasons For Page Count Fluctuation
The Page No Longer Exists or New Pages Were Created
A change in page count can simply be the result of new pages were added or removed from your site. Review the Check History page to determine what pages were added or removed.
The Entrance (or Landing) Page To A Section Of Your Site Was Removed or is Inaccessible
You may have sections of your site with a large amount of content that is only accessible through a few entrance links. If our crawler doesn't find the entrance link to this section, then we'll never find the pages contained within this section. Review the page that contains the entrance link to the section and determines if it still exists or is broken. It is not possible for our crawler to find orphan pages. An orphan page is the one that is not linked to by another page on the site.
Recent Addition or Removal of Aliases and Exclusions
Aliases and Exclusions are how we tell our crawler what content should be included as a page, and what should be considered "external" content. See the following articles for more explanation:
Recent Changes To Your Site's robots.txt Disallows
By default, our crawler respects the robots.txt file placed on your root directory (https://www.example.com/robots.txt). The robots.txt file may contain disallow instructions telling our crawler to not follow sections of your site.