Skip to main content

How do I know which pages have been added or removed with the latest crawl?

By Guðrún Unnur Gústafsdóttir

You have recently noticed an increase or decrease of pages/links on your site. What is happening? This article explains the Check history function within the Quality Assurance service, which can be utilized to detect a decrease or increase in page count and/or link count.

check_history.png

The Check History functionality within Quality Assurance provides information for each report that was sent at the end of a full crawl (typically every 5 days). The Check History overview provides information including report date and time, link count, the number of broken links, misspellings and potential misspellings, and page count. More detailed information can be accessed by clicking on a number in the page count and link count.

Note: More detailed information on page count and link count can only be accessed for the 5 most recent crawls, older crawl report information will become non-clickable. Below are definitions found within the Page count and Link count columns:

Page Count

Known pages: Pages that have been seen during the previous report crawl and that have also been seen in the this specific report crawl.

New pages: New pages seen during this specific report crawl, that were not seen in the previous report crawl.

Removed pages: Pages that were seen on the previous report crawl but not seen on the this specific report crawl.


Link Count

Known links: Links that Siteimprove crawlers have detected during the previous report crawl and that have also been seen in this specific report crawl.

New Links: New links detected during the specifically chosen report crawl.

Removed Links: Links that were on the previous report but are not detected on this particular crawl.

Check History can be useful to detect a decrease or increase in page count and/or link count. Often you will see that Siteimprove's crawler has picked up a new part of the website (for example, a subdomain, calendars, events list, suppliers list, planning applications, etc.) which can explain a large increase in page/link count. The Check History functionality also helps determine if any sections or pages of your website should be included or excluded from the crawl. 

COMMON REASONS FOR PAGE FLUCTUATION

THE PAGE NO LONGER EXISTS OR NEW PAGES WERE CREATED

The most obvious explanation for a change in page count is that pages were added or removed from your site. Review the Check History page to determine what pages were added or removed.

RECENT ALIASES OR EXCLUSIONS

Aliases and Exclusions are how we tell our crawler what content should be included as a page, and what should be considered "external" content. See the following articles for more explanation:

RECENT CHANGES TO ROBOTS.TXT DISALLOWS

By default, our crawler respects the robots.txt file placed on your root directory (http://www.example.com/robots.txt). The robots.txt file may contain disallow instructions telling our crawler to not follow sections of your site.

THE ENTRANCE PAGE TO A SECTION OF YOUR SITE WAS REMOVED OR INACCESSIBLE

You may have sections of your site with a large amount of content that is only accessible through a few entrance links. If our crawler doesn't finds the entrance link to this section, then we'll never find the pages contained within this section. Review the page that contains the entrance link to the section and determine if it still exists or is broken.

Was this article helpful?
1 out of 1 found this helpful