Data collection at Siteimprove
Siteimprove applies different methods to retrieve page data from customer websites. Most commonly we use full crawls recurring every five days to check customer websites. Typically, those crawls either start on the homepage or on the XML-sitemap.
Another way for us to identify pages to crawl is via Single Page Checks where single URLs are entered manually by the customer or automatically through one of our integrations (Ads, Marketing Automation Integrations, CMS plugins).
The Single Page Check tables in Siteimprove Quality Assurance and Siteimprove Accessibility show the pages that have been inserted as single URLs to be crawled. After they have been crawled, they will be checked for errors and issues.
Historically, pages that have been inserted into the Single Page Check table have remained in that table until they are removed manually. Not removing those pages manually has in some cases led to Siteimprove checking URLs to pages that were no longer published on the customer's website and reporting potential errors.
What will change?
To improve the data quality of Single Page Checks we are implementing a clean-up process. If a checked page returns an unsuccessful HTTP status (a status code in the 400s or 500s), the page will no longer be included in the page listings (Quality Assurance > Inventory > Pages) and it will no longer contribute to the content results in the Siteimprove platform. If links occur to that page from other pages on the website, these links will be reported as broken.
These page URLs will remain in the Single Page Check tables for a minimum of 30 days.
How does the clean-up impact Siteimprove customers?
The clean-up of Single Page Checks can lead to a lower amount of pages presented in Siteimprove, because "Single page check" pages returning an unsuccessful HTTP status won't be included in the page listings anymore. The page count can differ from previous page counts, and customers might see a difference in errors and issues which in turn can lead to changes in DCI Scores.
When will the change happen?
We will start the clean-up process on November 13, 2019. Customers might see changes in page counts and DCI Scores following their next crawl on or after November 13, 2019.
Please contact Siteimprove Technical support if you have any questions regarding this.