Skip to main content

How does Siteimprove determine what is classified as internal and external in our inventory section?

Modified on: Wed, 27 Sep, 2023 at 3:17 AM

This article explains how Siteimprove's tools determine internal and external inventory, including pages, documents, media files, links, and other digital assets.

Internal

Internal content on your site in QA, Accessibility, SEO, and Policy is defined primarily by your "index URL".
It can also be affected by including or excluding content in the crawl settings.

For example, on the website with the index-URL https://siteimprove.com, without any additional Site Content Settings being configured for the site, the following links are considered internal: 

https://siteimprove.com/work/quality 
https://siteimprove.com/maps.pdf
https://siteimprove.com/files/offices.pdf

… but links not matching the domain - even those on subdomains - are not considered internal to the site.

For example, the following links are not internal to the site with the index URL https://siteimprove.com by default.

https://www.siteimprove.com/maps.pdf 
https://download.siteimprove.com/file.pdf
https://www.wikipedia.com/tell-me-something.pdf

In this case, if you did want links like the following

https://www.siteimprove.com/maps.pdf
https://download.siteimprove.com/file.pdf

to be considered internal, you'd need to either:

  • Add separate sites to your account so those subdomains are crawled OR 
  • Set up an inclusion or exclusion in the configuration for your original site.

An inclusion or exclusion dictates if links seen during a crawl should be regarded as internal or external. 

If you include or exclude for 

www.siteimprove.com 

and set up

download.siteimprove.com 

then the site will regard all links matching these URL elements as internal.

If a link is regarded as internal, Siteimprove will follow that link, crawl the content, and subsequently download, render, and store the content. This content will then be used for further analysis in the QA, Accessibility, SEO, and Policy products.

Note: We recommend that an  is set up to be as specific as possible to avoid a situation like the following.

If you configure Siteimprove to regard .pdf as internal using that URL element as an inclusion or exclusion, then we will consider any PDF link containing .pdf as internal, regardless if it is on your domain or on someone else's domain. So all of the PDFs listed below (including the Wikipedia PDF that you most likely do not want to check) would be considered internal to your site.

https://www.siteimprove.com/maps.pdf
https://download.siteimprove.com/file.pdf
https://www.wikipedia.com/tell-me-something.pdf

External

The link does not contain your domain name in the URL and a setting has not been configured to include them. 

For example, on the website siteimprove.com, the following links are considered external:

http://planning.com/work/quality
http://www.planning.com/maps.pdf
http://download.planning.com/file.pdf

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.