This article explains Siteimprove's exclusion and alias options. If you already understand exclusions/aliases but simply need to know how to add them see the following article: Adding and removing content from the crawl (exclusions/aliases)
Exclusions and aliases are how we configure the Siteimprove crawler to evaluate which links should be either excluded, considered an internal page, or considered an external page. This categorization determines how thoroughly we evaluate the link and its contents
Note: When setting up exclusions and aliases only a partial match on the link is needed.
For example, an exclusion with a match of "/calendar/" means all links/pages containing "/calendar/" will be excluded. The exclusion would remove pages like these:
Excluded links are completely ignored by the crawler. It's as if the link never existed.
Excluded links will NOT:
- Be checked for HTTP response code
- Be checked for broken links, misspellings or accessibility issues
- Show in the site inventory
Reasons to exclude a link:
- The link is being flagged as a false positive broken link and ignoring the link manually would be too cumbersome
- The resources used to check the link(s) outweighs the benefits
Aliasing is how we determine whether a link should be considered internal or external.
The link to the page, and content within the page will be checked for broken links, misspellings, and accessibility issues.
Only the link to an external page will be checked. We will not evaluate the content on these pages.
Default Alias Setup
By default, only the content within the folder that the crawler starts on will be considered internal. In the example URL below we are starting the crawl in the /news/ folder on the index.html file. Only content within the /news/ folder would be considered internal.
Example URL: http://www.example.com/news/index.html
Any page that contains http://www.example.com/news/ would be considered an internal page, examples below:
Any page that does not contain http://www.example.com/news/ would be considered an external page, examples below:
Internal links will be checked for all Quality Assurance (QA) and Accessibility issues.
Reason to setup an internal alias:
- You need to include a page to be checked for QA and Accessibility issues but it does not match the folder of the IndexURL
External links will only be checked for the HTTP response code.
Reason to setup an external alias:
- A URL matches the folder of the IndexURL however it is not required that we check for QA and Accessibility issues. We only need to ping it for the HTTP status code.