Aliases and exclusions can be used to further specify what domains, folders or pages should be included/excluded during your website crawl.
Changes related to aliases and exclusions can be configured under Settings > Content > Crawl settings.
- What is an exclusion?
- How do I add an exclusion on my site?
- What is an alias?
- How do I add an alias on my website?
What is an exclusion?
An exclusion is a method of specifying what pages should not be crawled using a URL match (e.g. an exclusion of /archive/ would let the crawler know to skip over any page with a URL containing "/archive/").
Matching pages will not be checked for broken links, misspellings, accessibility or SEO issues. They will not be included in the Site Inventory.
Reasons to add an exclusion include:
- The URLs (pages) are not a priority when fixing issues on your website, e.g. Archive.
- You have a duplicate section of your website already being checked.
Note: When setting up exclusions only a partial match on the link is needed. A match of "/archive/" will apply to all links and pages containing "/archive/".
How do I add an exclusion on my site?
- Select Settings > Content > Crawl Settings.
- Select the site for which you would like to add the exclusions.
- Click Exclude.
- Type in the URL of the exclusion match and click "Create exclusion".
- These settings changes will take effect after your next website crawl.
What is an alias?
An alias helps our crawler better determine what content is considered "internal" or "external" to your website using a URL match.
For example, an alias can be used specify whether pages on a subdomain should be included in your website crawl results (internal - will be checked) or factored out (external - will not be checked).
Reasons to add an alias include:
- You just got responsibility for a new subdomain (e.g. https://news.example.com) on your website domain (e.g. https://www.example.com) and you'd like it to be checked as part of the original site.
- You want to remove a section (e.g. /calendar/) from being crawled but you'd still like any links on your main site to that section to be identified as broken if found.
Internal content
An internal page is considered a part of your site and will be checked for broken links, misspellings, accessibility issues, etc. Content is treated as internal unless you select the "Crawl as external content" option when adding an alias.
Note: A link to the aliased domain must exist on the website for our crawler to index it. If the link is not available, then contact Technical Support who can add an 'extra index URL' to achieve the same purpose. For example, if you want https://myothersite.demosite.com to be considered part of your site https://demo.com then, in addition to adding an alias, you will need to have a link to https://myothersite.demosite.com on at least one page of https://demosite.com.
External content
External content is not considered part of your site and will not be checked for broken links, misspellings, accessibility issues, etc. Content is treated as external if you select the "Crawl as external content" option when adding an alias.
You would add an alias for external content if you want to make sure links to that content are not broken but you do not necessarily want to check the content on the pages itself.
For example, if https://www.demosite.com/calendar/ is added as an alias, with "Crawl as external content" selected, any link URL containing 'https://www.demosite.com/calendar/' will be checked to make sure it is available (i.e. not broken). However, the content and links on the page(s) associated with that URL will not be evaluated.
How do I add an alias on my website?
Note: When setting up an alias only a partial match on the link is needed. A match of "/calendar/" will apply to all links and pages containing "/calendar/".
- Select Settings > Content > Crawl Settings.
- Select the site for which you would like to add the Alias.
- Add the domain or URL match for the Alias you are adding.
- Select "Crawl as external content" if you are creating an external Alias. Do not select this option for an internal Alias.
- Click on "Create alias".
- These settings changes will take effect after your next website crawl.
Note: If you are setting up a domain alias, only a domain name is required. Typing in "example.com" automatically ensures that all subdomains are included; i.e. www.example.com, news.example.com, and any other subdomains that you may have. Conversely, if you identify a subdomain by typing in the alias news.example.com, only this subdomain will be included.
If you have any questions regarding this, please contact Siteimprove Technical Support and we'll be happy to help.