Adding and removing content from a crawl
By Guðrún Gústafsdóttir
Adding and removing content from a crawl is handled using aliases and exclusions. By adding aliases and exclusions, we can control how the Siteimprove crawler evaluates pages that match our entries. This article is intended to inform you what alias and exclusions are respectively and how to add them to your site. Note: When setting up exclusions and aliases only a partial match on the link is needed. A match of "/calendar/" will apply itself to all links containing "/calendar/".
Exclusions are commands that are used to tell the Siteimprove crawler to completely ignore a URL as if it never existed. URLs that match an exclusion will not be evaluated or included in your Siteimprove inventory in any way.
Excluded URLs will NOT:
- Be checked for HTTP response code
- Be checked for broken links, misspellings or accessibility issues
- Show in the site inventory
Reasons to exclude a URL:
- The URL (link) is being flagged as a false positive broken link and ignoring the link manually would be too cumbersome
- The resources used to check the URL(s) outweigh the benefits
- The URL (link) are not a priority when fixing issues on your website
Aliases are commands that are used to tell the Siteimprove crawler what is "internal" to your site and what is "external". Aliases are used to identify domains and/or subdomains that are different from your site's main domain name. These domains are then considered part of the main site, and are included in all checks that are performed on the site. When setting up an alias you have the option to include whether links that match your alias should be considered internal or external.
An internal page/link is something that we want to check for broken links, misspellings and accessibility issues. It is something you are responsible for and is considered a part of your site.
An external page/link that you want to make sure exists (is not a broken link), but you do not necessarily want to check the content within the page itself.
How to add an Alias or Exclusion on your site/s
- Select Settings from the feature selector
- Select Content from the side-bar menu
- Select Crawl Settings from the drop-down menu
- Select the site you want to add an exclusion/alias on
- Click Exclude or click Alias
- If you are setting up an Exclusion, type in the URL exclusion match and click Create exclusion
- If you are setting up an Alias, only a domain name is required. Typing in example.com automatically ensures that all subdomains are included; e.g. www.example.com, news.example.com, and any other subdomains that you may have. Conversely, if you identify a subdomain by typing in the alias news.example.com, only this subdomain will be included. In both cases, aliases are only crawled if a link exists between your main site and the domains/subdomains identified on the page. Indicate whether links/pages that match the alias will be considered internal or external. If you do not select the Crawl as external content box, content will be determined as internal.
- Click Create alias to finish
If you have any troubles or questions about adding or removing content from a crawl, please submit a Support ticket.