Skip to main content

How to add and remove content from a crawl (Aliases and Exclusions)

Guðrún Gústafsdóttir avatar
By Guðrún Gústafsdóttir

Adding and removing additional domains, folders or individual pages is handled using aliases and exclusions. By adding alias and exclusions we can control how the Siteimprove crawler evaluates pages that match our entries. This article is intended to inform you what alias and exclusions are respectively and how to add them to your site.

Note: When setting up exclusions and aliases only a partial match on the link is needed. A match of "/calendar/" will apply to all links containing "/calendar/".

Exclusions

Exclusions are commands that are used to tell the Siteimprove crawler to completely ignore a URL as if it never existed. URLs that match an exclusion will not be evaluated or included in your Siteimprove inventory in any way.

Excluded URLs will NOT:

  • Be checked for HTTP response code
  • Be checked for broken links, misspellings or accessibility issues
  • Show in the site inventory

Reasons to exclude a URL:

  • The URL (link) is being flagged as a false positive broken link and ignoring the link manually would be too cumbersome
  • The resources used to check the URL(s) outweigh the benefits
  • The URL (link) are not a priority when fixing issues on your website

Aliases

Aliases are commands that are used to tell the Siteimprove crawler what is "internal" to your site and what is "external". Aliases are used to identify domains and/or subdomains that are different from your site's main domain name. These domains are then considered part of the main site, and are included in all checks that are performed on the site. When setting up an alias you have the option to include whether links that match your alias should be considered internal or external. 

Internal Alias:

An internal page/link is something that we want to check for broken links, misspellings and accessibility issues. It is something you are responsible for and is considered a part of your site. 

External Alias:

An external page/link that you want to make sure exists (is not a broken link), but you do not necessarily want to check the content within the page itself. 

How to add an Alias or Exclusion on your site/s

  1. Select Settings from the main menu

  2. Select Content from the side-bar menu

  3. Select Crawl Settings from the sub-menu



  4. Click on the site that you would like to add Exclusions and/or Aliases on

    list_site.png

  5. Click Exclude or click Alias, depending on what you want to set up

    step_3-4_choose.png

  6. If you are setting up an Exclusion, type in the URL exclusion match and click Create exclusion

    step_5.png

  7. If you are setting up an Alias, only a domain name is required. Typing in example.com automatically ensures that all subdomains are included; e.g. www.example.com, news.example.com, and any other subdomains that you may have. Conversely, if you identify a subdomain by typing in the alias news.example.com, only this subdomain will be included. In both cases, aliases are only crawled if a link exists between your main site and the domains/subdomains identified on the page. Indicate whether links/pages that match the alias will be considered internal or external. If you do not select the Crawl as external content box, content will be determined as internal. 


  8. Click Create alias to finish 

    alias_creation.png

 

 

Was this article helpful?
4 out of 6 found this helpful