How to exclude crawler traffic from Analytics
By Sean Needham
In some cases, the crawler traffic (also referred to as a robot, bot, spider) may be registered within your website analytics solution. This article explains how to exclude this traffic:
- Exclude traffic from Siteimprove Analytics using Data Exclusion Settings
- Exclude traffic from Siteimprove Analytics using filters
- Exclude traffic from Google Analytics
From January 2020 analytics traffic generated by the Siteimprove Crawler will be automatically removed from Siteimprove Analytics.
Data Exclusion Settings in Siteimprove
The easiest way to exclude unwanted crawler traffic from Siteimprove Analytics is to use Data Exclusion Settings. Data Exclusion settings are available for users with an Account Owner or Administrator role.
Note: Know crawler and bot data will be excluded from the time you enable Data exclusion settings (i.e. not retroactively). You will not be able to retrieve this data afterward. However, you can remove the exclusion if you want to start collecting the data again.
- Go to Analytics > Analytics Setting > Tracking > Data Exclusion Settings
Here you have the option to exclude crawler traffic from known crawlers and bots. The bots and crawlers excluded are defined using IAB Spiders & Bots list which is considered an industry standard.
You also have the option to exclude traffic based on specific IP groups using the Exclude specific IP groups dropdown. You can find out more about creating IP groups in the article, “How to create an IP group for filtering in Analytics”.
It is possible to exclude bots and crawlers, and/or specific IP group from all site (default) or specifically selected sites on your account. Select the "Specific Sites" tab under to configure exclusions for specific sites.
Exclude traffic from Siteimprove Analytics using filters
In Siteimprove Analytics, unwanted traffic can be identified and filtered using either an Organizational or IP filter. The following explains how to set up a filter in each case.
Filter Siteimprove traffic using an Organizational filter
- Go to Settings > Analytics > Filters and select “New filter”
- Give the filter a name and choose if others can use it
- Click “Add filter element” and then “Organization”
- Select to filter by “Visitors not from this organization”
- Enter a match for organizations where the name contains Siteimprove
- Click on “Add match condition”
- You can use the “Test Now” button to test the filter
- Click on “Create filter” to save
Filter Siteimprove traffic using an IP filter
Some users may prefer to filter traffic using an IP filter. To do so, follow the instructions in this article: “How to set up an IP Filter”.
Enabling the filter
Once created, you can enable the filter using the filter dropdown on analytics pages within the platform.
Filters can also be incorporated within reports by selecting the filter when scheduling the report.
Note: You can also use filters to remove spam from your statistics. For further information on identifying spam crawler traffic see: How to spot spam in Analytics and what to do about it.
Exclude Siteimprove traffic from Google Analytics
In order to exclude Siteimprove traffic from Google Analytics, you will need to create an account filter in Google Analytics and exclude the Siteimprove public-facing IPs addresses.
For further information see the following:
- What IP addresses and User agents are used by Siteimprove?
- Google Analytics article describing how to create an IP address filter.
Google filters will only be applied from the day it’s created.
If you have any questions, please contact Siteimprove Technical Support with your request.