The Siteimprove Suite is primarily used on publicly available websites but can under specific conditions also be utilized on internal and non-public websites such as Intranets, pre-production, and staging websites.
Siteimprove uses web-based crawlers to index and check your websites for errors.
Requirements to check sites behind a login
In order to use the Siteimprove Content Suite on your internal websites, you will have to agree to certain non-standard provisions in your agreement with Siteimprove dictating specific terms for using the services on a non-public website.
In addition to the above you must also meet the below requirements:
Access to the Website via the Internet
The website must be available over the internet, i.e. not only available on your internal network.
This requires one of the following:
- A subdomain pointing to the websites public IP-address
- A hostname paired with a public IP-address
- A public IP-address that leads directly to the website
Login Credentials for Password Protected Websites
Login credentials are required for password-protected websites. Please make sure that the user created for Siteimprove does NOT expire or use a password renewal policy. The user should NOT have access to modify or delete content on the website/intranet.
Supported Authentication methods
We support the following authentication methods:
- Basic Authentication - Please supply Username, Password, Domain, and Realm
- Windows Authentication - Please supply the username, Password and login domain. Some types of Windows Authentication are not compatible with our Perl-based crawlers.
- Token-based Login - We will send a GET request supplying an agreed upon token that will authenticate our crawlers for the session. Please supply the authentication URL.
- Form Based POST Request - We support a variety of POST login methods with pre-fetching of dynamic server-side generated variables. Please supply Username, Password and Login form URL.
- Sites that require Single sign-on (SSO) authentication
- Sites where the negotiation to establish a session is dynamic
The code will mimic a sequence of actions that closely resembles the interactions performed by a user to establish an authenticated session.
The authentication proxy steps through the HTTP interactions and records the cookies set in the negotiation, then for each subsequent request to the site, the relevant cookies are added to the request.
There’s no way of knowing if it is possible to crawl a site behind a login until we have tested the process. We are currently aware of the following constraints:
- We cannot crawl sites that require a Citrix/VPN based login.
- We cannot crawl sites that use 2-factor authentication.
- Sites that are protected by a non-standard (i.e. custom-made) security method require extra time to configure test. In a number of cases, it is not possible to crawl these sites.
- In some cases you may need to allow (white-list) our crawlers IP addresses. The default IP address used by our crawler: 184.108.40.206.
Each login scenario is different and due to the complexities and security restrictions involved, configuring a login can take days, or weeks if it needs to be escalated to our development team.
In some cases, it may not be possible to do a full Accessibility check behind a login.
If you would like to crawl a site that is not publicly available, behind a firewall or requires a login/authentication, then please submit a Support ticket to request assistance adding the site to your subscription.