The Siteimprove Data Privacy module helps users identify personal identification numbers (e.g. Social Security numbers, CPR numbers) that have been detected on your website.
Go to the main menu, Data Privacy > Personal Data > Personal Data Types > Personal Identification Numbers.
The table details the personal identification (PID) numbers, the file location, number of occurrences, and the time since the number was first detected.
Note: If a number is found in a document that is embedded in another document then the original document found by the crawler will be listed in the table.
Considerations regarding PID number checks
It important to be aware of the following considerations regarding checking for PID numbers.
The following file formats are currently checked for PID numbers.
- Microsoft Office Formats (doc, docx, xls, xlsx, ppt, pptx)
Note: When a PDF file is created in a browser using "Print into PDF" instead of "Save as PDF", the data cannot be analyzed. Even though the PDF file appears as a simple document, the data is not represented as text, and therefore we cannot extract it for further analysis.
There are file size limits that depend on you crawler configuration. The default is that we will not index files larger than 12 Mb.
If a site or part of a site is excluded from a crawl, then those exclusions will also apply to checks for PID numbers. This also applies to site configurations that remove certain parts of HTML code from the pages of a site.
The PID check will be configured to check for PID numbers associated with the website’s location by default. It is determined by the following logic:
- The suffix of the URL is analyzed. If it ends, for example, with “.de”, the site will be set up to check German PIDs.
- If the suffix is generic, the site will be checked for PIDs of the country the account is associated with.
- If you’d like a site checked for personal identification numbers relating to another country, then please contact Siteimprove Technical Support with your request.
Finding PID numbers in embedded files (e.g. a document inside a document) is supported. The original document will be listed as the location of the PID number.
Tip: Locating PID numbers in embedded files can be tricky. An indication that a PID number is in an embedded file is when you cannot find the number in the original file listed. The embedded files are usually represented by a file icon within the document.
Violations between crawls
We will not be able to identify violations if a document with a PID number appears on the website and was removed in the period between two consecutive crawls.