The Third Party Diary
Dutch summary available.
Lonneke van der Velden, August - September 2012
Introduction
In August 2012 the Government of the Netherlands (de Rijksoverheid) was criticized for not obeying the cookie-law, the law which obliges website owners to ask consent of Internet users when accessing their devices in order to collect or store data (
Telecommunicatiewet, Artikel 11.7a). The website rijksoverheid.nl and government.nl were placing cookies in the users browser. On the 9
th of August the government announced to disable all the cookies on these two websites and to further assess whether other websites needed to be adjusted as well.
[1]
This study provides an indication of the presence of third party elements (3pes) on websites run by the Government of the Netherlands. These 3pes are services that, as classified by the web detective Ghostery, could potentially deliver advertisements, analytics for website publishers, track user behaviour, provide some kind of page function, or disclose data practices involved in delivering an ad. Whether these websites need to ask for explicit consent by the user depends on the function of the specific 3pe. This project however does not focus on the legal aspects of the debate, but on the landscape of third party elements in relation to this specific set of websites.
[1]
http://www.rijksoverheid.nl/cookies
Research Question
Which third party elements are present on websites run by the Government of the Netherlands (Rijksoverheid)?
[1]
[1] The domain names listed as the website register of the government of the Netherlands are not all legally owned by the government. Yet they present this list as their responsibility. For more information see the Rijksoverheid's
website register.
Methodology
The government's website-register ('
Webregister Rijksoverheid') gives information about the status of 1110 websites that belong to the government of the Netherlands, as for instance their latest updates and to which ministerial department they belong to. In this research project I used the 'Tracker Tracker' tool to screen this url list for third party elements. The (beta) tool repurposes the tracker-detection plugin Ghostery. Ghostery detects tags, web bugs, pixels and beacons per website and gives the user an alert of their presence by a small visualisation in the browser. (For more information about the tool see the
Tracking the Trackers Workshop page.)
Because the term tracker can be confusing the term tracking cookie refers to cookies that allow for cross website analysis but Ghostery also detects software providing analytics for singular sites only I stick to the word third party element or 3pe. Ghostery uses the following classification:
- Advertiser (AD): a 3pe that delivers advertisements
- Analytics (AN): a 3pe that provides research/analytics for website publishers
- Tracker (T): a 3pe that exists only to track user behavior
- Widget (W): 3pes that provide some kind of page function (comment forms, "Like" buttons, ...)
- Privacy (P): 3pes that disclose data practices involved in delivering an ad
(Source: Ghostery's
http://www.knowyourelements.com)
Methodological steps:
- The total list of urls in the website register was inserted in the Tracking Tracker tool (in batches of 100).
- The triangulation tool was used to compare the tool's results with the original list, to distinguish websites with and without third party elements.
- Double mentionings were deleted from the tools results to determine the total list of domain names containing third party elements and the total amount of third party elements. (The Tracker Tracker output can list one 3pe twice per domain name when third party elements are detected in different patterns.)
- Because of many error notifications, the proces was repeated for both the domain names that gave an error notification and the ones that seemed to be 3pe-free, by adding or deleting www.
- Some of the errors were checkt manualy for typo's
- A random manual check was done for false positives.
- The study is repeated monthly. In September however, the active domain names in the webregister were extracted befor running the Tracker Tracker tool.
Summary of the Findings
August
Period of research: 9-11 August 2012.
Of the 1110 domain names in the website register 684 domain names (approximately 61%) have 840 third party elements implemented. In total, there are 39 services involved. On 426 domain names no third party elements were detected. Note that this does not mean that there are no web tracking technologies at al: Ghostery doesn't detect everything. 76 domain names did not connect to server.
Below you'll find a preview of 3pes per type. Note that many embassies use Google Analytics as well.
Follow the following link for the whole
list of 3pes sorted according to type.
Below is a visualisation of the relative presence of the 3pes. The size indicates the amount of 3pes; the color indicates the type.
Note: a few 3pes might have been excluded from this visualization due to a mistake when triangulating the lists. At least one domain name that seemed 3pe-free turned to have 3pes anyway. In addition, the Tracker Tracker tool seems to exclude some websites from the output without giving a notification of error.
Research project 2. Period of research: 21 August 2012 03 September
Of the 1110 domain names in the website register 696 domain names contain 3pes (856 in total). There are 38 services involved. On 414 domain names no third party elements were detected. This does not mean that there are no web tracking technologies at al: Ghostery does not detect everything. 45 domain names did not connect to server.
Again, Google Analytics is the biggest player. After comparing the results with the first project, it turned out that on 11 domain removed the 3pes from their websites. (See the more extended explanation/logbook below).
Follow the following links for an
overview of the 3pes per domain name ,
unique domain names that contain 3pes,
urls where no 3pes were detected, and a
comparison with the first project.
Two findings turned out to be false positives (2x Omniture).
Visualisation of the relative presence of the 3pes:
Below is a visualisation of the 3pes but scaled by corporate participation. The findings in this project support earlier findings (Hoofnagle et al. 2012) that within the landscape of web tracking technologies there is a concentration of a relatively small number of companies.
September
The Website Register was updated on the 17th of September. The total list is now 1088 websites. Of this list, 913 are active. The rest is not active or redirects to the page of a hosting company. Third party elements were detected on 658 of the 913 domain names. That is almost 72% of the active domainnames (and 60% of the whole Website Register). In total 803 third party elements were detected of 36 different sorts.
Below the gephi visualisation indicating how certain nodes are surrounded by clusters of websites, for instance the Webtrends cluster on the right bottom. It means that several websites use a Webtrends tracker.
Relevant files:
The Tracker Tracker tool was developed in a collaborative project during the Digital Methods Winter School 2012, "
Interfaces for the Cloud", and for this particular research project many people have helped me and provided me with comments. Special thanks to Erik Borra and Frederik Zuiderveen Borgesius helping me develop the project, Sabine Niederer for getting me started with the visualisations and Matthijs Koot for feedback to de url-lijst. In September, Matthijs Koot has conducted a similar research project for Madison Gurkha and has found similar findings, as outlined in his paper '
A Survey of Privacy & Security Decreasing Third-Party Content on Dutch Websites'.
After September, the results were kept in an online diary at
http://thirdpartydiary.net.
Logbook
Research project 1
Period of research: 9-11 August 2012.
Initially, results indicated that only 652 domain names contained third party elements; on these sites were 801 singular third party elements present. This was the total list after deletion of double mentionings that indicated all the different 'patterns' (including them would indicate 1529 3pe-detections). On 458 domain names no third party elements were detected.
As there were notifications of errors related to 224 domain names, 'www' was added to this set and analyzed with the Tracking Tracker tool. The initial list of 458 '3pe-free' domain names was triangulated with the output of the Tracking Tracker tool. In this set, 31 domain names did contain third party elements (36 in total, of 70 third party element detections).
This time there were notifications of errors as well (78). After a manual check, these url's turned out not to connect to the server. From these 78 domain names, at least 2 were not proper urls, containing typo's in the original website register. For instance, one of these domain names, www.dutchembassyukorg, should be www.dutchembassyuk.org. After a manual check of this website, three third party elements were detected by Ghostery.
Many other domain names that don't connect to a server were registered domain names (by whom was not always disclosed) and a few domain names turned out to be unregistered. For example, getijdetabellen.nl and getijvoorspellingen.nl were free to register.
[1] Because of time constraints, a further manual check of failed server connections was not pursued.
To sum up, at least 684 (652+31+1) domain names contained third party elements. On these 684 domain names 840 (301+ 36 +3) third party elements were detected. On 426 domain names no third party elements were detected, however, not all urls were checked on typo's. In total there were 39 services involved.
At a later stage, a mistake was noticed when triangulating the lists: a small batch of websites containing 3pe elements was included in the list of 3pe free websites. On that list, there is at least one more domain name that contained 3
rd party elements. Therefore, the visualization is not complete.
[1] The government's website register mentions that these domain names are in business case, that they are not yet tested, that their most recent status update was July 2012 and the date of expected implementation January 2013. A third very similar domain name in the website register, getijtafels.nl, redirects to visserslatijn.nl, which is an association for sport fishing.
Research project 2
The same method was repeated, but this time I started with adding www. to the whole list of domain names. The project was repeated for the 93 websites for which the Tracker Tracker tool could not resolve host, but this time without www. Of this set 45 couldnt not resolve host. This time only 449 domain names contained 3pes, including websites that did show them in the browser, which indicated a mistake was made by the tool or by me reading the error logs.
To solve this, the duplicate domain names in the positive output were removed in order to compare the list of domain names containing 3pes with the original website archive. The domain names left over the presumably 3pe-free ones were checked again without www.
The final results indicate that 696 domain names contain 3pes. (In total: 856 3pes). 38 services were detected. Also this time, the output of the Tracker Tracker tool turned out to be incomplete. For instance, depolitiezoekt.nl turns uses Google Analytics, but when checking Ghostery in the browser, Typekit by Adobe is present as well.
Comparison results project 1 and 2
After comparing this list with the previous one of 9-11 August it turns out that one service has disappeared: Kissinsights. And in fact, werkenvoordeoverheid.nl currently asks Internet users for their consent. Interestingly enough, Google Analytics is still shown by Ghostery in the browser. After giving consent Kissinsights is back according to the Ghostery plugin. A cookie called jsCookieCheck is installed which expires in 2040. There are also a few Google cookies (as for instance the _utma and the _utmz cookie) and two persistent Kissinsight cookies (ki_u and ki_t).
The website werkenvoornederland.nl also asks for consent. The website wbdo.nl is redirecting to werkenbijdeoverheid.nl, therefore it does not ask your consent if you have already given this to werkenbijdeoverheid.nl.
What remains unclear is why the Tracker Tracker tool does not detect Google Analytics on werkenbijdeoverheid.nl anymore and why it does pop up in Ghosterys visualization in the browser. Clearly the settings have been changed since the 9th of Agustus. Now, not giving your consent means not getting an _utma cookie. Visiting a few websites that do have Google Analytics implemented according the Tracker Tracker output results with cookies in the browser. For instance, databank.nl and 144redeendier.nl. This last website, 144 redeendier.nl, explains that visitor information is anonymously stored on Googles servers in the US.
Further comparison shows that the following domain names have removed the 3pes from their websites:
daarmaakjejesterkvoor.nl jaofnee.nl jongeren.minfin.nl nederlandveilig.nl pianoo.nl prinsjesdag2011.nl tuchtcollege-gezondheidszorg.nl wbdo.nl werkenbijdeoverheid.nl werkenvoornederland.nl zorgregister.nl
The following domain names do show 3pes in the second study that were not detected in the first study. It is unclear whether these 3pes are are only recently installed or whether the Tracker Tracker tool has accidently excluded them from its output before. ahn.nl
cfv.nl depolitiezoekt.nl dsta.nl fondsziekenhuisopleidingen.nl hetomvoorjou.nl hollandinvietnam.org inspectieloket.nl inspraakpunt.nl ioov.nl nbbe.nl nuclearforensics.eu oefeningbonfire.nl om.nl onderwijsinspectie.nl samenwerkenvoordejeugd.nl stagefondszorg.nl toetsingscommissiewwb.nl tracemer.nl uwkindenalcohol.nl vaccinatieregeling.nl vrominspectie.nl knmi.nl
Further Research
Use Gephi to analyse clusters of third party elements and domain names, in order to try to map the Rijksoverheid tracking ecology. See for instance the '
Track the Tracker' page and the work by
Anne Helmond and Carolin Gerlitz.
Redo the research project with the Tracker Tracker tools advanced settings and include sub-pages as well.
Redo the research project after the 24th of September, because
OPTA has contacted several websites about their cookie policy and urged them to respond before the 24th of September.
Helmond, Anne and Carolin Gerlitz (2011). Hit, Link, Like and Share. Organizing the social and the fabric of the web in a Like economy. Paper presented at the Digital Methods Winter School 2011 Conference at the University of Amsterdam, January 24-25, 2011.
[blog post] & [
pdf]
Hoofnagle et al.
Behavioral Advertising: The Offer You Cannot Refuse. 6 Harvard Law & Policy Review 273 (2012).
Zuiderveen Borgesius, F.J.
De nieuwe cookieregels: alwetende bedrijven en onwetende internetgebruikers? In: Privacy & Informatie (P&I); jaargang 14 : nr 1 ( 2011)
More information about the tool in the
Track the Trackers Workshop. See also the
Workshop slides:
Anne Helmonds project on
trackers used by political parties.
Helmond, A. & Gerlitz, C. - Reworking the fabric of the web: The Like economy from
network cultures on
Vimeo.
The
website register of the Government of the Netherlands:
The US '
Cookie Fine'