Newswork on Wikipedia
Team Members
Ida Raffaghello, Laura Kastalio, Laura Schäfer, & Sagar Hugar.
Introduction
Wikipedia describes itself as a “free online encyclopedia, created and edited by volunteers around the world”. It ranks among the top 5 global websites (Alexa), and its articles are meant to be as universal as possible. By having over 46 million articles that are accessible to 1.4 billion users each month (Barnett) in over 300 languages, Wikipedia is able to convince users of its encyclopedic value. According to Wikipedia contributors (2018), the articles are meant to always represent a neutral point of view, however, sourcing anchors and sourcing cultures may play an important role in what information is included in articles (Rogers, 2019).
In this research report we look at different Wikipedia language versions to discover how newswork evolves on Wikipedia over time and the evolution of how the presentation of news differ depending on the language version. Our aim is to discover whether and when events stop being news items and turn encyclopaedic.We focus on the European migrant crisis, as it was an event that involved many European countries. In 2015, over 1.2 million first time asylum seekers were registered in EU member states (Eurostat). The number was double as high as in 2014, and one third applied for asylum in Germany.
It is important to notice the difference between news items and information included in an encyclopaedia. News is new information (mostly about recent events) which have not previously been published. Encyclopedia items are referenced work which provide summaries of knowledge. For the reason that Wikipedia is constantly being edited by anonymous people with recent updates, the website could also be considered as news as an event might still be ongoing. Our expected findings are that Wikipedia articles will develop from national or local viewpoints to a more universal point of view. It is important to take into account who these Wikipedia articles may be indifferent to, as many of them could represent a national/local point of view, rather than the universal which often only presents one side of the story. Furthermore, we expect that Wikipedia pages will be news related (using news as a source) in the beginning of a certain event, but will become more encyclopaedic and factual with time.
Research Question(s)
European Migrant Crisis: How does newswork on Wikipedia evolve over time? To what extent does the evolution of news differ across Wikipedia language versions?
Methodology
In order to effectively pursue the research question(s), a tripartite methodology involving the reference lists, images, and linguistic terms appearing on the selected language versions of the Wikipedia article on European Migrant Crisis was designed. Considering the language expertise of all the members working on the project, 6 languages were chosen to study. Namely, English, German, Polish, Swedish, Danish, and Turkish. The reason behind choosing these languages was the simple fact that the countries these languages represent have been at the centre of the crisis and at the receiving end of a great number of migrants. Hence, analysing Wikipedia language versions of the same would make a great case to examine how and why Wikipedians edit their articles and if there are any similarities/differences in the trends across these languages.
Method 1: Reference List Analysis
At the bottom of every Wikipedia article lies a section that lists every source referred to on the page. In order to add to the credibility of the information shared, Wikipedia is built on a mechanism where every bit of content is linked to a source. These sources are mainly news articles, and at times academic blogs and websites of people, events, or organisations. In order to prove/disprove our hypothesis that said Wikipedia articles generally universalize by increasingly referring to global news media outlets as time proceeds, we broke down the European migrant crisis into 10 different time-points from 2015 to 2020. 2015 is the year when the crisis drew global attention and also the year when most of the selected language version Wikipedia articles were created. The 10 time-points were 10 major sub-events related to the migrant crisis that in our assumption might have influenced a great number of edits and discussions on the page. Further, a Wikipedia back-end research was conducted. Here, through the help of Wikipedia’s ‘edit history’ affordance, a time-travel was done back to the times when the selected events occurred and how the page looked at that particular point of time. Mainly, all the ‘references’ on the Wikipedia page (across languages) were collected right from the first time-point to the last. After having an exhaustive list of what articles or blogs these language version Wikipedia pages have referred to in the past years, all the references were divided into two categories, ‘national’ and ‘universal’. ‘National’ referred to the ‘references’ that a Wikipedia page made to its own countries’ top level domain. The same was done for all the selected languages and visualized using the open-source data visualization tool ‘Raw graphs’ developed by Density Design Lab.
Method 2: Image Analysis
Every language version Wikipedia page on the migrant crisis featured several images. Upon a brief glance, we get to know that a lot of Images, Graphs, and Maps are shared across languages. The first method of analysis was to create an ‘image network plot’ to know what kind of images are shared between languages and which are unique images that are specific to a selected language. Here, the current versions of the Wikipedia pages were being looked at. The second approach focused on looking at the same images and pages through the aforementioned 10 time-points and find out when and why images on a specific language version appear or disappear, and subsequently try to find out if the images in the due course universalize or take a national route.
The ‘Wikipedia Cross-lingual Image Analysis’ tool was used to obtain a side-by-side overview of all images in order to conduct a cross-lingual comparison. After picking up images for our selected language versions from here, the tool Gephi was used to visualize and create an image-network plot that depicted the most commonly occurring images across different languages in a closer proximity and the opposite for the images that were specific only to certain a language(s). This visualization helped us in plotting the issue of European migrant crisis through visual data.
For the second approach, the edit-history affordance of Wikipedia was made use of again. For all the selected 10 time-points, the past-version of the Wikipedia page was visited and the images at every particular time-point were noted. After making a list of all the images featured across the 10 time-points for all the selected language versions, a time-line GIF was made for every language using the tool adobe photoshop. This GIF features all the images from the first time-point to the last and if an image was deleted in the due course, it is marked with a pink coloured filter representing the time-point at which the image was deleted.
Method 3: Text Analysis
The evolution of the language used to describe the event presents another key element of our analysis. A preliminary assumption was made; that if the kind of vocabulary used by Wikipedians across languages has similar trends, it could be seen as an indicator of universalization. Hence, to find out, a list of 3 linguistic terms often used to refer the migrants was made. The terms were, ‘Migrant’, ‘Refugee’, and ‘Illegal Immigrant’. In an attempt to know how the usage of these three terms across the 10 selected time-points and different language versions evolved, the edit history of Wikipedia was examined. All the 10 time points were visited for every language version of the topic and the number of times these three terms occurred was counted. The total word-count of all the pages at every time-point too was noted in order to examine the relative proportional use of each term. The results were visualized for every language using the tool Raw Graphs built by Density Design Lab.
Findings
References:
Figure 1: References of Wikipedia language versions, national vs. universal
The reference section within Wikipedia portrays some relevant insight into the type of knowledge presented in the article. In examining how the distribution of national versus universal references changes over time, the cultural relevance within the individual countries becomes apparent. In examining Figure 1, a similarity in trends amongst German and Polish can be seen. For both languages, there is a shift over time from an earlier prevalence of universal references, to a later higher proportion of national references. For German, this shift can be seen in 2016, and for Polish around a year later, in 2017. One potential reason for these shifts could be a natural rise in the national interest and involvement in the subject. Furthermore, the presence and importance of national news outlets in the form of bigger national newspapers and a reliance thereupon could be another contributing factor. Another element that could play into this shift could be language dependency in Poland and Germany, especially in comparison to other more international, English infiltrated countries.
Swedish and Danish, present a joint trend in that they shift significantly towards a prevalence of universal references. From a cultural perspective, this similarity may not be surprising as both are Scandinavian countries with strong international influences. The Turkish Wikipedia page includes solely universal references. Despite the refugee crisis having a strong social, political and economic influence on Turkey as a country, no local sources are referenced on the Wikipedia page, potentially pointing to a strong cultural reliance on international news outlets. The English page does not undergo any apparent shifts and rather maintains a relatively constant divide of predominantly universal sources. In regards to the English page however, it is difficult to make cultural assumptions solely based upon the reference data, due to the fact that English does not refer to a specific country. Overall, the distribution of national and universal sources can allude to the national relevance of a topic and event but also point to larger lingual and cultural trends.
Image network:
Figure 2: Image Network of Wikipedia language versions
Images in an article are crucial as they can tell us what standpoint certain articles embrace. Figure 2 shows the universality of the image usage between the different European language versions from January 2020. Images that are closer to the middle of the network are shared more frequently in the different versions. The images that are further away from the middle of the network are shared less frequently, sometimes only in one language version. The larger an image is in the network, the more frequently it is shown which is also why the images closer to the middle are bigger in size.
By observing the network, we can see that the most shared images across the versions are maps that include migration patterns, number of migrants per country, and countries of origin. Photographed images that are displayed in the network most commonly show migrants on boats, shelter tents, fences and politicians. Other frequently occurring images display tables and graphs on the number of migrants coming into a specific country every month or the same figures for various countries.
The English version has the most unique images out of our chosen versions. An important reason for this may be that the English Wikipedia has the most articles out of all the languages written on Wikipedia. Most of these unique images are from different protests around the world such as; the anti-immigrant protests in Cologne from January 2016, migrants in a hunger strike in Budapest in September 2015, and ‘Volem acollir’ also known as the biggest pro-migrant demonstration in Europe. The German version includes 14 unique images, with 5 of the images featuring shelter tents. Looking at the Polish version, we can see that half of the unique images are featuring fences or borders. The Swedish, Danish, and Turkish images are more frequently presented across the pages. These articles have no unique images, as all of their images are being shared across multiple language versions. A reason for this could be that all three versions have more universal references, rather than national, which makes the probability that they include the same images higher.
Timeline/Edit History:
Figure 3: Edit history timeline of images from English language Wikiepdia
As mentioned earlier, a part of this research focussed on analysing the European migrant crisis by comparing and contrasting the visual content featuring on all language versions across 10 different time-points. The above Figure 3is a screenshot of the timeline GIF made for the English version of Wikipedia, the GIF helps us in understanding the journey of a particular image since the first time it appears on the Wikipedia page up to the last time-point. The screenshot depicts the last time-point i.e., 05-01-2020 for the English version. As we can see, there are a total of 53 images on the screenshot, out of which 15 are marked with a pink coloured tinge, which denotes that these images were not present on the last time-point.
It can be inferred that maps and graphs were among the most popular type of images. As seen in Figure 3, there are around 16 images solely on the English version that depict some form of mapping or graphed data. The below Figure 4is a scene dating back to June 15 (2015) that depicts Irish Naval Personnel rescuing Syrian and Iraqi migrants. This image is particularly important because it appears on all the languages at a certain time-point. Likewise, there are multiple images that are considered iconic and tend to appear on all the language versions. While this may seem like the images point towards the universalization of the Wikipedia article, the images don’t tend to stay for long. As seen in all other languages except English, as time progresses, many of these iconic images tend to disappear and nation-specific images start surfacing the page and thus can be inferred that the images point more towards the national interest of a particular language.
Figure 4 : Irish Naval Personnel rescuing Syrian and Iraqi migrants
Linguistic Narratives:
After a close reading of respective language versions of Wikipedia articles on the European migrant crisis, a shift in terminology in how the event and actors involved are being portrayed can be observed. The terms migrant, refugee and illegal immigrant were used frequently throughout the assorted language versions, arising the question why and in what way they were used to describe a particular occurrence. Hence, the use of the above-mentioned terms was analyzed over time by examining each language versions’ edit history.
Collectively, the most prominent events according to the group were chosen. Political events and decisions made by involved authorities of countries in which the researched languages are spoken were taken into account while determining the below listed time points (Table 1).
Time point
|
Event
|
T1: 02.09.2015
|
Death of Alan Kurdi.
|
T2: 05.09.2015
|
German Chancellor, Angela Merkel, announced the welcoming of refugees.
|
T3: 25.10.1015
|
Emergency summit in Brussels (Heads of 11 EU states and three non-EU states).
|
T4: 01.01. 2016
|
New Year's Eve sexual assaults in Germany.
|
T5: 02.03.2016
|
NATO's Supreme Commander in Europe accusing Russia and the Assad regime in Syria of working strategically against Europe.
|
T6: 03.11.2016
|
Libya migrant shipwrecks. Around 240 migrants were killed.
|
T7: 24.10.2017
|
The anti-immigration and right-wing populist party Alternative for Germany (AfD) entered the German Bundestag and simultaneously became the single largest opposition party.
|
T8: 08.04.2017
|
The Hungarian right-wing populist Fidesz party won a two-thirds supermajority in parliament, using a anti-immigration campaigning strategy.
|
T9: 08.07.2018
|
Italy's Interior Minister Matteo Salvini declared the rejection of foreign ships that have boarded refugees.
|
T10: 05.01.2020
|
The European Union records the lowest number of migrant arrivals in five years. |
Table 1: Timepoints European migrant crisis
Figure 5 displays the relative word occurrence of the selected terms and how the terminology has evolved from a language versions’ creation date until January 15th when the research was conducted.
Figure 5: Relative word occurrence visualisation
Results show a varied use of terminology. The English language version most notably uses the word migrant to describe actors involved in the European migrant crisis. By taking a closer look the term illegal immigrant was frequently used at the time the page was created. Within a few months, a shift in terminology can be observed. The term refugee is used more frequently and the denotation “illegal immigrant“ decreases. Furthermore, the Polish and Danish language versions feature the term illegal immigrant, repugnant to the remaining language versions which predominantly avail the term refugee. All language versions exhibit an individual development in their use of terminology. This can be seen as an indicator of how the narrative is distinctly discussed in each respective country.
Discussion & Conclusions
In looking at the various language versions of the Wikipedia article on the European migrant crisis over time, some overarching trends become apparent regarding the evolution from newswork to encyclopedic writing. The European migrant crisis is an exemplary event regarding the analysis of this type of evolution due to the fact that it predominantly happened several years ago, nonetheless with ramifications that remain influential across Europe today. Due to the factual and definitive nature of encyclopedic articles, the question therefore naturally arises whether consistent updates on Wikipedia in general point to a more news-like manner of informing. Consequently, it is of immense importance to focus on both linguistic and visual narratives in order to obtain a complete overview of stylistic changes. In regards to the European migrant crisis, our analysis has shed light on a change from news-like to more encyclopedic style of informing over time.
In terms of the evolution from a universal to a more local/national viewpoint within the articles an overall consistency across language versions from a universal to a more local focus over time became apparent. Especially noticeable when looking at the types of references cited in the articles over time a rising national perspective overarchingly occured. An evolution towards a more national focus is however, counterintuitive to Wikipedia’s emphasis on achieving neutrality and instead presents a more news-like style of writing. The english page is consistently an outlier to trends apparent in other language versions, pointing back to the idea of the English page acting as the universal ‘base’ page of which other language versions often stem and become derived from. This emphasis on universality is additionally a feature that clearly distinguishes (especially the English) Wikipedia from traditional news outlets, which tend to report on ongoing events with precise updates, predominantly from a more locally centered perspective. Wikipedia in a sense is a revolutionary form of news reporting lying between news and encyclopedic writing. In contrast to traditional Encyclopedia, Wikipedia allows for continuous editing, and a consequential shift in narrative style over time.
References
Barnett, David. “Can we Trust Wikipedia? 1.4 Billion People can’t be Wrong.”
Independent. 18 February 2018. <
https://www.independent.co.uk/news/long_reads/wikipedia-explained-what-is-it-trustworthy-how-work-wikimedia-2030-a8213446.html >
European Commission. Eurostat.
Record Number of Over 1.2 Million First Time Asylum Seekers Registered in 2015. Luxembourg: Eurostat Press Office, 2016.
Rogers, Richard. Doing Digital Methods. Sage Publications Ltd, 2019.