Data-Driven User Journalism: The Case of the Afghan War Diary
Team Members
Camilo Cristancho, Catalina Iorga, Matteo Cernison
Introduction
Research Question
Is there an alternative account of the
Afghan War Diary 2004 - 2010 documents released by
WikiLeaks, a "multi-jurisdictional public service designed to protect whistleblowers, journalists and activists who have sensitive materials to communicate to the public?" (1)
In other words, is the data hosted by
WikiLeaks used in different ways other than the mainstream media represented by
WikiLeaks' official partners?
Method
Collect all inlinks to
Afghan War Diary 2004 - 2010 document pages
- Observe which is the common root of all document URLs, namely 'http://wardiary.wikileaks.org/afg/event'
- Query Google by using the Google Scraper to obtain the first 1000 results which contain this common root as a textual component.
- Submit the 95 obtained webpages (alternatively considered as the top 100) to the Link Ripper in order to later get all outlinks to specific Afghan War Diary 2004 - 2010 document pages.
- Insert the Link Ripper output in the Harvester in order to alphabetize the obtained URLs and remove textual descriptions.
- Manually clean the output by again searching for the 'http://wardiary.wikileaks.org/afg/event' in an Excel file and produce a separate list of Afghan War Diary 2004 - 2010 document URLs.
- Analyze the list containing 179 non-unique results, select all document pages that receive at least two links (following the Issue Crawler logic) and create a file with the 'most mentioned' 17 warlogs, to be exact.
Preliminary Findings
As shown by the graphs in the attached presentation, content syndication was based on local interest. For example UK political blogger James Barlow was referring to British-related entries, not necessarily commenting on them, but rather listing a collection of links. Thus, the level of engagement with the actual data is very low given the entries' extremely technical language.
The Afghan War Diary documents were usually not directly referenced; blog entries and news stories relied heavily on the reports and databases put together by
WikiLeaks' official media partners, namely
Der Spiegel,
The Guardian and
The New York Times.
Issues and Limitations
The highly technical language of the war diaries (military terms and codes) made them difficult to analyze individually, meaning that the envisioned content-based search did not occur, especially given the limited resources and time span of this particular project.
Conclusion
Based on such a reserved linking practice, the future of data-driven user journalism looks bleak. The
Afghan War Diary 2004 - 2010 was a unique opportunity to deal with first-hand military information and to criticize crucial matters like the violation of human rights and unjust killings. If these documents are indeed discussed independently of linking or major media outlets, then this analysis is happening in the underground and it better come out for a true alternative account to emerge. The only beacon of hope in such a dark landscape, where only 17 documents are linked at least twice, is a blogger,
Peak of Elephants who astutely observes that most civilian shootings happened because of rebounds (2). One user on the entire Web who comments on the documents
and simultaneously links to them.
Further Research
Contents analysis is expected to be useful in order to follow syndication practices that lead into identifying non-hyperlinked networks. In other words, careful examination of how documents are discussed without being linked to could shed new light on the distribution and circulation of these highly controversial pieces of classified information. Special emphasis should be placed on the reusability of content in order to avoid problems such as the undecipherable technicality of the original Afghan War Diary.
Bibliography
(1)
http://wikileaks.org/wiki/WikiLeaks:About
(2)
http://peakofelephants.posterous.com/post/861912878