Locate GitHub and alternative code sharing platforms in data journalism spheres:
- What code sharing platforms are being used for journalism? Are there any alternatives or competing platforms to GitHub?
- What is the resonance of GitHub in the data journalism space? (Google Scraper)
- What role does GitHub have within this media ecology? (Minor in relation to datajournalism)
Document character of GitHub in online spheres:
- How is GitHub discussed in relation to journalism? (MIT Media Cloud / Twitter)
- What types of journalism practises are associated with it? (Twitter)
- What is being said about GitHub in relation datajournalism? (Twitter)
- What styles of collaboration and participation are discussed in relation to GitHub? (Twitter)
- What values/skills are attached to it? (LinkedIn)
What kinds of claims are made (situating GitHub in the journalism ecology)?
In order to locate and trace the resonance of GitHub and alternative code sharing platforms, various online spheres have been examined based on their relation to (data)journalism. First, we started with a qualitative analysis in various spheres by searching for the linkage between GitHub and the online journalism sphere manually and through scrapers. Looking into specialty publications such as Nieman Lab, Source, DataDrivenJournalism, Knight Journalism Lab, School of Data, Columbia Journalism Review and the Tow Center Blog, we found out that all of these rely on code and data sharing repositories from GitHub.
Subsequently we roamed through a list of events related to (data) journalism, which we found by querying Google for [data journalism event], [github event journalism] and [code journalism event]. Although we expected some introducing lectures on the usage of GitHub within data journalism, there was no notice or mention of GitHub within these events. Also, we examined the subreddit r/journalism on Reddit, which is the largest journalism-related subreddit, to see if there was any discussion on GitHub in relation to journalism. We pulled the most recent 1000 posts from there through the IO Magic API and queried for any domain, comment, or post text containing the text [github]. There were no mentions of GitHub in the entire dataset. We also ran queries for BitBucket, SourceForge, BeanStalkApp, Gitlab and CodePlex, but there were no mentions found as well.
In addition, we consulted the more ubiquitous and omnipresent social network of Facebook through the Netvizz tool by searching for Pages on data journalism, using queries [github], [data journalism] and [newsroom code] and see if there was any mention of GitHub, but this also led to no results. Finally, we explored the sphere of MOOCs and University courses to see if there was a referral to the usage of GitHub in (data) journalism courses. It turned out that the MOOCs only offer technical courses, such as different programming skills, but no GitHub tutorial. The university courses of Berkeley, Stanford and Georgetown on journalism did refer to GitHub.
After doing this exploratory round through different social, journalistic and tech-related spheres, we decided to move on by examining four established spheres, namely Google, Twitter, the news sphere of the MIT Media Cloud and LinkedIn. In order to find alternative platforms that might be competitive to GitHub, we ran a Google search in the Lippmannian debice based on queries related to code sharing platforms or GitHub alternatives:
We used the DMI Google Scraper for the first 100 search results for the following queries:
- Code sharing platform - query: "code platform" OR "code sharing platform" OR "coding platform".
- Data sharing platform - query: data sharing platform.
- Data journalism platform - query: data journalism platform.
- Open journalism - query: open journalism.
Then we filtered for GitHub in any text in the search results and identified six alternative platforms.
Subsequently we queried the six alternative platforms together with GitHub in the results for their resonance.
We downloaded the entire TCAT data journalism dataset.
We created an extract in Tableau and Excel to explore themes in conversational (discursive) GitHub -related activity.
We did a co-hashtag analysis on #GitHub and on [GitHub] AND #ddj (keyword for data driven journalism) through Gephi.
We queried LinkedIn groups with the following terms: 'github', 'data journalism', 'newsroom'.
We created a separate list with the LinkedIn groups.
We used the Google Scraper to look for profiles on GitHub and journalism.
50 profiles have been manually analysed by looking at the skills.
Github was only represented in 3 of the top 100 search results on Google in relation to the queries code platform, code sharing platform and coding platform.
Github was only represented in 1 of the top 100 search results on Google in relation to data sharing platform.
Github was represented in only 2 of the top 100 search results on Google in relation to data journalism platform.
Github was represented in only 2 of the top 100 search results on Google in relation to open journalism platform.
As can be seen in the results, GitHub is in the margins of discussion within the online data/open journalism spheres as well as in the code sharing platform spheres. In relation to the Google Trends graph it is interesting that GitHub is increasingly searched for by Google users, but it seems that it has not penetrated the journalism sphere to the extent that it is discussed as internal practice. Therefore, we turned to the news media sphere in order to find out to what extent GitHub has been discussed there.
The MIT Media Cloud results presented in figure 2 reflect the disparity in media coverage and the news agenda related to each topic. For GitHub, the media discussion has revolved around the features and technical aspects of the platform (repository, coding, fork, bug) and the data journalism sphere topics seem to relate to actors in the professional news realm (guardian, fivethiryeight, journalists), the products of data journalism (visualization, handbook, spreadsheet), and the practice of journalism. The crossover is a normalized (non-sample weighted) collection of words that are in both of the datasets (Query A and Query B). As you can see there is no clear relation between the two, as GitHub remains technical in appearance and journalism is not yet in discursively discussed as a technically influenced practice. Now that we have a grasp of the discursive level of the two topics separately within the news realm, we looked more into the discussion of GitHub in relation to datajournalism in the public sphere of Twitter.
Figure 3. Co-hashtag analysis on GitHub and #DDJ.
For the analysis on Twitter we consulted a dataset of over one million Tweets related to data journalism. When querying the #GitHub in that dataset there was a low resonance of merely [X] tweets containing that hashtag. However, when we entered the keyword GitHub without the hashtag, there were more results, namely [X] tweets. Based on the co-hashtag analysis performed with the query #GitHub we found out that the #ddj, which stands for data driven journalism is quite popular and central to a cluster with related practices. We therefore decided to query for [github] AND [#ddj] in order to see how GitHub is exactly being discussed in combination to the keyword for data journalism. As can be seen in figure 3, it becomes clear that there are different clusters around related practices, such as the programming language cluster, the educational cluster, the cartography/mapping cluster, a cluster of related stories and the data visualization cluster. It is interesting to see how these practices are clearly separated from each other with only a linkage between the data visualization cluster and the practice of mapping or cartography, which is a logical connection.
Figure 4. Twitter timeline on GitHub.
Next to the Gephi output, we created a timeline of the tweets related to [GitHub] from the data journalism collection through Google Tableau, as can be seen in figure 4. What becomes apparent from this timeline is that within the three year dataset, most of the Tweets on GitHub have been created in 2013, with a small decrease in 2014. The bigger dots in the graph represent the @mentions, so the direct conversations between Twitter users on GitHub, of which the appearance is not determined by a hashtag but could appear anywhere in the sentence. When zooming in on the bigger dots in 2013, it becomes clear that users are mostly discussing troubleshooting problems with their repositories on GitHub through Twitter, asking for solutions from friends. In 2014, there is a shift towards more promotional messages on GitHub when mentioned between users. The outstanding line of big green dots addresses the interactive Happy New Year story that Twitter produced through GitHub.
Based on the same three year tweet collection on data journalism, we also created the output for the top domains in [GitHub], as can be seen in figure 5. Figure 5 shows a strong pattern in outgoing links on Twitter, moving from github.com in 2012 and 2013 to github.io in 2013 and 2104, and moving strongly towards githut.info in 2015. There is also a small presence of twitter.com links. After locating the discursive language on GitHub in relation to data journalism through Twitter, we lastly wanted to know how GitHub is discussed in relation to (data) journalism within the professional sphere. We therefore moved towards LinkedIn, in order to get a grasp of the discourse by employees using GitHub as a benchmark in their profiles.
Figure 5. The aggregate count of outgoing URLs in the GitHub Twitter dataset.
Figure 6. Computing and coding skills on LinkedIn given by 50 profiles.
When querying LinkedIn for groups related to data journalism, we found 16 groups. Through a manual research and qualitative analysis, it became clear that there were no discussions on GitHub within these groups. Therefore, we moved to the personal profiles to see if we could define GitHub as a skill or core asset in relation to data journalism. The majority of the users linked to GitHub from their profile, to show their repositories as a technical portfolio. Next to that, GitHub is mentioned as a skill within the majority of the profiles. In order to understand better in what way people conceive GitHub as a skill, we further classified the related mentions in categories of the top 50 users. You can see in figure 5 that GitHub is mostly linked to a variety of programming languages, but also to certain software programs.
The Twitter data suggests that GitHub-related conversational activity on Twitter is largely based around topical data visualization discussion. Drilling down into the individual tweets, we found examples of Twitter being used to promote personal portfolios and GitHub projects; to alert other users directly about data journalism events (notification, event); and also to link to teaching and course projects (educational). Many of the tweets that were linking to GitHub often didnt have a hashtag, which suggests that much of the discursive activity on Twitter related to GitHub takes place outside of the realm of the hashtag.
Github, at least as a platform, is not well represented on Google related to related topical searches. Googles PageRank may not be ideal for measuring the resonance of GitHub. The internal structure of GitHub might not be ideal for tracing outside of the platform itself, meaning that there are few pages that are easily traceable using common digital methods tools (e.g. the Google Scraper). Git pages have few incoming/outgoing hyperlinks, and its users and repositories are connected, updated, merged, and forked inside the infrastructure of the platform.
When scraping Google with the DMI scraper tool, it appeared that some results are Dutch. When matching the scraper results CSV output (w/setting to any language, all regions, and set to .com) to a web browser query (with personalisation turned off and account signed out) from google.com and google.nl, the DMI scraper data matched exactly that of the Dutch Google (NL) web browser search. The scraper results werent international, but based on Google NL.
Within the professional sphere it seems that LinkedIn is used to show different coding or language skills. Sometimes GitHub is seen as a skill, but from the users that have been analysed, it is clear that LinkedIn is a way to serve their portfolio.
Further research The TCAT dataset could be explored further to see if any temporal patterns in conversation (specially, looking at tweets directly between users related to Github and/or tweets that link to github.com) exist, and if so, what the broader conversation about Github entails and how it has shifted over time.