You are here: Foswiki>Dmi Web>DmiProtocols (15 Jan 2010, ErikBorra)Edit Attach

Protocols devised by the DMI

This page is being replaced gradually by our new research protocols and methods page.

  • Perfom an issue crawl to demarcate an issue network, a social network, an establishment network, or an event network. The result will be an Issue Crawler network map with detailed network data, which can be analysed with various tools. More info on how to do an issue crawl can be found on Issue Crawler's scenario's of use. To collect starting points (seed URLs) for a crawl you can use linkRipper to gather URLs from a page. Insert these URLs into the harvester of the Issue Crawler.
  • You can schedule the issue crawl and compare the networks over time. Which sites are rising in importance, which declining? Use this tool and receive ranked actor lists over time. Tip: use 'by site'.

Issue Networks

  • To query the issue network actors for substance, get all the urls from an issuecrawler xml file, copy and paste the URLs from the network results into the Google scraper. Query that set of sites to see which organizations work in which sub- issue areas or use particular language (by querying the URLs for key words). This can all be done in one step by pasting the network id here. You can also show the frequency of hosts per issue in a Tag Cloud.
  • Compare the network rank with Google's rank through the Actor Profiler. This script will get the top 10 network nodes (by indegree) and query those in Google for a specific issue. The Google pagerank and description will be visualized in an svg, along with the in- and outlinks of the actor from the network.
  • See if surfers actually follow the links from one site to the other with our Surfer Issue Pathways Tool. Building upon Alexa's related sites feature, this tool determines which sites are likely to be in the actual surfer paths of other sites related to the same issue.
  • Perform image analysis of an issue network. Which images are associated with the issues, according to the network actors? Use GoogleImages to query a set of sites appearing in an issue network for images. Query that set of sites to see which organizations display which images for a particular sub-issue area or particular language. Use imagesDeep to fetch the images from a single URL.
  • Show on a geographical map where organizations in an issue network are based. The IssueGeographer takes results, scrapes a service, and plots the sites' registration address (lat/long of the city) to geographical map. Seet this movie for an explanation on how to use the Issue Crawler and the Issue Geographer.
  • Perform in-depth social network analysis with UCINET. Use the UCINET datafile from the Issue Crawler crawl details page.

Blog Analysis

  • Find high, medium, and low authorative starting points for an issue with the Technorati scraper and Charts of the (Relative) Actor Resonance Per Issue. Use these starting points as input for co-link analysis with the Issue Crawler.
    E.g. Bruns' article on using the Issue Crawler and Technorati in Methodologies for Mapping the Political Blogosphere (2007).
  • (Relative) Actor Resonance Per Issue in the Blogosphere. Building upon Technorati, the tool shows a ratio of all issue postings to an organization's association with the issue postings.
  • Charts of the (Relative) Actor Resonance Per Issue in the Blogosphere. This tool charts the amount of issue postings and the ratio of issue postings to an organization's association with the issue postings.

Google News (Image) Analysis (you'll need a special account for this)

  • Query a particular country or language for it's news
  • Compare the discourse accross countries
  • Compare news images accross countries
  • Compare images by media ownership

Censorship Research

  • Redistributed Content Discovery
    • Scrape Google (international) for an issue that is suspected to be censored. Get unique phrases from the Google descriptions, and query the individual phrases in Google again. Perform a geoip and whois lookup of the sites to see who authors the sites and where the sites are hosted. In addition you can check if the sites are known to be blocked, or submit them to be checked, by the Open Net Initiative.
      E.g. A chapter by Richard Rogers in the forthcoming book by Jussi Parikka, Tony Sampson (eds.), 'The Spam Book: On Viruses, Spam, and Other Anomalies from the Dark Side of Digital Culture'
    • Scrape Google for a particular issue. Split the results in a list of blocked and non-blocked sites by providing a list of known blocked sites.
  • Use proxies to surf sites from within other countries (or view connection stats only).
  • Url discovery through hyperlink sampling with the Issue Crawler
    E.g. "A Censor's Network: Iranian Social, Political and Religious Sites. A Hyperlink Analysis Method for Censored Website Discovery" (, December 2006) pdf
  • See the section on Search Enginge Behavior for more censorship research in relation to search engines

Search Engine Behavior

  • Information retrieval is normally not considered dramatic. On the Web, however, information sources are in constant competition with each other to be returned in the top ten for any given query. The competition is particularly fierce for products and services. The quest to reach the top often prompts companies to enlist the black arts services of search engine optimizers. Use the Issue Dramaturg to see the rise and fall of a site's Google rank for an issue.
  • Use the Page Rank script to get a site's current Google Rank for an issue.
  • Query Google for its results on a particular query. See the section on Geopgraphical Analysis to find out where the machines and the owner of the webpages are based.

Geographical Analysis / (De-/Re-)Territorialization of the web

  • The Issue Geographer shows on a geographical map where organizations in an issue network are based by querying whois databases and looking up the country in which the IP-address is based.
  • Geo-ip. This script looks up the country in which the machine, identified by an IP-address, is based.
  • Whois. This script looks up the country in which the domain owner (registrar) is based.

Exclusion Policies

  • Explore robot exclusion policies. Sites may block robots, and thus prevent search engines' and other crawlers from indexing or scraping their sites for archiving or further analysis. Enter a URL and see which parts of the site are blocked from indexing.


  • Organization Tag Cloud Generator Per Issue. Building upon, this tool shows, in a tag cloud, which URLs or tags are referred to per issue area.
  • Organization Tag Cloud Generator Per Site. Building upon, this tool shows, in a tag cloud, which tags are referred to per site. Also the number of users who bookmarked the site is displayed.
  • tag and save history for a url. Building upon this tool discovers how a url was tagged and at what time which tags were used.

create new tag
, view all tags
Topic revision: r8 - 15 Jan 2010, ErikBorra
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback