Googlescraper (Search Engine Scraper)
Batch queries Google. Query the resonance of a particular term, or a series of terms, in a set of Websites.
Instructions
The Google Scraper has been deprecated and replaced by two new tools:
- The Search Engine Scraper, if you want to scrape and analyze overall search results for a given query or set of queries.
- The Lippmannian Device, if you want to analyze query results on a per-site basis.
Overview
The Googlescraper (also known as the
Lippmannian Device) queries Google and makes the results available for further analysis. In the top text box, place the
source set, in this case a list of URLs. In the bottom text box, place key words. Google will be asked if each keyword occurs in each URL. Results are displayed as a tag cloud and an html table. They also are written to a text file which you can access at the bottom or through previous results.
Harvester feature: In the top box, you may also place a combination of URLs and text, and the URLs will be fetched out of the text and queried for the key words placed in the bottom box. Detailed instructions of use and use cases are
available.
To merge divided scrapes from the same project together, use the
Lippmannian Merge tool.
Sample project
The Googlescraper can be used for a number of specific research projects, including
censorship research, and
source distance research. The most common use of the tool is researching the presence as well as the ranking of particular sources within Google engine results. A sample project is this tag cloud, which visually presents unique hosts from top 100 URLs returned from the query of "synthetic biology." The hosts are sized by occurences of "Venter" on each site. The method for this project: 1) Search google for "synthetic biology". 2) Paste top 100 results in the top box, and enter Venter in the bottom box.
The Ordering DeviceThe Engine Source Distance Research (the significance of source ranking in search engine returns) Researching the presence as well as the ranking of particular s...
The SpheresThe Spheres Spheres as way of thinking about the Web Thinking of the Web in terms of spheres refers initially to the name of one of the most well known, the bl...
Climate Change SkepticsIntroduction To what extent are climate change 'skeptics' present in the climate change spaces on the Web? The question is posed in order to gain insight into wh...
Dmi AboutThe Digital Methods Initiative About Us The Digital Methods Initiative (DMI) is one of Europe's leading Internet Studies research groups. Comprised of new media...
Dmi ProtocolsProtocols devised by the DMI This page is being replaced gradually by our new research protocols and methods page. Hyperlink Analysis * Perfom an issue craw...
Firefox Tool BarDMI Tools firefox extension The DMI toolbar is a Firefox extension that provides extra functionality to the DMI tools. Currently it provides off loading of HTTP r...
Issue Image AnalysisIssue Animals Research With climate change, animals become endangered. Global warming as well as global cooling threatens the habitat of species, as animals migr...
NofollowNofollow / Indexing Issues in the Blogosphere Introduction: Indexing and Ranking Search engine critiques generally focus on either the allocation of pages to be ...
Protocol Redistributed Content Discovery1) Derive issue related sites known to be blocked in a country. 2) Query list of sites for a controversial subject matter or name in Google Scraper. Retain teaser...
Protocol Surfer Rerouting1) Familiarize oneself with the content of a set of blocked Websites, e.g., women's issues sites blocked in China.2) Query Web for the key words or issue language...
Summer School 2007Digital Methods Summer School 2007: New Objects of Study 2010 2009 2008 2007 How does one do research online? What are the new objects of study, and how do ...
Test HomeDMI Tools Digital Methods Project Overview FAQ Tag Cloud Introduction The Digital Methods Initiative is a contribution to doing research into the "nati...
Tool Google Scraper FAQGoogle Scraper FAQ What does the Google Scraper actually do? The Google Scraper is a piece of software which allows one to batch query Google. It allows a user t...
Tool Harvester How ToInput text in the harvester to extract URLs. Tip: On a website, view source. Copy and paste source code into harvester in order to extract the URLs (or embedded l...
Tool Issue Network Cloud How ToEnter the URL of an Issuecrawler xml file. The xml source file URL looks like this: http://www.issuecrawler.net/files/inm_316224.xml The xml source file URL is lo...
Tool Lippmannian Device Sample ProjectThe Lippmannian Device can be used for a number of specific research projects, including censorship research, and source distance research. The most common use of...
Tool Search Engine ScraperMain.KoenMartens 05 Dec 2008