This page is a work-log and should be updated into a proper research page. Slides presenting this work can be found at the Digital Methods Blog
, starting at slide 101.
Mathieu Jacomy, Matthieu Renault, ErikBorra
How does related search work? Is it useful for query design and research? If so, how is it useful?
In 2009 Google launched the upgraded related searches feature, and currently there has been little research into this promising query design sphere. According to Google, related search was invented to 'refine' queries. Research on query logs has shown that users generally search using one or two words. With only one or two words, it is often difficult to produce a wide variation of queried results. For this reason, the information space for a specific query eventually tends to get clogged, meaning there is no room for new results in the top results. Our work looks critically at the related search feature as a possible place for digital methods research. In this project we examined the algorithmic framework of the related search, and then we utilize an issue space analysis case study.
Although Google has not release documentation on related search, it is clear that the results utilize the algorithm originally created by Orion after they acquired the engine in 2006. Orion, "finds pages where the content is about a topic strongly related to the key word. It then returns a section of the page, and lists other topics related to the key word so the user can pick the most relevant."
What Orion does is an automated type of content or concordance analysis. It creates an index of main words from a set of pages along with their immediate contexts. Google's related search derives from Orions's automated research. If one searches "principles of physics," for example, Orion's algorithms will produce the results "angular momentum", "special relativity", "big bang" and "quantum mechanic"
. Although it is unclear whether the Google derived related search is solely keyword/content analysis, it does derive from an attempt to 'interpret' the content. This, we hypothesize, will provide a rich area of research for social science digital methods based query design.
- Create new tool which scrapes Google's related searches for a query and then scrapes the related searches of those related searches up til a user-specified depth.
- Visualize as a directed network with http://gephi.org where related searches would link to the queries they related to.
- Compare manual and automatic related searches scraping to analyze the algorithm behind related search.
- Tested the 'query design' tool utilizing womenonweb.org and the starting term "medical abortion":
- Scraped the related search Google results for the term "medical abortion" and visualized using gephi.
- Captured all key terms as identified by the related search query and crawled those issues on the womenonweb.org site.
- Identified key terms where womenonweb.org was in the top 100 Google results.