Introduction
Tagging is used by sites to describe and ultimately organize items by theme or key word. Individual items like posts, blogs, images, links and videos can be tagged with one or more named values to group the particular item within one or several descriptions. When searching for a specific tag, the items related to this tag will be shown. Additionally the other tags these items have received may be shown. Several applications provide the user with a list of tags where are those most used in relation to the current tag; these are the so-called related tag. Finally, many websites will provide an overview of tags on a site, where the tags that are used the most will be larger is size than those used less frequently. This is the famed tag cloud.
The following are notes on tools and methods for exploring popular and related tags on various devices, including del.icio.us, as well as references to related literature.
Research
In addition to helping manage and organize data, tags also generate spaces. Clouds of 'Most Popular Tags generate a space of topics or issues which are current or important according to the tagosphere. There are the obvious popular tags which probably are used frequently in the tagosphere or within the tagspace of a specific site or community, but as an issues or event arises it will have an effect on the tags used as the items related to the event will adapt to current events. As important aspects of the contemporary web such as images and videos are not easaly searchable or their content, tags provide a good way of searching them and as a result providing an issuespace of tags about current events.
Most popular tags surface according to user input and therefore not directly say something about the tag itself. By providing several tags to the items and by specific website generating a related tags list, individual tags seem to be located within a issuespace of their own. As related tags are more related to the tags than to the user, looking at these related tags and the space they occupy can provide information about the tag itself.
These two spaces of the tag, the most popular and the related tags, will be looked at by creating tools which can extract the tags, cloud and related tags across several devices. The intention of this research and the tools are:
- Surface and explore the distinctiveness of the tag
- Display what issues are present on the web according to the tagosphere
- Define problems and possibilities of 'tag merging' across devices and spheres
- Provide means to collect tag data and present it
As most of the websites which are used on the web today provide a tagcloud with the most popular tags this is a good source of information to start collecting information about current issues and about the possibilities and problems of tagmerging.
Analyzing popular websites, five websites have be selected to be included which seem to be at the heart of either tagging or user contribution. They are:
- flickr
- del.icio.us
- yahoo video
- youtube
- technorati
The tool
poptag.rest has been created to extract the following information from these websites:
- tagname, the actual tag
- taglink, the url where the tag is linking to
- weight, the specific weight of the tag within that device
- channel, the specific device the information is gathered from
An example of the output is located
here.
By using php and dom, a script has been written to make all the weights generic from values 1 to 5 and display per tag if it is located in on of the other devices. This script is now in its final stage. Output of the script in array fromat can be viewed seen
here.
The end result will be a tagcloud of most popular tags across five devices. The cloud needs to represent:
- The tags and their weights per device
- A weight according to their occurrence in devices (1-5)
- Separate colors per devices
In relation to using tagclouds as way of representing data, it seems our current tools are not sufficient to handle multidimensional research as is done with the tagmerging. A new tagcloud generator needs to be created.
To examine the related tags there are two approaches. The first is to use/scrape the device generated 'related tags' list. The great advantage of this is that not all tags with a specific tag have to be scraped for their other tags, this is done by the device itself. The
WeScrape method can be used to scrape the necessary data. This raises the problem of what rules they have applied to generate the list. This approach only works for specific websites as a lot of website do not provide this list and is thus not very applicable in cross spherical analysis.
The second approach is to search for a tag and then scrape each and every result (or a decided subset as the first 100 results), and generate the related tags list by hand. This approach provides much more freedom and insight into the gathered information, but requires quite a bit of programming and cannot really been done by a
WeScrape method. There is already such a tool called the
delicious related tags tool. Each related tags tool will need to be device specific.
By gathering the related tags of a tag one can view the space this tag is located in. When for instance searching for a name or issue, the issue space surrounding this person or event on the web can be made visible. By trying to tagmerge the related tags for a specific tag, the goals is to look at the tag and its space at a more generic or "global". This then raises the question whether or not tags from different devices could ever be merged as tagging seems to be in many ways device specific. Although in many ways this seems to be the case, it is the assumption that the more data is aggregated from different devices, the more this problem will be filtered out as the occurrences in different devices will effect their priority.
Some steps have been undertaken to use
WeScrape but using a tool to physically loop trough the pages of a website proved to big of a load.
An attempt has been done to use
WeScrape to retrieve the 'related tag' information from technoratie and go two levels deep. By crawling through the technorati related tags, the idea was to generate the tag space from this. Before pursuing this though however, this needs to be thought over again to determine its relevance and value.
Research and Literature
Cramer, Florian. 2007. Semantic Web, lecture delivered at the Quaero Forum, Jan van Eyck Academy, Maastricht, 29 September,
http://www.nettime.org/Lists-Archives/nettime-l-0712/msg00043.html
Shirky, Clay. 2005. Ontology is Overrated: Categories, Links, and Tags.
http://www.shirky.com/writings/ontology_overrated.html
Simons, Jan. 2008. 'Another Take on Tags? What Tags Tell,' in Geert Lovink and Sabine Niederer (eds.) Video Vortex Reader: Responses to
YouTube. Amsterdam, Institute of Network Cultures: 239-254.
Tags:
,
view all tags