ToolIssueDiscoveryHowTo < Dmi

You are here: Foswiki>Dmi Web>ToolDatabase>ToolIssueDiscovery>ToolIssueDiscoveryHowTo (01 Dec 2009, ErikBorra)Edit Attach

For each document - whether it be a page from an Issue Crawler network or text submitted by the user - the Issue Discovery Tool does the following:

Make a phrase list of noun phrases and Capitalized Sequences (Resulting in a list of Proper Nouns, Acronyms, ...)
Add to the phrase list a list of significant words or phrases extracted from a larger source set of content by using the Yahoo Term Extraction Web Service
Output is adjusted as follows:

Lowercase all phrases in the list (for easy comparison)
Remove phrases that have a length less than 3
Weight each phrase found in the previous steps as follows: Count the number of times the phrase appears in the document. If the phrase comes from Yahoo add 1 to the previous count (This favors Yahoo's presumed robustness). If the phrase does not come from Yahoo but if there are multiple terms in the phrase, add 2 to the previous count. (This assumes preference for multiple terms to single terms, if they did not come from Yahoo).
Remove phrases that are on the stop word list.
Remove phrases that are also part of a longer phrase in the list.
Sum the weight of all phrases obtained from all documents into one large list.
Rank the list.

The Issue Discovery tool is not designed to 'give proper weight' to items. It is more a heuristic, a data exploration tool rather than an empirical tool.

Topic revision: r3 - 01 Dec 2009, ErikBorra

Digital Methods

Course

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback