this page is a draft!!!
The 'raw data' format file is actually just a database dump of the particular network - some uninteresing fields are left out. All fields are printed comma separated in a textfile. On this page you will see a description of all headings/fields in this file.
im_network
Provides a description of the network
Field |
Description |
id |
The network id |
schedule_id |
Id of the schedule which generated this map. |
schedule_index |
Chronological position of the network within the series (0 = the initial map, not actually produced by the scheduler) |
crawl_queued |
Time at which the request to crawl this network was sent |
crawl_start |
Time at which the crawl of this network started |
crawl_end |
Time at which the crawl of this network finished |
crawl_timeouts |
Number of timeouts |
page_downloads |
Number of pages downloaded during the crawl |
excluded_pages |
Number of pages excluded during the crawl |
num_starting_points |
Total number of starting points |
[starting_point_privilege |
Currently expected values are 0 (do not privilege startingpoints) or 2 (privilege startingpoints). |
iterations |
Number of iterations of the algorithm. Expected values are 1, 2 or 3. |
depth |
Depth to which each site is crawled. |
co_link_analysis |
Type of co-link analysis: 1 = by site; 2 = by page |
exclusion_list |
List of sites to exclude. XML. |
title |
Title of the network |
minimum_diversity |
Minimum number of domain categories the network must contain |
required_authority |
Number of inward links a node must receive to be included in the network |
im_site
Provides a description of the host.
Column |
Description |
id |
site_id |
url |
URL to be linked to when the map is rendered, usally the homepage. |
host |
the host of the url |
name |
Name of the website or organisation |
category |
e.g. gov/com/org, international/national |
authority |
Number of inward links the site receives from the network |
knowledge |
Number of links from this site to other sites in the network |
in_network |
1 = in the network 0 = an External Site (not in the network, but part of the set of nodes which generates the network) |
im_page
Provides a description of a deep-link (aka page).
Column |
Description |
id |
page_id |
site_id |
the id of the site/host this page belongs - refers to #im_site |
url |
the full url |
date_stamp |
the date of retrieval of this link |
im_link
Provides a description of the links between pages.