Input URLs or text into the harvester and choose depth of search (example.com/depth1/depth2/depth3).

In the box you can enter URLs. After clicking submit all unique hosts of the URLs will be checked for robots.txt (e.g. http://www.bla.com/bla/bla/index.html will be checked for http://www.bla.com/robots.txt) and each unique URL will be checked for <meta name="robots" content="bla">. From those URLs the links are fetched and the process starts again for the specified depth.

In pseudocode:
Harvest the urls

while i < depth

for every url

get the host

if host/robots.txt exists //this is only checked if an url has a different host from the previous

display robots.txt

else say it didn't exist

end if

if robot meta tag exists //this is checked for every url

display meta tag

else say it didn't exit

end if

get all the links of the url

end for

i++

end while

Note: Not all frames are supported. For more information about robot exclusion protocols, please see http://en.wikipedia.org/wiki/Robots.txt
Topic revision: r3 - 21 Dec 2008, RichardRogers
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback