Input URLs or text into the harvester and choose depth of search (

In the box you can enter URLs. After clicking submit all unique hosts of the URLs will be checked for robots.txt (e.g. will be checked for and each unique URL will be checked for <meta name="robots" content="bla">. From those URLs the links are fetched and the process starts again for the specified depth.

In pseudocode:
Harvest the urls

while i < depth

for every url

get the host

if host/robots.txt exists //this is only checked if an url has a different host from the previous

display robots.txt

else say it didn't exist

end if

if robot meta tag exists //this is checked for every url

display meta tag

else say it didn't exit

end if

get all the links of the url

end for


end while

Note: Not all frames are supported. For more information about robot exclusion protocols, please see
Topic revision: r3 - 21 Dec 2008, RichardRogers
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback