Input URLs or text into the harvester and choose depth of search (example.com/depth1/depth2/depth3).
In the box you can enter URLs. After clicking submit all unique hosts of the URLs will be checked for robots.txt (e.g.
http://www.bla.com/bla/bla/index.html will be checked for
http://www.bla.com/robots.txt) and each unique URL will be checked for <meta name="robots" content="bla">. From those URLs the links are fetched and the process starts again for the specified depth.
In pseudocode:
Harvest the urls
while i < depth
for every url
get the host
if host/robots.txt exists //this is only checked if an url has a different host from the previous
display robots.txt
else say it didn't exist
end if
if robot meta tag exists //this is checked for every url
display meta tag
else say it didn't exit
end if
get all the links of the url
end for
i++
end while
Note: Not all frames are supported. For more information about robot exclusion protocols, please see
http://en.wikipedia.org/wiki/Robots.txt