Display a site's robot exclusion policy.
Harvest the urlsNote: Not all frames are supported. For more information about robot exclusion protocols, please see http://en.wikipedia.org/wiki/Robots.txt
while i < depth
for every url
get the host
if host/robots.txt exists //this is only checked if an url has a different host from the previous
display robots.txt
else say it didn't exist
end if
if robot meta tag exists //this is checked for every url
display meta tag
else say it didn't exit
end if
get all the links of the url
end for
i++
end while
I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
png | robotstxt.png | manage | 9 K | 12 Dec 2008 - 14:19 | AnneHelmond | Tool icon |
png | white_house_robots.png | manage | 55 K | 12 Dec 2008 - 14:51 | MichaelStevenson | White House Robots |