Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


Share this Page URL
Help

Chapter 3: A Tangled Web > DNS Poisoning (Pharming) - Pg. 95

A Tangled Web · Chapter 3 95 Tools & Traps I, Robot Web crawler access can be controlled via a robots.txt file placed in the directory tree. This file can pass some instructions to Web crawlers, or a least those that are com- pliant with the relevant protocols. Several pages at www.robotstxt.org offer relevant if somewhat dated informa- tion, including: www.robotstxt.org/wc/exclusion-admin.html http://www.robotstxt.org/wc/exclusion-user.html.) See www.w3.org/TR/html4/appendix/notes.html -h-B.4.1.1for a more formal view. Google has lots of information on manipulating crawlers, especially its own, of course, including the use of robots.txt and meta tags. Check out the Webmaster Help Center at www.google.com/support/webmasters/ Unfortunately, compliance is optional, so the use of robots.txt as a means of denying access to pages containing e-mail addresses, for instance, is unlikely to