How can I keep robots off my server?

Programs that automatically traverse the web can be quite useful, but have the potential to make a serious mess of things. Every so often someone will write a "depth-first" searching robot that brings servers to their knees. See the section on writing robots for details.

Fortunately, most robots on the web follow a simple protocol by which you can keep them off your server if you wish, or keep them out of portions of your server which are robot traps (ie, they contain an infinite number of possible links). Read the document World Wide Web Robots, Wanderers and Spiders (URL is <URL:http://web.nexor.co.uk/mak/doc/robots/robots.html> ) and learn about the emerging standards for exclusion of robots from areas in which they are not wanted. You can also read about existing robots there, including useful cataloging robots you probably do not want to keep off your server.


World Wide Web FAQ