ResearchBuzz!
ResearchBuzz Logo
Search Engine News and More Since 1998

Sign up for ResearchBuzz FREE every week by e-mail.

Email address: Privacy Policy

ResearchBuzz:

Get a Feed:



    Add to Google
    Subscribe in Bloglines

Search:

 
Web www.researchbuzz.org

November 18, 2002

Database of Web Robots and 'Bots

Sometimes in the course of reviewing your Web site's access logs you'll find occasionally weird entries. They're not browsers, and they're not search engine spiders that you recognize -- what are they?

The robot and spider database at http://joseluis.pellicer.org/ua/ will allow you to search for unknown user agents or browse by type including indexing agents, validator agents, and spam harvester agents. A summary page for each type of agent includes a table of information for each robot including the kind of robot it is, whether it's "naughty" or "nice" (which depends on how fast it grabs pages to index) and whether or not it honors robots.txt or the robots meta tag.

Individual pages for each spider provide much of the same information, with occasional comments about the spiders. Once you've finished browsing, check out the tool that allows you to generate configuration files (including a portion of an .htaccess file that you cut and paste into your existing .htaccess file, as well as browser and robot inclusion lists.) I learned about several spiders here I'd been wondering about. Worth a look.

Posted to Internet-Web | TrackBack


Things You Can Do With This Article: