RSS FEED:

SEO & Internet Marketing RSS Feed SEO & Internet Marketing
Web site  RSS Feed Web site

PODCAST:

Podcast Podcast SEO

Blog:

Blog

Press Releases:

Press Releases

Business Catalog :

Samyak Online Catalog

Subscribe to Newsletter
Privacy Policy
Instant Messenger
Yahoo IM
Skype IM
Msn IM
Google Talk IM
samyakonline samyakonline
samyakonline
samyakonline

Spider Spotting



Search engines send out what are called spiders, crawlers or robots to visit the site and gather web pages. These robots leave traces behind in the access logs.

How to identify a spider ?

Those from the major search engines can sometimes be identified from their host names. These often incorporate part of the search engine's name or the company's name. For example, one of WebCrawler's host names is spidey.webcrawler.com.

A better way of spotting spiders is to look for their agent names, or what some people call browser names. Spiders have their own names, just like browsers.

For example, Netscape identifies itself by saying Mozilla. Alta Vista's spider says Scooter, while HotBot's spider is named Slurp.

Some resources for getting a list of host and agent names for the major search engines is below. However, it's useful to know how to spot any robot, because names can change, or new robots can appear. The principles of spotting spiders still remains the same, however.

The Best Clue: robots.txt

This is a file that tells robots what they may and may not index within a site. Not all spiders follow the robots.txt convention, but most do. Anything requesting this file is almost certainly a spider, robot or an agent.

By reviewing the requests, we can usually spot spiders from the major search engines by their host names, which in turn tells us the latest agent names. This is surprising to note that how many smaller search engines, personal agents and other robots are also accessing ythe site.