Web Spiders List Planned

Search engine spiders and crawlers with other purposes pose significant problems to site owners. In particular, they can skew page view statistics, which means that advertisers might pay for impressions that human beings never see.

If only there were a public list of all the spiders operating, so that they could be filtered out! That's what the Interactive Advertising Bureau is now offering, in conjunction with ABCi. Of course, it makes you wonder what all those advertisers have been doing -- or not doing -- to filter out robotic queries without such a list.

Spiders can also overburden web servers, which is why the robots.txt protocol was created years ago. A subsequent list of spiders was also established, but the self-reported nature of this means that it may miss many spiders. In contrast, ABCi has maintained its own list by watching for spider behavior, as having a good list is essential to its auditing business.

No matter how good the ABCi list is, it still faces challenges. Some spider operators will fail to list themselves or may actually try to disguise that they are spiders, such as those that harvest email addresses off of web pages.

ABCi and IAB Spiders and Robots
http://www.abcinteractiveaudits.com/abci_iab_spidersandrobots/

More about the joint spider list can be found here. The list is actually semi-public. You need to be a member of IAB or an ABCi client to access it.

Spiders and Robots Are Ghouls and Goblins
InternetNews.com, Oct. 22, 2001
http://www.internetnews.com/IAR/article/0,,12_908361,00.html

Overview of the effort to create a new spider list. ABCi, which has maintained its own list since 1995, will be doing the work on behalf of the IAB.

The Big List of Web Robots
SearchDay, Oct 24, 2001
http://searchenginewatch.com/searchday/01/sd1024-robots.html

More about SpiderSpotting and existing lists of robots on the web. Also includes links to information from the new ABCi effort.

Spiders and Robots and Crawlers, Oh My!
ClickZ, Oct. 25, 2001
http://www.clickz.com/media/media_buy/article.php/909611

Closer look at the problems that spiders pose for advertising measurements and why eliminating them isn't so easy.

The Web Robots Database
http://www.robotstxt.org/wc/active.html

The web's oldest list of spiders and crawlers, based on self-reported data.

StumbleUpon Toolbar Stumble It! Digg! Digg this! Add to del.icio.us Add to Mixx!

Newsletter signup
Subscribe today and receive the next edition of SearchDay delivered to your inbox.




Learn more about Newsletters Learn more about Newsletters   Subscribe to RSS Feeds Subscribe to RSS Feeds