By Avi Rappoport Guest Writer, September 4, 2002
* Articles and News
After the Dot-Bomb: Getting Web Information Retrieval Right This Time
http://firstmonday.org/issues/issue7_7/bates/
http://www.searchtools.com/info/info-retrieval.html
Marcia Bates, an academic expert on usable information retrieval. suggests that if web entrepreneurs and VCs had known about the history of IR and library experiences, they would not have wasted investments in problematic approaches such as "push" technology. She offers seven suggestions to improve web retrieval: use faceted rather than hierarchical classification; don't try for a single "true" classification (and avoid the term 'ontology'); use subject and domain information retrieval vocabulary; remember the Bradford distribution; plan for explosive growth; provide tools for "human content processing"; learn from the history of information retrieval.
Robotcop enforces robots.txt
http://www.robotcop.org/
http://www.searchtools.com/robots/robots-txt.html
The Robots.txt file is a cooperative way to request that crawlers and spiders avoid certain parts of web sites. This free server module watches for spiders which read pages disallowed in robots.txt, and blocks all further requests from that IP address. It is particularly useful for blocking email address harvesters, while still allowing legitimate search engine spiders. Be sure to double-check your robots.txt file (use one or more of the robots.txt checkers), before implementing it, and to watch your server logs carefully. The August 2002 version (0.6) works with Apache 1.3 on FreeBSD and Linux.
* Search Tools Product News
Convera Visual RetrievalWare SDK version 5.0
http://www.searchtools.com/tools/retrievalware.html
http://www.convera.com/Products/products_rw.asp
New version provides improvements in the video clip editing and fuzzy matching, modules for color, shape, texture indexing, automated shot-boundary detection, more image and video formats, additional OS support includes FreeBSD, OpenBSD and Darwin (Mac OS X).
Google Search Appliance Capacity, Prices Rise
http://www.searchtools.com/tools/google-app.html
http://www.google.com/appliance/
The new price for the GB-1001 (1u rackmountable box, indexes 300,000 documents) is $28,000 -- up from $20,000; for the GB-8008 (an 8u server rack with additional load balancing features, and capacity for millions of documents) the price is now $450,000 -- up from $250,000.
Inktomi to Concentrate on Search, Buy Quiver
http://www.searchtools.com/tools/inktomi-search.html
http://www.inktomi.com/products/search/
Inktomi has announced that it's going to direct resources to Web and enterprise search, reducing the content networking part of its business. It is also buying Quiver, which has developed a content categorization tool. Quiver has a mixed manual and automated classification workflow system, allowing editorial staff to adjust taxonomies and document categorization for best results.
iPlanet Search Security Flaw
http://www.nextgenss.com/advisories/sun-iws.txt
http://www.sun.com/software/
http://www.searchtools.com/tools/iplanet-search.html
The Sun ONE Web Server search function (formerly iPlanet search engine / Netscape Compass) is vulnerable to a buffer overflow attack, which then gives access to the server and the ability to run code as the administrator account. Sun has released patches for iPlanet 4.1 (SP 10) and 6 (SP 3). This was reported by Next Generation Security Software.
Avi Rappoport, Principal Consultant for Search Tools Consulting, is a leading authority on site, Intranet and topical portal search engines.
NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.