Behind the Scenes at the Daypop Search Engine

Daypop is a unique search engine, for a number of reasons. It’s primary focus is on weblogs, news sites and other sources for current events and breaking news — currently scouring more than 35,000 of these sources.

Daypop is also unique in that it’s operated and maintained by a single individual, Dan Chan. Dan’s development and devotion to Daypop is worthy of the highest praise. Even if other engines begin devoting more resources to weblog searching, Daypop should not be forgotten. Because Daypop is the work of one person, Dan can experiment, try “new things” (search options, lists, rankings), and tweak the engine in a matter of hours if not minutes.

Dan graciously took time out of his busy schedule to chat with me via email about Daypop. Here’s part one of a two part interview.

Can you share some of Daypop’s history? For example, when did you start it and why?

During the 2000 Presidential Elections I was keeping up with the news from Hong Kong (where I was staying at the time) and writing in my blog. I wanted to know what other people thought of the elections and I realized there wasn’t any way of searching for current events. All the search engines at the time were on a two-month cycle for each crawl of the web. They were missing out on a huge amount of dynamic content, all the sites that I call the Living Web.

I wanted to read news articles about the elections, to expand my reading beyond Salon, but most importantly I wanted to get all the different viewpoints on the elections. What better way to get opinions that to search blogs?

That was like a Eureka moment for me when I realized there was this huge void — no one was offering a simple Google-like service to search highly dynamic sites. Someone had to fill that void. In April or May of 2001 I started working on it, almost full-time, and I launched at the end of August 2001.

What are Daypop’s current numbers in terms of sources, hits, and so on?

There are about 1000 news sources from around the world and about 19,000 weblogs. All the feeds from NewsIsFree as well as the weblog RSS that Blogstreet tracks are also indexed. I believe NewsIsFree has about 5,000 feeds and Blogstreet tracks about 9,000 feeds. Daypop gets about 50,000 page views a day.

You’re one person running a very well-known web tool. Is Daypop your only job? How much time do you put into running it?

I stopped working on Daypop during the weekdays around the beginning of 2002. It’s mostly a weekend thing for me now. I’m actually a video game programmer and I’m currently working on a project for a home console system. The last game I worked on was Crash Team Racing for the Playstation.

Is funding an issue? I noticed that you ask for donations?

I’ve put up a lot of money (in addition to all the time) to keep Daypop running. The cost of the line is just a part of it. The cost of the equipment is also a major factor. The little advertising Daypop has had plus all the donations don’t even add up to the cost of the equipment. But I consider the server my donation to Daypop and I’d be happy if just going forward I could break even.

The server right now is overburdened, it takes care of all the crawling, indexing, inverting, analysis (Top 40, Top News, Word Bursts, News Bursts, Wishlist, Top Weblogs), and searching. Daypop’s index is updated continuously. I’m looking into offloading some of that work to another server, but of course that’s another added cost.

What has Daypop taught you about how people search for news and information?

People still perform general searches (many one word queries) even when looking for news. “Iraq” has topped the charts for a while now but a search like that will turn up semi-random articles about Iraq without a specific context. It’s a very general search. A small minority of searchers actually use Daypop to narrow their hunt for news with searches like “Basra” or “Jessica Lynch”.

When you started Daypop it was one of the few specialized news engines available. Then, Google News came along and many people became aware of news and specialty search. Did your visitor numbers decrease?

Surprisingly, no. I knew it was a matter of time before Google came out with a news search and when they did I took solace in the fact that at least Daypop indexes weblogs also (which I’ve found just as useful as news search). I guess most Daypop users think the same way, they’ve stuck with Daypop even through the extended downtime in November 2002 (which I apologize for).

Did Google news force you to change anything? Focus more on blogs and less on news sources?

That’s something I’ve been mulling over recently. Should I drop the news search from Daypop altogether and concentrate on weblogs? But there are still plenty of people who use Daypop for news searches, and some people are even finally utilizing the RSS feed feature to create customized news feeds. And Daypop indexes all Western European languages whereas Google just indexes English sites. So I think there’s still value in keeping news search in Daypop.

Has Google or any other engine ever contacted you merge Daypop into their service or for you to become part of their team?

Several companies have given me calls but nothing has ever come of it.

Dan, you’re someone deeply involved in web search, what’s wrong with it?

I’ve got some ideas that I’m working on for Daypop on the weekends so I don’t really want to let the cat out of the bag, but it boils down to the fact that I still have to dig for relevant information. If I have to dig, at least give me the tools to do it.

Final Note: In addition to making news and weblog content searchable, make sure to take a look at the interesting rankings that Daypop also provides, including the Top 40, Top News, Top Posts, Word Bursts, News Bursts, Top Wishlist, and Top Weblogs — links to all of these features appear at the top of every Daypop page.

If you’re really interested in weblogs, take a look a the Blogstats service, which “generates a page of statistics for any weblog in Daypop’s index.”

This interview with Dan Chan continues with part two and part three.

Gary Price runs Gary Price Library & Internet Research Consulting and is the author of ResourceShelf, a weblog about searching and the information industry.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

Camera-phone reaches a megapixel
CNET May 21 2003 1:18PM GMT
How to Keep Your Medical Records Private May 21 2003 1:14PM GMT
eWebEditPro 3.0+XML: Easy XML editing from a GUI
CNET May 21 2003 6:32AM GMT
U.S. promises limits on computer dragnet
CNET May 21 2003 6:06AM GMT
Wanna get online? Find a pay phone
ZDNet May 21 2003 1:22AM GMT
Search Engine Consolidation Nearing End: Report
theWHIR May 20 2003 11:24PM GMT
Feedster — A Search Engine Built on RSS Feeds
Research Buzz May 20 2003 12:57PM GMT
Hackers hijack computers remotely in new surge of spam May 20 2003 12:38PM GMT
The Dark Side of eBay
Fortune May 20 2003 12:37PM GMT
Portal Websites – SEO SPAM or Legitimate Search Engine Placement Tactic?
Search Engine Guide May 20 2003 10:44AM GMT
Yahoo Kicks Off Wide-Ranging Search Campaign May 20 2003 10:38AM GMT
Atomz, WebSideStory Integrate Search, CMS and Analytics May 20 2003 10:38AM GMT
SPSS Offers Predictive Web Analytics May 20 2003 5:19AM GMT
powered by

Related reading

Search engine results: The ten year evolution
Five ways PPC customer support can help SMBs
#GoogleDoBetter The latest on internal issues at Google and Alphabet
Google Sandbox Is it still affecting new sites in 2019