AnalyticsScrew Size! I Dare Google & Yahoo To Report On Relevancy

Screw Size! I Dare Google & Yahoo To Report On Relevancy

Ah, summer. Time to play on the beach, head out on vacation and if you’re a search engine, announce to the world that you’ve got
the largest index. Search Engine Size Wars & Google’s Supplemental Results, Search Engine Watch,
Sept. 3, 2003

The quote above is from an article I wrote after Google and AllTheWeb played a game of “who’s biggest” in August 2003. They’d done the same thing in August 2002. Now here
we are in August 2005, and it’s another spat over size once again, this time between Yahoo and Google.

I cannot believe we’re going through this again. This is Search Engine Size Wars VI, by my count. It’s absurd. It’s annoying. It’s a friggin’ waste of time. Instead of
advancing to a commonly accepted relevancy figure, the search engines want to keep us mired in the mud of who’s biggest.

Who’s biggest really doesn’t matter, as I and others have written so, so, so, so, so many times before. Reasons? There are many. How about…

  • You need the whole haystack! Here, if I dump it all on your head, can you find the needle now?
  • If you have lots of documents but they are all near duplicates of each other, is that good?
  • How much of a document have you indexed — 101K, 500K, 1MB?

Pick your metaphor, your explanation, your qualification (Gary gives you even more here) — we’ve been
through this all before.

Nothing has changed. Size hasn’t suddenly gotten more important overnight. What has happened is for the first time, one search engine is strongly disputing the claims of
another. Google doesn’t believe the figures Yahoo is bandying about, as Gary covered earlier. Yahoo has been
steadfast that it’s not lying.

Well let’s do some testing! Let’s come up with some standards! Let’s audit the figures! Yeah, let’s do that. After all, it’s been
discussed since 1999, when Northern Light wanted to say definitively that it was biggest. Surely it’s
time for that to happen, right?

No, it’s not. If the search engines are all going to come together to figure out a standard on something, move forward! Move forward! Pull it together and unite to come up
with a way to test relevancy! That’s what matters, not this squawking and time wasting over size.

In Search Of The Relevancy Figure from me in 2002 looks at the need for a relevancy figure and how
without it, we’ll continue to have search engines use surrogates such as size for relevancy:

A relevancy figure would also free us from search engines playing the “size card” or the “freshness card” to quantify themselves as better than the competition. Yes, having
a large index is generally good. Yes, having a fresh index is desirable. However, neither of these stats indicates how relevant a search engine is. Nevertheless, the search
engines keep pushing them at us, and in particular at journalists, in an effort to trump their competitors.

Here we are in 2005 and what’s happening? Size is pushed again in our faces. Sure, Yahoo didn’t do a release on it. But it knew exactly the reaction it would get by
announcing via its blog that it was twice as big as Google. And Google? The company has pulled out all the stops in lobbying us at Search Engine Watch along with other
analysts to poke hard at the Yahoo numbers, because it doesn’t want to be seen as “second best” in any area.

The irony is deep. Google has never provided any proof when it trumped others on the size front. MSN says it’s at 5 billion in November? No problem — Google magically
announces on its home page that it’s at 8.1 billion. While MSN didn’t seriously question that Google was
larger than it, plenty of other rumblings went around that the count might not be correct. But since it had trumped everyone else, Google apparently didn’t feel the burning
concern it now has that size should somehow be verified. Sure, maybe Yahoo isn’t at 19 billion. But maybe Google isn’t at 8 billion, either.

This game is going to go on and on until someone is brave enough to change the rules. I’m daring either of the leaders, Google or Yahoo, to do just that. Both of them say
that size is one of only many factors to consider. Both of them tell you relevancy matters most. SO PROVE IT!

Ideally, I want to see the major search engines come together to develop a unified, accepted way to measure relevancy in various ways: web search, local search, advanced
queries, whatever. Establish a research center, a consortium or something and a methodology that all will agree upon. Then test every four to six months and pledge you’ll
accept the results publicly. Someone wins? Kudos all around! Didn’t win? Then do better next time.

That’s the challenge. Let’s see if someone steps up. As for size — yes, Gary and I will revisit the various claims and counter-claims in more depth later this week. In the
meantime, some past reading on the subject of size and the complications in measuring it:

  • New Estimate Puts Web Size At 11.5 Billion Pages & Compares Search Engine Coverage has an estimate of
    what search engines cover compared to self-reported claims. Despite the Ask Jeeves connection, that service doesn’t come out on top in terms of size.
  • Search Engine Size Wars V Erupts covers the self-reported figures and battle we had between Google and
    MSN last November, along with issues such as how much of a page is actually indexed.
  • Search Engine Size Wars & Google’s Supplemental Results covers more on deconstructing index size
    claims from 2003.
  • Search Engine Sizes has more articles than you can imagine covering size issues over the years.
  • How to count URLs is an archived page of what Excite used to do back in 1996
    — 1996! — to explain how it thought counting should be done. I and others have written how in many ways, it feels like we’ve gone right back in a big circle of portal
    features being rolled out, land grabs and inflated valuations that make it fell like we’re back in the 90s again. The size dispute is just another big spin of that wheel.

Want to discuss? Visit our forum thread, Ridiculous Increase In Yahoo Backlink Counts & Is Bigger Index Real?

Resources

The 2023 B2B Superpowers Index
whitepaper | Analytics

The 2023 B2B Superpowers Index

8m
Data Analytics in Marketing
whitepaper | Analytics

Data Analytics in Marketing

10m
The Third-Party Data Deprecation Playbook
whitepaper | Digital Marketing

The Third-Party Data Deprecation Playbook

1y
Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study
whitepaper | Digital Marketing

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

1y