Do you think your site has been hit by Panda? While Google has now rolled the Panda algorithm into the main algorithm, many publishers have never recovered from the many Panda updates that took place before this happened. Even though we may not see announcements of updates or refreshes of the Panda algorithm any more, it’s still there.
As a publisher, it’s important to learn the lessons that Panda taught us, and what it means for our web publishing strategies.
In some cases of sites hit by Google’s Panda update, the problem lies in a lack of “material content differentiation” in major sections of a site. Last October I wrote a related article that covered The Need for Extreme Differentiation in your Content, and that covers a bit of this ground, but now we’ll look at some ways to think about measuring if your content stands out.
The prior article went into great detail on the concept of query deserves diversity (QDD). Put simply, the concept here is that if the user didn’t like the first result they clicked on, they probably don’t want to see the same information in the next result they go to.
In the case of a search on “jaguar”, a user could be looking for the car, the animal, the guitar, or the operating system. They would even be looking for the football team.
This notion of QDD is by itself reason enough to want to differentiate your content in material ways. Not everyone can be the top brand in a given market, so finding different approaches to crack the top 10 is a good idea. But, the issues with differentiating your content and Panda are, well, different.
Panda is a Document Classifier
In a Wired interview, Google’s Matt Cutts said this about Panda:
“And we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons.”
This is an important concept to focus on. Why is it that your site got hit by Panda? Is it because your site is spamming? Not necessarily. It may be because your site was seen as poor quality in nature.
From Google’s perspective you might think of this as “Index Deserves Quality” (note I am sure they would not call it that, but it works for this discussion). The reality is that Google looked for signatures of sites that over-prioritize revenue generation over the user experience.
Google knows that people like the IRS site (when they need it), Wikipedia, or the NY Times. They know that people don’t like other types of sites. So how might they do this?
Content Differentiation as a Panda Issue
I have participated in multiple Panda recovery projects. One site was worked on is a very high PageRank site (PR8 home page), implying (though not proving) that it was seen as a fairly authoritative site. The site was hit by Panda, and it hit the site really hard.
Overall site traffic dropped 31 percent, and the major revenue producing pages on the site dropped nearly 50 percent. Once this happened we began working hard to try and figure out why.
Ultimately, we decided to try no-indexing one particular set of pages on the site. While the organic search traffic to this particular set of pages was lost forever, the organic search traffic to the rest of the site returned to its original levels.
How bad was the content? Let me help you get an idea about it, as follows:
- The basis of the data was a public government database.
- Many other sites make use of the same data.
- The owner of this site performed extensive value added analyses on the data (cross item comparisons and the like) and published the results of that on the pages as well. Nearly zero of the other people using the same data source did that.
- The owner of this site wrote hand researched, actually valuable (subjective statement, I know), 200 to 300 word articles for each of the pages with the data. Nearly zero of the other people using the same data source did that.
The point of the above outline is to show that the site owner went through a lot of effort to differentiate the pages containing this data from the pages of all of the other publishers using the same data on the web. Yet, the site got hit by Panda. Why?
Content “Sameness” and Panda
Figuring out how to differentiate content is not enough in this world. That differentiation needs to be perceptible, both to users and search engines.
I’ve spent a lot of time trying to explain this concept to many publishers. In the site referenced above, the content was in fact differentiated, but the fact that it was different was not perceptible. Try this concept on for size:
Imperceivable Value Has NO Value
The might sound a bit like a small contradiction, but it is a very important publishing concept. If users and/or search engines cannot perceive that wonderful new custom feature or content you added to the site, then you wasted your time and effort in implementing it.
As for how this plays into Panda, the point is that large sections of content on a site that has no imperceivable value are likely to be seen as doorway pages, and doorway pages are a signature for a poor quality site.
Think Like a Machine
How do you tell if you have a problem with “Sameness”? The key is to learn to think like a piece of programming code that is unable to discern things the way that humans do. While this is not trivial to do, try this exercise on for size.
Take a given page on your site and run it through a keyword density analysis – no not because I am worried about keyword density, I never think about that – but because it can help you see a bit about the way a machine sees a page. Here are two pages side by side, using the Keyword Density Tool from Internet Marketing Ninjas:
Quick, what is the real difference in value between these two pages? Of course, you have no idea. And, of course, Google is going to do this the type of analysis in a very different way, but the point is, that the analysis is being done by software, not by a human spending 15 minutes looking over every detail of the page.
What makes it stand out? Why is it perceptibly different? It should look somewhat different than this block of content:
Differentiation may also be driven by the level of integration into the rest of a website.
For example, if you have a large block of pages that appear to be designed as landing pages where the sole intent is to directly convert that search visitor, and with little thought put into deeper related information sources, then the content has no depth. It is much like an old fashioned doorway page. Sounds like it could be a signature of poor quality content, doesn’t it?
The need to come up with true differentiation is increasing rapidly. End users are demanding this more and more, and search engines like Google and Bing want to give end users what they want.
In the Internet age, where publishing is cheap, and people don’t need millions of different sources providing the exact same answers, you can’t afford to fall victim to “sameness”, so do something special instead!