Panda, Google's High Quality Sites Algorithm, Officially Adopts User Feedback As Search Signal
This time around, a global rollout may only prove to be a storm in a teacup as it will affect only 2% of search queries, rather than 12% as in the first update. Along with the update, Google seems to have put a more official badge on the ‘Panda update’, calling it the “high quality sites algorithm”. This may in order to further differentiate this second update and emphasize the fact that “user feedback signals” have been officially introduced into the algorithm. Google says these will most likely affect the long tail of search traffic.
These latest user signals come from the Chrome site block extension and also their in-situ button to block sites from Google search. The official announcement says that these user signals only drop sites algorithmically in certain ‘high confidence’ situations.
What is a High Confidence situation?
Google’s Panda update is remarkable for the fact that completely new methods were used for crafting the algorithm. Whilst other updates were focussed on tweaking general aspects of Google’s ranking factors, such as the quality of in-bound links or anchor text, Panda was more of a panel based update. In an interview with Wired, Matt Cutts and Amit Singhal, leaders in Google’s search quality team, discussed the fact that surveys were conducted with outside testers to find out what their users considered to be a low quality site.
“Wired.com: How do you recognize a shallow-content site? Do you have to wind up defining low quality content?
Singhal: That’s a very, very hard problem that we haven’t solved, and it’s an ongoing evolution how to solve that problem. We wanted to keep it strictly scientific, so we used our standard evaluation system that we’ve developed, where we basically sent out documents to outside testers. Then we asked the raters questions like: “Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?””
This survey enabled Google to create a working definition of what was considered a low-quality site and then launching the Chrome site blocker extension, they were able to compile more data to benchmark the algorithm change against. Although, this data was never used in the original Panda update, it was a means to see if they could algorithmically determine a low quality site that matched a real user’s definition.
“Singhal: And based on that, we basically formed some definition of what could be considered low quality. In addition, we launched the Chrome Site Blocker [allowing users to specify sites they wanted blocked from their search results] earlier , and we didn’t use that data in this change. However, we compared and it was 84 percent overlap [between sites downloaded by the Chrome blocker and downgraded by the update]. So that said that we were in the right direction.
Wired.com: But how do you implement that algorithmically?
Cutts: I think you look for signals that recreate that same intuition, that same experience that you have as an engineer and that users have. Whenever we look at the most blocked sites, it did match our intuition and experience, but the key is, you also have your experience of the sorts of sites that are going to be adding value for users versus not adding value for users. And we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons …”
So, essentially, a “high confidence situation” is one in which user signals match or overlap with the new low-quality site detection algorithm. Whereas these matches created no search signal before, Google’s latest announcement clearly states that the data is being used.
Why is that important?
If Google’s battle to maintain search quality is essentially based on a user panel, then it highlights the importance of Google +1 and social signals to it’s anti-spam strategy.
Blekko and Bing both have the ability to generate similar user panels and so, as users understand more how search works, may prove to be attractive alternatives to Google. Nonetheless, it will be the search engine that can scale panel based signals the quickest that will win.
The adoption of Google +1 button for sites, still not released, will be critical to maintaining it’s edge. This will mean that the in the future, the algorithm may be less about site structure, on-site SEO factors and off-site linking factors, but more on how sites live up to a popular definition of quality. Sites may live or die on the basis of user matches against the algorithm, rather than any traditional ranking factor.
However, in order for Google +1 to truly succeed it will have to achieve widespread adoption, and the only way to do that is to have users ‘playing for Google’ as unwitting panelists. My gut feeling is that this latest announcement on panda quietly foreshadows a new trend for 2011 – gamification of the web via Google +1.