Penguin – What Happens Next? 10 Data-Led Predictions
What will the latest update of Penguin bring and how can webmasters prepare?
What will the latest update of Penguin bring and how can webmasters prepare?
Twitter has been alight for the past couple of weeks with news that the latest Penguin algorithm update is almost ready for action.
For those still impacted by the initial “shock and awe” tactics employed by Google some 12 months ago, “official news” that a refresh is on its way is good news indeed, given the length of time between reruns.
We know that the ship is about to sail again thanks to the welcome news from Google search engineer Gary Illyes that disavow files are now no longer being processed for this next update. It’s close.
The question is, however, what will this update bring to the party and how can webmasters prepare?
Clearly no official guidance has, or ever will be, given here but what we certainly do have is both data from the past runs and also, critically, from the transformation of “sister” algorithmic update Panda to help us understand where Penguin is heading.
As we know, and can see from the graphic below, Panda first launched in February 2011 as a clunky, issue-ridden but extremely impactful update which rolled out globally within six weeks.
The data processing piece for a filter based primarily on on-page factors, however, was much more straightforward, as it didn’t involve mapping the entire link graph and understanding ever nuance. That meant the iteration plan picked up pace quickly and, as talented data analysts do, they began a series of smaller updates followed by analysis, iteration, and tweaks.
To date there have been at least 30 of those we know of in a period spanning just more than three-and-a-half years. If ever there was proof of Google’s strategy, it is there for all to see.
We know then that the same is planned for Penguin, the challenge has simply been the sheer amount of data that has needed to be processed in order to get it “right” with the link-quality-based update.
While Panda measured, amongst other things, the code base of a website and its content, Penguin has had to map, define, and measure the entire link graph and to do that the search engine required help.
And you guessed it, that help came from webmasters and from those impacted from the first five iterations of the “penalty.” The work with disavow allowed the search team to leverage tens of thousands of other “experts” to send through millions of examples of “poor-quality” links.
That human-sorted data set will undoubtedly now form the basis of the next update; a much more intelligent versioning based on true “big data” understanding of what defines a good links from a bad one. Or, more precisely, a value-adding link from a worthless one designed to manipulate PageRank only.
So, if we know they have used some pretty smart gamification techniques to gather key data and help “process” the link graph already, what can we expect from an update that has been a whole year in the making? Here are some predictions:
A key aspect of the next iterations of Penguin will undoubtedly be its ability to understand the “provenance” of any link equity that a site “earns” from any link placement on it.
Rather than taking a link on face value, it is imperative for the filter to truly understand how that site has got its own equity in the first place.
It’s like knowing the history of a car you are buying. If it’s missing lots of service history stamps and there are then also suspect repairs done on the bodywork, you would be right to question whether it really is the sound vehicle the seller says it is.
Links are the same and in my view a lot of the wait has been due to Google digging into the link graph in a way that allows an algorithm to measure not just the value of the link on face value but by looking at that site’s “history.”
If you look at the link graph, it is made up of a series of “nodes” that, when expanded out, look a little like this:
In almost all circumstances you can trace link equity right back to “neighborhoods” of shared “equity” and by doing this it is possible to work out where the “good” and “bad” ones are. Of course, like in real life, you get good and bad people in good and bad neighborhoods and figuring that level of precision out will be part of the ongoing iteration process that we will undoubtedly see in the coming months and years.
The key, of course, will be getting in with the right, virtuous, crowd and steering clear of those sites that have attempted to game their own equity.
Another thing that Panda has taught us is that Google likes to start with whole-of-site impacts, learn from the data, and then use that feedback to create more targeted impact.
We will see this in Penguin with sites hit at category or page level as opposed to randomly or across the board. This will make checking link profiles at that level more important.
As with Panda, we will now see a more regular refresh, as the heavy lifting is over. This should mean that those waiting for recovery should see the result of their work much faster, in either direction.
And combined with a move toward page-level impact as opposed to site-wide, this should mean Penguin becomes less of a business-destroyer and more a “clip-behind-the-ear” over time.
Initially the algorithmic update focused very much on obvious signals such as anchor text misuse, but as the data play gets smarter we will see link relevance and hat aforementioned provenance, or trust, come more into play.
This will, of course, bring more challenges to those attempting still to outsmart the system, as previously “hidden” link networks and domain authority built from very powerful, but irrelevant or unnatural, sites will be more easily spotted.
We know also that Google has a new patent for Panda that also looks at anchor text use in the context of counting inbound link anchor text as part of the on page calculation for content.
That basically means that even pages that are very natural “on page” may still be penalized for spam if they then have a lot of exact match anchor text off page. Another reason to steer clear of that tactic!
Unique IP or domain link count has always been important, but it will take on another dimension with future iterations of Penguin. Getting that natural balance between enough and too many for your niche will be more important. Unnatural will stick out and black flagged, making an understanding of competitor balance very important.
What is acceptable in one niche will be very unnatural in another and a smarter Penguin will quickly sniff that out.
Google has long held its Hilltop patent and it would make logical sense for Penguin to use element of it to understand trust and relevance.
For those that do not know it, the patent looks at “Expert” and “Authority” pages, defining the former as a page that links out to lots of other relevant pages to add value to an article/page, while the authority is the page linked out to.
The really valuable links are therefore those that come from expert pages and earning lots of these is the way to rank well and avoid Penguin. The only way to do that, of course, is to share amazing content, becoming a thought leader and authority in your space.
The amount of links that go into deeper pages will also be looked at as part of that move to more precise measurement. Great sites earn deep links, but where there appears to be too many to a commercial page may trigger Penguin issues.
The safer strategy would appear to be domain-level links and links from expert documents into thought leadership pages, which will most probably be found on your blog, or within a content or resources section.
This may be slightly more of a stretch, and could form part of Panda as opposed to Penguin, but the relationship between the number of links you have and the amount your brand is “talked about” online is a very logical way to validate link authority.
It’s something I have written about previously and makes absolute sense as a sense check for understanding if a link profile is real.
Google talks a lot about “brand building” and one of the best ways to measure brand is to do so via “listening” through either social or Web mentions and sentiment.
Tools to find these things are easy enough to build and so the might of engineering talent at the search company would have no problem doing that at scale.
Finally there is the piece around usage data. We have certainly seen signs of that creeping in on the Panda side as Google looks to understand not just what a page, or site, might “look” like to a crawler or headless browser.
Looking at, or measuring, the amount of “traffic” from certain links is within their reach through analytics and would be another way of validating link quality and relevance. After all, who clicks on a non-relevant link?
I wrote a post on here a year ago examining some of the data the team at Zazzle Media had extracted from recent site recovery projects. It pointed toward a reducing percentage of allowable “suspect” or spammy links in a profile. The chart below shows how that progressed and we’ll be testing again post Penguin 3.0 to see how far that has been taken.
The future is uncertain and the predictions above are clearly just that. One thing we do know, however, is that Penguin will get smarter and, having had a whole year to work on it, the next version will be much more precise at doing its job: wiping out irrelevant linking behavior.
The wider challenge for those hit, of course, is distinguishing Panda impact from Penguin and as the two get closer together, and as Penguin is rolled into the main algorithm just as Panda has been, it will be more and more difficult to find the right “fix.”
For those struggling with it this simple Google Penalty cheat sheet is designed to help.