Google’s Distinguished Engineer Matt Cutts was the headline attraction this morning as SES San Francisco 2012 Day 2 kicked off. Cutts joined Incisive Media Global VP Mike Grehan for a discussion on Panda, Penguin, social signals, duplicate content, transparency, and more.
Knowledge Graph Expansion
Cutts began by briefly recapping what’s new in Google search. Google is using the Knowledge Graph more and it has now has rolled out around the world. Google has also added Gmail messages to the search results, but noted it’s something people have to ask for. So far, Google has determined that users prefer to see the Knowledge Graph results in a consistent place (at the top or right) and Google is collecting more feedback.
Google’s goal is to make it possible for people to do quick research and get quick stats – to basically make exploration faster, whether it’s by showing collections of rollercoasters in a strip along the top or basic info on people, places or things.
Interestingly, Cutts noted that the Search Quality Team is no longer known by that moniker. They are now known as the Knowledge Team.
Why Panda and Penguin?
Panda came from the last name of the engineer who worked on it. Since Google’s Penguin engineer wasn’t named after an animal, Cutts said they went over a list of the top 100 cutest animals, and Penguin was the final choice. Though the black and white animal was a coincidence, Cutts said it may become a theme for algorithmic updates.
Later during the keynote, Cutts encouraged the audience to do a search for Freebase to understand it. You’re able to download what’s under the hood.
Search Worlds Collide: SES, SMX & PubCon
Grehan and Cutts were then joined onstage by Danny Sullivan, who runs the SMX Conference and Search Engine Land, and Brett Tabke, who runs the PubCon conference and WebmasterWorld, marking the first time that this foursome of search veterans has ever appeared onstage at one time.
The ‘Father of All Penguins’
Is there any plan to release “the father of all Penguins”. Cutts says long term Google wants as close to ideal rankings and best quality search results possible. He could see social becoming a bigger signal in the long term. In the short term, Google won’t leave links behind.
Cutts noted Penguin is still in an early stage, whereas Panda is now monthly and they understand it very well. Google is still iterating Penguin, so the changes will be jolting for a while.
Ultimately, Google doesn’t want people to worry about Pandas or Penguins. Google wants to reward sites that have good signals, said Cutts.
Google Leery of Social Signals
Discussion turned to social signals as a ranking factor. The number of Twitter followers is a potential social signal. Google isn’t able to crawl Facebook, either because people set their profiles to private or Google is blocked from crawling.
When Google lost access to Twitter’s firehose, Twitter blocked Google for several weeks. If Google can’t crawl and see how many people you follow or who follow you, they can’t use that as a reliable signal.
Can Google tell how many times a page has been shared/Liked/tweeted? Cutts said they can do a relatively good job, but Google is a little leery of relying on social as a signal. Google crawls 20 billion pages a day.
Google Tries to Get More Transparent
After recounting Sergey Brin’s claim Google is spam proof from a past SES, Cutts discussed Google’s new found transparency. He began by referencing a tweet that said it’s cheaper to do everything legit than to go under the radar.
Cutts recounted how Google started out slow only giving messages in Webmaster Tools for hidden text and parked domains. Then earlier this year Google sent out messages for everything but egregious black hat spam.
While Google decided to be more transparent, they won’t go so far as to publish the algorithm. Cutts said Google wants to debunk the idea that it gives itself an unfair advantage to its own properties.
Google Doesn’t Hate SEO
As he has in the past, Cutts pointed out that Google doesn’t hate SEO. The goal of SEO is to make websites more crawlable and faster. When SEO becomes an issue is when spam comes into play, such as if you go overboard buying links, doing comment spam links, or keyword stuffing.
Google the Publisher
Discussion then turned to Google transitioning from being a search engine to a publisher with Knowledge Graph, Google Flights, Google Places, among others. Webmasters are worried that Google will eat up their traffic and that if they get on Google’s radar, Google just may add another tab and launch a new product.
Cutts reminded the audience that Google’s aim with the Knowledge Graph, or providing a calculator in search results, or reporting sunrise/sunset times in search results is giving users the answer they want rather than making them go to a website.
Obviously, webmasters want as much traffic as possible. Google is looking at the “value add,” Cutts said, such as creating something original (research, analysis, opinion). In Google’s view, if a website can create pages that give users basic information that takes 3 seconds, then it’s fair for Google to give that same stuff to users directly within the search results.
User expectations go up every year. They expect natural language/query understanding. Google is driven by providing what users expect. Whatever you type into the search box, they try to give the best information.
Facts can’t be copyrighted, whether it’s a video game release date or the height of Eiffel Tower. However, if your site is a collection of facts, then that would become a resource rather than something on the low end of quality.
Cutts says the Knowledge Team doesn’t care whether they make money, lose money, or are neutral. The search team isn’t talking about how much money they’re going to make off of search features, Cutts said.
Google still sends a ton of traffic. People ask for more and more, think Siri, so if Google doesn’t provide that info, everyone’s going to go to another search engine that does. The litmus test is what’s the value add and can Google help users get information faster.
Users have to come first, and Cutts said Google understands that the web is made up of websites and those websites have owners and webmasters and it has to be a good value from everyone or everything will go to apps or walled gardens.
Google is more of an information company than pure search, Cutts noted.
Google Has a Sampling Problem
Google has seen more than 30 trillion URLs and crawls 20 billion pages a day. One hundred billion searches are conducted each month on Google (3 billion a day).
Despite this, Cutts said Google still has a sampling problem. There are always spider traps and the web is always changing.
Cutts said it’s hard to know who published what first. Over time, Google can detect when one site seems to be publishing original content, while another site seems to be publishing duplicate content.
Google has worked hard on returning original content. Google has to crawl, it can’t magically know the web.
Google Doesn’t Put a Lot of Weight on +1’s
How much does Google+ help rankings? It’s a signal Google will look at and they’ll see how good it is. Over time, they will continue to experiment. Cutt said Google doesn’t put a lot of weight on +1’s yet.
Cutts also pointed out that, in response to feedback, when you do searches you’re much less likely to see Google+ in search results, compared to January. The 10 year trend might be the more real you look on search results, the better you will do but don’t assume it’s automatically a ranking.
Hey, Google: Tell Me What to Fix!
Why doesn’t Google rate sites, specifically telling website owners what’s wrong? Actually, Cutts revealed that’s the direction Google wants to move in. As an example, he noted you’ll be alerted if you’ve shot yourself in the foot with robots.txt.
Cutts says they will “turn up the knob” on transparency, to tell site owners that the site is good but Google might not trust some links.
Cutts acknowledged that it doesn’t do anybody any good if they don’t give actionable advice. He thinks Google will get there toward end of year and moving on.
Panda! Penguin! Panic!
After Panda/Penguin, there’s a lot of extreme panic, with some people and websites taking extreme measures such as sending cease and desist letters over links. Google tries to have incremental changes. For the most part gradual and logical.
Sometimes things have drifted off course with link spam techniques and directory schemes. Taking this action was a big change and sites were unhappy but Google thought it was best for users, though people were shaken by the course correction.
Create a site that will stand the test of time, Cutts said. A site that people will tell their friends about and bookmark. That’s the site Google wants people to build.
By this time next year, Cutts hopes people are less likely to do link buying, blog comment spam, etc.
Cutts said Google has been consistent. They try to show the content they think came first or has the most value. If you have 2,000 items from an affiliate feed then your site isn’t all that great in Google’s eyes.
You don’t need to worry about duplicate content on your own site unless it’s on massive scale. If you repeat 2-3 paragraphs on every page. Google may not penalize but it may not count in terms of value added.
Make few pages with original content. If it’s a sentence/paragraph it’s fine. Otherwise, Google doesn’t want to highly rank the same content with one different link, as that’s close to doorway pages.