WebPromo’s Q&A with Google’s Andrey Lipattsev [transcript]

Date published 6 April 2016 Author

Categories

As we reported at the end of March, Andrey Lipattsev, the Search Quality Senior Strategist at Google Ireland revealed the three most important ranking signals used by Google.

These are content, links and RankBrain (although that last one is “hotly contested” and the list doesn’t have an order).

This information was uncovered during an excellent live Q&A from WebPromo, which also featured Rand Fishkin from Moz; Ammon Johns, the Managing Director at Ammon Johns & Co.; Eric Enge, CEO at Stone Temple Consulting; and the whole thing was hosted by WebPromo’s Anton Shulke.

We’ve partnered with Anton to bring you a transcript of the entire one-hour long Q&A.

As you can imagine, it’s a very lengthy read. We have trimmed it for any repetition or digressions, but there was so much brilliant chat between the group that we’ve decided to keep it in its 7,000+ word form.

The discussion covers:

Removal of PageRank toolbar
Click-through-rates as a ranking signal
Google’s top ranking signals
Machine learning vs spam
The state of SEO outside of the US and Western Europe,
New mobile friendly update

Now set aside half-an-hour, pour a cup of coffee and enjoy.

A massive thank you to our staff writer Rebecca Sentance, who spent a huge portion of her day transcribing this video.

Removal of the PageRank toolbar

Ammon Johns: Why remove the PageRank toolbar if PageRank is still a part of Google’s ranking algorithms – which of course, we all believe it is – even if it wasn’t used on its own?

Andrey Lipattsev: I get it. And it’s a really good question – I promise you I do have a couple of answers, which I hope are reasonable – but let me ask you a question back. Why do you think it was useful in the first place? Why was it a good thing to have?

Ammon: When it was still being updated it helped me give a rough idea of what the crawl priority of a site might be. And for that reason, it was useful to me because I could see that if I get a client comes to me and they’ve been following some SEO advice, they’ve had the blog, they’ve had the news feed, they’ve built up thousands and thousands of pages – but they’ve got a Toolbar PageRank of two, I’m already thinking…

‘With a PageRank of two, they’ll be lucky to get a thousand pages regularly – spidered and indexed, every month – given that there are so many news sites, there are so many things that Google has to pick up, all of the time; this just isn’t going to be a high enough priority to have this number of pages. They may be giving Google more than it can digest, and therefore detracting from where the value is – the core pages.’

So it had a use there. It had a second use, which was more financial. Lots and lots of bad SEOs used to base their strategy on Toolbar PageRank. As long as they’re doing a bad job, there was more work for people doing a good job like myself.

Andrey: By all means. And I think the second reason alone would have been sufficient to get rid of it. But I mean, if that was the only reason, we probably would have tried to keep working on that.

You know that it wasn’t so much removed as it died a natural death more than anything. Nobody was looking at it, nobody was developing it because it wasn’t bringing very much value internally. Essentially, it became so out of date, so when going back to Ammon’s first point about its usefulness, that kind of all went away, leaving only the second value there.

PageRank toolbar

So it was no longer a valid benchmark for a site’s usefulness, for a site’s likelihood to be trawled more often, or ranked well, because a) it was out of date, and b) there were a lot of other things in place where – you’re saying Toolbar PageRank two would have made it less likely to be crawled? Not really.

There’s a lot of other stuff in place, and PageRank two could have been ranked above PageRank eight very easily depending on what else was going on.

Ammon: That’s ranking, though – I’m not talking about ranking, I’m just talking about indexing, and I found the correlation there, across thousands of sites, was very high.

Andrey: But that link to me is so tenuous, between Toolbar PageRank and indexing that I’m not even sure that thing ever really existed.

It was supposed to be a reflection of the actual PageRank of a page, and that has no bearing on how often the page gets crawled.

Ammon: If you gave us the actual PageRank, I’d have used that, believe me, but you wouldn’t share that. I did ask!

Andrey: I can understand, you know, the second-degree links between the page’s PageRank and how often we’d come back to it, but there’s a lot of other things in play there, that also need to be taken into account. So no matter what you’re thinking about, whether crawling or ranking… it a) has gradually become just one thing out of very many, and therefore not reflecting of the real picture; and b) has stopped reflecting what it was supposed to reflect in the first place. So it wasn’t very useful.

And as I said, c) it became something that it was never supposed to be, it became something of a currency for some SEOs. And I’m not saying everybody was doing it, but clearly, as yourself acknowledged, a lot of people started using it like that, and a lot of SEO contracts would be ‘I will get your PageRank to this’ which is kind of meaningless, really.

And so, what I was going to tell you beforehand is, this reduced meaningfulness and also, we are hoping that the improved stats we are providing now – for example to Search Console, the improved search analytics report – are the stats we’re looking at. They are the bots you’re looking for.

You know, your clicks, and your impressions, and the queries and the pages; that’s what you should be looking at, and as an owner, as an SEO, comparing it to other people if you have the data, and so on and so forth. And build your strategies, and your analysis on this kind of data, not just on one number, which was kind of neither here nor there.

Not to mention the fact that, last but not least, I have not seen that famous toolbar, in which it was supposed be a plugin, on anyone’s computer for a long time. Granted I am a very biased sample, because most people around me have Macs and Linux machines anyway, but I haven’t seen anyone with that toolbar with the thing in it.

Click-through rate as a ranking signal

Eric: I want to talk a little bit about click-through rate, and I want to talk about it from a couple of perspectives. Rand, at SMX Munich, I believe, ran a fresh test that showed at least a temporary movement of the ranking of an item that they were trying to promote by sending a lot of clicks to the page, where it kind of jumped in the rankings, and then over time it came back down.

And then in addition, Paul Haahr gave a keynote at SMX West in which he walked through ‘how Google Works’ and he talked for a while about click-through rate, more in the context of controlled tests by Google, so [Google] will roll out some algorithm change, and one of the things it might look at is the user click-through rate on the revised search results, in order to see whether that’s a better result.

And then he also explained that the reason why Google doesn’t use it as a general ranking factor is because it’s too gameable, but in this controlled test environment, it allows you to use user interaction with the result in the way of measuring search quality, to decide whether to roll out a new algorithm update.

And just to finish my rather complex question, the point of that is that it seems to me that if you’re using click-through rate as a main measurement of other ranking factors, to better measure search quality, it doesn’t really matter to me that much whether it’s a direct ranking factor or an indirect ranking factor. It still is used in evaluating search quality.

Andrey: To be honest, I think I’m kind of with you there, and what Paul said, I’m not going to disagree with Paul, but also what we’ve been saying before. I think, if you look through the majority of our past comments on this topic, you won’t find anything to the contrary.

We do, if you like, use that as a factor to assess our quality, and treat it as you like, fair enough, in that sense, if you’d like to… It’s just, it’s very important, because you know how headlines go. Tomorrow’s headlines from anybody who watches us today will be, ‘Google uses behavioural factors for ranking!’ And you know what people will interpret from that.

Because on the one hand, yes, in the sense that you described; on the other hand, no, in the sense that most people understand.

/IMG/605/324605/roadmap-user-metrics

So it’s very important to kind of come in with a slightly more complicated explanation. Somebody commented to me the other day about being able to answer yes or no to questions, and I told them that sometimes there are questions that are not yes or no, even if you phrase them in a yes or no fashion; it doesn’t make it possible to give a yes or no answer.

So anyway, coming back to what you said, I think you described it pretty accurately.

Rand: Why is it the case that seven or eight times in the last two years, I’ve done something, just having a little fun, so I’ll be standing on a stage in front of 500 to a couple thousand people, and I’ll ask them ‘hey, can you all jump on your mobile phones, or on your laptops, and do a search? And I want you to click the seventh, eighth, ninth, tenth result, and then over the next 24 hours, let’s observe what happens to that query, and what happens to that page’s ranking for that query.’

I’ve had seven or eight of those that have been successful, and I’ve had four or five where the ranking did not change. And I’ve run a few of them over Twitter, again, a few where the ranking did change, and changed pretty quickly, and usually sticks around a day or two, and a few where nothing happened.

So in your opinion, what’s happening there that’s making the ranking for the page that gets clicked on change so rapidly, and then what’s happening when it falls back down, again relatively rapidly over the next day to two/three days?

Andrey: It’s hard to judge immediately without actually looking at the data in front of me. In my opinion and what my best guess here would be, is the general interest that you generate around that subject – by doing that, you generate exactly the sort of signals that we are looking out for. Mentions, and links, and tweets and social mentions – which are basically more links to the page, more mentions of this context – I suppose it throws us off, for a while. Until we’re able to establish that none of that is relevant, to the user intent.

Eric: So back to the other part of my question, I just want to acknowledge that, I agree that a lot of people might run off and say, ‘oh my god, click-through rate is a ranking factor in a more general sense!’

/IMG/383/189383/click-through-rate-curve-slingshot-seo

And personally, I understand why that would be bad, because it’s so gameable, but if you look at it more holistically, it seems to me that engagement signals and ways that users interact with content, using that as an indirect thing where you’re using it to qualify search quality, so you can pick other ranking signals that reflect that well…

As I internalise that, it gives me I think more ammunition to point people to make better websites and better webpages. And that’s the reason why I was asking the question, because to me I like to be able to show people why that’s so important and why they should think to themselves that this will help them, over time, with their SEO.

Not because you’re employing it directly in your algorithm, but because you’re using it to qualify your algorithm.

Andrey: Eric, I think you’re absolutely right, and your message to the people that you work with is absolutely spot on.

I think it is already significant enough that we’ll look at what users do on our search result pages, as we change them, to evaluate the usefulness of these changes, for people to take into account, you know, when Google changes the algorithm the next time and my page gets exposed, people like it or don’t like it, come to it or don’t come to it, I should probably pay attention to how people like or don’t like what I’m trying to offer them, as part of what it appears like in Google search results.

The disadvantages that I’ve most often seen described for this approach on a clear, pure ranking factor basis is that we’d need to have broad enough and reliable enough data about bounce rates, click-through rates, depth of view for the vast majority of pages and the vast majority of websites everywhere, in order to be able to make meaningful comparisons all the time.

That is impossible, because we don’t have the technical means to do it. Even when you think about Google Analytics, not everybody has a Google Analytics code by far, so we can’t use that.

If we don’t use that, what else are we gonna use? Start trying to come up with something we could use, but it’s always going to be a struggle.

Going back to the original idea of links, you can kind of reasonably say, ‘I see a page, I see the links on that page, I see where they’re going’ – if we can see them, granted they can be no-followed, they can be hidden, that’s like little things, but by and large, they’re here, we can see them, we can use them. The words on the page.

This stuff, any reasonable person with a bit of experience could think of ways how you could measure user behaviour for particular pages and particular websites, but measuring it web-wide…

Rand: Andrey, correct me if I’m wrong, but you don’t need it particularly web-wide, you only need it on the search queries, right? So Google sees that, on average…

Andrey: Right, but is that enough? Imagine we only ever measured the links for pages that appear in our results. Or, like, the top 10 of our results. So you end up with a very small subset of everything that’s out there, and the worst thing about it is you end up reviewing only that, because it’s the only thing that ever gets clicked on, because it’s the only thing that ever gets shown.

You can find ways to get out of it, I agree with you. You can find ways around it, and solutions to it, but this is the reason we’ve been saying it’s a tough challenge. It’s gameable on the one hand, and it’s a tough challenge to actually make a very strong signal out of it.

If we solve it, good for us, but we’re not there yet.

Rand: I mean, I think that you have filed some patents, have written some papers about using pogo-sticking, and pogo-sticking certainly seems like a very reasonable way to measure the quality of search results. If something gets a lot of clicks, and then people click ‘back’, or they click on something else, clearly that result didn’t fully satisfy them. So that seems like a very reasonable user approach.

Pogopalooza_10_High_Jump_Record

Andrey: That is a reasonable approach; it is one of them, so as a patent goes, that’s an interesting idea, but take it into account, what if the nature of the query is such that you won’t go to that? You’re comparing things; you want to see this one, and you want to see another one, and then you want to make a decision.

So all these things come into account… We experiment and explore other ways, not just this. Can we look at what queries are? And can we just go back to the basics of what’s the content on the page? How can we understand that better? And how can we understand the entities on the page? And so on.

I think there can be more research done into user behaviour factors, and how we use them well. But there’s also like a million other avenues of research, and maybe some of them will be more promising.

What are the top signals Google uses for ranking?

Ammon: RankBrain has become this new keyword that everyone’s latched onto; I’m seeing already companies that are selling ‘We’re the SEOs that have taken into account the latest RankBrain upgrades’ – despite the fact they can’t possibly, because we’re all still examining what it does, what its limitations are, and there’s no way of knowing that from the outside completely, especially since it seems to combine with several others.

Now my understanding of Hummingbird is that it’s led to this, that Hummingbird was brought in more context, more idea that the meaning of the query was more important than the words of the query. And I think the natural consequence of that is, there’s times when it isn’t. There’s times when the way we’ve worded it is very specific, and it seems that RankBrain is one method of being able to spot this – Gary Illyes’ example was the word ‘without’. That one word was the most important word in the query. ‘Can I complete this without such-and-such?’ ‘Without’ couldn’t be changed.

So… Is that kind of where we’re going with this? We’ve heard that this is the third most important signal contributing to results now. Would it be beneficial to us to know what the first two is? Could webmasters build better sites if they know what the first two is?

Andrey: Yes; I can tell you what they are. It’s content, and links pointing to your site.

Ammon: In that order, or another order?

Andrey: There is no order. Third place is a hotly contested issue. I think… It’s a funny one. Take this with a grain of salt.

It’s obviously up to Greg the way he chose to phrase it when he was doing it, and I understand where he was coming from, being somebody who worked on that. The way I interpret his meaning is that if you look at a slew of search results, and open up the debugger to see what has come into play to bring about these results, certain things pop up more or less often. Certain elements of the algorithm come into play for fewer or more pages, in fewer or more cases.

And so I guess, if you do that, then you’ll see elements of RankBrain having been involved in here, rewriting this query, applying it like this over here… And so you’d say, ‘I see this two times as often as the other thing, and two times as often as the other thing’. So it’s somewhere in number three.

It’s not like having three links is ‘X’ important, and having five keywords is ‘Y’ important, and RankBrain is some ‘Z’ factor that is also somehow important, and you multiply all of that… That’s not how this works.

The way we can look at it in a useful way is that we are trying to get better at understanding natural language, and applying machine learning and saying ‘What are the meanings behind the inputs?’

It’s still early days; we cannot claim that typed queries, whether mobile or desktop, have reasonably subsided or are going away. But more and more so, people are interacting with their devices using voice. So we can expect the use of stop words, words like ‘without’ more often.

People still tend to be a lot more mechanic and overthink their queries a bit – as I tend to, anyway – and think, ‘Okay, so what is a query that is completely not human in nature, that sounds like what the machine would understand?’ I don’t know if you guys catch yourselves doing that. You don’t generally type a question to Google as you would ask a real person.

Ammon: I do structured queries; what’s the most important concept? Right, I’ll put that first… What’s the modifier to that?

Andrey: You’ve gotta admit that in that sense we are pretty advanced users; a little bit outside the norm in this sense. As a company I guess we’re not so much looking to support us guys, because I think we’ll kind of figure our way out, but to start supporting people who are just joining the net. And to whom, for example, the mobile experience is the first experience, and the Google search application is their first experience of interacting with the web and getting answers to their questions.

And they don’t know that you need to say robotic words and omit commas, dots and stop words, everything; they just speak. And they’ll say, ‘How can I complete Mario Karts without cheating?’ They’re not going to think about the word ‘without’ in that sentence, so it’s up to us to figure out what’s behind all of that.

I think there was a bit of conversation on Twitter, as well, with Gary involved, about ‘So what does this affect?’ Does this affect indexing, does this affect ranking, and Gary was trying to say ‘Well no, it doesn’t affect ranking, it’s not a ranking factor’… It becomes a very complex conversation when you get to that.

Ammon: How about ‘everything affects ranking. Otherwise there’s no point having it.’

Andrey: Well ultimately, I guess so, yes; ultimately, even your webpage’s accessibility affects ranking, because if we can’t access them, we can’t rank them, even at that level. It doesn’t affect the ranking of an individual page, but what it does affect is our understanding of a query.

So once our understanding of a query changes, we’re more likely to throw something different as a result. That’s the effect that it has. But it’s not the same effect as knowing there’s so-and-so many links pointing to that page, or knowing there’s such-and-such words on that page. That’s something else.

And I think you mentioned Hummingbird – the very initial roots of that are in the synonymisation attempts, understanding synonyms better and replacing them the old way back then; those were just libraries, to some extent static libraries. This is much more interesting. Also as Greg has said, and some of the other guys have said: we don’t know what it’s doing. We’re not supposed to know, I guess, because it’s machine learning; that’s the whole point. You throw it out there and it does its stuff.

But the ideas behind it is this, and they’re the same as when we started talking about Knowledge Graph and introduced it, and that became very much part and parcel of our search results; people expect it to be there, people expect it for entities.

Three years ago, I remember quite vividly – I think it must have been 2013; I was at a conference in Russia and we introduced this for the first time. It was a bit of a struggle to explain – ‘What is it that we’re doing, and what are we doing with all these new things? What is ‘entities’ and how is it different from words?’

Now people expect it. You’re looking for an actor, you’re looking for a city, you’re looking for an event; yeah, you want to see that card, or have a thing from a source, follow links, yeah – that’s the experience.

So this is hopefully enhancing that experience, allowing us to understand what else has that potential, what else can be explained in such a way, what else can be understood better. Negative queries, as Gary called them, that’s definitely one thing that hopefully is going to work better from now on.

Eric: There’s been so much confusion that’s spun out of this, and there’s some things that I think we can perhaps help people understand a little better here. So, as you described it, Andrey, and there’s conversations I’ve had with Gary about it and other Googlers.

‘Better understand language, to help us better match queries with appropriate webpages, from a relevance perspective.’ Put a period, end of sentence. ‘Doesn’t take over other ranking factors’ – is that a fair top-level assessment?

Andrey: I suppose, yes? Probably have to see that in writing to make sure I fully understand every single word of it and how they fit together, but yes, generally speaking, I’m there with you.

Rand: Can I slightly disagree with that last sentence? This is just based on what Gary and I tweeted back and forth the other day, which is – I think the idea of RankBrain is that it can re-order or re-weight ranking elements, in order to produce more relevant results.

Andrey: But I mean, at the very simple level, you know, in the example that Gary gave – Mario Kart without cheats – it’s not the weight you assign to the word ‘without’ in that query, and its presence on the pages you’re going to show for this, or not going to show for this.

It’s hard to kind of structure this answer clearly… When you look at it like that, you’re almost prone to saying ‘Yeah, it is a ranking factor because now we’re going to grab on to those pages that do have this word’ but it’s not just about the word ‘without’, it’s about the context of the whole discussion around this page.

It’s looking at what’s on the page but also how well it met the expectations of people who then interacted with that. Did us guessing that this particular word indicated a particular intent – was that a right kind of guess? And gathering that data continuously and then saying, ‘You know what, fair enough. Now we’re kind of confident that these words, or these combinations of words, indicate a particular intent, so we need to pay attention to such-and-such content.’

And it doesn’t necessarily mean that all of a sudden, links are more or less important for this page or that page. They’re still as important as they were, but now there’s also this other thing, on top of everything we know, we also know that for this query, this is particularly important.

Machine learning vs spam

Rand: Obviously RankBrain is one of the places where you’re most public about it, I think another place where Google is very public about using machine learning is in image search, and it’s really cool what you’ve been able to do, identifying locations of images and features of images.

A few years ago there was some talk about whether to use, or would Google use machine learning in webspam, and would they use it in other ranking factors, like around links or content? Are there any of those elements that you can answer and say, ‘Yes, we have been using machine learning in other factors like spam, or links, or content, or something else, and that’s been useful to us because XYZ’?

Andrey: I can say that we’ve definitely been looking at using machine learning across the board. Including webspam and other areas.

In webspam, I can’t point towards any particular huge success we’ve had, when usin

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Opinion

Information

Follow us

WebPromo's Q&A with Google's Andrey Lipattsev [transcript]

Removal of the PageRank toolbar

Click-through rate as a ranking signal

What are the top signals Google uses for ranking?

Machine learning vs spam

Leave a Reply Cancel reply

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

App store optimization success: Top five KPIs you must measure

Improving personalization with machine learning

How AI is powering real-time SEO research: Insights and optimization

Google RankBrain: Clearing up the myths and misconceptions

Google's PageRank algorithm, explained

Looking through the artificial intelligence mirror: insights and automation

Artificial intelligence and machine learning: What are the opportunities fo...

How to escape Google's filter bubble