Google RankBrain: Clearing up the myths and misconceptions
It’s been nearly 3½ years since Google announced RankBrain, but we have little detail about how it works. Here's what we actually know--and what is myth.
It’s been nearly 3½ years since Google announced RankBrain, but we have little detail about how it works. Here's what we actually know--and what is myth.
It’s been nearly 3½ years since Google first announced their usage of RankBrain (October 26th 2015, but it had started being rolled out early 2015, in multiple languages).
In that time, there’s been little in the way of details coming from G about what it is or how it works.
The result is that numerous SEOs have stepped up to fill that void with their own speculations and opinions, and in doing that, have caused all sorts of confusion.
This is my attempt to correct and clean up some of that mess.
(There is a TL:DR at the bottom if you want to skip the verbiage :D)
Though there isn’t much publicly available, what we do have is fairly specific:
“If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.”
– Greg Corrado, from Bloomberg’s Google Turning Its Lucrative Web Search Over to AI Machines}
Or, if you want it more succinct than that;
“… Lemme try one last time: Rankbrain lets us understand queries better. …”
– Gary Illyes (@methode), on Twitter
Resources
Analytics The 2023 B2B Superpowers Index
Analytics Data Analytics in Marketing
Digital Marketing The Third-Party Data Deprecation Playbook
Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study
Google receives a fair percentage of queries per day that it hasn’t seen before: 15% at last check.
These may include misspellings and typos, elisions/omissions, unusual phrasing/syntactic structures, the wrong word(s) being used, negations (“not x”), things that have only just happened etc. etc. etc.
RB receives these weird, wonderful, and new searches, and attempts to identify existing searches and results that are probably suitable for the searcher’s query.
Again, we aren’t exactly given a guided tour by G on this, but there are a few bits and pieces.
“… RankBrain uses artificial intelligence to embed vast amounts of written language into mathematical entities — called vectors — that the computer can understand. If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries. …”
– Greg Corrado, from Bloomberg’s : Google Turning Its Lucrative Web Search Over to AI Machines
So, rather than looking at words and attempting to parse them and understand the semantics (traditional Natural Language Processing [NLP]), it converts them into numbers and plots them on a chart (with multiple dimensions, not just X and Y).
Items near each other possess some form of relationship. The type of relationship will be reflected by each term’s position and distance from its neighbors.
If that sounds vaguely familiar, that’s because it sounds very similar to Word2Vector.
So when G receives a query it doesn’t quite recognize, it can find semantically related pieces, and look at the results.
But, what if it’s wrong?
“…
RankBrain is a PR-sexy machine learning ranking component that uses historical search data to predict what would a user most likely click on for a previously unseen query. It is a really cool piece of engineering that saved our butts countless times whenever traditional algos were like, e.g. “oh look a “not” in the query string! let’s ignore the hell out of it!”, but it’s generally just relying on (sometimes) months old data about what happened on the results page itself, not on the landing page. Dwell time, CTR, … those are generally made up crap. Search is much more simple than people think.
…”
– Gary Illyes (@methode), on Reddit
I’ve added the bold to draw your eye to the key part.
G may go back and look at what gets clicked for different searches, and check their performance. This can help the system learn what suggestions are suitable, and which ones are fails.
If so, I was lucky enough to get some help from Bill Slawski, who pointed me to two potentially interesting patents:
The first patent (computing numeric…) was worked on by Greg Corrado, from the Bloomberg quote previously referenced.
If you don’t fancy suffering the trauma of reading the patents, Bill has two far nicer bits that get you the insights without the need for painkillers:
How about we walk through a simple demo of the type of thing that RB does?
Query: How Nemee 2020
Google receives that query, and has nothing that appears to be a match and little that seems above a weak relevance.
So, it needs to do some work.
The query is vectorized, and the nearest neighbors for those vectors are found.
Included in the results are vectors that represent:
So we have two probable query types:
But we have a 3rd factor, the “2020”. When we look at the result groups, there are barely any pre-existing queries or results that include time with pronunciation, where are there are a moderate number of “how to” queries and results that do.
RB decides that the most likely results that match this query are those from the “how to make” queries, and so the results you would receive would match;
“how to make a meme 2020”.
No.
And that’s what this post is about — clearing up all the baloney some people have been pushing about “Dwell Time” and “Click Through Rate” and “Bounces” etc.
RankBrain doesn’t use UX signals from your pages.
For quick confirmation;
“… Dwell time, CTR, … those are generally made up crap …”
That’s from Gary’s AMA response I quoted above.
But, you can use a little common sense yourself at this point.
Ask yourself the following question:
Why would a system that is built to try to encapsulate relationships between text-strings be looking at how long someone spent on a page, or how fast they left?
When you stop and look at it that way, and consider the example above, you can see how site based UX signals have no relevance for RankBrain.
The only such metric we know they may use are SERP-based clicks to identify what type of results appeared relevant to that type of query.
Yes.
Google has even told us that we can 😀
“…
Optimizing for RankBrain is actually super easy, and it is something we’ve probably been saying for fifteen years now, is – and the recommendation is – to write in natural language. Try to write content that sounds human. If you try to write like a machine then RankBrain will just get confused and probably just pushes you back. But if you have a content site, try to read out some of your articles or whatever you wrote, and ask people whether it sounds natural. If it sounds conversational, if it sounds like natural language that we would use in your day to day life, then sure, you are optimized for RankBrain. If it doesn’t, then you are “un-optimize
…”– Gary Illyes (@methode), talking to TheSEMPost
I know — it’s a bit lame.
All you have to do is fly in the face of standard SEO practices, and aim for the exact opposite of what you would normally go for — high search volume.
Instead, look at all the queries, and then generate variants that aren’t in the lists.
I know, that’s even lamer!
(But, be honest, you did want to know :D)
But there is more — particularly for those that deal with time-relevant content; events and occurrences.
As these are “new”, the queries likely will be too (at least partially). To gain an advantage here, you might be able to look at similar searches yourself, and look at the patterns they possess. Once you have some samples and associated search volume data, you can pick and choose the ones you feel are most advantageous and relevant, and then weave them into your content.
If you want a little more insight into RB, and things like Association Rule Learning (delving deeper into the computing side of things), Dan Taylor has a previous article that may be of interest: Here’s how RankBrain does (and doesn’t) impact SEO
No — it’s a matter of inclusion.
Though Google has stated that RB is one of the most influential Ranking Factors, it’s not a typical SEO factor.
Unlike Titles or Link Text, it’s not a gradient or variable — it’s Boolean.
Either you are perceived as relevant, and included in the SERPs for a query — or you aren’t.
So you can optimize for RankBrain — but it isn’t a matter of ranking influence, it’s a matter of index inclusion.
It attempts to answer unknown queries by looking at previous search data and the relationship of the terms used in those searches.
By converting words into numbers and plotting them into vector-space.
It can then break a query into parts and look for similar terms in the vector space to try to understand the relationship and potential intent of the search.
Example:
Query : “how nemee 2020”
Convert query to vectors, find closest vectors, try to calculate probable matches.
Two distinct query types are surfaced; “create” and “say”.
“2020” associates more strongly with “create” than “say”.
RB will return SERPs for “how to make a meme 2020”.
No.
It handles words and vectors.
Things like Bounce Rate, Long Clicks etc. aren’t used.
Yes.
By writing naturally and ensuring your content contains variations.
For some types of content (occurrences/events/news) you may be able to check similar searches and get ahead of the pack.
Not in the traditional SEO sense. It’s not about “position”, it’s about whether you show for that query or not.