Many publishers continue to seek cheap and easy ways to publish lots of pages and get them up with a goal of bringing in incremental search traffic. One method many people look at is content curation.
However, content curation could be a very risky practice for you. There may be a role for curated content on your site, but there is a chance that you should be placing noindex tags on those pages.
Two forms of content curation that can potentially be useful to users are:
- Posting a list of links of the latest news.
- Building pages listing the best resources you can find on a particular topic area.
But, what does Google think about it?
The Content Curation Spectrum
For some background on this, it is useful to watch this video in which Matt Cutts answers the following user provided question on whether sites are better off removing sections creating duplicate content, such as press releases or news:
Cutts said the answer is probably “yes.” He then goes on to draw a line representing a spectrum beginning with really poor quality sites and ending with the New York Times. I’ve taken the liberty of greatly enhancing his chart as follows:
The New York Times will be fine with curated content, as are certain other major media properties. Why? Getting in the New York Times is hard. They have a strong editorial process.
Let’s break this down a bit and look at how Google may algorithmically determine these things:
- Brand: Obviously the New York Times has a highly recognizable brand. One of the reasons they have a strong editorial policy is that publishing poor quality content would damage that brand. A simple algorithm approach to evaluating brand strength is counting the number of searches on an entity’s brand name. Lots and lots of searches is a quick indicator for you of brand strength (I’m not suggesting that Google does this by the way, but you can use it as a crude measure for yourself).
- Links: The link graph remains alive and well. A rich and robust link graph that can be an indicator of an authoritative site. One of the most important patents in search engine history is the Jon Kleinberg patent on hubs and authorities. Regardless of how search engines determine authority today, the concept of using the link graph to do so is likely the major way they do that. Link graph shows a lot of authority? Then you move to the right of our Content Curation Spectrum.
- Quality of the Linked Resources: This is something that they could use as well. For an extreme example, if your curated list includes links to obvious spam sites, or really poor quality pages, than the whole list is called into question. Of course, there is a whole spectrum from obvious crap to obvious awesome stuff.
- Publisher Authority: Here I’m specifically referring to the potential use of rel=publisher tags on the content you create. While the tag has been known for a while, little was known about how it might be used until Google’s announcement of in-depth articles. Consider the possibility that Google will use this as a tool to help it track the overall quality of the content published on your site.
- Author Authority: Google has been actively using rel=author as a way to show rich snippets including the author’s face in the search results for some time. I wrote about Author Rank as a potential signal in March, and predicted that it would become a ranking signal this year in January. The in-depth articles announcement may even have been a step in that direction. For our purposes, Author Rank is a signal that isn’t tied to a given website, but to a specific author, and can travel with you wherever you as an author publish content.
These are all signals that can clue Google into where a given site is on the Content Curation Spectrum.
How Will Google Use These Signals?
As explained in “Google Doesn’t Care if it Ranks Your Site Properly“, Google really isn’t targeting your site individually (unless you’re penalized). Google’s primary focus is on improving the quality of their product.
Perfect ranking of each individual site is fundamentally not an achievable goal for them. They operate a different level, and they work hard at making their product better. But, how it impacts organic search traffic to individual sites is a side effect for them.
As it relates to content curation, the main point is that Google already has a curated list of content. It’s called their search results. Yes, this list is algorithmically curated, but it’s a curated list nonetheless.
Google’s curated content is backed by an extraordinary amount of scientifically-driven testing. If you’re using software to curate content for you, the chances that your machine-generated list is better than Google’s is basically zero.
Curated Content vs. Search Results
Your hand-curated content isn’t that much different than a set of search results. Long ago Google made it clear that they don’t want to show your site’s search results within their search results.
Unless there is some clear differentiation, it is my opinion that curated content is in the same boat as a site’s search results. Google has little reason to show it, because a different set of search results, hand picked or not, doesn’t really add any value to people over what they get in their search results.
Don’t get me wrong, truly expert humans can probably pick a better list than Google, or perhaps even just a comparable list, with a somewhat different focus. But how might Google actually detect that added value?
Google can clearly detect the New York Times. They can clearly detect the people on the far left of the Content Curation Spectrum.
But what if you’re in the middle? If I were a Google engineer, I’d place no value on the middle either, and I wouldn’t rank it, unless other signals gave a clear indication that something was different about it.
The curated content list may be great stuff, but there’s no way to know really, and it isn’t worth the risk. Besides, there may be one or two highly authoritative lists of curated content in their search results (more power to you if you’re one of them!), so showing another one from someone of unknown authority doesn’t make sense for their product.
What Should Publishers Who Want to Curate Content Do?
Now, the curated content you create isn’t necessarily resulting in a poor quality page for your site, but for purposes of this discussion that doesn’t matter. From an SEO perspective, what matters is whether that unique and differentiated quality is detectable both by users and Google.
In the video, Cutts gives some indications of what this might take. Here are three key phrases he uses:
- “a lot of effort into curation”
- “editorial voice”
- “distinctive point of view”
He also talks about including access to information that isn’t otherwise generally available. Here are some ideas on how you can make it quite different:
- Recognized Expert Analysis: In addition to producing the list of resources, add commentary and analysis from recognized experts and cement that with rel=author tagging. This is where the “editorial voice” and “distinctive point of view” come in.
- Unique Expert Reviews: Include expert reviews and commentary on the curated content. The key here: these reviews aren’t published elsewhere. Rel=author tagging is a good idea here, too.
- Data Sources: Accessing data sources not available to Google is also useful. Bear in mind, though: it’s critical that these data sources be very distinct and differentiated in a way that will be immediately obvious to both users and Google.
- Freshness: If you have a method for updating this in real time, and significantly faster than Google does, this may work as well. Note: a regurgitated news feed fails this test.
If you can meet one or more of these tests, great! Or you may want to publish the page anyway because you think your users will value it. That’s fine, but if the Google perceivable value isn’t there, I would noindex those pages.