AnalyticsHow to Detect and Deal With Toxic Content (That Could Poison Your Entire Site)

How to Detect and Deal With Toxic Content (That Could Poison Your Entire Site)

It's critical to habitually audit your domains for low-value content that could negatively impact organic search visibility. Here are some efficient ways to automate and scale your efforts by leveraging the following tools, tips, and tactics.

Danger Hazardous MaterialWith the recent Panda 4.0 update, it’s more important than ever to tend your content flock. Panda isn’t going away. In fact, Google continues to dial up the importance of quality content and user engagement signals and is only ramping up the frequency of Panda Updates and refreshes.

It’s critical to make Panda-proofing your site a priority, by habitually auditing your domains for low-value content that could negatively impact organic search visibility down the road.

The practice of regularly auditing content on your site should include:

  1. Hunting for and flagging content that might be low-value or poor-quality.
  2. Cleaning up your low-value content (either removing it, improving it, or no indexing it).
  3. Making it your mission to only publish high-value, informative, engaging content (the most effective way to Panda-proof your site).

Theoretically, if you only publish quality content, you should never be at risk. But often that’s easier said than done, especially if you run an ecommerce site, or you work with teams of contributors, or you don’t control what gets published.

Even if you feel you’ve been publishing amazing content exclusively, most sites host legacy content that can be pretty un-amazing. So you still may need to do some house cleaning to get your site in shape.

What Is Low-Quality Content?

Content that’s poor in quality or low-value can come in many different flavors. Examples include:

  • Poorly written content: Maybe you bought large blocks of content for $5 an article from content brokers or dollar sites. Or maybe you used article spinners or concatenation schemas to scale and produce “unique” content across hundreds of automated listing pages across your site. Or maybe it’s as simple as each time you read certain pieces of content on your site you think “this isn’t very good.”
  • Thin content: Pages that are too brief to convey any useful information could be seen as “thin.” So pages that are less than a few hundred words could be problematic. Now, I’m not saying it’s all about word count, since there are instances where you can deliver value in a few hundred words. But that’s generally not the case. So if you’re taking inventory, low word count is a key indicator to note for locating potential trouble spots across your site.
  • Duplicate content: Very similar or exact versions of the same page across different URLs, boilerplate copy across multiple pages where they’re very little unique text, duplicated manufacturer product descriptions, and so on. These are all real world examples of duplicate content.
  • Expired content: Do you have expired content on your site? Examples would be special offer pages with expired dates, old job listings, expired real estate listings, out-of-stock products, and so on.
  • Dead content: Pages that serve a 404 error because that file/content no longer exists.
  • Un-engaging content: Even if your content doesn’t fall in any of the categories above, it may still be a potential threat because users don’t find it particularly interesting. Just because it’s 100 percent original, meaty, and grammatically sound doesn’t mean it’s engaging by default. It could contain flawed research, or the tone is dull, or it’s poorly structured, or maybe it’s totally self-promotional. Any of these factors could lead to poor user engagement signals (short dwell time, high bounce, etc.), which in a nutshell is at the core of Panda.

Methods to Unearth Low-Quality Content on Your Site

So how do you find this potentially toxic content? One way is to manually click around and review every page on your website to assess its quality and value to your audience. But that’s pretty laborious, especially if you plan to make it a regular practice.

Another more efficient way is to automate and scale your efforts by leveraging the following tools, tips, and tactics.

Screaming Frog

One tool I rely heavily on for a range of tasks is Screaming Frog. You can run a crawl to find three different content trouble spots, including thin content, duplicate content, and 404 errors.

  • Duplicate Content: One of the quickest ways to spot duplicate content with Screaming Frog is to run a crawl report of your site and identify duplicated page titles and/or file names. You can sort your spreadsheet crawl doc by title or file name, A-Z.

Identify Duplicate Content With Screaming Frog

If you find identical page titles or file names or even if they’re really close, check to make sure they aren’t duplicate versions of content.

  • Thin Content: Screaming Frog also allows you to automate the process of hunting for content that is potentially thin or low value by delivering word count totals for each page on your site. You can sort by smallest to largest and flag suspect URLs to investigate further.

Page Word Count Totals

  • 404 Errors: Screaming Frog also generates a status code column in each crawl report where you can sort by 404 errors on your site.

Google Analytics Custom Report

Another way to unearth low-value content is to look at user engagement signals in Google Analytics. Typically content that is poorly written, of low quality, or is un-engaging will exhibit substandard engagement signals.

These engagement signals would include:

  • Bounce Rate: Whether a user returns to or “bounces” back to the search results without going deeper into your site may be an indicator of an unfulfilling or poor experience for users.
  • Average Time on Page: Are users spending a very short time on some specific pages on your site compared to others? Is there a reason why this dwell time is so low, and could it be because the content is lacking value?
  • Pageviews: Pages on your site that get very few unique views might be a source of concern. Why are they getting so few views, and does that indicate a lack of quality?
  • Pages per Session: Are users digesting additional pieces of content on your site in a session? If not, it could mean they didn’t find any real value in the initial piece of content they read.
  • Goal Completions: Low goal completions won’t negatively impact your organic performance or invoke the wrath of Panda, but it could be an indicator that your content needs an overhaul or your messaging is off. Again, it’s worth noting when looking for trouble spots.

To help quickly access and evaluate these metrics in Google Analytics, I’ve created a custom User Engagement Analysis Report you can apply to your own site. Click this link and assign it to a profile.

Now, it’s worth noting that each of these metrics on its own would necessarily constitute a poor user experience. But in aggregate, pages with low dwell time and high bounce rates, for example, may be problematic and certainly worth looking further at as part of your comprehensive content audit.

Dealing With Low-Quality Content

Once you’ve found low-quality content, you want to come up with a plan of action for how to deal with it. This usually means one of two things: you’re either going to keep the page in question and work to improve it, or you’re going to kill it altogether or maybe hide it from the engines.

If you do intent to purge content from your site, there are some key things to consider first, such as:

  • Does it drive traffic? Be sure to pull a landing page report from analytics for the past three to six months to determine if this page drives traffic to your site. If so, you probably want to think twice about removing it because, even though it may not have stellar engagement signals, it’s still performing. Deleting it effectively kills that traffic too.
  • Does it have inbound links? It’s difficult to get links into deep pages on your site naturally, so you want to think long and hard about deleting content with links. If you do decide to kill a page, be sure to redirect that link equity to another, relevant page on your site.
  • Does it provide real value? I strongly advocate for not letting data points alone dictate the fate of a page. Before you kill a page, be sure to actually take a look at it and determine if there’s value there that the data points might be missing. For example, maybe it’s a page that’s really thin but it’s a key step in your funnel or is critical for navigation. But you realize it isn’t necessarily a page you want in the index because it might be seen as low value by Google. So rather than killing it, you hide if from the engines.

So again, before removing any content from your site because it’s potentially low value or exhibits poor engagement signals, be sure to carefully weigh the risks and consequences and assess the impact that action might have.


The State of Content Marketing 2022 Global Report

whitepaper | Market Research The State of Content Marketing 2022 Global Report

Modular Content Is The Key To Customizing Experiences At Scale

whitepaper | Content Modular Content Is The Key To Customizing Experiences At Scale

The Semrush Content Writing Workbook

whitepaper | Market Research The Semrush Content Writing Workbook

Data-Driven Market Research and Competitive Analysis

whitepaper | Market Research Data-Driven Market Research and Competitive Analysis