Website Indexation Audit: How to Find & Remove Non-Essential Content

Are you feeding search engines garbage? Taking a look at your entire index will allow you to gain an understanding of what Google and Bing see and index. Here's how to hone in on and assess your content, see if it's indexed, and clean it up.

Date published
April 29, 2013 Categories

holding-out-trash-bagsThere’s been a lot of talk about quality content in the SEO world for the past couple of years. We do our best to be mindful of Google’s Content Quality Guidelines and know that a human rater may review the page quality of the content we provide for consumption by users.

Looking at the Big Picture

Creating fresh content that is insightful, entertaining, or informational is important. But that’s not all you have to worry about. What about the content that’s already on your site?

Site managers and marketers can become complacent with their sites, viewing the same top level and product/service pages, blog pages, news pages, and so on. Looking at your entire index through the lens of a website indexation audit will allow you to gain an understanding of what Google sees and indexes.

How many junky, non-essential pages make up your site as a whole?

Finding Non-Essential Content

The first stop in assessing an overall site indexation audit is completing a scrape of your site’s URLs. My favorite tool for this is Screaming Frog.

While all the URLs you discover might not be indexed in search engines, this list will show you what you’re putting out there for crawler consumption.

A URL scrape may unearth some surprising results. Perhaps you’ll find that you need to redirect a lot of pages or use robots.txt to exclude content that has no place being seen by the search engine bots.

Review the URL scrape list for:

Is it Indexed?

Now, take some of the folder names and page names found above and perform a site: operator in Google and Bing to assess whether these pages are indexed. In Bing you can also review the Index Explorer to assess indexed pages. (Google, why don’t you have this?)

If these non-important pages are getting indexed you can now start making preparations for robots.txt exclusion, meta robots usage, or at the least canonical tag utilization.

Before you take any leaps of exclusion or canonicalization you have to review organic landing pages in your analytical profile. Use filters to search for the above content within Organic Landing Pages. It is highly unlikely that Google is crediting this content and it is driving traffic but it for some reason it is, then consider the consequences of removing this content from the indices.

While you’re reviewing some of the above page types in potential indexation situations, it doesn’t hurt to review your overall index results in Google or Bing. You might be surprised at what shows up.

Do you see those pages from the old site eight years ago that never got redirected and are long forgotten in a dark corner on the server? Do you notice that many of your site’s site search result pages are being indexed. These are a few good examples of the types of content you may not know are getting indexed by search engines.

Clean Up Your Site & Take Out the Garbage!

Don’t make web crawlers weed through the chaff to figure out what your site is about. Take the time to take out the garbage your site has been accumulating and feeding the search engines.

After you’ve polished your overall site theme, you can get back to creating quality content that will hopefully have a better chance of “shining” on your site because you found and removed all of the non-essential content.

Exit mobile version