Basic SEO Troubleshooting With XML Sitemaps

Date published 13 February 2012 Author

Ben Goodsell

Categories

Despite the fact that sitemaps are simply lists of canonical URLs submitted to search engines, it’s amazing how rare it is to come across a perfect one. Issues often arise when large site owners use sitemap auto generation tools that aren’t configured properly. These sites typically come with challenges for search engine crawlers like pagination and URLs generated by faceted navigation.

Spiders decide what pages to crawl based on URLs placed in a queue from previous crawls and that list is augmented with URLs from XML sitemaps. Therefore, sitemaps can be a key factor in ensuring search crawlers access and assess the content most eligible to be seen in search engine results.

The following is a quick overview of search engine sitemap guidelines and limitations followed by a technique to help identify crawling and indexation issues using mutiple sitemaps in Google Webmaster Tools.

Bing & Google Guideline & Limitation Overview

The sitemap protocol has been a standard adopted by search engines in 2006. Since then Bing and Google have developed useful Webmaster Tool dashboards to help site owners identify and fix errors.

Out of the two search engines, Bing particularly has a low threshold, or at least they outwardly state they begin devaluing sitemaps if 1 percent of the URLs result in an error (return anything but a status code 200).

Google provides clear guidelines, limitations, and a more robust error reporting system when using their webmaster dashboard. In addition to submitting quality sitemaps, ensure that files stay within the following hard limits applicable to Google.

Limit sitemaps to 50,000 URLs
File size should be under 50MB
500 sitemaps per account

Both search engines support sitemap index files. Rather than submitting multiple sitemap files individually, the sitemap index file makes it easier to submit several sitemap files of any type all at once.

Basic Sitemap Optimization

Basic sitemap optimization should include checking for pages that are:

Duplicated (multiple URLs in different sitemaps are OK)
Returning status code errors – 3XX, 4XX, and 5XX

And any pages that specify:

Meta rel canonicals that are not self-referential
Noindex meta robots tags

There are tools to quickly parse URLs contained within XML files and find this information like the Screaming Frog SEO crawler.

Using Google Webmaster Tools

Once comprehensive and quality XML sitemaps have been submitted to Google and Bing, breaking up sitemaps into categories can provide further insight into crawling and indexation issues.

sew-google-webmaster-tool-multiple-sitemaps

A great place to start is by breaking up sitemaps by page type. Sitemaps can be diced up in any way that makes sense to provide feedback, the main goal being to expose any areas of a site with a low indexation rate.

Once an area has been identified, finding the source of the issue can begin. Using Fetch as Googlebot to identify uncrawlable content and links is often very helpful.

Another particularly useful technique is to use multiple sitemap indexation identification (or MSII, an acronym I just made up…) in combination with advanced search operators in an attempt to find excess indexation.

So for example, hypothetically speaking, if your website is having trouble getting posts indexed, it would be helpful to create a XML sitemap containing only blog posts. From this, an indexation rate can be calculated and an advanced search can be used to see what other pages might be diluting the crawling and indexation of post pages.

The search below shows that tag and author pages could be the hypothetical hindrance.

Additional advanced Google searches like site:example.com/blog/ inurl:tag AND inurl:author could then be done to determine the scale of potential excess crawling and indexation. The same concept can be applied to dynamic parameters contained within URLs generated by faceted navigation, pagination, sorting products, etc.

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Opinion

Information

Follow us

Basic SEO Troubleshooting With XML Sitemaps

Bing & Google Guideline & Limitation Overview

Basic Sitemap Optimization

Using Google Webmaster Tools

Leave a Reply Cancel reply

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Optimize Google’s new Interaction to Next Paint metric

The Search Engine Watch Top 5!

The ultimate 2022 Google updates round up

Is Google headed towards a continuous “real-time” algorithm?

The new YMYL guidelines and what this means for marketers

How to drive B2B conversions from your organic traffic

Three critical keyword research trends you must embrace

Why we’re hardwired to believe SEO myths (and how to spot them!)