{"id":1025,"date":"2014-12-29T07:38:53","date_gmt":"2014-12-29T11:38:53","guid":{"rendered":"http:\/\/www.hmtweb.com\/marketing-blog\/?p=1025"},"modified":"2014-12-29T07:47:56","modified_gmt":"2014-12-29T11:47:56","slug":"xml-sitemaps-advanced-seo","status":"publish","type":"post","link":"https:\/\/www.gsqi.com\/marketing-blog\/xml-sitemaps-advanced-seo\/","title":{"rendered":"XML Sitemaps &#8211; 8 Facts, Tips, and Recommendations for the Advanced SEO"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" title=\"XML Sitemaps for Advanced SEOs\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps.jpg\" alt=\"XML Sitemaps for Advanced SEOs\" width=\"525\" height=\"326\" \/><\/p>\n<p>After publishing my last post about <a href=\"https:\/\/www.gsqi.com\/marketing-blog\/dangerous-rel-canonical-problems\/\">dangerous rel canonical problems<\/a>, I started receiving a lot of questions about other areas of technical SEO. One topic in particular that seemed to generate many questions was how to best use and set up xml sitemaps for larger and more complex websites.<\/p>\n<p>Sure, in its most basic form, webmasters can provide a list of urls that they want the search engines to crawl and index. Sounds easy, right? Well, for larger and more complex sites, the situation is often not so easy. And if the xml sitemap situation spirals out of control, you can end up feeding Google and Bing thousands, hundreds of thousands, or millions of bad urls. And that\u2019s never a good thing.<\/p>\n<p>While helping clients, it\u2019s not uncommon for me to audit a site and surface serious errors with regard to xml sitemaps. And when that\u2019s the case, websites can send Google and Bing mixed signals, urls might not get indexed properly, and both engines can end up losing trust in your sitemaps. And as Bing\u2019s Duane Forrester once said <a href=\"https:\/\/www.stonetemple.com\/search-algorithms-and-bing-webmaster-tools-with-duane-forrester\/\">in this interview<\/a> with Eric Enge:<\/p>\n<p><em>\u201cYour Sitemaps need to be clean. We have a 1% allowance for dirt in a Sitemap. If we see more than a 1% level of dirt, we begin losing trust in the Sitemap.\u201d<br \/>\n<\/em><br \/>\nClearly that\u2019s not what you want happening\u2026<\/p>\n<p>So, based on the technical SEO work I perform for clients, including conducting many audits, I decided to list some important facts, tips, and answers for those looking to maximize their xml sitemaps. My hope is that you can learn something new from the bullets listed below, and implement changes quickly.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>1. Use RSS\/Atom and XML For Maximum Coverage<br \/>\n<\/strong>This past fall, Google\u00a0<a href=\"http:\/\/googlewebmastercentral.blogspot.com\/2014\/10\/best-practices-for-xml-sitemaps-rssatom.html\">published a post<\/a>\u00a0on the webmaster central blog about best practices for xml sitemaps. In that post, they explained that sites should use a combination of xml sitemaps and RSS\/Atom feeds for maximum coverage.<\/p>\n<p>Xml sitemaps should contain\u00a0<strong>all canonical urls<\/strong>\u00a0on your site, while RSS\/Atom feeds should contain the\u00a0<strong>latest additions or recently updated urls<\/strong>. XML sitemaps will contain many urls, where RSS\/Atom feeds will only contain a limited set of new or recently changed urls.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"RSS\/Atom Feed and XML Sitemaps\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-rss-xml.jpg\" alt=\"RSS\/Atom Feed and XML Sitemaps\" width=\"493\" height=\"227\" \/><\/p>\n<p>So, if you have new urls (or recently updated urls) that you want Google to prioritize, then use both xml sitemaps and RSS\/Atom feeds. Google says by using RSS, it can help them \u201ckeep your content fresher in its index\u201d. I don\u2019t know about you, but I like the idea of Google keeping my content fresher. :)<\/p>\n<p>Also, it\u2019s worth noting that\u00a0Google recommends\u00a0maximizing the number of urls per xml sitemap. For example, don\u2019t cut up your xml sitemaps into many smaller files (if possible). Instead, use the space you have in each sitemap to include all of your urls. If you don\u2019t Google explains that, \u201cit can impact the speed and efficiency of crawling your urls.\u201d\u00a0I recommend reading Google\u2019s post to learn how to best use xml sitemaps and RSS\/Atom feeds to maximize your efforts. By the way, you can include 50K urls per sitemap and each sitemap must be less than 10MB uncompressed.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>2. XML Sitemaps By Protocol and Subdomain<br \/>\n<\/strong>I find a lot of webmasters are confused by protocol and subdomains, and both can end up impacting how urls in sitemaps get crawled and indexed.<\/p>\n<p>URLs included in xml sitemaps <strong>must use<\/strong> the same protocol and subdomain as the sitemap itself. This means that https urls located in an http sitemap <strong>should not<\/strong> be included in the sitemap. This also means that urls on sample.domain.com <strong>cannot<\/strong> be located in the sitemap on www.domain.com. So on and so forth.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"XML Sitemaps and Protocol and Subdomains\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-protocol-subdomains.jpg\" alt=\"XML Sitemaps and Protocol and Subdomains\" width=\"525\" height=\"460\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>This is a common problem when sites employ multiple subdomains or they have sections using https and http (like ecommerce retailers). And then of course we have many sites starting to switch to https for all urls, but haven\u2019t changed their xml sitemaps to reflect the changes. My recommendation is to check your xml sitemaps reporting today, while also manually checking the sitemaps. You might just find issues that you can fix quickly.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>3. Dirty Sitemaps \u2013 Hate Them, Avoid Them<br \/>\n<\/strong>When auditing sites, I often crawl the xml sitemaps myself to see what I find. And it\u2019s not uncommon to find many urls that resolve with non-200 header response codes. For example, urls that 404, 302, 301, return 500s, etc.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"Dirty XML Sitemaps\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-dirty.jpg\" alt=\"Dirty XML Sitemaps\" width=\"525\" height=\"421\" \/><\/p>\n<p>You should only provide\u00a0<strong>canonical urls<\/strong>\u00a0in your xml sitemaps. You\u00a0<strong>should not provide<\/strong>\u00a0non-200 header response code urls (or non-canonical urls that point to other urls). The engines do not like \u201cdirty sitemaps\u201d since they can send Google and Bing on a wild goose chase throughout your site. For example, imagine driving Google and Bing to 50K urls that end up 404ing, redirecting, or not resolving. Not good, to say the least.<\/p>\n<p>Remember Duane\u2019s comment from earlier about \u201cdirt\u201d in sitemaps. The engines can lose trust in your sitemaps, which is never a good thing SEO-wise.\u00a0More about crawling your sitemaps later in this post.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>4. View Trending in Google Webmaster Tools<br \/>\n<\/strong>Many SEOs are familiar with xml sitemaps reporting in Google Webmaster Tools, which can help surface various problems, while also providing important indexation statistics. Well there\u2019s a hidden visual gem in the report that\u2019s easy to miss. The default view will show the number of pages submitted in your xml sitemaps and the number indexed. But if you click the &#8220;sitemaps content&#8221; box for each category, you can view trending over the past 30 days. This can help you identify bumps in the road, or surges, as you make changes.<\/p>\n<p>For example, check out the trending below. You can see the number of images submitted and indexed drop significantly over a period of time, only to climb back up. You would definitely want to know why that happened, so you can avoid problems down the line. Sending this to your dev team can help them identify potential problems that can build over time.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"XML Sitemaps Trending in Google Webmaster Tools\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-gwt-trending.jpg\" alt=\"XML Sitemaps Trending in Google Webmaster Tools\" width=\"525\" height=\"242\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong>5.\u00a0Using Rel Alternate in Sitemaps for Mobile URLs<br \/>\n<\/strong>When using mobile urls (like m.), it\u2019s incredibly important to ensure you have the proper technical SEO setup. For example, you should be using rel alternate on the desktop pages pointing to the mobile pages, and then rel canonical on the mobile pages pointing back to the desktop pages.<\/p>\n<p>Although not an approach I often push for, you can provide rel alternate annotations in your xml sitemaps. The annotations look like this:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"Rel Alternate in XML Sitemaps\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-rel-alternate2.jpg\" alt=\"Rel Alternate in XML Sitemaps\" width=\"519\" height=\"302\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>It\u2019s worth noting that you should still add rel canonical to the source code of your mobile pages pointing to your desktop pages.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>6. Using hreflang in Sitemaps for Multi-Language Pages<br \/>\n<\/strong>If you have pages that target different languages, then you are probably already familiar with <a href=\"https:\/\/support.google.com\/webmasters\/answer\/189077?hl=en\">hreflang<\/a>. Using hreflang, you can tell Google which pages should target which languages. Then Google can surface the correct pages in the SERPs based on the language\/country of the person searching Google.<\/p>\n<p>Similar to rel alternate, you can either provide the hreflang code in a page\u2019s html code (page by page), or you can use xml sitemaps to provide the hreflang code. For example, you could provide the following hreflang attributes when you have the same content targeting different languages:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"Hreflang in XML Sitemaps\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-hreflang2.jpg\" alt=\"Hreflang in XML Sitemaps\" width=\"503\" height=\"515\" \/><\/p>\n<p>Just be sure to include a separate &lt;loc&gt; element for each url that contains alternative language content (i.e. all of the sister urls should be listed in the sitemap via a &lt;loc&gt; element).<\/p>\n<p>&nbsp;<\/p>\n<p><strong>7. Testing XML Sitemaps in Google Webmaster Tools<br \/>\n<\/strong>Last, but not least, you can test your xml sitemaps or other feeds in Google Webmaster Tools. Although easy to miss, there is a red \u201cAdd\/Test Sitemap\u201d button in the upper right-hand corner of the Sitemaps reporting page in Google Webmaster Tools.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"Test XML Sitemaps in Google Webmaster Tools\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-gwt-test-button.jpg\" alt=\"Test XML Sitemaps in Google Webmaster Tools\" width=\"398\" height=\"150\" \/><\/p>\n<p>When you click that button, you can add the url of your sitemap or feed. Once you click &#8220;Test Sitemap&#8221;, Google will provide results based on analyzing the sitemap\/feed. Then you can rectify those issues before submitting the sitemap. I think too many webmasters use a \u201cset it and forget it\u201d approach to xml sitemaps. Using the test functionality in GWT, you can nip some problems in the bud. And it\u2019s simple to use.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"Results of XML Sitemaps Test in Google Webmaster Tools\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-gwt-test-results.jpg\" alt=\"Results of XML Sitemaps Test in Google Webmaster Tools\" width=\"525\" height=\"392\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong>8. Bonus: Crawl Your XML Sitemap Via Screaming Frog<br \/>\n<\/strong>In SEO, you can either test and know, or read and believe. As you can probably guess, I\u2019m a big fan of the former\u2026 For xml sitemaps, you should test them thoroughly to ensure all is ok. One way to do this is to crawl your own sitemaps. By doing so, you can identify problematic tags, non-200 header response codes, and other little gremlins that can cause sitemap issues.<\/p>\n<p>One of my favorite tools for crawling sitemaps is Screaming Frog (which I have mentioned many times in my previous posts). By setting the crawl mode to \u201clist mode\u201d, you can crawl your sitemaps directly. Screaming Frog <strong>natively<\/strong> <strong>handles<\/strong> xml sitemaps, meaning you don\u2019t need to convert your xml sitemaps into another format before crawling (which is awesome).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"Crawling Sitemaps in Screaming Frog\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-screaming-frog.jpg\" alt=\"Crawling Sitemaps in Screaming Frog\" width=\"517\" height=\"368\" \/><\/p>\n<p>Screaming Frog will then load your sitemap and begin crawling the urls it contains. In real-time, you can view the results of the crawl. And if you have Graph View up and running during the crawl, you can visually graph the results as the crawler collects data. I love that feature. Then it\u2019s up to you to rectify any problems that are surfaced.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"Graph View in in Screaming Frog\" src=\"https:\/\/www.gsqi.com\/images\/xml-sitemaps-screaming-frog-graph.jpg\" alt=\"Graph View in in Screaming Frog\" width=\"381\" height=\"333\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Summary \u2013 Maximize and Optimize Your XML Sitemaps<br \/>\n<\/strong>As I\u2019ve covered throughout this post, there are many ways to use xml sitemaps to maximize your SEO efforts. <em>Clean<\/em> xml sitemaps can help you inform the engines about all of the urls on your site, including the most recent additions and updates. It\u2019s a direct feed to the engines, so it\u2019s important to get it right (and especially for larger and more complex websites).<\/p>\n<p>I hope my post provided some helpful nuggets of sitemap information that enable you to enhance your own efforts. I recommend setting some time aside soon to review, crawl, audit, and then refine your xml sitemaps. There may be some low-hanging fruit changes that can yield nice wins. Now excuse me while I review the latest sitemap crawl. :)<\/p>\n<p>GG<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After publishing my last post about dangerous rel canonical problems, I started receiving a lot of questions about other areas of technical SEO. One topic in particular that seemed to generate many questions was how to best use and set up xml sitemaps for larger and more complex websites. Sure, in its most basic form, &#8230; <a title=\"XML Sitemaps &#8211; 8 Facts, Tips, and Recommendations for the Advanced SEO\" class=\"read-more\" href=\"https:\/\/www.gsqi.com\/marketing-blog\/xml-sitemaps-advanced-seo\/\" aria-label=\"Read more about XML Sitemaps &#8211; 8 Facts, Tips, and Recommendations for the Advanced SEO\">Read more<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,3,5],"tags":[],"class_list":["post-1025","post","type-post","status-publish","format-standard","hentry","category-google","category-seo","category-tools","generate-columns","tablet-grid-50","mobile-grid-100","grid-parent","grid-50"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/posts\/1025","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/comments?post=1025"}],"version-history":[{"count":18,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/posts\/1025\/revisions"}],"predecessor-version":[{"id":1041,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/posts\/1025\/revisions\/1041"}],"wp:attachment":[{"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/media?parent=1025"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/categories?post=1025"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/tags?post=1025"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}