{"id":295,"date":"2013-01-04T10:38:53","date_gmt":"2013-01-04T14:38:53","guid":{"rendered":"http:\/\/www.hmtweb.com\/marketing-blog\/?p=295"},"modified":"2013-01-04T10:38:53","modified_gmt":"2013-01-04T14:38:53","slug":"redirects-duplicate-content-seo","status":"publish","type":"post","link":"https:\/\/www.gsqi.com\/marketing-blog\/redirects-duplicate-content-seo\/","title":{"rendered":"Faulty Redirects, Duplicate Content, and SEO &#8211; How a Redirect Glitch Created Hundreds of Thousands of Duplicate Pages"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" title=\"Redirect Glitch Causing SEO Problems\" src=\"https:\/\/www.gsqi.com\/images\/redirect-seo-problems.jpg\" alt=\"Redirect Glitch Causing SEO Problems\" width=\"525\" height=\"319\" \/><\/p>\n<p>With the release of <a href=\"https:\/\/www.gsqi.com\/marketing-blog\/how-to-use-index-status-in-google-webmaster-tools-to-diagnose-indexation-problems\/\">index status in Google Webmaster Tools<\/a>, many webmasters are now questioning why their \u201cnot selected\u201d numbers are high.\u00a0 They wonder if those numbers are good, bad, normal, etc?\u00a0 Unfortunately, there\u2019s not an easy answer to that question, since it depends on the site at hand.\u00a0 But, you can definitely look at the ratio of \u201cnot selected\u201d to pages indexed to start to understand if there is a technical problem causing a spike in pages being categorized as \u201cnot selected\u201d.<\/p>\n<p>For example, if you have 200 pages indexed on your site, and you see 350 categorized as \u201cnot selected\u201d, that might be ok.\u00a0 But if you see 25K pages as \u201cnot selected\u201d or more, then that could raise a red flag that something may not be right with the site\u2026\u00a0 For example, is there a site structure issue that\u2019s causing thousands of variations of pages with extremely similar content (duplicate content)?<\/p>\n<p><strong>A Recent Example of a Poor \u201cNot Selected\u201d Ratio<br \/>\n<\/strong>During SEO audits, there are times I come across significant problems like the one I mentioned above.\u00a0 And those problems could be inhibiting a company\u2019s search efforts (to say the least).\u00a0\u00a0 During a recent SEO audit, I came across a very interesting situation.\u00a0 Index status revealed an extremely high number of \u201cnot selected\u201d pages (as compared to the number of pages indexed) and I found myself digging into the site to find out why.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" title=\"High Percentage of &quot;Not Selected&quot; Pages in Index Status\" src=\"https:\/\/www.gsqi.com\/images\/redirect-seo-not-selected-high.jpg\" alt=\"High Percentage of &quot;Not Selected&quot; Pages in Index Status\" width=\"525\" height=\"375\" \/><\/p>\n<p>I found several issues causing the problem, so there wasn\u2019t just one issue pumping up the number.\u00a0 That said, the problem I\u2019m going to cover today was causing thousands of duplicate pages to be created, and without the site owner knowing.\u00a0 The more pages I checked, the more duplicates I found.\u00a0 And this is a problem that can easily slip through the cracks for many webmasters.\u00a0 And that\u2019s especially the case if a small or medium sized business is handling all website development on its own.\u00a0\u00a0 Below, I\u2019m going to cover what I found, and more importantly, how you can avoid the problem in the first place.<\/p>\n<p><strong>The Danger of an Extra Character<br \/>\n<\/strong>As I was analyzing the site manually, and via a number of test crawls, I came across some URL\u2019s that contained an extra character.\u00a0 Specifically, the extra character was being appended to each canonical URL.\u00a0 All of those URL\u2019s were from one specific section of the site (which contained thousands of URL\u2019s).\u00a0 After digging into that section of the site, I found out that this problem was happening to almost every URL being linked to from a certain element within each page.\u00a0 So, I honed in on that element within each page to find out how the duplicate pages were being created.\u00a0 And by the way, it just so happens that the section of the site contains nearly 200K pages.\u00a0 Yes, this was a huge problem that was uncovered.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" title=\"How One Extra Character Created Thousands of Duplicate Pages\" src=\"https:\/\/www.gsqi.com\/images\/redirect-seo-extra-character.jpg\" alt=\"How One Extra Character Created Thousands of Duplicate Pages\" width=\"525\" height=\"328\" \/><\/p>\n<p><strong>The Result \u2013 Duplicate Content Ad Infinitum<br \/>\n<\/strong>The core problem is that the extra character created a new URL, but that new URL was an exact duplicate of the canonical URL (the URL that should be resolving).\u00a0 And as you can guess, both pages can be accessed on the site.\u00a0 One part of the site links to the canonical version of the pages, while this problematic section linked to the duplicate versions of the pages.<\/p>\n<p>So, right off the bat, we are dealing with at least <strong>200K duplicate pages<\/strong>.\u00a0 In addition, as more content is added to this section, more duplicate pages will be created over time (based on the extra character being added to each URL).\u00a0 Also, the canonical URL tag was not being used on the duplicate pages, so that wasn\u2019t helping this specific case.\u00a0 And on that note, I wouldn\u2019t advocate using the canonical URL tag to fix this problem\u2026\u00a0 Technical problems like this should be addressed at the code or structure level.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" title=\"1 to 1 Ratio of Duplicate URL's to Canonical URL's\" src=\"https:\/\/www.gsqi.com\/images\/redirect-seo-duplicate-urls.jpg\" alt=\"1 to 1 Ratio of Duplicate URL's to Canonical URL's\" width=\"525\" height=\"400\" \/><\/p>\n<p>So, if this was left in place, this problem could generate an unlimited number of duplicate pages.\u00a0 If 500K pages ended up there, then there would be 500K pages of duplicate content.\u00a0 Not good, so I dug deeper to find out exactly what was causing the problem.<\/p>\n<p><strong>The Root Problem &#8211; Faulty Redirects<br \/>\n<\/strong>Let\u2019s face it, we all need to implement redirects at some point.\u00a0 And that introduces the possibility of a poor implementation, which can be catastrophic SEO-wise.\u00a0 It\u2019s one of the reasons that website redesigns and CMS migrations are so risky.\u00a0 On that note, to learn <a href=\"http:\/\/www.searchenginejournal.com\/how-to-avoid-seo-disaster-during-a-website-redesign\/42824\/\">how to avoid SEO disaster during a redesign or migration<\/a>, you should check out my Search Engine Journal column on the subject.<\/p>\n<p>For example, using 302\u2019s versus 301\u2019s, using meta refresh redirects, redirecting to the wrong pages, or having the redirect code bomb the URL\u2019s.\u00a0 For this situation, we ran into the \u201cbombing of URL\u2019s\u201d problem.\u00a0 The redirects were faulty, and were redirecting to URL\u2019s with an extra character.<\/p>\n<p><strong>The Solution \u2013 Fix the Redirect Code!<br \/>\n<\/strong>So, hundreds of thousands of duplicate pages were being generated, and it was due to one piece of redirect code on the server.\u00a0 The 301 redirects being generated simply added an extra character to the destination URL.\u00a0 That\u2019s it.\u00a0 The fix will be implemented soon, and once the new redirect code is rolled out, the correct URL\u2019s will resolve.<\/p>\n<p>This situation underscores the fact that even <strong><em><a href=\"http:\/\/www.searchenginejournal.com\/two-examples-of-how-one-line-of-code-could-kill-your-seo-case-studies\/37004\/\">one small piece of code<\/a><\/em><\/strong> could have serious implications SEO-wise.\u00a0 If this situation was left unchanged, it could have ended up generating an unlimited number of duplicate pages.\u00a0 Knowing the content on this site, my guess is the problem would have generated 500K-750K pages of duplicate content over the next 2-3 years.<\/p>\n<p><strong>How To Avoid This Situation<br \/>\n<\/strong>After reading this post, you might be scared that this could happen to you, or worse, that it\u2019s happening right now.\u00a0 I\u2019m going to provide a short list of things you can do to make sure this doesn\u2019t happen.\u00a0 Of course, if you feel you are having problems already, you should have an SEO audit performed.<\/p>\n<ul>\n<li>First, whenever you create redirects, make sure you have a system for testing those redirects <strong>before they launch<\/strong>.\u00a0 You can do this a number of ways, including on a local server or test server prior to releasing the final code to production.\u00a0 If you thoroughly test the redirects, you could nip serious problems in the bud.<\/li>\n<li>Second, make sure your xml sitemaps contain the canonical url\u2019s for the pages at hand.\u00a0 Making sure you are feeding Google and Bing the correct URL\u2019s can help them understand which ones should be considered the canonical url\u2019s.<\/li>\n<li>Third, you should develop a strategy for using the canonical URL tag on the site.\u00a0 If the tag is present, then you can ensure that any duplicate pages pass their search equity to the canonical URL\u2019s.\u00a0 Note, I\u2019m not saying that you should leave a technical problem in place!\u00a0 Instead, I\u2019m saying that having the canonical URL tag in place will make sure the engines pass any search equity to the correct pages on your site while you figure out solutions to your technical problems.<\/li>\n<li>This final bullet assumes you are already experiencing problems with duplicate content from a technical problem.\u00a0 If you are, and you cannot determine what\u2019s going on, then invest in having a <a href=\"https:\/\/www.gsqi.com\/blog\/2009\/09\/seo-technical-audits-logical-first-step.html\">technical SEO audit<\/a> completed.\u00a0 To me, the provide the most bang for your SEO buck.\u00a0 It\u2019s a great way to find out what\u2019s truly going on with your site (beyond just the problem I listed here).<\/li>\n<\/ul>\n<p><strong>Summary \u2013 Know Your Site<br \/>\n<\/strong>This case emphasizes something I\u2019ve said a thousand times over the past few years.\u00a0 It\u2019s incredibly important to have a sound site structure in order to perform at your highest level SEO-wise.\u00a0 Coding problems, site structure issues, flawed redirects, etc. can kill your SEO efforts.\u00a0 It\u2019s one of the reasons that I believe SEO audits are critically important. \u00a0They can catch all types of SEO issues and provide ways to remedy those problems.\u00a0 You know, like generating hundreds of thousands of duplicate pages.\u00a0 :)<\/p>\n<p>GG<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the release of index status in Google Webmaster Tools, many webmasters are now questioning why their \u201cnot selected\u201d numbers are high.\u00a0 They wonder if those numbers are good, bad, normal, etc?\u00a0 Unfortunately, there\u2019s not an easy answer to that question, since it depends on the site at hand.\u00a0 But, you can definitely look at &#8230; <a title=\"Faulty Redirects, Duplicate Content, and SEO &#8211; How a Redirect Glitch Created Hundreds of Thousands of Duplicate Pages\" class=\"read-more\" href=\"https:\/\/www.gsqi.com\/marketing-blog\/redirects-duplicate-content-seo\/\" aria-label=\"Read more about Faulty Redirects, Duplicate Content, and SEO &#8211; How a Redirect Glitch Created Hundreds of Thousands of Duplicate Pages\">Read more<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,3],"tags":[],"class_list":["post-295","post","type-post","status-publish","format-standard","hentry","category-google","category-seo","generate-columns","tablet-grid-50","mobile-grid-100","grid-parent","grid-50"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/posts\/295","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/comments?post=295"}],"version-history":[{"count":6,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/posts\/295\/revisions"}],"predecessor-version":[{"id":301,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/posts\/295\/revisions\/301"}],"wp:attachment":[{"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/media?parent=295"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/categories?post=295"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gsqi.com\/marketing-blog\/wp-json\/wp\/v2\/tags?post=295"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}