The Internet Marketing Driver

  • GSQi Home
  • About Glenn Gabe
  • SEO Services
    • Algorithm Update Recovery
    • Technical SEO Audits
    • Website Redesigns and Site Migrations
    • SEO Training
  • Blog
    • Web Stories
  • Contact GSQi

Faulty Redirects, Duplicate Content, and SEO – How a Redirect Glitch Created Hundreds of Thousands of Duplicate Pages

January 4, 2013 By Glenn Gabe 4 Comments

Share
Tweet
Share
Email

Redirect Glitch Causing SEO Problems

With the release of index status in Google Webmaster Tools, many webmasters are now questioning why their “not selected” numbers are high.  They wonder if those numbers are good, bad, normal, etc?  Unfortunately, there’s not an easy answer to that question, since it depends on the site at hand.  But, you can definitely look at the ratio of “not selected” to pages indexed to start to understand if there is a technical problem causing a spike in pages being categorized as “not selected”.

For example, if you have 200 pages indexed on your site, and you see 350 categorized as “not selected”, that might be ok.  But if you see 25K pages as “not selected” or more, then that could raise a red flag that something may not be right with the site…  For example, is there a site structure issue that’s causing thousands of variations of pages with extremely similar content (duplicate content)?

A Recent Example of a Poor “Not Selected” Ratio
During SEO audits, there are times I come across significant problems like the one I mentioned above.  And those problems could be inhibiting a company’s search efforts (to say the least).   During a recent SEO audit, I came across a very interesting situation.  Index status revealed an extremely high number of “not selected” pages (as compared to the number of pages indexed) and I found myself digging into the site to find out why.

High Percentage of "Not Selected" Pages in Index Status

I found several issues causing the problem, so there wasn’t just one issue pumping up the number.  That said, the problem I’m going to cover today was causing thousands of duplicate pages to be created, and without the site owner knowing.  The more pages I checked, the more duplicates I found.  And this is a problem that can easily slip through the cracks for many webmasters.  And that’s especially the case if a small or medium sized business is handling all website development on its own.   Below, I’m going to cover what I found, and more importantly, how you can avoid the problem in the first place.

The Danger of an Extra Character
As I was analyzing the site manually, and via a number of test crawls, I came across some URL’s that contained an extra character.  Specifically, the extra character was being appended to each canonical URL.  All of those URL’s were from one specific section of the site (which contained thousands of URL’s).  After digging into that section of the site, I found out that this problem was happening to almost every URL being linked to from a certain element within each page.  So, I honed in on that element within each page to find out how the duplicate pages were being created.  And by the way, it just so happens that the section of the site contains nearly 200K pages.  Yes, this was a huge problem that was uncovered.

How One Extra Character Created Thousands of Duplicate Pages

The Result – Duplicate Content Ad Infinitum
The core problem is that the extra character created a new URL, but that new URL was an exact duplicate of the canonical URL (the URL that should be resolving).  And as you can guess, both pages can be accessed on the site.  One part of the site links to the canonical version of the pages, while this problematic section linked to the duplicate versions of the pages.

So, right off the bat, we are dealing with at least 200K duplicate pages.  In addition, as more content is added to this section, more duplicate pages will be created over time (based on the extra character being added to each URL).  Also, the canonical URL tag was not being used on the duplicate pages, so that wasn’t helping this specific case.  And on that note, I wouldn’t advocate using the canonical URL tag to fix this problem…  Technical problems like this should be addressed at the code or structure level.

1 to 1 Ratio of Duplicate URL's to Canonical URL's

So, if this was left in place, this problem could generate an unlimited number of duplicate pages.  If 500K pages ended up there, then there would be 500K pages of duplicate content.  Not good, so I dug deeper to find out exactly what was causing the problem.

The Root Problem – Faulty Redirects
Let’s face it, we all need to implement redirects at some point.  And that introduces the possibility of a poor implementation, which can be catastrophic SEO-wise.  It’s one of the reasons that website redesigns and CMS migrations are so risky.  On that note, to learn how to avoid SEO disaster during a redesign or migration, you should check out my Search Engine Journal column on the subject.

For example, using 302’s versus 301’s, using meta refresh redirects, redirecting to the wrong pages, or having the redirect code bomb the URL’s.  For this situation, we ran into the “bombing of URL’s” problem.  The redirects were faulty, and were redirecting to URL’s with an extra character.

The Solution – Fix the Redirect Code!
So, hundreds of thousands of duplicate pages were being generated, and it was due to one piece of redirect code on the server.  The 301 redirects being generated simply added an extra character to the destination URL.  That’s it.  The fix will be implemented soon, and once the new redirect code is rolled out, the correct URL’s will resolve.

This situation underscores the fact that even one small piece of code could have serious implications SEO-wise.  If this situation was left unchanged, it could have ended up generating an unlimited number of duplicate pages.  Knowing the content on this site, my guess is the problem would have generated 500K-750K pages of duplicate content over the next 2-3 years.

How To Avoid This Situation
After reading this post, you might be scared that this could happen to you, or worse, that it’s happening right now.  I’m going to provide a short list of things you can do to make sure this doesn’t happen.  Of course, if you feel you are having problems already, you should have an SEO audit performed.

  • First, whenever you create redirects, make sure you have a system for testing those redirects before they launch.  You can do this a number of ways, including on a local server or test server prior to releasing the final code to production.  If you thoroughly test the redirects, you could nip serious problems in the bud.
  • Second, make sure your xml sitemaps contain the canonical url’s for the pages at hand.  Making sure you are feeding Google and Bing the correct URL’s can help them understand which ones should be considered the canonical url’s.
  • Third, you should develop a strategy for using the canonical URL tag on the site.  If the tag is present, then you can ensure that any duplicate pages pass their search equity to the canonical URL’s.  Note, I’m not saying that you should leave a technical problem in place!  Instead, I’m saying that having the canonical URL tag in place will make sure the engines pass any search equity to the correct pages on your site while you figure out solutions to your technical problems.
  • This final bullet assumes you are already experiencing problems with duplicate content from a technical problem.  If you are, and you cannot determine what’s going on, then invest in having a technical SEO audit completed.  To me, the provide the most bang for your SEO buck.  It’s a great way to find out what’s truly going on with your site (beyond just the problem I listed here).

Summary – Know Your Site
This case emphasizes something I’ve said a thousand times over the past few years.  It’s incredibly important to have a sound site structure in order to perform at your highest level SEO-wise.  Coding problems, site structure issues, flawed redirects, etc. can kill your SEO efforts.  It’s one of the reasons that I believe SEO audits are critically important.  They can catch all types of SEO issues and provide ways to remedy those problems.  You know, like generating hundreds of thousands of duplicate pages.  :)

GG

 

Share
Tweet
Share
Email

Filed Under: google, seo

Connect with Glenn Gabe today!

Latest Blog Posts

  • Continuous Scroll And The GSC Void: Did The Launch Of Continuous Scroll In Google’s Desktop Search Results Impact Impressions And Clicks? [Study]
  • How to analyze the impact of continuous scroll in Google’s desktop search results using Analytics Edge and the GSC API
  • Percent Human: A list of tools for detecting lower-quality AI content
  • True Destination – Demystifying the confusing, but often accurate, true destination url for redirects in Google Search Console’s coverage reporting
  • Google’s September 2022 Broad Core Product Reviews Update (BCPRU) – The complexity and confusion when major algorithm updates overlap
  • Google Multisearch – Exploring how “Searching outside the box” is being tracked in Google Search Console (GSC) and Google Analytics (GA)
  • Sitebulb Server – Technical Tips And Tricks For Setting Up A Powerful DIY Enterprise Crawler (On A Budget)
  • Google’s Helpful Content Update Introduces A New Site-wide Ranking Signal Targeting “Search engine-first Content”, and It’s Always Running
  • The Google May 2022 Broad Core Update – 5 micro-case studies that once again underscore the complexity of broad core algorithm updates
  • Amazing Search Experiments and New SERP Features In Google Land (2022 Edition)

Web Stories

  • Google’s December 2021 Product Reviews Update – Key Findings
  • Google’s April 2021 Product Reviews Update – Key Points For Site Owners and Affiliate Marketers
  • Google’s New Page Experience Signal
  • Google’s Disqus Indexing Bug
  • Learn more about Web Stories developed by Glenn Gabe

Archives

  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • GSQi Home
  • About Glenn Gabe
  • SEO Services
  • Blog
  • Contact GSQi
Copyright © 2023 G-Squared Interactive LLC. All Rights Reserved. | Privacy Policy
This website uses cookies to improve your experience. Are you ok with the site using cookies? You can opt-out at a later time if you wish. Cookie settings ACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience. You can read our privacy policy for more information.
Cookie Consent