The Internet Marketing Driver

  • GSQi Home
  • About Glenn Gabe
  • SEO Services
    • Algorithm Update Recovery
    • Technical SEO Audits
    • Website Redesigns and Site Migrations
    • SEO Training
  • Blog
  • Contact GSQi

Faulty Redirects, Duplicate Content, and SEO – How a Redirect Glitch Created Hundreds of Thousands of Duplicate Pages

January 4, 2013 By Glenn Gabe 4 Comments

Share
Tweet
Share
Email

Redirect Glitch Causing SEO Problems

With the release of index status in Google Webmaster Tools, many webmasters are now questioning why their “not selected” numbers are high.  They wonder if those numbers are good, bad, normal, etc?  Unfortunately, there’s not an easy answer to that question, since it depends on the site at hand.  But, you can definitely look at the ratio of “not selected” to pages indexed to start to understand if there is a technical problem causing a spike in pages being categorized as “not selected”.

For example, if you have 200 pages indexed on your site, and you see 350 categorized as “not selected”, that might be ok.  But if you see 25K pages as “not selected” or more, then that could raise a red flag that something may not be right with the site…  For example, is there a site structure issue that’s causing thousands of variations of pages with extremely similar content (duplicate content)?

A Recent Example of a Poor “Not Selected” Ratio
During SEO audits, there are times I come across significant problems like the one I mentioned above.  And those problems could be inhibiting a company’s search efforts (to say the least).   During a recent SEO audit, I came across a very interesting situation.  Index status revealed an extremely high number of “not selected” pages (as compared to the number of pages indexed) and I found myself digging into the site to find out why.

High Percentage of "Not Selected" Pages in Index Status

I found several issues causing the problem, so there wasn’t just one issue pumping up the number.  That said, the problem I’m going to cover today was causing thousands of duplicate pages to be created, and without the site owner knowing.  The more pages I checked, the more duplicates I found.  And this is a problem that can easily slip through the cracks for many webmasters.  And that’s especially the case if a small or medium sized business is handling all website development on its own.   Below, I’m going to cover what I found, and more importantly, how you can avoid the problem in the first place.

The Danger of an Extra Character
As I was analyzing the site manually, and via a number of test crawls, I came across some URL’s that contained an extra character.  Specifically, the extra character was being appended to each canonical URL.  All of those URL’s were from one specific section of the site (which contained thousands of URL’s).  After digging into that section of the site, I found out that this problem was happening to almost every URL being linked to from a certain element within each page.  So, I honed in on that element within each page to find out how the duplicate pages were being created.  And by the way, it just so happens that the section of the site contains nearly 200K pages.  Yes, this was a huge problem that was uncovered.

How One Extra Character Created Thousands of Duplicate Pages

The Result – Duplicate Content Ad Infinitum
The core problem is that the extra character created a new URL, but that new URL was an exact duplicate of the canonical URL (the URL that should be resolving).  And as you can guess, both pages can be accessed on the site.  One part of the site links to the canonical version of the pages, while this problematic section linked to the duplicate versions of the pages.

So, right off the bat, we are dealing with at least 200K duplicate pages.  In addition, as more content is added to this section, more duplicate pages will be created over time (based on the extra character being added to each URL).  Also, the canonical URL tag was not being used on the duplicate pages, so that wasn’t helping this specific case.  And on that note, I wouldn’t advocate using the canonical URL tag to fix this problem…  Technical problems like this should be addressed at the code or structure level.

1 to 1 Ratio of Duplicate URL's to Canonical URL's

So, if this was left in place, this problem could generate an unlimited number of duplicate pages.  If 500K pages ended up there, then there would be 500K pages of duplicate content.  Not good, so I dug deeper to find out exactly what was causing the problem.

The Root Problem – Faulty Redirects
Let’s face it, we all need to implement redirects at some point.  And that introduces the possibility of a poor implementation, which can be catastrophic SEO-wise.  It’s one of the reasons that website redesigns and CMS migrations are so risky.  On that note, to learn how to avoid SEO disaster during a redesign or migration, you should check out my Search Engine Journal column on the subject.

For example, using 302’s versus 301’s, using meta refresh redirects, redirecting to the wrong pages, or having the redirect code bomb the URL’s.  For this situation, we ran into the “bombing of URL’s” problem.  The redirects were faulty, and were redirecting to URL’s with an extra character.

The Solution – Fix the Redirect Code!
So, hundreds of thousands of duplicate pages were being generated, and it was due to one piece of redirect code on the server.  The 301 redirects being generated simply added an extra character to the destination URL.  That’s it.  The fix will be implemented soon, and once the new redirect code is rolled out, the correct URL’s will resolve.

This situation underscores the fact that even one small piece of code could have serious implications SEO-wise.  If this situation was left unchanged, it could have ended up generating an unlimited number of duplicate pages.  Knowing the content on this site, my guess is the problem would have generated 500K-750K pages of duplicate content over the next 2-3 years.

How To Avoid This Situation
After reading this post, you might be scared that this could happen to you, or worse, that it’s happening right now.  I’m going to provide a short list of things you can do to make sure this doesn’t happen.  Of course, if you feel you are having problems already, you should have an SEO audit performed.

  • First, whenever you create redirects, make sure you have a system for testing those redirects before they launch.  You can do this a number of ways, including on a local server or test server prior to releasing the final code to production.  If you thoroughly test the redirects, you could nip serious problems in the bud.
  • Second, make sure your xml sitemaps contain the canonical url’s for the pages at hand.  Making sure you are feeding Google and Bing the correct URL’s can help them understand which ones should be considered the canonical url’s.
  • Third, you should develop a strategy for using the canonical URL tag on the site.  If the tag is present, then you can ensure that any duplicate pages pass their search equity to the canonical URL’s.  Note, I’m not saying that you should leave a technical problem in place!  Instead, I’m saying that having the canonical URL tag in place will make sure the engines pass any search equity to the correct pages on your site while you figure out solutions to your technical problems.
  • This final bullet assumes you are already experiencing problems with duplicate content from a technical problem.  If you are, and you cannot determine what’s going on, then invest in having a technical SEO audit completed.  To me, the provide the most bang for your SEO buck.  It’s a great way to find out what’s truly going on with your site (beyond just the problem I listed here).

Summary – Know Your Site
This case emphasizes something I’ve said a thousand times over the past few years.  It’s incredibly important to have a sound site structure in order to perform at your highest level SEO-wise.  Coding problems, site structure issues, flawed redirects, etc. can kill your SEO efforts.  It’s one of the reasons that I believe SEO audits are critically important.  They can catch all types of SEO issues and provide ways to remedy those problems.  You know, like generating hundreds of thousands of duplicate pages.  :)

GG

 

Share
Tweet
Share
Email

Filed Under: google, seo

Connect with Glenn Gabe today!

Latest Blog Posts

  • Smart Delta Reports – How To Automate Exporting, Filtering, and Comparing Google Search Data Across Timeframes Via The Search Console API and Analytics Edge
  • Filters and Pills in the Google SERPs – How the addition of filters, tabs, and dynamic organization in the search results can impact visibility and clicks
  • How To Use GSC’s Crawl Stats Reporting To Analyze and Troubleshoot Site Moves (Domain Name Changes and URL Migrations)
  • Google Search Console (GSC) reporting for Soft 404s is now more accurate. But where did those Soft 404s go?
  • Google’s December 2020 Broad Core Algorithm Update Part 2: Three Case Studies That Underscore The Complexity and Nuance of Broad Core Updates
  • Google’s December 2020 Broad Core Algorithm Update: Analysis, Observations, Tremors and Reversals, and More Key Points for Site Owners [Part 1 of 2]
  • Exit The Black Hole Of Web Story Tracking – How To Track User Progress In Web Stories Via Event Tracking In Google Analytics
  • Image Packs in Google Web Search – A reason you might be seeing high impressions and rankings in GSC but insanely low click-through rate (CTR)
  • Google’s “Found on the Web” Mobile SERP Feature – A Knowledge Graph and Carousel Frankenstein That’s Hard To Ignore
  • Image Migrations and Lost Signals – How long before images lose signals after a flawed url migration?

Web Stories

  • Google’s Disqus Indexing Bug
  • Google’s New Page Experience Signal

Archives

  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • GSQi Home
  • About Glenn Gabe
  • SEO Services
  • Blog
  • Contact GSQi
Copyright © 2021 G-Squared Interactive LLC. All Rights Reserved. | Privacy Policy

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in settings.

The Internet Marketing Driver
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

3rd Party Cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

This site also uses pixels from Facebook, Twitter, and LinkedIn so we publish content that reaches you on those social networks.

Please enable Strictly Necessary Cookies first so that we can save your preferences!