The Internet Marketing Driver

  • GSQi Home
  • About Glenn Gabe
  • SEO Services
    • Algorithm Update Recovery
    • Technical SEO Audits
    • Website Redesigns and Site Migrations
    • SEO Training
  • Blog
    • Web Stories
  • Contact GSQi

Archives for November 2017

How To Quickly Remove A Rogue Subdomain From Google Using The Remove URLs Tool In GSC – And Then Make Sure It Stays Out (Case Study)

November 27, 2017 By Glenn Gabe 10 Comments

Deindex urls using the remove urls tool in GSC.

I’ve written about Murphy’s Law for SEO before, and it’s scary as heck. And that’s especially the case for large-scale websites with many moving parts. Murphy’s law is an old adage that says, “anything that can go wrong, will go wrong.” For example, no matter how much you plan and prep for large-scale SEO changes, there’s a good possibility that something will go wrong.

And when SEOs are implementing major changes across complex sites, whether that be CMS migrations, domain name changes, https migrations, etc., you can bet Murphy will be hanging around. You just need to be prepared for Murphy to pay a visit.

I recently helped a company move to https, and overall, it was a clean migration. I’ve helped this company with a number of SEO projects over the past several years and I was extremely familiar with their site, content, and technical SEO setup. The site was doing extremely well SEO-wise leading up to the migration, so we definitely wanted to make sure the move to https went as smoothly as possible.

We prepared heavily leading up to the migration, so we could all be confident when finally pulling the trigger. When a site moves to https there are a number of checks I perform prior to, during, and then after the migration takes place. And that includes several crawls to ensure everything is ok.  In addition, my client is SEO savvy and they were checking the site from their end as well.

But remember our friend Murphy? Read on.

Site Commands Are Useless, Or Are They?
As we were tracking some of the normal volatility you can see during a migration, my client was checking indexation across some important urls. While performing some site commands, they noticed a rogue subdomain showing up with copies of the core website’s content. When I quickly checked that subdomain (via a site command) the subdomain revealed close to 7K pages indexed.

Site command revealing rogue subdomain indexed.

It seems when they pulled the trigger and moved to https, Murphy came to pay a visit. A subdomain that was never existent, suddenly appeared. And for some reason, the site was duplicating all pages from the core site on that subdomain.

Luckily, we were relatively early into the migration so we were able to start tackling the problem quickly. We clearly wanted to get those urls out of Google’s index as quickly as possible and then handle the subdomain properly. Below, I’ll document what we did to overcome a rogue subdomain showing up containing a lot of duplicate content.

Indexation, Rankings, and Traffic
One of the first things I told my client was to add that rogue subdomain to Google Search Console (GSC). That would enable us to see reporting directly from Google for that subdomain, including indexation, search analytics data, crawl errors, etc. It would also enable us to submit xml sitemaps and use the Removal Urls Tool if we thought that would be helpful. They quickly set up the GSC property and we waited for data to come in.

While we waited for data to arrive in GSC, I told my client to make sure all of the urls across the rogue subdomain 404. They did that pretty quickly. So the urls were only active for a relatively short period of time, but long enough for thousands to be indexed.

Once data arrived in GSC for the subdomain, we were able to see exactly what we were dealing with. There were 7,479 urls indexed according to Index Status reporting in GSC.

Surge in indexation via Index Status report in GSC.

In addition, some of the urls were actually ranking for queries and driving clicks. We saw 759 clicks and 28,724 impressions once the urls were indexed. Note, that was a small percentage of Google organic traffic compared to the core website, but clearly not a good thing that the rogue subdomain was driving impressions and clicks.

Traffic to rogue subdomain incorrectly indexed.

So we had a problem on our hands. Below, I’ll document the steps we took to remove the rogue subdomain from the Google search results while also working to have the pages removed from the index long-term. We also wanted to make sure that any subdomain urls that were ranking in the SERPs were correctly switched to the core site urls.

How To Remove Rogue URLs From Google’s Index:

1. 404 (or 410) all of the pages.
I mentioned this earlier, but I told my client to make sure all urls on the subdomain were 404ing. And yes, you can use 410s if you want. That could quicken up the process slightly for having the urls removed from the index. John Mueller has explained in the past that it could quicken up the process “a tiny bit”.
Here’s a video of John explaining that:

2. Verify the property in Google Search Console (GSC) – for several reasons.
We verified the subdomain in GSC so we could gain reporting directly from Google related to indexation, search traffic and impressions, xml sitemaps, crawl errors, and more. In addition, we would have access to the Remove Urls Tool, which can be very helpful with quickly removing urls from the Google search results (temporarily).

3. Create and submit an xml sitemap with all urls that are 404ing (via GSC).
We created an xml sitemap with all rogue urls and submitted that sitemap via GSC for the subdomain in question. That can help Google find the rogue urls that are now 404ing and hopefully get them out of the index quicker.
XML sitemap submitted with rogue urls.

4. Use the Removal URLs Tool in GSC for the new property.
In Google Search Console (GSC), there is a very powerful, yet dangerous, tool called the Remove URLs Tool. I find there’s a lot of confusion about how to use the tool, how it works, what will happen long-term, etc.

First, it’s important to know that it’s a temporary removal (90 days). Using the tool, you can temporarily remove urls from the Google SERPs (in case the urls contain sensitive information, confidential information, etc. that shouldn’t be displayed in the search results). This is probably the most common misconception. Some site owners remove a url or a directory using the tool and then think it’s gone forever. That’s not the case. You have to make sure you remove the urls via 404 or 410, or by noindexing the urls, in order for them to be truly removed from the index. You can also put the content behind a login (forcing a username/password to log in). If not, the urls can reappear 90 days later.

Second, you can remove specific urls, directories, or entire sites by using the tool in GSC. And the Remove URLs Tool is case-sensitive and precise (character by character). So make sure you enter the exact url when using the tool, or it might not work the way you want it to. Also, to remove an entire site or subdomain, just leave the form blank and click the button labeled “Continue”. All of the urls for the root GSC property you are dealing with will be removed.

The Remove URLs Tool in GSC.

Since we were dealing with a rogue subdomain, we wanted to nuke all of the urls residing there. So we simply left the field blank for the subdomain in GSC and pushed the giant nuclear button. Below you can see the subdomain removal request based on using the Remove URLs Tool.
Rogue subdomain removed from Google using the Remove URLs Tool in GSC.

Important Note: Make sure you are working with the right GSC property or you can incorrectly remove your core website from the SERPs. Remember I said the tool can be dangerous?

How Fast Will The URLs Be Removed?
That’s a trick question. There are two ways to think about the problem. First, how fast will the urls be removed from the SERPs? And then how fast before Google truly deindexes the urls (for the long-term). The Remove URLs Tool can work very quickly. Google’s John Mueller has said it can take less than a day for the urls to be removed from the SERPs, but I’ve seen it work even faster.

For this situation, it was just a few hours before a site command showed 0 results.
Site command revealing urls removed in just 3 hours.

But remember, it’s a temporary removal. You still need to handle the urls on the site correctly in order for them to be removed over the long-term. That means having the urls resolve with a 404 or 410, putting the content behind a login, or by using the meta robots tag using “noindex”. If you do one of those things, then the urls won’t reappear in Google’s index 90 days later.

Bonus: Double Check Logs To Ensure The Pages Aren’t Being Visited Anymore
And for those of you that dig analyzing log files, you can check your logs to make sure traffic drops off to the urls on the rogue subdomain. You can use Screaming Frog Log File Analyzer to import your logs and check activity (and hopefully see declining activity.)
Screaming Frog Log Analyzer for checking traffic.

Moving Forward: Monitoring The Problem Over The Long-Term
Once the above steps are completed, you shouldn’t just sit back with a frozen margarita in your hand gazing at Google Analytics. Instead, you should monitor the situation to make sure all is ok long-term. That includes reviewing the GSC reporting for the rogue subdomain, which can reveal several key things.

1. Make sure indexation drops.
Check the index status report and perform site commands to make sure the urls remain out of Google’s index. If you see them come back for some reason, make sure the urls are 404ing, they are properly noindexed, or behind a login. You should also make sure someone didn’t reinclude the urls in the Removal URLs Tool in GSC.
Index Status reveals urls being deindexed.

2. Make sure 404s show up in crawl errors.
By checking the crawl errors reporting in GSC, make sure the urls are showing up as 404s. For the subdomain we were working on, you can see the rise of 404s since my client implemented the correct changes. If you don’t see 404s increasing, you might not be correctly implementing 404s across the site or subdomain.
404s increasing as rogue urls are crawled by Google.

3. Make sure traffic drops off.
By checking the search analytics reporting in GSC, you can see impressions and clicks over time. Make sure they are dropping after removing the urls from Google. If you don’t see a drop in impressions and clicks for the rogue urls, then double check the Remove URLs tool, make sure the urls are indeed 404ing, being noindexed, etc.
Clicks and impressions drop after using the Remove URLs Tool.

Summary – Removing URLs… and Showing Murphy The Exit Door
If you find yourself in a situation where a rogue subdomain gets indexed with duplicate content, then use all of the tools available to you to rectify the situation. And that includes the Remove URLs Tool in GSC. Unfortunately, when implementing large-scale changes on complex sites, site owners should understand that things can, and will, go wrong. Murphy’s Law for SEO is real and can definitely throw a wrench into your operation. My advice is to be ready for Murphy to show up, and then move quickly to kick him out.

GG

 

Filed Under: google, seo, tools

Should You Remove Low-Quality Or Thin Content Versus Improving It? Seeing The Forest Through The Trees

November 2, 2017 By Glenn Gabe 7 Comments

Removing low-quality and thin content.

Google’s quality algorithms are always at work. And Google’s John Mueller has explained a number of times that if you are seeing a decrease in rankings during algorithm updates, and over the long-term, then it could mean that Google’s quality algorithms might not be convinced that your site is the best possible result for users. I would even go a step further and say it also includes user experience.

In addition, John has explained several times that ALL pages indexed are taken into account when Google evaluates quality. So yes, every page that’s indexed counts towards your site’s “quality score”.

Here are some clips from John explaining this:
At 10:06 in the video:

At 25:20 in the video:

And here’s a quote from a blog post from Jennifer Slegg where she covers John explaining this in a webmaster hangout video.

In the post John explains:
“From our point of view, our quality algorithms do look at the website overall, so they do look at everything that’s indexed.”

In addition, Panda (which has changed since being incorporated into Google’s core ranking algorithm) is also on the hunt for sites with low-quality content. Panda now continually runs and slowly rolls out over time. It still requires a refresh, but does not roll out on one day. Last October, Gary Illyes was interviewed and explained that Panda evaluates quality for the entire site by looking at a vast amount of its pages. Then it will adjust rankings accordingly.

Here’s the quote from the interview with Gary (you can read Barry’s post and listen to the audio interview via that post):

“It measures the quality of a site pretty much by looking at the vast majority of the pages at least. But essentially allows us to take quality of the whole site into account when ranking pages from that particular site and adjust the ranking accordingly for the pages.”

So from what we’ve been told, Panda is still a site-level score, which then can impact specific pieces of content based on query. That means site-level quality is still evaluated, which also means that all content on the site is evaluated. And remember, John has said that ALL pages indexed are taken into account by Google’s quality algorithms.

The reason I bring all of this up is because Gary Illyes recently said at SMX East that removing low-quality or thin content shouldn’t help with Panda, and it never should have. So there’s definitely confusion on the subject and many are taking Gary’s statement and quickly believing that removing low-quality or thin content won’t help their situation from a quality standpoint. Or, that they need to boost all low-quality content on a site, even when that includes thousands, tens of thousands, or more urls.

Anyone that has worked on large-scale sites (1M+ urls indexed) knows that it’s nearly impossible to boost the content on thousands of thin pages, or more (at least in the short-term). And then there are times that you surface many urls that are blank, contain almost no content, are autogenerated, etc. In that situation, you would obviously want to remove all of that low-quality and thin content from the site. And you would do that for both users and SEO. Yes, BOTH.

Clarification From John Mueller AGAIN About Removing Low-Quality Content:
John Mueller held another webmaster hangout on Tuesday and I was able to submit a pretty detailed question about this situation. I was hoping John could clarify when it’s ok to nuke content, boost content, etc. John answered my question, and it was a great answer.

He explained that when you surface low-quality or thin content across your site, you have two options. First, you can boost content quality. He highly recommends doing that if you can and he also explained that Google’s Search engineers also confirm that’s the best approach. So if you have content that might be lacking, then definitely try to boost it. I totally agree with that.

Your second option is when there is so much low-quality or thin content, that it’s just not feasible to boost that content. For example, if you find 20K thin urls will no real purpose. In that case, John explained it’s totally ok to nuke that content via 404s or noindexing the urls. In addition, you could use a 410 header response (Gone) to signal to Google that the content is definitely being removed for good. That can quicken up the process of having the content removed from the index. It’s not much quicker, but slightly quicker. John has explained that before as well.

Here is John’s response (at 6:21 in the video):
Note, it was on Halloween, hence the wig. :)

Seeing The Forest Through The Trees – Pruning Content For Users And For SEO:
Removing low-quality content from your site is NOT just for SEO purposes. It’s for users and for SEO. You should always look at your site through the lens of users (just read my posts about Google’s quality updates to learn how a negative user experience can lead to horrible drops in traffic when Google refreshes its quality algorithms.)

Well, how do you think users feel when they search, click a result, and then land on a thin article that doesn’t meet their needs? Or worse, hit a page with thin or low-quality content and aggressive ads? Maybe there are more ads than content and the content didn’t meet their needs. That combination is the kiss of death.

They probably won’t be so thrilled about your site. And Google can pick up user happiness in aggregate. And if that looks horrible over time, then you will probably not fare well when Google refreshes its quality algorithms down the line.

So it’s ultra-important that you provide the best content and user experience as searchers click a result and reach your site. That’s probably the most important reason to maintain “quality indexation”. More on that soon, but my core point is that index management is important for users and not just some weird SEO tactic.

Below, I’ll quickly cover the age-old problem of dealing with low-quality content. And I’ll also provide some examples of how increasing your quality by managing “quality indexation” can positively impact a site SEO-wise.

Nuking Bad Content – An Age-Old Problem
Way back in web history, there was an algorithm named Panda. It was mean, tough, and you could turn to stone by looking into its eyes. OK, that’s a bit dramatic, but I often call old-school Panda “medieval Panda” based on how severe the hits were.

For example, the worst hit I’ve ever seen was 91%. Yes, the site lost 91% of its traffic overnight.

Here’s what a big-time Panda hit looked like. And there were many like this when medieval Panda rolled out:

Medieval Panda Attack

The Decision Matrix From 2014 – That Works
When helping companies with Panda, you were always looking for low-quality content, thin content, etc. On larger-scale victims (1M+ pages indexed), it wasn’t unusual to find thousands or tens of thousands of low-quality pages. And when you found that type of thin content on a site that’s been hammered by Panda, you would explain to a client that they need to deal with it, and soon.

That’s when I came up with a very simple decision matrix. If you can boost the content, then do that. If you can’t improve it, but it’s fine for users to find once they are on the site, then noindex it. And if it’s ok to remove for good, then 404 it.

Here is the decision matrix from my older Panda posts (and it’s still relevant today):
Panda Decision Matrix For Removing or Improving Low-Quality Content

I called this “managing quality indexation” and it’s extremely important for keeping the highest quality content in Google’s index. And when you do that, you can steer clear of getting hit by Google’s quality algorithms, and Panda.

So, it was never just about nuking content. It was about making sure users find what they need, ensuring it’s high-quality and that it meets or exceeds user expectations, and making sure Google only indexes your highest quality content. You can think of it as index management as well.

Therefore, boost low-quality content where you can, nuke where you can’t (404 or 410), and keep on the site with noindex if it’s ok for users once they are on the site. This is about IMPROVING QUALITY OVERALL. It’s not a trick. It’s about crafting the best possible site for users. And that can help SEO-wise.

It’s Never Just Nuking Content, It’s Improving Quality Overall
There’s also another important point I wanted to make. When helping clients that have been negatively impacted by major algorithm updates focused on quality, they NEVER just nuke content. There’s a full remediation plan that takes many factors into account from improving content quality, to removing UX barriers to cutting down aggressive, disruptive, or deceptive advertising, and more.

Therefore, I’m not saying that JUST nuking content will result is some amazing turnaround for a site. It’s about improving quality overall. And by the way, Google’s John Mueller has explained this before. He said if you have been impacted by quality updates, then you should look to significantly improve quality over the long-term.

Here’s a video of John explaining that (at 39:33 in the video):

And What Can This All Lead To? Some examples:
Below, you can see trending for sites that worked hard on improving quality overall, including improving their index management (making sure only their highest-quality content is indexed). Note, they didn’t just nuke content… They fixed technical SEO problems, published more high-quality content, but they also removed a large chunk of low-quality or thin content. Some of the sites removed significant amounts of low-quality content (tens or even hundreds of thousands of urls in total). They are large-scale sites with millions of pages indexed.

Surge after improving quality indexation.

Surge after improving quality overall.

Surging during multiple quality updates.

Surge after increasing quality overall.

Surge after improving quality overall including index management.

And there are many more examples that fit into this situation. Always look to improve quality indexation by boosting content quality AND removing low-quality or thin content. If you do, great things can happen.

My Advice, Which Hasn’t Changed At All (nor should it)
After reading this post, I hope you take a few important points away. First, you should focus on “quality indexation”. Google’s quality algorithms take all pages indexed into account when evaluating quality for a site. So make sure you’re best content is indexed. From a low-quality or thin content standpoint, you should either improve, noindex, or 404 the content. That’s based on how much there is, if it’s worthwhile to improve, if it’s valuable to users once they visit your site, etc. You can read the section earlier for more information on the decision matrix.

In closing, don’t be afraid to remove low-quality or thin content from your site. And no, you DON’T need to boost every piece of low-quality content found on your site. That’s nearly impossible for large-scale sites. Instead, you can noindex or 404 that content. Google’s John Mueller even said that’s a viable strategy.

So fire away. :)

GG

 

Filed Under: algorithm-updates, google, seo

Connect with Glenn Gabe today!

Latest Blog Posts

  • Continuous Scroll And The GSC Void: Did The Launch Of Continuous Scroll In Google’s Desktop Search Results Impact Impressions And Clicks? [Study]
  • How to analyze the impact of continuous scroll in Google’s desktop search results using Analytics Edge and the GSC API
  • Percent Human: A list of tools for detecting lower-quality AI content
  • True Destination – Demystifying the confusing, but often accurate, true destination url for redirects in Google Search Console’s coverage reporting
  • Google’s September 2022 Broad Core Product Reviews Update (BCPRU) – The complexity and confusion when major algorithm updates overlap
  • Google Multisearch – Exploring how “Searching outside the box” is being tracked in Google Search Console (GSC) and Google Analytics (GA)
  • Sitebulb Server – Technical Tips And Tricks For Setting Up A Powerful DIY Enterprise Crawler (On A Budget)
  • Google’s Helpful Content Update Introduces A New Site-wide Ranking Signal Targeting “Search engine-first Content”, and It’s Always Running
  • The Google May 2022 Broad Core Update – 5 micro-case studies that once again underscore the complexity of broad core algorithm updates
  • Amazing Search Experiments and New SERP Features In Google Land (2022 Edition)

Web Stories

  • Google’s December 2021 Product Reviews Update – Key Findings
  • Google’s April 2021 Product Reviews Update – Key Points For Site Owners and Affiliate Marketers
  • Google’s New Page Experience Signal
  • Google’s Disqus Indexing Bug
  • Learn more about Web Stories developed by Glenn Gabe

Archives

  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • GSQi Home
  • About Glenn Gabe
  • SEO Services
  • Blog
  • Contact GSQi
Copyright © 2023 G-Squared Interactive LLC. All Rights Reserved. | Privacy Policy
This website uses cookies to improve your experience. Are you ok with the site using cookies? You can opt-out at a later time if you wish. Cookie settings ACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience. You can read our privacy policy for more information.
Cookie Consent