The Internet Marketing Driver

  • GSQi Home
  • About Glenn Gabe
  • SEO Services
    • Algorithm Update Recovery
    • Technical SEO Audits
    • Website Redesigns and Site Migrations
    • SEO Training
  • Blog
    • Web Stories
  • Contact GSQi

Archives for May 2013

Penguin 2.0 Initial Findings – A Deeper Update, But Not Broader [Analysis]

May 29, 2013 By Glenn Gabe 26 Comments

Penguin 2.0 Initial Findings

Penguin 2.0 launched on Wednesday, May 22nd, and it’s an update that most SEOs have been eagerly awaiting.  Leading up to the rollout, all signs pointed to a nasty algorithm update that was going to be bigger and badder than the Penguin 1.0.  There was a lot of speculation about how aggressive it would be, what it would target, and how much impact it would have across the web.  Well, now that it rolled out, was it as nasty as many thought it would be?  Did it target more types of webspam?  And if you are hit, how can you recover?  I’ll try and answer some of these questions below, based on analyzing thirteen sites hit by Penguin 2.0.

In addition to the official Penguin 2.0 update, there was a Phantom Update on May 8th, which could have been Penguin 2.0 being tested in the wild.  I wrote a post explaining my findings, based on analyzing four (now seven) sites hit by that update.  It’s important to read that post as well as this one, so you can start to understand the various factors that led to a drop in rankings and traffic for sites hit on that day.

Analyzing Penguin 2.0
Since Penguin 2.0 launched, I took the same approach that I took with Penguin 1.0.  I’ve been heavily analyzing sites hit by our new, black and white friend.  I’ve been monitoring webmaster forums, receiving emails from P2.0 victims, and digging into each site.  My goal has been to identify consistent factors that impacted sites hit by the latest algorithm update.  I have also been looking for any new additions that Penguin 2.0 might be targeting webspam-wise.

As of today, I have analyzed thirteen sites hit by Penguin 2.0 (I know, unlucky number).  That includes drilling into their link profiles, reviewing their content, and deciphering what led to the Penguin hit.  This post details my initial findings.

Deeper: Yes, Broader: No – Unnatural Links Still the Driving Force, Other Webspam Factors Not Being Targeted
As I explained earlier, we heard that Penguin 2.0 was going to be bigger and nastier than 1.0, but nobody knew exactly what that meant.  Personally, I thought it could mean that more webspam tactics could be targeted, versus just spammy inbound links.  In case you aren’t that familiar with Penguin 1.0, it heavily targeted unnatural links.  If you had a spammy link profile, then you were susceptible to getting pecked.  And a peck could mean a significant drop in Google organic search traffic (sometimes by over 90% overnight).

So, did Penguin 2.0 target additional forms of webspam?  Not in my opinion.  Again, I’ve analyzed thirteen sites hit by P2.0 and all of them had serious link profile issues.  Some had more balanced link profiles than sites hit by Penguin 1.0, but you could easily see the gaming of links from a mile away.  The heavy use of exact match anchor text stood out like a sore thumb.  And some of the sites I analyzed had hundreds of thousands of links using exact match anchor text from questionable sites.  More about those links and websites later in this post.

Penguin 2.0 Heavily Targets Unnatural Links

Homepage vs. Deeper Pages
The one important point to note is the “deeper” reference earlier.  During Twig on 5/22, Matt Cutts announced the release of Penguin 2.0.  During that interview, he explained that Penguin 1.0 only analyzed your homepage links and not pages deeper on your website.  To clarify that point, your domain could still be hit, but Penguin 1.0 only analyzed the homepage link profile to identify the gaming of links.  Looking back, I can see why they launched the first version of Penguin this way.  There were many low quality sites using exact match anchor text leading to their homepages (in an attempt to rank for those keywords).  That’s a good way to launch the first version of Penguin and see how it impacted sites across the web.

But Matt also explained that Penguin 2.0 now analyzed deeper pages on the site.  And that line made a lot of sense to me…  I had some companies reaching out to me after Penguin 1.0 launched complaining that their competitors were using the same tactics they were.  They wanted to know why those companies weren’t getting hit!  Now that we know Penguin 1.0 heavily analyzed the homepage, and didn’t take deeper pages into account, we understand that could be the factor that saved those companies (at least for the time being).  Now that P2.0 is out, those companies using spammy links pointing to deeper pages very well could have gotten hit.

Penguin 2.0 Analyzes Deeper Pages

Am I Seeing The “Deeper Pages” Factor?
I am absolutely seeing websites using exact match anchor text leading to a number of pages on their sites (versus just the homepage).  Actually, every one of the thirteen sites I analyzed had this issue.  So Matt might be telling us the truth when he explained that Penguin 2.0 is deeper.  But again, it’s not broader (taking other webspam tactics into account).

To quickly recap, I have not seen any sign that additional types of webspam were targeted by Penguin 2.0.  It’s still extremely link-based.  I have also seen sites with unnatural links pointing to deeper pages on the site get hit by Penguin 2.0.

Collateral Damage
During major algorithm updates, there are always webmasters that claim they were unfairly hit.  That’s definitely the case sometimes, but I can tell you that I have not seen any collateral damage from Penguin 2.0 first-hand.  All of the sites I have analyzed clearly had unnatural link issues.  And some had extreme unnatural link issues that are going to take a lot of work to rectify… And yes, I can hear the frustration in the voices of the business owners calling me.  Some have a long and tough road ahead of them if they want to recover.

Types of Unnatural Links Remain Consistent
When analyzing unnatural links of sites hit by Penguin 2.0, did the types of unnatural links change at all?  Not from what I can see.  I saw many familiar link types, including comment spam, article sites, spammy directories, blogroll links, link networks (public and private), etc.  Basically, the same types of link manipulation are being targeted by Penguin 2.0 as were targeted by Penguin 1.0 (based on my analysis).

And similar to what I saw with Penguin 1.0, risky sites continually showed up in link profiles.  For example, attack sites, sites hit by malware, etc. I’m not saying that getting hit by malware, or sites that are hacked, get targeted by Penguin 2.0, but a long-term issue without fixing problems like that is a clear signal about the quality of the site.  Think about it, most webmasters hit by malware, or that are being flagged as an attack site, would fix those problems asap.  They wouldn’t let it sit for weeks or months.  I noticed the same situation when analyzing sites hit by Penguin 1.0.

In case you are wondering what a  link scheme is, here is a screenshot from Google Webmaster Guidelines listing various types of link schemes:

Link Schemes

What To Do If You’ve Been Hit
Similar to Penguin 1.0, you need to heavily analyze your link profile to identify unnatural links.  You should organize them by quality and start to create a master list of links to nuke.  And by “nuke”, I don’t mean you should simply disavow all of the unnatural links.  Google wants to know that you tried as hard as possible to manually remove them.  That means setting up a communication plan to webmasters in control of sites that contain spammy links leading to your website.  No, that process isn’t easy, and you can expect a lot of interesting messages back (with some people trying to charge you for link removal).  You can also 404 pages receiving spammy links, but that obviously guts the content on your site.  That’s not the best approach for all situations.

Once you work hard to remove as many links as possible, you can absolutely use the disavow tool for the remaining links.  But again, that shouldn’t be used for the majority of links…  Once you take care of the link situation, you’ll need to wait for another Penguin update in order to see positive movement.  Then again, I have seen Penguin updates during Panda updates (which makes me think they are connected somehow).  You can read my Penguin recovery case studies to learn more about how my clients recovered from Penguin 1.0.

Penguin 2.0 – Now is the time to take action
That’s what I have for now.  I’ll continue analyzing websites hit by Penguin 2.0 and will write follow-up posts covering additional findings.  I’m already helping several clients with dealing with Penguin 2.0, and I anticipate helping more in the coming weeks and months.  If you have any questions, feel free to post them in the comments.  Good luck.

GG

 

Filed Under: algorithm-updates, google, seo

SEO Findings From Google’s Phantom Update on May 8th, 2013 | Was It Penguin 2.0 in the Wild?

May 20, 2013 By Glenn Gabe 47 Comments

Google Phantom Update May 2013

In early May there was a lot of chatter in the webmaster forums about a major Google update.  Google wouldn’t confirm that it occurred (big shock), but the level of chatter was significant.  Not long after that, Matt Cutts announced that Penguin 2.0 would be rolling out within the next few weeks, and that it would be big.  All signs point to a major update that will be larger and nastier than Penguin 1.0.  Now, I’ve done a lot of Penguin work since April 24, 2012 when Penguin 1.0 rolled out, and I can’t imagine a larger, nastier Penguin.  But that’s exactly what’s coming.

So what was this “phantom update” that occurred on May 8th?  Was it a Panda update, some other type of targeted action, or was it actually Penguin 2.0 being tested in the wild?  I’m a firm believer that Google rolls out major updates to a subset of websites prior to the full rollout in order to gauge its impact.  If Penguin 2.0 is rolling out soon, then what we just saw (the phantom update) could very well be our new, cute black and white friend.  I guess we’ll all know soon enough.

Google Phantom Update Trending Graph

 

*Update: Penguin 2.0 Launched on 5/22*
Penguin 2.0 launched on May 22nd and I have analyzed a number of sites hit by latest algorithm update. I published a post containing my findings, based on analyzing 13 sites hit by Penguin 2.0.  You should check out that post to learn more about our new, icy friend.

The First Emails From Webmasters Arrived on May 9th
The first webmasters to contact me about this phantom update were confused, nervous and seeking help and guidance.  They noticed a significant drop in Google organic search traffic starting on the prior day (May 8th), and wanted to find out what was going on.  Four websites in particular saw large drops in traffic and were worried that they got hit by another serious Google algorithm update.

So, I began to heavily analyze the sites in question to determine the keywords that dropped, the content that was hit, the possible causes of the drop, etc.  I wanted to know more about this phantom update, especially if this could be the beginnings of Penguin 2.0.

Phantom Update Findings
After digging into each of the four sites since 5/9, I have some information that I would like to share about what this phantom update was targeting.  Since Penguin 1.0 heavily targeted unnatural links, I wanted to know if this update followed the same pattern, or if there were other webspam factors involved (and being targeted).   Now, my analysis covers four sites, and not hundreds (yet), but there were some interesting findings that stood out.

Below, I’ll cover five findings based on analyzing websites hit by Google’s Phantom Update on May 8th.  And as Penguin 2.0 officially rolls out, keep an eye on my blog.  I’ll be covering Penguin 2.0 findings in future blog posts, just like I did with Penguin 1.0.

Google Phantom Update Findings:

  1. Link Source vs. Destination
    One of the websites I analyzed was upstream unnatural links-wise. It definitely isn’t a spammy website, directory, on anything like that, but the site was linking to many other websites using followed links (when a number of those links should have been nofollowed).  Also, the site can be considered an authority in its space, but it was violating Google’s guidelines with its external linking situation.I’ve analyzed over 170 sites hit by Penguin since April 24, 2012, and this site didn’t fit the typical Penguin profile exactly…  There were additional factors, some of which I’ll cover below.But, being an upstream source of unnatural links was a big factor why this site got hit (in my opinion). So, if this is a pre-Penguin 2.0 rollout, I’m wondering how many other sites with authority will get hit when the full rollout occurs.  I’m sure there are many site owners that believe they can’t get hit, since they think they are in great shape as an authority…  I can tell you this site got hit hard.
  2. Cross-Linking (Network-like)
    Two of the sites that were hit were cross-linking heavily to each other (as sister websites).  And to make matters worse, they were heavily using exact match anchor text links.  Checking the link profiles of both sites, the sister sites accounted for a significant amount of links to each other… and again, they were using exact match anchor text for many of those links. It’s worth noting that I’ve helped other companies (before this update) with a similar situation.  If you own a bunch of domains, and you are cross linking the heck out of them using exact match anchor text, you should absolutely revisit your strategy.  This phantom update confirms my point.
    Google Phantom Update Crosslinking
  3. Risky Link Profiles (historically as well as current)
    This was more traditional Penguin 1.0, but each of the four sites had risky links.  Now, one site in particular had a relatively strong link profile, has been around for a long time, and had built up a lot of links over time.  But, there were pockets of serious link problems.  Spammy directories, comment spam, and spun articles were driving many unnatural links to the site.  But again, this wasn’t overwhelming percentage-wise.  I’ve analyzed some sites hit by Penguin 1.0 that had 80-90% spammy links.This wasn’t the case with the site mentioned above. Two of the sites I analyzed had more spammy links.  Their situation looked more Penguin 1.0-like.  And they got hit hard.  There were many spammy directories linking to the sites using exact match anchor text, comment spam was a big problem, etc.  And drilling into their historic link profile, there were many more that had been deleted already.  So, their link profiles had “unnatural link baggage”. And I already mentioned the cross-linking situation earlier (with two sites).  So yes, links seemed to have a lot to do with this phantom update (at least based on what I’ve seen).Google Phantom Update Unnatural Links
  4. Scraping Content
    To make matters more complex, two of the sites were also scraping some content to help flesh out pages on the site.  This wasn’t a huge percentage of content across each of the two sites mentioned, but definitely was big enough of a problem that it stood out during my analysis.  The other two sites didn’t seem to have this problem at all. Scraping-wise, one site was providing excerpts from destination webpages, and then linking to those pages if users wanted more information (this was happening across many pages).  The other site had included larger portions of text from the destination page without linking to it (more traditional scraping).
  5. Already Hit by Panda
    OK, this was interesting… All four sites I analyzed had been hit by at least one Panda update in the past.  Two were hit by the first rollout (Feb 2011), one was hit in July of 2012, and the other in September 2012.  Clearly, Google had a problem at one point with the content on these sites.  So, how does that factor into the phantom update or Penguin 2.0?  I’m not sure, but it was very interesting to see all four had Panda baggage.So, does your Google baggage hurt you down the line, and does the combination of unnatural links and content spam exponentially hurt your site with this update?  I’ll need to analyze more sites before I can say with confidence, but it’s worth noting.Google Phantom Update Panda History


Reminder: Algorithmic Updates = Tough Stuff
By now, you might notice something important… how hard analyzing updates like this can be for SEOs (and trying to pinpoint the root cause).  Was it unnatural links that got these sites hit by the phantom update, or was it the content piece?  Or, was it the combination of both?  If this is Penguin 2.0, does it score a site based on a number of webspam tactics?   Based on what I’ve heard and read about Penguin 2.0, this very well could be the case.  Again, more data will hopefully lead to a clearer view of what was targeted.

The Big Question – Was this Pre-Penguin 2.0 or something else?
Based on my analysis of sites hit by the phantom update, I’m wondering if we are looking at a two-headed monster.  Maybe some combination of Panda and Penguin that Google is calling Penguin 2.0?  If that’s the case, it could be disastrous for many webmasters.  And maybe that’s why Matt Cutts is saying this is going to be larger than 1.0 (and why he even shot a video about what’s coming).

Based on what I found, links seemed to definitely be an issue, but there were content issues and earlier Panda hits too.  There were several factors at play between the four sites I analyzed, and it’s hard to tell what exactly triggered each hit.  The only good news for webmasters out there is that none of the four seemed like collateral damage.  Each had its own issues to deal with webspam-wise.

Stay Tuned – There’s an Icy Update Approaching
That’s all I have for now.  If you have any questions, feel free to include them in the comments below.  And definitely keep an eye on my blog as Penguin 2.0 rolls out.  As I mentioned above, I’ll be heavily analyzing websites hit the new and nastier Penguin.  Good luck.

GG

Filed Under: algorithm-updates, google, seo

Robots.txt and Invisible Characters – How One Hidden Character Could Cause SEO Problems

May 13, 2013 By Glenn Gabe 1 Comment

How syntax errors in robots.txt can cause SEO problems.

If you’ve read some of my blog posts in the past, then you know I perform a lot of SEO technical audits.  As one of the checks during SEO audits, I always analyze a client’s robots.txt file to ensure it’s not blocking important directories or files.  If you’re not familiar with robots.txt, it’s a text file that sits in the root directory of your website and should be used to inform the search engine bots which directories or files they should not crawl.  You can also add autodiscovery for your xml sitemaps (which is a smart directive to add to a robots.txt file).

Anyway, I came across an interesting situation recently that I wanted to share.  My hope is that this post can help some companies avoid a potentially serious SEO issue that was not readily apparent.  Actually, the problem could not be detected by the naked eye.  And when a problem impacts your robots.txt file, the bots won’t follow your instructions.  And when the bots don’t follow instructions, they can potentially be unleashed into content that should never get crawled.  Let’s explore this situation in greater detail.

A sample robots.txt file:

Sample Robots.txt File

Technical SEO – Cloaked Danger in a Robots.txt File
During my first check of the robots.txt file, everything looked fine.  There were a number of directories being blocked for all search engines.  Autodiscovery was added, which was great.  All looked good.  Then I checked Google Webmaster Tools to perform some manual checks on various files and directories (based on Google’s “Blocked URLs” functionality).  Unfortunately, there were a number of errors showing within the analysis section.

The first error message started with the User-agent line (the first line in the file).  Googlebot was choking on that line for some reason, but it looked completely fine.  And as you can guess, none of the directives listed in the file were being adhered to.  This meant that potentially thousands of files would be crawled that shouldn’t be crawled, and all because of a problem that was hiding below the surface…  literally.

Blocked URLs reporting in Google Webmaster Tools:

Blocked URLs in Google Webmaster Tools

 

Word Processors and Hidden Characters
So I started checking several robots.txt tools to see what they would return.  Again, the file looked completely fine to me.  The first few checks returned errors, but wouldn’t explain exactly what was wrong.  And then I came across one that revealed more information.  The tool revealed an extra character (hidden character) at the beginning of the robots.txt file.  This hidden character was throwing off the format of the file, and the bots were choking on it.  And based on the robots syntax being thrown off, the bots wouldn’t follow the instructions.  Not good.

Invisible Character in Robots.txt

I immediately sent this off to my client and their dev team tracked down the hidden character, and created a new robots.txt file.  The new file was uploaded pretty quickly (within a few hours).  And all checks are fine now.  The bots are also adhering to the directives included in robots.txt.

 

The SEO Problems This Scenario Raises
I think this simple example underscores the fact that there’s not a lot of room for error with technical SEO… it must be precise.  In this case, one hidden character in a robots.txt file unleashed the bots on a lot of content that should never be crawled.  Sure, there are other mechanisms to make sure content doesn’t get indexed, like the proper use of the meta robots tag, but that’s for another post.  For my client, a robots.txt file was created, it looked completely fine, but one character was off (and it was hidden).  And that one character forced the bots to choke on the file.

 

How To Avoid Robots.txt Formatting Issues
I think one person at my client’s company summed up this situation perfectly when she said, “it seems you have little room for error, SEO seems so delicate”.  Yes, she’s right (with technical SEO).  Below, I’m going to list some simple things you can do to avoid this scenario.   If you follow these steps, you could avoid faulty robots.txt files that seem accurate to the naked eye.

1. Text Editors
Always use a text editor when creating your robots.txt file.  Don’t use a word processing application like Microsoft Word.  A text editor is meant to create raw text files, and it won’t throw extra characters into your file by accident.

2. Double and Triple Check Your robots.txt Directives
Make sure each directive does exactly what you think it will do.  If you aren’t 100% sure you know, then ask for help.  Don’t upload a robots.txt file that could potentially block a bunch of important content (or vice versa).

3. Test Your robots.txt File in Google Webmaster Tools and Via Third Party Tools
Make sure the syntax of your robots.txt file is correct and that it’s blocking the directories and files you want it to.  Note, Google Webmaster Tools enables you to copy and paste a new robots file into a form and test it out.  I highly recommend you do this BEFORE uploading a new file to your site.

4. Monitor Google Webmaster Tools “Blocked URLs” Reporting
The blocked urls functionality will reveal problems associated with your robots.txt file under the “analysis” section.  Remember, this is where I picked up the problem covered in this post.

 

Extra Characters in Robots.txt – Cloaked in Danger
There you have it.  One hidden character bombed a robots.txt file.  The problem was hidden to the naked eye, but the bots were choking on it.  And depending on your specific site, that one character could have led to thousands of pages getting crawled that shouldn’t be.  I hope this post helped you understand that your robots.txt format and syntax are extremely important, that you should double and triple check your file, and that you can test and monitor that file over time.  If the wrong file is uploaded to your website, bad things can happen.  Avoid this scenario.

GG

 

Filed Under: bing, google, seo, tools

Connect with Glenn Gabe today!

Latest Blog Posts

  • Continuous Scroll And The GSC Void: Did The Launch Of Continuous Scroll In Google’s Desktop Search Results Impact Impressions And Clicks? [Study]
  • How to analyze the impact of continuous scroll in Google’s desktop search results using Analytics Edge and the GSC API
  • Percent Human: A list of tools for detecting lower-quality AI content
  • True Destination – Demystifying the confusing, but often accurate, true destination url for redirects in Google Search Console’s coverage reporting
  • Google’s September 2022 Broad Core Product Reviews Update (BCPRU) – The complexity and confusion when major algorithm updates overlap
  • Google Multisearch – Exploring how “Searching outside the box” is being tracked in Google Search Console (GSC) and Google Analytics (GA)
  • Sitebulb Server – Technical Tips And Tricks For Setting Up A Powerful DIY Enterprise Crawler (On A Budget)
  • Google’s Helpful Content Update Introduces A New Site-wide Ranking Signal Targeting “Search engine-first Content”, and It’s Always Running
  • The Google May 2022 Broad Core Update – 5 micro-case studies that once again underscore the complexity of broad core algorithm updates
  • Amazing Search Experiments and New SERP Features In Google Land (2022 Edition)

Web Stories

  • Google’s December 2021 Product Reviews Update – Key Findings
  • Google’s April 2021 Product Reviews Update – Key Points For Site Owners and Affiliate Marketers
  • Google’s New Page Experience Signal
  • Google’s Disqus Indexing Bug
  • Learn more about Web Stories developed by Glenn Gabe

Archives

  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • GSQi Home
  • About Glenn Gabe
  • SEO Services
  • Blog
  • Contact GSQi
Copyright © 2023 G-Squared Interactive LLC. All Rights Reserved. | Privacy Policy
This website uses cookies to improve your experience. Are you ok with the site using cookies? You can opt-out at a later time if you wish. Cookie settings ACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience. You can read our privacy policy for more information.
Cookie Consent