There’s never a dull moment in Google Land. And that’s especially the case for health and medical sites over the past year starting with the Medic Update in August of 2018. Since then, we have seen several major core updates, which seem to include a new method of evaluating quality for health and medical sites. Google’s John Mueller explained more about the changes in a webmaster hangout video in May.
That algorithmic change has led to some insane volatility for various sites in the health and medical niche. Some are surging and dropping with every update, which is a clear sign that Google is heavily tinkering with those algorithms (turning the dial up and down to try and find the right balance). I’ve been pretty vocal that I believe Google hasn’t figured this out yet… and I believe we’ll see even more volatility in the health space with the next core update (which we are due for).
I’ve had the opportunity to help a number of health and medical sites over the years, and even more since last August when the Medic Update landed. It’s been fascinating to see the impact, surface problems across those sites, watch some sites surge, others drop more, and some ride the “Google roller coaster” (surging and dropping with each core update). Like I said earlier, there’s never a dull moment in Google Land.
A Multi-Core Victim with an interesting content problem:
I recently started helping another health and medical site that has gotten hammered over several core updates. Their traffic dropped 54% during the June core update, and they are down 67% since the March core update. After the latest drop in June, the site owners finally decided to have someone come in and heavily analyze the site through the lens of Google’s core updates to see what’s going on.
Here are the drops from the March and June core updates:
During my first wave of analysis, I surfaced a huge problem that I’m sure is contributing to the site’s drop during core updates. Sure, there’s never one smoking gun with Google’s core updates… there is typically a battery of problems. For example, I’m still heavily analyzing the site and surfacing more issues as I write this post. But, the issue I found was big enough that I actually stopped analyzing the site to craft a separate deliverable just about the topic. That’s what I’m going to cover in my post.
The Find – The Domino Effect of Negative Impact From Copied Content
One of the first things I like to do when helping a company that was impacted by a core update is to run a traffic drop report (what I previously called a Panda Report). The report enables me to see the content that dropped the most after a major core update rolls through. It can often reveal glaring problems across once-popular content on a site.
It wasn’t long before I noticed a disturbing trend. There were many articles that dropped that were copied from third-party health sites and blogs. And to make matters worse (and what should be obvious to SEOs reading this post), the copied pages were fully indexable. In other words, they weren’t being noindexed and they weren’t being canonicalized to the original articles. By the way, it’s not like those tactics make copying content better… but it at least decreases the chance of those pages outranking the original.
Here is an example of one page copied that was ranking for over 1K queries in the top 10 results BEFORE the March core update pummeled the site (and the copied content):
Important reminder: YMYL sites held to a higher standard
Remember, this is a site focused on “your money or your life” (YMYL) topics in health and medical. And many sites in the health/medical niche have gotten hammered since the Medic update, seeing a big drop in rankings in traffic during several core updates in 2019. It’s a volatile space, that’s for sure.
Based on that impact, many health and medical sites have been working hard to publish killer content, showcase their expertise, hire medical experts to write and review their content, build medical review boards, and more. So, having content that’s copied from other sites in their niche is clearly not a good thing. And some of those sites are the biggest players in the health and medical category.
In total, there were hundreds of articles that fell into this category. And to clarify, there was original content on the site beyond the copied content. It’s not like the entire site was filled with copied content. But again, there were hundreds of copied articles that were freely indexable (and some were ranking well at certain points in time).
When checking the trending of those copied articles over time, many had dropped heavily in rankings and organic search traffic. Trending for many of those specific pieces of content looked like this:
The Domino Effect: Wait, were those sites hit too?
When checking the content that was copied, I noticed some of those articles weren’t high quality. So I decided to check the sites where the content was being copied from. And low and behold, several of those sites had been hit by major core updates as well! Therefore, we had a classic SEO double whammy. The site I was analyzing was copying content from other sites in the niche, but it was also copying content from several sites that were also being hit hard by core updates.
Here is search visibility trending for two of those sites.
Copied Content and Major Google Algorithm Updates (Including Medieval Panda)
I’ve seen copied content, and the incorrect use of consuming syndicated content, cause massive problems over the years. I wrote about it (check the Q&A) when medieval Panda roamed the web and I’ve seen it cause problems with major core updates as well. It makes sense. If Google believes you are copying another site’s content, and then ranking for that content, then Google’s quality algorithms can have a big problem with that situation. And in a YMYL world, it can be a very serious issue.
In case you are wondering, Google has explained the problems with copying content a number of times, including publishing a section in its Quality Rater Guidelines (QRG). Google also has a Pirate algorithm for extreme cases, which I have analyzed heavily.
For example, John Mueller has explained that if Google believes a majority of your content is copied, then it can lose trust in your site. In extreme cases, the webspam team might even get involved. Again, I have seen this play out many times over the years (algorithmically and via manual actions). It’s not pretty.
Here is a video of John explaining this (at 26:03 in the video):
And here is a section from the QRG about copied main content:
DMCA threats, but no official takedowns.
After surfacing this situation, I wondered how many DMCA takedowns the site received. It ends up not many were filed for some reason. I found out that there were a few threats of using DMCA takedowns, but none of the sites ever acted on those threats. I believe the site did remove those articles when contacted by third-parties whose content had been copied, but there were no official DMCA takedowns filed.
By the way, that should have been a sign that something wasn’t right… When you are being threatened with DMCA takedowns, that’s never a good sign.
Moving forward: How to resolve the issue
In order to rectify the situation, there are multiple paths the site can take. I’ll cover them below:
1) Nuke it: 404, 410
First, I’m typically aggressive with core update remediation. So, you can probably guess what my recommendation was. I would nuke the copied content via 404s or 410s. Let’s face it… it’s not your content to begin with! Remember, Google takes every page indexed into account when evaluating quality, so removing low-quality content is always a good thing (including copied content).
2) Seek permission and then canonicalize:
Next, if a site really wants to provide that third-party content for some reason, then the site owners can seek permission to use that content and then use rel canonical pointing back to the original page (basically canonicalizing the urls on your site to the original urls). So, this involves requesting authorization to consume syndicated content and then properly canonicalizing the url to the original.
3) Seek permission and noindex:
You could also seek permission for publishing the content on your site and then noindex that content. Remember, if it’s not indexed, it can’t hurt the site quality-wise. And to be clear, this is AFTER receiving approval for publishing the content from the original author and site. Don’t just go and start copying content from anywhere thinking it’s fine. It’s definitely not fine.
4) Remove and 301 redirect to extremely relevant content (if possible)
You could also 301 redirect the old pages you are removing to extremely relevant content on your site. But keep in mind that Google can simply treat the old pages as soft 404s if you redirect to non-relevant content. You can read my case study about that, and my recent post about Google ignoring rel canonical when the content wasn’t the same or extremely relevant. Those are important points to consider.
Summary: The moral of the (SEO) story:
Don’t copy another site’s content. Don’t try to rank based on content that’s not yours. It’s bad karma and as the saying goes… “karma will take care of it”. In this situation, some of those articles were copied from sites that also got hammered by Google’s core updates. So those pieces of copied content could have contributed to the quality problems that this site is dealing with now. Again, it’s a great example of an SEO double whammy.
Again, I’m definitely not saying this is the only problem the site is dealing with. I’m surfacing a number of issues across categories, but having hundreds of copied articles on the site is definitely not helping the situation.
Instead of taking the easy path and copying content, you should always be striving to create 10X content that blows your audience away, that can naturally attracts links, spark social sharing, and drive searches for your brand plus the topic. If you do that, then you won’t have to worry about the domino effect of negative impact from copied content – or about bad karma. Instead, start building good karma… that’s how you win.