The Internet Marketing Driver

  • GSQi Home
  • About Glenn Gabe
  • SEO Services
    • Algorithm Update Recovery
    • Technical SEO Audits
    • Website Redesigns and Site Migrations
    • SEO Training
  • Blog
    • Web Stories
  • Contact GSQi

Archives for September 2018

The Magically Moving Meta Robots Tag And The Potential SEO Danger It Brings [Case Study]

September 19, 2018 By Glenn Gabe Leave a Comment

The Magical Meta Robots Tag

{Update: Google’s John Mueller replied to my question on Twitter about this situation. You can read John’s feedback here.}

{Update 10/27/18: My client’s dev team figured out the problem and I provided more information about that below.}

I’ve been helping a client with a large-scale CMS migration recently and came across a very interesting, and potentially scary, situation. It’s a large and complex site with many moving parts, so there is a lot to process (both figuratively and literally).

The site has over one million pages indexed, and many more that get crawled based on the proper use of noindex in certain areas of the site. That was a big change for the site when I first started helping them a few years ago (after they got pummeled by a major algorithm update). Therefore, I’m always making sure those areas of the site are handled properly (as changes and updates go live on the site).

And now that a CMS migration was under way, I made sure to check on those areas of the site to make sure all was ok. It ends up they were ok… but they sort of weren’t either. Confused? Then read on.

The Magic Show Begins – Typical checks after a CMS migration
As I started crawling and checking the site after the migration went live, I noticed something very strange. The number of noindexed pages during my crawls was sometimes lower than what I expected. And while manually checking those pages, I noticed that my trusty AYIMA plugin sometimes went red (indicating the page was not indexable), but sometimes it didn’t. I found that odd and began to dig in further. Note, other plugins like the DeepCrawl indexability plugin also showed those urls were indexable. Again, strange…

DeepCrawl Chrome Plugin for Indexation

Static HTML Correct, Rendered HTML Not So Much…
When checking the areas of the site I mentioned earlier, and digging into the source code (the static html), I noticed that the meta robots tag was in the proper place and contained the correct directives. For example, it was placed in the head of the document and contained “noindex, follow”.

Meta robots tag from static html.

But we know that Google renders the page and that rendered html can be different than the static html. For example, JavaScript-based content will be rendered and can be indexed by Google. So, I started to check the rendered html via a number of tools like Google’s mobile-friendly test, the rich results testing tool, Chrome Dev Tools (using inspect element), and then various crawling tools like DeepCrawl, Screaming Frog, and Sitebulb. They have the ability to render the page, revealing what the final rendered html would reveal.

Note, the rich results testing tool will render the page via Googlebot desktop, while the mobile-friendly testing tool will render the page as Googlebot for Smartphones. Chrome Dev Tools will show you the rendered html via “inspect element” and then crawling tools like Screaming Frog, Sitebulb, and DeepCrawl have options for crawling with JavaScript rendering on. Therefore, it’s good to test across all of the tools to see what the results are across both mobile and desktop.

Render JavaScript option in Screaming Frog (first image) and then DeepCrawl (second image):

Render JavaScript option in Screaming Frog.

Render JavaScript option in DeepCrawl.

Wait, what just happened?
When I checked the rendered html across those tools, I witnessed the magically moving meta robots tag. The robots tag, and some other head elements, were now in the body of the document and not the head. I had to double and triple check this to make sure I was seeing it properly.

The amazing moving meta robots tag.

The meta robots tag went from line 105 in the source code to line 1,181 in the rendered version of the code. And again, it was NOT in the head of the document, where it needs to reside. It was now in the body! Here is a screenshot from Screaming Frog showing both the source html and the rendered html. Notice the line numbers next to each meta robots tag:

Screaming Frog static versus rendered html.

But Is It Safe?
When spot-checking urls containing the magically moving meta robots tag, they indeed were noindexed. Therefore, I’m assuming Google is using the meta robots tag from the static html and not in the rendered html (more about this in the next section). But will that always be the case? Again, on a site with millions of noindexed pages, and a history of getting hit by major algorithm updates, is it safe to believe that the magically moving meta robots tag won’t eventually cause problems? All good questions.

Checking the meta robots tag via the inspect url tool in GSC.

Google’s 2-phase approach for indexing JavaScript-based content
I passed all of this information along to my client and that was sent to the dev team. I’m still waiting to hear back about why this is happening, but needless to say, it could be very dangerous on a large-scale site with many moving parts. In addition, this is happening in sections of the site that absolutely should be noindexed and not released into the SEO wild.

Also, Google has explained that it uses a two-phase approach for indexing JavaScript-based content. The first phase will crawl and index the static html. And the second phase, which can take a few days or longer, will process and index the rendered html. And remember, the rendered html is where the meta robots tag ends up in the body of the document (for this case).

Here is John Mueller explaining that inserting canonical tags or meta robots tags via JavaScript can cause issues based on the initial phase of indexing versus the second phase (where it renders and indexes the JavaScript version):

So, theoretically for my client, Google would first see and process the proper meta robots tag in the static html, but then it would process and index the rendered html, which has the meta robots tag in the body of the document. If that’s the case, then Google may not decide to noindex the page in the long-term. I have to dig in further across those areas of the site to see how many pages are being noindexed versus indexed over the long-term (even though I’m hoping we just fix this problem sooner than later so we don’t need to worry about this!)

Again, with millions of pages currently being noindexed, I would NOT want to see a chunk of those pages suddenly get indexed. Remember, this site has felt the wrath of major Google algorithm updates in the past. It’s doing very well now, but the battle scars remain.

The CMS or the implementation?
Since my client just moved to a new CMS (a popular enterprise CMS), I naturally starting wondering if the problem was CMS-based or if it was due to the implementation of the CMS. Companies typically tailor the CMS implementation with various customized elements and code, and that can always cause unforeseen issues.

So, I decided to check several others sites running the CMS and they looked fine. The meta robots tag was on the same line in the html in the static and rendered versions (and it was located in the head of the document). Therefore, this looks to be an implementation problem. That’s often the case with CMS migrations. It’s not necessarily the CMS, it’s more about how the implementation goes. That’s where bugs can creep in and potentially cause big problems SEO-wise.

Screaming Frog static versus rendered html for another site running the same CMS.

Update: Feedback From Google’s John Mueller
John Mueller replied to my question about this situation today on Twitter. John explained that if the meta robots tag is in the right place in the static html, then that should be fine. He explained that it’s different when you need to do something specific after indexing and rendering (like adding hreflang via JavaScript). That’s great to know. I am still going to make sure the problem is fixed, since I’m not a fan of having magically moving meta robots tags, but it’s great to know this shouldn’t be a big problem. That makes sense, since I am seeing those pages accurately noindexed. Here’s John’s tweet:

John Mueller's tweet about the magically moving robots tag.

Update: The problem was identified and a fix was implemented
Well, my client’s dev team finally figure out the problem, which is great! It ends up there was a conflict with a GDPR script where it was validating which code to execute (production versus staging). And sometimes the validation was failing, or just late to determine the logic that was necessary (which resulted in a conflict with other JavaScript code on the page). The fix was implemented and the meta robots tag is now correctly being published in the head of the document for both the static and rendered version of the html. See the screenshot below after crawling the pages with rendering on.

Static HTML:
Bug fixed and meta robots tag in the head of the static html.

Rendered HTML:
Bug fixed and meta robots tag in the head of the rendered html.

Quick Tips For Stopping Magically Moving Meta Robots Tags
Based on this case, I have some closing tips for SEOs and site owners that are going through a CMS migration (or for anyone that wants to make sure the meta robots tag, rel canonical, or other important SEO code is set up correctly).

  • Test your site via multiple tools that enable you to view the rendered html. That includes the mobile-friendly test (mobile), the rich results testing tool (desktop), Chrome Dev Tools (using inspect element), etc.
  • You can also test your site via multiple crawling tools that render JavaScript, including Screaming Frog, DeepCrawl, and Sitebulb. Then check the static html versus the rendered html. And make sure to check important reports to see what has changed compared to previous static crawls. For example, you might see noindexed pages drop off a cliff when you compare a static crawl to a JavaScript-rendered crawl. If so, why?
  • If it’s a third-party CMS (and not custom built by your company), then test other sites running the CMS. You might see consistent behavior across those others sites, which would signal a problem with the CMS itself. But if they look fine, then it’s probably your implementation of the CMS. That’s important to know.


Summary – For SEO, don’t rely on magically moving code
This was an interesting case for sure. Any time you see important directives magically moving around the html code, you definitely want to dig in and find out what’s going on. And then it’s important to quickly address the situation with the dev team. That’s especially the case for large-scale sites with many moving parts. I’ll try and post an update after my client surfaces the problem and applies a fix. Needless to say, I’m hoping the magic show ends soon. :)

GG

 

Filed Under: google, seo, tools

Connect with Glenn Gabe today!

Latest Blog Posts

  • Continuous Scroll And The GSC Void: Did The Launch Of Continuous Scroll In Google’s Desktop Search Results Impact Impressions And Clicks? [Study]
  • How to analyze the impact of continuous scroll in Google’s desktop search results using Analytics Edge and the GSC API
  • Percent Human: A list of tools for detecting lower-quality AI content
  • True Destination – Demystifying the confusing, but often accurate, true destination url for redirects in Google Search Console’s coverage reporting
  • Google’s September 2022 Broad Core Product Reviews Update (BCPRU) – The complexity and confusion when major algorithm updates overlap
  • Google Multisearch – Exploring how “Searching outside the box” is being tracked in Google Search Console (GSC) and Google Analytics (GA)
  • Sitebulb Server – Technical Tips And Tricks For Setting Up A Powerful DIY Enterprise Crawler (On A Budget)
  • Google’s Helpful Content Update Introduces A New Site-wide Ranking Signal Targeting “Search engine-first Content”, and It’s Always Running
  • The Google May 2022 Broad Core Update – 5 micro-case studies that once again underscore the complexity of broad core algorithm updates
  • Amazing Search Experiments and New SERP Features In Google Land (2022 Edition)

Web Stories

  • Google’s December 2021 Product Reviews Update – Key Findings
  • Google’s April 2021 Product Reviews Update – Key Points For Site Owners and Affiliate Marketers
  • Google’s New Page Experience Signal
  • Google’s Disqus Indexing Bug
  • Learn more about Web Stories developed by Glenn Gabe

Archives

  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • GSQi Home
  • About Glenn Gabe
  • SEO Services
  • Blog
  • Contact GSQi
Copyright © 2023 G-Squared Interactive LLC. All Rights Reserved. | Privacy Policy
This website uses cookies to improve your experience. Are you ok with the site using cookies? You can opt-out at a later time if you wish. Cookie settings ACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience. You can read our privacy policy for more information.
Cookie Consent