Fixing A Google Images Indexing Problem Caused By Redirect Chains and Robots.txt Directives – Case Study

Glenn Gabe

google, seo, tools

Google has done a lot of work over the past two years with Google Images. And all signs point to Google driving forward with even more new features and functionality. Although sometimes an afterthought for many companies when it comes to Search, Google Images can drive meaningful traffic. In addition, having your images properly indexed can lead to stronger listings in Google’s Top Stories (if you’re a news publisher).

Therefore, if you have strong visual assets, and you know people are searching for those images, then it’s important to give your site the best shot possible to rank for relevant queries.

By the way, I attended the Google Webmaster Conference in Mountain View on Monday, November 4, and the presentations from the various Googlers were awesome. One “lightning round” covered Image Search and the various projects Google has been working on recently.

An important point covered in that presentation related to “Google moving from just showing images to providing much more context about the great content behind those images”.

Google Images lightning round from the Google Webmaster Conference on Nov 4, 2019
Source: Holly Miller Anderson on Twitter

That has manifested itself with a new design that includes titles and descriptions directly in image search, swipe up with AMP to visit a site, and more. In the past, only images would be displayed, so Google is definitely trying to help users find the content behind the images, which can translate to higher-quality image traffic to your site. So again, it’s important to rank for the images you want to rank for.

That’s a good segue to the case study I’m presenting in this post. :)

Case Study: When One “Top Stories” Listing Is Not Like The Others – The Image Advantage
I received an email one morning from a company with an interesting question. The news publisher was experiencing a weird problem with Google Images, including images in Top Stories. The publisher performed well in Google News, including Top Stories, but images would not show up in the Top Stories listing a good percentage of the time. Sometimes they did, but often they didn’t. And they were ultra-confused about why that was happening.

They explained that there were images in the articles and you could clearly visit those images and see them perfectly when visiting the CDN urls (like many companies, they were using a CDN to store and deliver their images). But when their articles ended up ranking in Top Stories (an extremely visible search feature that can drive a ton of traffic), images often did not accompany the article.

And to make matters worse, the other sites listed in the same Top Stories module did have images, which put this company at a disadvantage from a click-through rate standpoint. Here’s a mockup of what it looked like:

A mockup of what was happening in Top Stories. Images were not showing up.

So, I dug in to hopefully provide some answers. And like many other things in SEO, what’s invisible to the naked eye was causing serious (but logical) problems. Read on.

So what the heck was wrong? The land mine of extra CDN hops…
As I mentioned earlier, the company was using a CDN to house and deliver images. That’s totally fine for handling images and many sites employ CDNs for that very purpose. Even John Mueller has fielded that question a number of times and explained it’s totally fine to use CDNs for that.

But the devil is in the details. Sure, a CDN is fine, but you need to follow the path that Googlebot must follow in order to see if it’s truly fine. So I crawled part of the site and began checking the situation manually (via Chrome and several plugins). It wasn’t long before I could see the problem.

Tracking the path from the site to CDN revealed an extra hop along the way (via a redirect). And that extra hop was extremely problematic. It was through another CDN subdomain… which had me immediately checking the robots.txt file on that subdomain. And low and behold, the urls were blocked by robots.txt.  

So the image urls sent Googlebot to a CDN url, which then sent it to another CDN url (which was blocked by robots.txt), and then to the final CDN url where the image was housed. Since Googlebot couldn’t crawl the extra hop, it was never making its way to the image file. And that’s why images weren’t being indexed, ranking in Google Images, or showing in Top Stories. Not good, to say the least.

The path to the image files revealed an extra hop, which was blocked by robots.txt

To the naked eye (for non-SEOs), you could enter the image url and see that the image was displayed fine. But, the path to that image was problematic. Again, it didn’t take long to see the 301 to 301 to 200 redirect chain. And then checking the second 301 more closely, I cross referenced the robots.txt file for that domain, and boom, it was blocked.

The good news is that this was a relatively straightforward fix (although many of the image urls across a large and complex site needed to be refined). So, the company’s dev team dug in and worked on a better solution. Again, it’s a large and complex site, so no change is easy… But they clearly wanted to move quickly to get this done.  

Side note: It ends up not all images were set up this way across the site. That’s one of the reasons the company was confused about why images weren’t showing up in Top Stories. Some of the images were being indexed fine in Google Images. That’s when the extra hop wasn’t present. This made it harder for the company to debug. For example, they saw some images being indexed fine, and showing up in Top Stories, but many weren’t.

The issue was sort of sinister that way… almost cloaking the bad results with some good.  That said, many images in the latest news articles were having problems in image search (and Top Stories). So that led them to start asking the right questions, which led to the root problem that I covered above.

A fix was implemented and… image traffic surged!
The dev team worked on a better solution and removed that extra hop. Now Googlebot is able to crawl and index the images efficiently and associate those images with the pages at hand. I was eager to see how Google would handle the change. And I was pretty excited with the results.

It’s like Google was just waiting for the fix… Google image search traffic surged and has continued to increase over time. You can clearly see trending for image search surge after the fix was put into place in April. Those metrics continue to move up and to the right. Here are some screenshots of the surge over time:

Since the change, an extra 334K clicks have been driven from image search alone. Unfortunately, it’s hard to gauge the changes in traffic from Top Stories, since there’s no way to specifically report on that in GSC across desktop and mobile, but articles are showing up with images now versus just text (which should be helping with click-through rate).  

For Top Stories, this change levels the playing field for my client. The other articles showing up from other news organizations often had images associated with them, which can greatly help with click-through rate. Now the company’s Top Stories listings also have visuals. It’s great see.

Driving Google Image Performance – Closing Tips
Google Images can definitely drive meaningful traffic, and images can be an important part of Top Stories (which can also drive a lot of traffic for news publishers). As I mentioned earlier, Google has made a serious effort to enhance image search over the past two years or so, and I’m sure more is coming on that front. So, it’s important to make sure you are providing a clear path for crawling and indexing your image content (so that content can end up ranking in Search).

Here are some final bullets containing important image search tips:

  • In order to rank in Google Images, Google needs an image and landing page combination. Images alone will not suffice. Google’s John Mueller has covered this several times.
  • Don’t block Googlebot from crawling your images, whether they are hosted on your own site, on a subdomain, or housed on a CDN. And make sure you follow the full path to those images, which could include extra hops that might be blocked by robots.txt.
  • Crawling your site on a regular basis using tools like DeepCrawl, Screaming Frog, and Sitebulb can help you surface redirect chains and various files blocked by robots.txt. And they do this in bulk. When you combine your toolset with your brainset, good things can happen. :)
  • Check the actual search results and use tools to view screenshots of SERP history to ensure your listings and images look ok. For example, perform actual queries that are yielding Top Stories and view your listings. You can also use tools like SEMrush to view historical screenshots of the SERPs, which can also help you understand how your listings looked over time. All of this can help you understand what’s going on, but can also provide data to stakeholders when a problem exists. As they say, a good image is worth a thousand words. Well, a good SEO screenshot is worth a thousand votes when you need changes implemented.
  • Unfortunately, sitemap reporting in GSC does not contain the number of images indexed anymore. John Mueller once said that’s probably a good thing to have back, so maybe we’ll see that at some point in the future. For now, you really don’t have a solid idea of how many images are being indexed or the problems that could be causing indexing issues. Maybe we’ll see a new sub-report in GSC’s coverage reporting for this in the future… who knows? (Yes, that’s a hint for GSC product managers and engineers!) :)

Summary – Let your images be seen (and crawled and indexed properly)
In closing, a redirect chain combined with a robots.txt directive was causing major problems with image search (and Top Stories) for the news publisher that contacted me. With Google’s focus on improving image search, you don’t want to be left out in the cold if you provide barriers to Googlebot when crawling and indexing your image content.

Sometimes the answer lies in the path to those images… so make sure you double check redirects, redirect chains, and if those urls are accessible to Googlebot.  There just might be some obstacles in Google’s way. And that’s typically not a good thing.