If you’re using hreflang with pages written in the same language, but targeting different countries, you might be surprised to learn how Google can handle the situation (especially when multiple pages are being canonicalized to one). John Mueller confirmed this behavior in a recent webmaster hangout video, which is provided below. Read my post to learn more about this hreflang magic trick.
Based on helping a lot of companies deal with major algorithm updates, I’m often assisting companies that target users in multiple countries and languages. Those sites might be geotargeting directories or subdomains, or they might have multiple sites targeting different countries. And when that’s the case, I’m heavily involved with helping those sites with international SEO strategy, which often includes guiding the proper hreflang setup.
I have a relatively new client that has seen interesting swings during major algorithm updates across their large-scale site targeting multiple countries. While digging into the site via a crawl analysis and audit, I noticed that hreflang wasn’t set up correctly. And since the site targets over 40 different countries (and a number of languages), that could be a serious problem.
Note, hreflang can be a very confusing topic for site owners, developers, and even SEOs. I’ve come across many issues over the years while analyzing international sites. But this specific issue was clear. The setup just wasn’t correct. I’ll cover more about that shortly.
First, The Proper hreflang Setup
I first wanted to cover the proper setup, so we’re clear about the right way to implement hreflang tags. This is the Google-approved and recommended way to set up hreflang by the way. Let’s say you had a piece of content that’s been translated from English to Spanish and French. Therefore, you have three different versions of the content. You want to signal to Google that it’s the same content, but in different languages, so you decide to implement hreflang tags.
The cluster of pages should all contain hreflang tags (all three urls in the cluster). So, each page would contain an hreflang tag pointing to itself, and then contain the two other pages in the cluster. That set of hreflang tags should be copied to each page in the cluster. For example:
From a canonical standpoint, each page should contain a self-referencing canonical tag. You should NOT canonicalize those pages to one page (like the English page). The reason is simple. In order to surface those pages in the SERPs, Google needs to index them. If you canonicalize them, then those pages won’t be indexed. So that’s a confusing setup for Google. You can also leave out rel canonical if you want, but I don’t recommend doing that. I would always look to provide Google the strongest signals possible about which url is the canonical (without forcing Google to decide on its own).
When you provide the setup documented above, Google can swap out the English page for the Spanish or French page when that’s appropriate. For example, if someone that speaks Spanish searched for a query that would yield the English page, then Google could provide the Spanish page in the SERPs instead. Ditto for French. Again, it’s a smart way to go when you target multiple languages.
This all makes sense, but as with many things with Google, there are unique situations that arise where you are left confused… Just when you think you know the proper way to set something up, Google throws you for a loop. My new client’s setup did just that. I’ll cover their setup below.
The Anomaly (and my hreflang advice)
I mentioned earlier that a new client of mine has the incorrect hreflang setup. I picked this up while auditing the site and quickly reached out to them about the problems I uncovered. But that email led us down a rabbit hole, since they were already aware the setup was at risk. More about that soon.
The site provides content by country in various directories. Those directories are geotargeted via Google Search Console (GSC) and they use a directory structure by country. Those two points are important by the way, as you’ll learn about later. For example, /us/, /uk/, /es/, etc. The site is using hreflang to signal to Google when the same content appears in different languages (or for different countries), which is smart. By doing this, Google can surface the right content for the right user by language and country. As I mentioned earlier, this approach can be extremely effective.
But, and this is where my hreflang spidey-sense kicked in, there was a problem with the canonical setup. Instead of using self-referencing canonicals, which is the correct way to set up canonical tags across urls with hreflang tags, the site canonicalized multiple pages to the /us/ version. My client’s thought-process was to cut down on duplicate content (when there would be multiple pieces of content in English targeting multiple countries). For example, UK and IN urls are being canonicalized to US. They are not doing this when the languages are actually different (like Spanish, French, etc.)
My client explained that they knew this went against Google’s recommendations, but IT WAS WORKING. The correct urls were being surfaced for the right countries and languages even for the pages that were being canonicalized. And that made no sense to me at all!
When you canonicalize one url to another url, you are telling Google that the canonicalized url essentially contains the same content as the canonical url, and that Google shouldn’t index the canonicalized url. That’s how it works, so I was shocked to hear that those urls were being surfaced in the search results.
Google ignoring rel canonical?
My first thought was that Google was simply ignoring rel canonical, which it can. I wrote a post about how that can happen and you should check that out to learn more about how it works. Rel canonical is a hint, not a directive. So Google’s algorithms can believe the site owner made a mistake and simply ignore rel canonical completely.
But that wasn’t happening here. When checking many urls that were part of an hreflang cluster that were being canonicalized, you could see they weren’t being indexed (which is correct). Instead, the urls they were being canonicalized to were being indexed (again, which is correct).
So what the heck was going on here? My client was confused and just kept the setup in place. And I was left scratching my head. Did I find a bug in Google’s algorithms related to hreflang? Was this SEO magic? Or was this just a minor glitch in the Google Matrix? I needed to find out.
Seeking Clarity: Asking Google’s John Mueller
During the next webmaster hangout video, I asked John a question about what I was seeing. I was expecting him to explain that this shouldn’t happen, and that urls must be indexable in order to be surfaced in the SERPs when using hreflang – but that’s not what he said.
John explained that what I was witnessing could absolutely happen! He explained that Google can follow the hreflang tags even when they choose one version as the canonical url. He said it’s more common when the same language content is used across multiple countries (which is exactly what’s going on here!) Remember, the site has multiple pieces of content in English that target different countries.
John continued and said that Google understands the hreflang links between versions of the content, and that it’s the same language across countries. So, they will choose one page to index, but swap out the url in the SERPs to show the correct url by language/country combination. John also explained that other factors could cause this to happen, including the url structure, internal links, and more.
Therefore, Google can surface the right url in the SERPs even without the proper guidance (even if those urls are being canonicalized). John ended by saying that using the proper hreflang guidance (which would include the right canonical setup) increases your chances of having the desired outcome. I’m glad he added that, since I’ve come across quite a few botched hreflang setups while auditing sites.
Here is John’s response (at 38:17 in the video):
Summary – A fascinating SEO mystery, but my hreflang recommendation remains the same:
This was a fascinating case to me. You don’t see this type of behavior in the SERPs often, so I was intrigued by what I was witnessing and analyzing. That said, I would not rely on Google always handling hreflang like this… Instead, I would use the proper hreflang setup, which includes self-referencing canonical tags. Then you can give yourself the best chance possible to have the right urls surface in the SERPs, based on language and country. And remember, this was only for urls containing the same language, but targeting different countries (versus multiple language/country combinations).
My client was thrilled to hear the news from John. They always knew their setup wasn’t technically correct, but they did see it was working correctly. Now they have an answer to the hreflang mystery I described above. It’s a great example of why John Mueller is ultra-valuable to the industry. Without John’s information about this, my client would continue to wonder if their setup would cause the wrong urls to surface in the SERPs. And I would continue to scratch my head as I point to the “right” hreflang setup over and over again.
Now we can just move forward and tackle other important items I’m surfacing based on my audit. And in an ever-changing Google world, that’s always a good thing.
Adios, au revoir, addio. :)