The Internet Marketing Driver

  • GSQi Home
  • About Glenn Gabe
  • SEO Services
    • Algorithm Update Recovery
    • Technical SEO Audits
    • Website Redesigns and Site Migrations
    • SEO Training
  • Blog
    • Web Stories
  • Contact GSQi

Archives for November 2022

Percent Human: A list of tools for detecting lower-quality AI content

November 9, 2022 By Glenn Gabe Leave a Comment

AI content

Updated on 2/1/23: OpenAI’s AI content detection tool (AI Text Classifier) was added to the list. GPTZeroX was also included (an upgrade to GPTZero).
Updated on 1/10/23: GPTZero was added to the list of AI content detection tools (created by a Princeton University senior).
Updated on 12/29/22: Content at Scale’s AI content detection tool was added.
Updated on 12/14/22: Writer’s AI content detection tool has been updated to detect GPT-3, GPT 3.5, and ChatGPT.
Updated on 12/13/22: Originality.ai was added to the list of AI content detection tools.

———-

As I’ve been sharing examples of sites getting pummeled by the Helpful Content Update (HCU) or the October Spam Update, I’ve also been sharing screenshots from tools that detect AI content (since some sites getting hit are using AI to pump out a lot of lower-quality content – among other things they were doing that could get them in trouble). And based on those screenshots, many people have been asking me which tools I’m using.

So, instead of answering that question a million times (seriously, it might be a million), I figured I would write a quick post listing the top tools I have come across. Then I can just quickly point people to this post versus answering the question over and over.

And note, I’m not saying these tools are foolproof. I have just found them to be pretty darn good at detecting lower-quality AI content. And that’s what we should be trying to detect by the way (not all AI content… but just low-quality AI content that could potentially get a site in trouble SEO-wise).

For example, here is high-quality human content run through a tool:

Detecting human content

And here is an example of lower-quality AI content run through a tool:

Detecting lower quality AI content

Again, it’s not foolproof, but can give you a quick feel for if AI was used to generate the content. Below, I’ll cover my favorite AI content detectors I’ve come across so far. I’ll also keep adding to this list so feel free to ping me on Twitter if you have a tool that’s great at detecting lower-quality AI content!

Here is a list of tools covered in this post for detecting AI content:

  1. Writer’s AI content detector tool.
  2. Huggingface GPT-2 Output Detector Demo.
  3. Giant Language Model Test Room (GLTR).
  4. Originality.ai (AI content and plagiarism detection)
  5. Content at Scale’s AI content detection tool.
  6. GPTZeroX
  7. OpenAI’s AI Text Classifier

1. Writer’s AI content detector tool:
The first tool I’ll cover is from a company that has an AI writing platform (sort of ironic, but does make sense). Also, it seems like the platform is more for assisting writers from what I can see. You can check out their site for more information about the platform. Well, they also have a nifty AI content detector that works very well. You have probably seen my screenshots from the tool several times on Twitter and LinkedIn. :)

Update: 12/14/22 – While I was testing content created via GPT 3.5 and ChatGPT, I noticed that Writer’s detection tool was accurately detecting the content as created by AI. That was a change, since the tool was originally focused on GPT-2, so I quickly reached out to Writer’s CEO for more information. And I was correct! Writer’s AI content detection tool has been updated to detect GPT 3, GPT 3.5, and ChatGPT. So it’s now the second tool on the list that can achieve that.

AI content is progressing, but so are the tools. Below are some examples of using Writer’s AI content detection tool.

Here is Writer’s tool detecting higher-quality human content:

Writer's AI content detection tool measuring high quality human content

And here is Writer’s tool detecting content created via GPT-3.5 (using davinci-003, which is the latest model as of 12/14/22):

Writer's AI content detection tool accurately detecting content created via GPT-3.5 (using davinci-003)

2. Huggingface GPT-2 Output Detector Demo:
If you’re not familiar with Huggingface, it’s one of the top communities and platforms for machine learning. You can check out their site for more information about what they do. Well, they also have a helpful AI content detector tool. Just paste some text and see what it returns. I have found it to be pretty good for detecting lower-quality AI content. 

For example, here is Huggingface’s tool detecting higher quality human content:

Huggingface's AI content detection tool measuring high quality human content

And here is Huggingface’s tool detecting lower-quality AI content:

Writer's AI content detection tool measuring lower quality ai content

3. Giant Language Model Test Room (GLTR.io)
The third tool I’ll cover was actually down recently, but I had heard good things about it from several people (when it was working). It ends up there was a server issue and the tool was hanging. Well, the GLTR is back online now and I’ve been testing it to see how well it detects AI content.

The tool was developed by Hendrik Strobelt, Sebastian Gerhmann, and Alexander Rush from the MIT-IBM Watson AI Lab and Harvard NLP. It’s definitely not as intuitive as the first tools I covered, but once you get the hang of it, it can definitely be helpful.

How it works:
You can paste text into the tool and view a visual representation of the analysis, along with several histograms providing statistics about the text. I think most people will focus on the visual representation to get a feel for how likely each word would be the predicted word based on the word to its left. And that can help you identify if a text was written by AI or by a human. Again, nothing is foolproof, but it can be helpful (and I’ve found the tool does work well). To learn more about GLTR and how it works, you can read the detailed introduction on the site.

For example, if a word is highlighted in green, it’s in the top 10 of most likely predicted words based on the word to its left. Yellow highlighting indicates it’s in the top 100 predictions, red in the top 1,000, and the rest would be highlighted in purple (even less unlikely to be predicted).

The fraction of red and purple words (unlikely predictions) increases when the text was written by a human. If you see a lot of green and yellow highlighting, then it can indicate the text contains many predicted words based on the language model (signaling the text could have been written by AI).

Here are two examples. The first shows AI content (many words highlighted in green and yellow). This text was generated via GPT-2.

Giant Language Model Test Room (GLTR) analysis of AI generated content.

And here is an example from one of my articles about broad core updates. Notice there are many words highlighted in red, and several purple words as well (signaling this is human-written text).

Giant Language Model Test Room (GLTR) analysis of human-written content.

4. Originality.ai (for detecting GPT 3, GPT 3.5, and ChatGPT)
I was able to test Originality.ai recently and I’ve been extremely impressed with their platform. The CEO emailed me and explained they were one of the few tools to be able to detect GPT-3, GPT 3.5 and ChatGPT (as of December 13, 2022). Needless to say, I was excited to jump in and test out its AI content detection tool. Also, it’s worth noting that the tool can detect plagiarism as well (which is an added benefit). They have also released a Chrome extension and they have an API for handling requests in bulk. I’ll cover more about the Chrome extension below.

So, I fired up OpenAI and selected text-davinci-003 (the latest model as of 12/13/22) and started generating essays, short articles, how-tos, and more. I also used ChatGPT to generate a number of examples I could test.

And when testing those examples in Originality.ai’s detection tool, it picked up the work as AI every time. Again, I was extremely impressed with the solution.

For example, here was a short essay based on GPT 3.5:

Originality.ai AI content detection tool

And here was a how-to containing several paragraphs and then a bulleted list of steps. I also checked for plagiarism:

Originality.ai AI and plagiarism detection tool

It’s not a free tool, so you will need to sign up and pay for credits. That said, it’s been a solid solution based on my testing. Note, they are providing a coupon code (BeOriginal) that gets you 50% off your first 2000 credits. One credit scans 100 words according to the site.

Originality.ai Chrome Extension:
I mentioned earlier that Originality.ai has both a Chrome extension and an API. The Chrome extension enables you to highlight text on a page in Chrome and quickly check to see if it was written by AI. You must log in and use the credits you have purchased, so it’s not free. It works very well based on my testing so far.

For example, here is an article created via Automated Insights. By highlighting the article text, right clicking, and selecting Originality.ai in the menu, you can check to see if the content was created by AI.

AI content created via Automated Insights
Originality.ai Chrome extension detecting AI content

5. Content at Scale
Next up is an AI content detector tool from Content at Scale. Like Writer, they provide a platform for AI content generation that uses an interesting approach. You can read more about the platform on their site. But, like Writer, they also have an AI content detection tool. You can include up to 2,500 characters and the tool will analyze the text and determine if it’s AI content or human content. And like Originality.ai and Writer, it can detect GPT-3, 3.5, and ChatGPT.

For example, here is the tool detecting AI content generated by ChatGPT (a short essay):

Content at Scale's AI content detector detecting AI-generated content.

And here is the tool detecting content from one of Barry’s blog posts as human:

Content at Scale's AI content detector detecting human-generated content.

6: GPTZeroX
Next up is a new AI content detection tool created by a Princeton University student! And it’s causing quite the buzz. I’ve read a number of articles across major publications about Edward Tian and his tool called GPTZero, which works to detect if content was written by ChatGPT.

{Update: 2/1} GPTZeroX was just released and can highlight which parts of the text being tested is AI-generated. It’s more granular with its detection, which was a top feature that Edward Tian heard from educators.

Beyond that, Edward explains that “GPTZeroX also supports larger text inputs, multiple .txt, word, and pdf file uploading, and lightning-fast processing speeds.” There is also an API now that can handle high-volume requests.

Here is GPTZeroX detecting a part of text as AI-generated:

GPTZeroX from Edward Tian

With GPTZero, Edward’s approach is interesting, since it uses “perplexity” and “burstiness” in writing to detect if a human or AI wrote the content. “Perplexity” aims to measure the complexity of the content being tested, or what Edward explains as the “randomness of text”. And “burstiness” aims to measure the uniformity of the sentences being tested. For example, Edward explains that “human written language exhibits non-common items appearing in random clusters.” Humans tend to write with more burstiness, while AI tends to be more consistent and uniform.

I’ve been testing the tool over the past few days, and it has worked well (and has been pretty accurate). The site has definitely had some growing pains since launching, since I’m sure Edward didn’t think the tool would become so popular that quickly, but site performance has improved greatly recently. Also, the homepage now explains he is creating a “tailored solution for educators”. I’m eager to hear more about that, but for now, you can add GPTZero as yet another tool in your AI detection arsenal. I think you’ll like it.

For example, here is the tool measuring “perplexity” and “burstiness” of content (based on an essay written by ChatGPT):

And here is the final result accurately detecting AI content written by ChatGPT:

7. OpenAI’s AI Text Classifier
Well, this was an interesting development! OpenAI, the creator of ChatGPT, just released its own AI content detection tool. And as you would guess, it can detect when a piece of content was written by ChatGPT (like several other tools in my post). Based on my testing, it works well (when taking direct output from ChatGPT and testing it). Like other AI content detection tools, it’s not foolproof, but does seem to catch a number of examples of AI content that I tested.

Note, it requires a minimum of 1,000 characters of input and provides one of five responses:

  • Very unlikely to be AI-generated.
  • Unlikely to be AI-generated.
  • Unclear if it is AI written.
  • Possibly AI-generated.
  • Likely AI-generated.

Here is a quick example based on an essay I created via ChatGPT. As you can see, it’s accurately being detected as AI content:

OpenAI's content detector called AI Text Classifier accurately detecting AI content.

And here is an example of OpenAI’s tool accurately detecting a blog post of mine as human:

OpenAI's content detector called AI Text Classifier accurately detecting human content.

Summary: Although not foolproof, tools can be helpful for detecting AI content.
Again, I’ve received a ton of questions about which tools I’ve been using to detect lower-quality AI content, so I decided to write this quick post versus answering that question over and over. I hope you find these tools helpful in your own projects. And again, if you know of other tools that I should try out, feel free to ping me on Twitter!

GG

Filed Under: google, seo, tools

True Destination – Demystifying the confusing, but often accurate, true destination url for redirects in Google Search Console’s coverage reporting

November 3, 2022 By Glenn Gabe Leave a Comment

If you are confused when Google reports redirects as other categories, like “blocked by robots.txt”, “soft 404s”, “noindexed”, “404s”, and others, it could be Google silently following the redirect and reporting the status of the true destination url instead. My post covers the situation in detail, and provides examples of this happening in the wild.

While heavily analyzing websites from an SEO standpoint, you will undoubtedly find yourself deep in Google Search Console (GSC) reporting. GSC contains a boatload of data directly from Google and can help site owners and SEOs surface key insights. That said, it’s important to understand the nuances involved with GSC reporting, and how Google determines the information it provides in those reports. Having a clear understanding of what the data is showing is important when taking action to improve SEO.

And there’s no better example of GSC data confusion than the dreaded true destination url for redirects in GSC’s index coverage reporting (and URL inspection tool). I have received so many questions about this from clients that I decided to write this post so I can just point people here versus explaining it again and again.

So, join me on a GSC adventure where we uncover the secrets of the true destination url. Some of you might already know this, but I know some do not. And for those that don’t, this will all make sense very soon. You might not be happy with how this is working, but at least you’ll understand why urls are categorized in certain ways in GSC (and via the URL inspection tool).

What is the dreaded true destination url situation in GSC for redirects?
When viewing the indexing status in GSC of urls that are being redirected, Google reports on the true destination url (even if that url is outside of your own site). For example, if you redirect a url to another url, and that url is not indexable for some reason, GSC will silently follow the redirect and report on the final destination’s status. And that can be super confusing for site owners and SEOs that don’t know this is happening.

Yes, that means you can see urls showing up as “blocked by robots.txt”, “noindexed”, “soft 404”, “404”, and more (when the url you are inspecting is actually redirecting). As you can imagine, many site owners are left confused when they see “blocked by robots.txt” when they know 100% that a url is redirecting.

Google’s John Mueller has been asked about this many times, and he has replied with what I explained above (and does admit it can be a bit confusing). Also, Barry wrote a post covering how this happens with the URL inspection tool based on John’s comments. Even though this has been documented, I find it’s still a very confusing situation for many site owners and SEOs (which is why I’m writing this post).

Here is a tweet of mine with a link to John explaining how Google silently follows redirects (and how that shows up in GSC):

Right, that's why I said "reminder". :) John has explained this before in webmaster hangout videos. For example, here is one from 2019 where he explains how Google silently follows redirects for the URL inspection tool (and it's what shows up in Coverage): https://t.co/XG0aGNPOSW

— Glenn Gabe (@glenngabe) January 5, 2021

Now that you know this is happening, you might be wondering what this actually looks like in GSC. I’ll cover that next with examples of this happening in the wild.

Examples of Google silently following redirects and reporting the true destination url status in GSC:
Below, I’ll provide examples with screenshots of Google reporting on the true destination urls versus the redirect. Again, this is when the final destination urls are not indexable for some reason.

Blocked by robots.txt:
The url is redirected outside the site to a url that is blocked by robots.txt. Google reports the redirecting url as being “blocked by robots.txt” since the final destination is actually disallowed.

A twist on blocked by robots.txt:
This url redirects first to a tracking url, which is blocked by robots.txt. The final destination is not blocked, but Google can’t follow the first redirect to find the final destination url since it’s disallowed. It just knows that first url in the chain is blocked and reports that in GSC. Below, you can see the second step shows the url is actually blocked by robots.txt (and that’s what is reported in GSC).

Soft 404:
The url redirects to a page that’s a soft 404 (a product is unavailable). Google reports that the redirecting url is a soft 404 (since the true destination url is being seen as a soft 404).

Here is the page the url redirects to (with the product “currently unavailable”). Hence the soft 404:

Noindexed:
Yep, you guessed it. The url redirects to a page that’s noindexed. Google reports the url that is redirecting as noindexed in the coverage reporting:

Crawled, not indexed:
At first glance, you might think the redirect is being reported as “Crawled, not indexed”. Not true! It’s the final destination url that’s not being indexed by Google. Google is reporting “Crawled, not indexed” for the true destination url.

The final destination url is indeed not indexed:

404:
How can Google see a redirect as a 404? It doesn’t. It’s the true destination url that 404s and that’s what is reported in GSC.

404 with domain name change:
This is just a variation on the 404 situation to explain how this works when changing domain names. The url on the old domain redirects to a url on the new domain name, but the url was never migrated (it 404s). So Google reports that the redirecting url is a 404.

Sorry, more confusion with redirects:
When a url redirects to a page that resolves with a 200 header response code, and is indexed, the URL inspection tool reports accurately about the redirect (and says that initial url is a redirect and not indexed), but Google shows the canonical as the true destination url (where the redirect leads to). Talk about confusing, especially based on everything I explained above with the other examples where the redirecting urls are being reported as something different than a redirect…

A possible solution in GSC to clear up the confusion:
So, how can this be more intuitive? I think if GSC actually provided a message that it’s reporting on the true destination url, it could clear up the confusion for site owners and SEOs. Below, I have mocked up what this can look like in GSC. If Daniel Waisberg is reading (and I hope you are), then please add this!

Summary: Clearing up the confusion with redirects and destination url reporting.
I hope this post helped you understand how Google is silently following redirects and reporting on the true destination urls in GSC. I know it’s a confusing topic for many site owners and SEOs and I’m sure it has led to many head-scratching moments. Just keep in mind that as of now, GSC is reporting on the true destination urls when a url redirects. So don’t be surprised when you notice redirects in other categories in GSC’s coverage reporting (or when using the url inspection tool). And who knows, maybe the GSC product team will implement that message I mocked up above…

GG

Filed Under: google, seo, tools

Connect with Glenn Gabe today!

Latest Blog Posts

  • How to compare hourly sessions in Google Analytics 4 to track the impact from major Google algorithm updates (like broad core updates)
  • It’s all in the (site) name: 9 tips for troubleshooting why your site name isn’t showing up properly in the Google search results
  • Google Explore – The sneaky mobile content feed that’s displacing rankings in mobile search and could be eating clicks and impressions
  • Bing Chat in the Edge Sidebar – An AI companion that can summarize articles, provide additional information, and even generate new content as you browse the web
  • The Google “Code Red” That Triggered Thousands of “Code Reds” at Publishers: Bard, Bing Chat, And The Potential Impact of AI in the Search Results
  • Continuous Scroll And The GSC Void: Did The Launch Of Continuous Scroll In Google’s Desktop Search Results Impact Impressions And Clicks? [Study]
  • How to analyze the impact of continuous scroll in Google’s desktop search results using Analytics Edge and the GSC API
  • Percent Human: A list of tools for detecting lower-quality AI content
  • True Destination – Demystifying the confusing, but often accurate, true destination url for redirects in Google Search Console’s coverage reporting
  • Google’s September 2022 Broad Core Product Reviews Update (BCPRU) – The complexity and confusion when major algorithm updates overlap

Web Stories

  • Google’s December 2021 Product Reviews Update – Key Findings
  • Google’s April 2021 Product Reviews Update – Key Points For Site Owners and Affiliate Marketers
  • Google’s New Page Experience Signal
  • Google’s Disqus Indexing Bug
  • Learn more about Web Stories developed by Glenn Gabe

Archives

  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • GSQi Home
  • About Glenn Gabe
  • SEO Services
  • Blog
  • Contact GSQi
Copyright © 2023 G-Squared Interactive LLC. All Rights Reserved. | Privacy Policy
This website uses cookies to improve your experience. Are you ok with the site using cookies? You can opt-out at a later time if you wish. Cookie settings ACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience. You can read our privacy policy for more information.
Cookie Consent