How do you spot text generated by LLM/AI

restertest · 19 September 2024 07:33

Since we are testers we tend to spot pattern and issues even when we do not want to .

So i was wondering how do you spot a text that is generated by an LLM/AI.
Please do not say I use a scanner.

For me the tells are usually these 3

Using the prashe “In the world of”
Using words that one does not usually use in a day to day conversation like delve
The post is marked with emojis all the way.

What about you?

keerthivaddi · 19 September 2024 18:25

@restertest AI generally uses too many adverbs and adjectives to make its point.

kristof · 20 September 2024 04:38

I actually do these 2 myself xD
Damn I might be a robot!

restertest · 20 September 2024 04:55

I knew it @kristof .

That explains it how you are so good at security.

You talk to the computer in robot languge.

Got you

ujjwal.singh · 21 September 2024 03:19

It appears very polished

what I have noticed is that in the ChatGPT response, the semicolon is inserted between lines, so wherever I noticed a semicolon between lengthy lines, I understood that it is copy pasted from ChatGPT, because usually, we use a comma or full stop in real.

Very complex words are visible which we usually don’t use in day-to-day life.

Instead of “are”, the apostrophe re is visible. like we’re, you’re,

mistercwood · 23 September 2024 01:54

Two main things stand out, I don’t know 100% how to describe it, but it’s super obvious when I see it. The paragraphs are broken up almost like fleshed out dot points? Like each one was a brief thought that has been expanded upon. And the last one is always a clear conclusion paragraph - much of the time even starting with the phrase “In conclusion, xyz”. It reminds me of high school essay writing, very formulaic.

kristof · 23 September 2024 05:52

I think it’s because people write like they talk and like LLM systems write to polished and doesn’t feel like somebody talking.

shad0wpuppet · 1 October 2024 13:08

Why?
Some posts/texts/messages have this artificial vibe and sometimes I don’t even think about why exactly and I don’t really care if they are AI-generated or not.
Another point is that many of the characteristics mentioned above are just features of people for whom English isn’t their native language but they have quite a good level of studying it using books and other learning materials, passing exams rather than using it naturally in conversations with native speakers. People from some regions have quite a sophisticated language but again they don’t use it naturally so sometimes I can say that it sounds like a person from a particular country (won’t mention the country because it might be stereotypic).
Additionally, using some tools to correct mistakes and improve readability like Grammarly may lead to adding this AI scent to the text. Is it okay for you?

I’m writing this because there might be lots of different cases (AI might be used in various ways even to develop unique ideas in certain ways which won’t make the text completely AI-generated but it may look like so) and what it may seem at the first glance doesn’t mean a lot, even if it’s completely AI-generated. Better concentrate on the content, ideas, and usefulness, and if you need to determine for sure if a text was AI generated (e.g. maybe you’re a teacher) then use particular tools and methods not just your intuition.

PS: Was this text AI-generated? Or AI-enhanced? Does it sound natural to you?

darth_piriteze · 3 October 2024 21:31

As one of those rare people who tries to keep the semicolon alive, I think you might be right.

c32hedge · 7 February 2025 20:18

I agree with a lot of the “tells” listed here, but disagree with others. I also see people hold up some various grammar feature that they never use (e.g. em dashes) as “the” way to tell a post is AI-generated. I tend to avoid relying on a single “tell” and use them as heuristics. Some of mine:

“In the [ever-[changing/evolving]/dynamic] world of ”… (actually, this one is pretty reliable on its own).
Certain uncommon words/phrases like “delve”, “game-changer”.
Extreme overuse of similes. Sometimes metaphors too, but LLMs way overdo it on similes for things that should be obvious without one.
Frequent use of the sentence structure “It’s not about ; it’s about ”. Personally, when I write I tend to make contrasts by putting the thing I am talking about first, otherwise it feels like the other thing steals the thunder a bit. E.g. my catch phrase “testing is activities, not artifacts”.
Waffling both-sides-ism–not making a strong point in strong favor of anything. Carries extra weight if it’s prefaced with something like “spicy take ”
Amazingly consistent sentence lengths.
Particularly in replies (e.g. LinkedIn comments), essentially just rephrasing/summarizing the original content without adding any new thought or insight. Often paired with the word “indeed” near the start.
Extreme wordiness/word salad.
Never quite coming to the point.
General structure, particularly in posts, consisting of an introductory paragraph or two, a bulleted list with emojis for bullets, a summary paragraph saying basically the same thing at the end, and ending with a bland, generic hook question.
Sounding a whole lot like a whole bunch of other people. No personality or style.
“Botsplaining” - related to several of the above but LLMs often just can’t help spelling out the most obvious of points well after they’ve already generically beaten the point to death. I can’t remember some of the common phrases, but something like “this is important because” or “it is important to consider”, etc.
As a grammar stickler, I’d say proper grammar and spelling are not an LLM tell, but if a post is littered with problems in those areas, it may be a good indicator that it was not AI-generated.
Extremely vague “personal” anecdotes. I actually broke down a prime example of this here. (Note that I used a very specific personal example! )

mirekdlugosz · 8 February 2025 13:00

Apart from overused phrases and certain stylistic choices, one thing I would stress out is:
AI generated content is shallow. There’s no insight, no knowledge behind it, it does not speak of real experiences.

It’s not necessarily wrong. Very often it is more or less correct, insofar as any writing may be correct. But there’s this veil of correctness, and there’s nothing behind it. It’s just the empty shell.

It’s also not about being vague. There are many reasons to be vague or to hide some details. But when you interact with a work that is purposefully vague, you can see the hints there was something more, but was omitted. While AI generated thing does not leave such hints, because there was never anything more.

It gives very similar feeling to SEO presell texts. If you know anything about the subject that is covered, you quickly realize that author has only theoretical knowledge of what they are talking about. They aren’t wrong and they might use the phrases or concepts correctly, but it’s all completely abstract for them.

Unfortunately, there are way too many articles written by real humans that show these exact properties.

mistercwood · 11 February 2025 01:20

You both raise very strong points that I think lock down what bugs me the most about LLM responses - the ambivalence, and the emptiness.

hananurrehman · 11 February 2025 07:41

A more in-depth analysis could be done by trying to engage the author in a meaningful discussion. If they change their style or don’t respond to any jibes or jokes, there’s your tell

rosie · 11 February 2025 14:10

We lean on GPTZero to help inform us if text is written by AI. We do this as part of our article submission review process.

c32hedge · 20 February 2025 07:02

Amazingly, many I’ve asked just double down and their comments are also often clearly AI-generated. It’s almost as if thinking just isn’t very popular.

sebastianclavijo · 20 February 2025 17:07

I think there is a a big difference between a text written by an AI VS a text written by a content creator from scratch and use an AI just to correct the syntax (but the content, structure and style of the narrative are still the creators).
In this case the AI is just a tool or an assistance) like asking a collaborator to have a look.
To me the creativity in an article it’s the essence, the vibe, the meaningful content without unnecessary embellishments, that so far a AI cannot nail down (at least not yet).

sebastianclavijo · 20 February 2025 17:18

@rosie , does that tool able to distinguish when something is AI written/generated VS AI/reviewed?

mariem_safi · 22 February 2025 14:01

I completly agree with you.
As a non native speaker, I tend to write my thoughts in an academic way which seems unnatural.
I also use AI to enhance my expressions and/or my grammar if I’m not sure how to say things.

komalgc · 22 February 2025 14:10

No Grammar mistakes
Tone is very polished
Oftern repeats same concepts, just that its reworded

Also u can check out Hugging Face’s Detect AI

Topic		Replies	Views
How good are you at spotting AI content? Discussions tools , learning , career-development	1	76	16 October 2024
🤖 Day 28: Build your own AI Tools 30 Days of Testing tools , 30-days-of-testing , ai , 30-days-of-ai-in-testing	23	1292	1 July 2024
Have you tried any of these LLM-as-a-Judge tools? Discussions tools , risks , llms , llm-as-a-judge , evaluation	2	209	24 June 2025
Prompting for Testers - Setting the Right Context for Your Prompts Activity MoT Content Discussions ai , prompt-engineering , mot-ondemand-pft	3	208	27 September 2025
Day 28: Define quality for LLMs 30 Days of Testing ai , 30-days-of-testbash	2	114	14 October 2024

How do you spot text generated by LLM/AI

Related topics