simonw
5 months ago
I'm a bit suspicious of this report - they don't reveal nearly enough about their methodology for me to evaluate how credible this is.
When it says "The 10 leading AI tools repeated false information on topics in the news more than one third of the time — 35 percent — in August 2025, up from 18 percent in August 2024" - 35% of what?
Their previous 2024 report refused to even distinguish between different tools - mixing the results from Gemini and ChatGPT and Perplexity and suchlike into a single score.
This year they thankfully dropped that policy. But they still talk about "ChatGPT" without clarifying if their results were against GPT-4o or o3 or GPT-5.
hydrox24
5 months ago
I posted this because I thought HN would find it interesting, and agree that the methodology is a little thin on the ground. Having said that, they have another page (a little hard to find) on the methodology here[0] and a methodology FAQ page here[1].
Basically it seems to be an "ongoing" report done ten claims per month as they identify new "false narratives" in their database, and they use a mix of three prompt types against the various AI products (I say that rather than models because Perplexity and others are in there). The three prompt types are innocent, assuming the falsehood is true, and intentionally trying to prompt a false response.
Unfortunately their "False Claim Fingerprints" database looks like it's a commercial product, so the details of the contents of that probably won't get released.
[0]: https://www.newsguardtech.com/ai-false-claims-monitor-method...
[1]: https://www.newsguardtech.com/frequently-asked-questions-abo...
Lerc
5 months ago
Are they doing it on current events? Because by the very nature of 'current' you don't have a repeatable experiment that you can do a year apart.
simonw
5 months ago
There are so many legitimate questions about how one would design a benchmark of this type.
I don't feel like they're answering those questions.
mallowdram
5 months ago
News narratives are neither random, nor specific, they are arbitrary. There is nothing really accurate about any narrative. The idea we rely on the news for things other than immediate survival is somewhat bizarre. In effect, AI's role is make narratives even more arbitrary and force us to develop a format that replaces them, and by nature, is unable to be automated at the same time.
We should welcome AI into the system in order to destroy it and then recognize AI is purely for entertainment purposes.
“Flawed stories of the past shape our views of the world and our expectations for the future. Narrative fallacies arise inevitably from our continuous attempt to make sense of the world. The explanatory stories that people find compelling are simple; are concrete rather than abstract; assign a larger role to talent, stupidity, and intentions than to luck; and focus on a few striking events that happened rather than on the countless events that failed to happen. Any recent salient event is a candidate to become the kernel of a causal narrative.” Daniel Kahnemann Thinking Fast and Slow
“The same science that reveals why we view the world through the lens of narrative also shows that the lens not only distorts what we see but is the source of illusions we can neither shake nor even correct for…all narratives are wrong, uncovering what bedevils all narrative is crucial for the future of humanity.” Alex Rosenberg How History Gets Things Wrong: The Neuroscience of Our Addiction to Stories 2018
bmitc
5 months ago
Just go on Facebook. Greater than 90% of ads and linked articles are AI generated falsehoods.
jrflowers
5 months ago
>35% of what?
Well it says 35% of the time so I would guess that they’re talking about the number of incidents in a given time frame.
For example if you asked me what color the sky is ten times and I said “carrot” four times, you could say that my answer is “carrot” 40% of the time
Lerc
5 months ago
And if you said azure, cerulean, or black.
Are they correct answers or not?
simonw
5 months ago
But what's an "incident"?
There is an enormous difference between "35% of times a user asks a question about news" and "35% of the time against our deliberately hand-picked collection of test questions that we have not published".
jrflowers
5 months ago
>what's an "incident"?
Those are valid questions.
I simply found the phrasing “[quote of a sentence saying 35% of the time]. 35% of what?” to be funny because you would either have to not read the sentence you pasted or be an English speaker that does not understand what “% of the time” means.
I personally didn’t download the study linked in the article. It is interesting that they (must have, I’m assuming) did not include anything about their methodology in this study since they usually do with other things they publish.
simonw
5 months ago
"... of the time" is ambiguous if you don't describe what "the time" is referring to.
jrflowers
5 months ago
This is a good point. We know that the periods of time that they conducted their tests must be finite because there is a year between them. We also know that the tests happened in August
>Their non-response rates fell from 31 percent in August 2024 to 0 percent in August 2025
Was it a day or a week or a month? This matters greatly, and you can tell that I’m not deliberately misunderstanding simple phrases that are not difficult to understand because