simonw
12 hours ago
I'm a bit suspicious of this report - they don't reveal nearly enough about their methodology for me to evaluate how credible this is.
When it says "The 10 leading AI tools repeated false information on topics in the news more than one third of the time — 35 percent — in August 2025, up from 18 percent in August 2024" - 35% of what?
Their previous 2024 report refused to even distinguish between different tools - mixing the results from Gemini and ChatGPT and Perplexity and suchlike into a single score.
This year they thankfully dropped that policy. But they still talk about "ChatGPT" without clarifying if their results were against GPT-4o or o3 or GPT-5.
hydrox24
11 hours ago
I posted this because I thought HN would find it interesting, and agree that the methodology is a little thin on the ground. Having said that, they have another page (a little hard to find) on the methodology here[0] and a methodology FAQ page here[1].
Basically it seems to be an "ongoing" report done ten claims per month as they identify new "false narratives" in their database, and they use a mix of three prompt types against the various AI products (I say that rather than models because Perplexity and others are in there). The three prompt types are innocent, assuming the falsehood is true, and intentionally trying to prompt a false response.
Unfortunately their "False Claim Fingerprints" database looks like it's a commercial product, so the details of the contents of that probably won't get released.
[0]: https://www.newsguardtech.com/ai-false-claims-monitor-method...
[1]: https://www.newsguardtech.com/frequently-asked-questions-abo...
mallowdram
10 hours ago
News narratives are neither random, nor specific, they are arbitrary. There is nothing really accurate about any narrative. The idea we rely on the news for things other than immediate survival is somewhat bizarre. In effect, AI's role is make narratives even more arbitrary and force us to develop a format that replaces them, and by nature, is unable to be automated at the same time.
We should welcome AI into the system in order to destroy it and then recognize AI is purely for entertainment purposes.
“Flawed stories of the past shape our views of the world and our expectations for the future. Narrative fallacies arise inevitably from our continuous attempt to make sense of the world. The explanatory stories that people find compelling are simple; are concrete rather than abstract; assign a larger role to talent, stupidity, and intentions than to luck; and focus on a few striking events that happened rather than on the countless events that failed to happen. Any recent salient event is a candidate to become the kernel of a causal narrative.” Daniel Kahnemann Thinking Fast and Slow
“The same science that reveals why we view the world through the lens of narrative also shows that the lens not only distorts what we see but is the source of illusions we can neither shake nor even correct for…all narratives are wrong, uncovering what bedevils all narrative is crucial for the future of humanity.” Alex Rosenberg How History Gets Things Wrong: The Neuroscience of Our Addiction to Stories 2018
Lerc
12 hours ago
Are they doing it on current events? Because by the very nature of 'current' you don't have a repeatable experiment that you can do a year apart.
simonw
12 hours ago
There are so many legitimate questions about how one would design a benchmark of this type.
I don't feel like they're answering those questions.
bmitc
11 hours ago
Just go on Facebook. Greater than 90% of ads and linked articles are AI generated falsehoods.
jrflowers
12 hours ago
>35% of what?
Well it says 35% of the time so I would guess that they’re talking about the number of incidents in a given time frame.
For example if you asked me what color the sky is ten times and I said “carrot” four times, you could say that my answer is “carrot” 40% of the time
Lerc
12 hours ago
And if you said azure, cerulean, or black.
Are they correct answers or not?
simonw
10 hours ago
But what's an "incident"?
There is an enormous difference between "35% of times a user asks a question about news" and "35% of the time against our deliberately hand-picked collection of test questions that we have not published".
jrflowers
6 hours ago
>what's an "incident"?
Those are valid questions.
I simply found the phrasing “[quote of a sentence saying 35% of the time]. 35% of what?” to be funny because you would either have to not read the sentence you pasted or be an English speaker that does not understand what “% of the time” means.
I personally didn’t download the study linked in the article. It is interesting that they (must have, I’m assuming) did not include anything about their methodology in this study since they usually do with other things they publish.
simonw
3 hours ago
"... of the time" is ambiguous if you don't describe what "the time" is referring to.