Timsky
4 hours ago
> GPT-5 is proving useful as a literature review assistant
No, it does not. It only produces a highly convincing counterfeit. I am honestly happy for people who are satisfied with its output: life is way easier for them than for me. Obviously, the machine discriminates me personally. When I spend hours in the library looking for some engineering-related math made in the 70s-80s, as a last resort measure, I can try to play this gambling with chat, hoping for any tiny clue to answer my question. And then for the following hours, I am trying to understand what is wrong with the chat output. Most often, I experience the "it simply can't be" feeling, and I know I am not the only one having it.
cj
a minute ago
[delayed]
crazygringo
3 hours ago
In my experience doing literature super-deep-dives, it hallucinates sources about 50% of the time. (For higher-level literature surveys, it's maybe 5%.)
Of the other 50% that are real, it's often ~evenly split into sources I'm familiar with and sources I'm not.
So it's hugely useful in surfacing papers that I may very well never have found otherwise using e.g. Google Scholar. It's particularly useful in finding relevant work in parallel subfields -- e.g. if you work in physics but it turns out their are math results, or you work in political science and it turns out there are relevant findings from anthropology. And also just obscure stuff -- a random thesis that never got published or cited but the PDF is online and turns out to be relevant.
It doesn't matter if 75% of the results are not useful to me or hallucinated. Those only waste me minutes. The other 25% more than make up for it -- they're things I simply might never find otherwise.
andrewflnr
3 hours ago
So, the exact stuff Google used to be good at.
ramenbytes
an hour ago
The exact stuff I now use Kagi for. Finding obscure relevant PDFs that Google didn't is literally one of the things that made me switch.
georgemcbay
2 hours ago
Pretty much, though Google got bad at these things well before LLMs really came on to the scene, and we can all debate which project manager was responsible and the month and year things took a downward turn, but the IMO obvious catalyst was that "Barely Good Enough" search creates more ad impressions, especially when virtually all of the bad results you are serving are links to sites that also serve Google managed ads.
xiphias2
5 minutes ago
It was a very clear point: when Amit Singhal was kicked out for sexual harassment in the me too era. He was the heart of search quality but he went too far when he was drinking.
andrewflnr
2 hours ago
Oh, sure, Google was starting to take a dive almost a decade before LLMs came on the scene.
macrolime
3 hours ago
What is "it". Gpt-5 auto? Gpt-5 pro? Deep research? These have wildly different hallucination rates.
bathtub365
2 hours ago
If these rates are known it would be great for OpenAI to be open about them so customers can make an informed decision
Maxatar
2 minutes ago
OpenAI has published a great deal of information about hallucination rates, as have the other major LLM providers.
You can't just give one single global hallucination rate since the rates depend on the different use cases and despite the abundant amount of information available to people on how to pick the appropriate tool for a given task, it seems no one cares to take the time to actually first recognize that these LLMs are tools, and that you do need to learn how to use these tools in order to be productive with them.
malfist
an hour ago
"Known" implies that these rates are consistent and measurable. It seems to me, that this is highly unlikely to be the case
clbrmbr
an hour ago
OpenAI goes into great detail on hallucination rates of GPT5 models versus o3 in the GPT5 System Card [1], section 3.7.
scosman
2 hours ago
Saying it isn't useful is a bit of an overstatement. It can search, churn through 500k words in a few minutes, and come back with summaries, answers, and sources for each point.
Should you blindly trust the summary? No. Should you verify key claims by clicking through to the source? Yes. Is it still incredibly useful as a search tool and productivity booster? Absolutely.
scruple
2 hours ago
I gave it a PDF recently and asked it to help me generate some tables based on the information there in. I thought I'd be saving myself time. I spent easily twice as long as I would have if it I had done it myself. It kept making trivial mistakes, misunderstanding what was in the PDF, hallucinating, etc.
Timsky
2 hours ago
It is excellent when just finding something is enough. Most often in my practice, I am dealing with questions that have no written-down answers, meaning the probability of finding a book/article that provides one is negligible. Instead, I am looking for indirect answers or proofs before I make a final engineering decision. Yet another problem is that the language itself changes over time. For instance, at the beginning of the 20th century, the integers were called integral numbers. IMHO, LLMs poorly handle such cases when considered as a substitute for search engines. For full-text vector search, I am using https://www.recoll.org/ a real time saver for me, especially for desktop search.
happy_dog1
2 hours ago
I wonder whether for a lot of the search & literature review-type use-cases where people are trying to use GPT-5 and similar we'd honestly be much better off with a really powerful semantic search engine? Any time you ask a chatbot to summarize the literature for you or answer your question, there's a risk it will hallucinate and give you an unreliable answer. Using LLM-generated embeddings for documents to retrieve the nearest match, by contrast, doesn't run any risk of hallucination and might be a powerful way to retrieve things that Google / Bing etc. wouldn't be able to find using their current algorithms.
I don't know if something like this already exists and I'm just not aware of it to be fair.
Timsky
42 minutes ago
I think you have a very good point here: a semantic search would be the best option for such a search. The items would have unique identifiers so the language variations can be avoided. But unfortunately, I am not aware of any of these kinds of publicly available projects, except DBpedia and some biology-oriented ontologies that would massively analyze scientific reports.
Currently, I am applying RDF/OWL to describe some factual information and contradictions in the scientific literature. On an amateur level. Thus I do it mostly manually. The GPT-discourse somehow brings up not only the human-related perception problems, such as cognitive biases, but also truly philosophical questions of epistemology that should be resolved beforehand. LLM developers cannot solve this because it is not under their control. They can only choose what to learn from. For instance, when we consider a scientific text, it is not an absolute truth but rather a carefully verified and reviewed opinion that is based on the previous authorized opinions and subject to change in the future. So the same author may have various opinions over time. More recent opinions are not necessarily more "truthful" ones. Now imagine a corresponding RDF triple (subject-predicate-object tuple) that describes that. Pretty heavy thing, and no NLTK can decide for us what the truth is and what is not.
andai
2 hours ago
There's this principle, I forget the name, but how everyone when reading the newspaper, when they read on a subject they're familiar with, will instantly spot all the holes, all the errors. And they will ask themselves, how was this even published in the first place?
But then they flip to the next page and they read a story on a subject they're not an expert on and they just accept all of it without question.
I think people might have a similar relationship with ChatGPT.
malshe
28 minutes ago
I think its scope is narrower than a lit review assistant. I use it mainly for finding papers that I or my RAs might have missed in our lit review.
I have a recent example where it helped me locate a highly relevant paper for my research. It was from an obscure journal and wouldn't show up in the first few pages of Google Scholar search. The paper was real and recently published.
However, using LLMs for doing lit review has been fraught with peril. LLMs often misinterpret the research findings or extrapolate them to make incorrect inferences.
glenstein
3 hours ago
Struggling to understand this one. Is it that (1) it's lopsided toward reference materials found on the modern internet and not as useful for reviewing literature from the Before Times or (2) it's offering specific solutions but you're skeptical of them?
kianN
2 hours ago
If you’re interested in a literature review tool, I built a public one for some friends in grad school that uses hierarchical mixture models to organize bulk searches and citation networks.
Example: https://platform.sturdystatistics.com/deepdive?search_type=e...
Timsky
31 minutes ago
Thank you for sharing! I like your dendrogram-like circular graphs! They are way more intuitive. That could be a nice companion for a bibliometrix/biblioshiny library for bibliometric analysis https://www.bibliometrix.org/. I tried "Deep Dive" with my own request, and ... it unfortunately stops at the end of "Organizing results". Maybe I should try again later.
kianN
23 minutes ago
Haha that’s embarrassing! The progress bars are an estimate. If a paper has a lot of citations, it may take a bit longer than the duration of the bars but it will hopefully finish relatively soon!
Edit: Got home and checked the error logs. There was a very long search query with no results. Bug on my end to not return an error in that case.