hackernews client

LLM-assisted writing in biomedical publications through excess vocabulary

103 pointsposted 17 hours ago

108 Comments

A_D_E_P_T

16 hours ago

Something you've gotta understand is that the majority of English-language scientific journal articles are written by authors who aren't native English speakers/writers.

> https://ncses.nsf.gov/pubs/nsb202333/publication-output-by-r...

In the past, just after they submit a poorly-written paper, the sleazeballs at Wiley/Elsevier/Springer would "encourage" said authors to employ their "Author Services" who would edit and rewrite their paper for them, and resubmit it for publication. This didn't come cheap.

Today, everybody just uses LLMs. LLMs are masterful translators of language and concepts. It's a win-win for everybody but Author Services.

Admittedly, word choice and sentence structure are much more limited. (This is not necessarily a bad thing. LLM-written scientific papers can be clearer, and are often more tidy, than human-written scientific papers.) I don't like seeing LLM-written text in journalism, literature, and other media, but I don't mind it so much in scientific literature -- if there's no fraud involved, that is.

perching_aix

15 hours ago

Sounds a bit scary from a correctness standpoint. I have to wonder if such paper authors have the necessary language skills to ascertain whether their LLM of choice didn't e.g. add in any nuance that isn't correct or intended.

I do think it's possible they do, I myself am a foreign speaker who would consider doing something like this, and I think I could notice if that happens. But then I think I'm also well beyond the level where this would be a necessity for me, rather than just a convenience and an additional polishing step.

diggan

15 hours ago

Wouldn't it be the same if they use the "Author Services" as with an LLM? Essentially "How do you evaluate something that might go above your head?" remains the same regardless of the tool/service used.

perching_aix

15 hours ago

I don't think modeling this as Author Services = good, LLM = bad (or vice versa) makes for a fruitful conversation, and that (modeling it as such) wasn't my intention either. I'd definitely expect Author Services to do a better job though, even just based on this much description.

diggan

15 hours ago

> I'd definitely expect Author Services to do a better job though, even just based on this much description

Yeah, me too, but the claim was if "paper authors have the necessary language skills to ascertain" the correctness of the translation. Regardless of how good the LLM or Author Services are, the problem remains the same.

perching_aix

15 hours ago

Right, that's true. Also scary in its own way. I'm not too big a fan of natural languages - something I voice quite frequently these days I notice :)

AlienRobot

15 hours ago

The main worry, in my opinion, is an issue of responsibility. If the author services makes a mistake, a person will be blamed for it. But when an LLM makes a mistake, people tend to shift the responsibility from the person that used the LLM to the LLM itself.

People don't simply want to shift the effort of a task away from themselves, they also want to shift away the responsibility for doing it wrong.

diggan

15 hours ago

> If the author services makes a mistake, a person will be blamed for it

Wouldn't the paper authors be "blamed" (held responsible) for it, rather than the "Author Services"? Ultimately, they're responsible for the quality of their paper.

Just like I'm responsible for making sure my taxes are paid correctly, even if I use an account, tax lawyer and financial advisor for the whole shebang. If anything is incorrect, I'll be held responsible.

AlienRobot

12 hours ago

Yes, but this will only happen after a mistake occurs.

What I mean is that if it's your responsibility to do something right, it acts a deterrent that motivates you to make sure of it.

LLM's allow people to defer this responsibility to the LLM, so they may avoid making sure of things because they let the LLM take upon their responsibility.

diggan

2 hours ago

If anything, wouldn't it be the opposite? When you offload something to a human, at least you can say "it was their fault!" and point to an actual name, or similar.

But you can't use a tool and then say "it was their fault!" and point to the screen, it's not another entity like the "Author Services" is.

mihaaly

15 hours ago

I believe both arguments assume honest contributors, which may not be completely true. I have no reference but vague recollection of doctored data, made up results (basically cheating) due to publication pressure or just dishonesty/lazyness plaguing the reliability of scientific publications. Some may even be happy of the twisted reality or halucinations producing novel sounding results advancing the scientific career similar to those occured in the past. Reproducibility crisis is their friend to get away with it for a long enough time to have the personal gain harvested.

perching_aix

15 hours ago

The reason I "assume honest contributors" is because assuming anything else produces trivial conclusions. Obviously someone who just wants to pump out something, anything, won't care about the accuracy of delivery, nor will anyone be impacted by the lack of it (as the paper was rubbish from the get-go). I just don't find that whole can of worms relevant here.

If I really think about it, I guess I can see it being relevant like how typesetting software is, for making a paper feel more serious than it really is. Not really the angle I was going for though.

dev_l1x_be

15 hours ago

Absolutely. I think, even for non scientific journals the LLMs have a net good effect on expressiveness and clarity when instructed like that.

leakycap

15 hours ago

> It's a win-win for everybody but Author Services.

Having work that can be edited by a human who understands the concepts –or at least the words— shows a great depth of usefulness to the work itself.

Running your rough draft in one language through an LLM so it appears, at surface glance, to be similar to a publication that a team of humans had been involved in doesn't actually provide value to the author or the reader. Now the author has a paper they cannot say they wrote (or indeed, say who wrote it) and the reader cannot assume that what they read was intended to mean what it now means in English.

If anything, people who write papers in another language to be published in English should be leaning more than ever on human editors and quality assurance measures, not slop machines that hallucinate with good grammar.

raincole

15 hours ago

In my (non-English-speaking) country, paying for an editing service before one submits a paper was very common. Now it's all AI edited.

That's not the interesting part though. The interesting part is that the biggest editing service provider here explicitly states it's okay for the clients to send the papers through AI first then buy their service for 'additional human editing.' They even offer cheaper price for papers that are already edited by AI.

The irony is not lost on me.

A_D_E_P_T

15 hours ago

I think you overestimate the extent of Author Services' involvement. It's near nil; there's no back-and-forth; you just send them your article and they send back a version with fewer grammatical/textual errors and correct formatting for the journal. It's a lot like an LLM, just slower and considerably more expensive.

Also, you severely underestimate the ability of the authors to edit and verify the correctness of their own work. They wrote it in the first place, after all. They know, better than anybody else, what's important to convey.

What's more, this editing step happens before peer review, so if the paper is junk or has glaring errors it'll probably be turned down by most respectable publications. Even the marginal publications won't publish stuff that's really bad.

leakycap

15 hours ago

You're pointing out issues with one departmental service provided by one publishing company. I'm pointing out issues with using any LLM.

If you don't like Author Services, use someone else. Involve a coauthor. This is not even a remotely hard problem to solve.

auggierose

3 hours ago

Yes, it is not hard at all to solve. Just use an LLM.

DontBreakAlex

15 hours ago

I find it quite insulting that you seem to think that non-native english speakers are incapable of reading the outputs of LLMs to asses if it still means what they intended to say.

leakycap

15 hours ago

Telling me you're insulted by my comment makes me question whether it is worth the time to reply. In the spirit of goodwill, let me provide some context that your emotional response might not have given you time to consider:

I work in a multilingual healthcare field and my output is often translated into different languages. Forms. Posters. Take-home advice following surgery. We provide all of this in every language where more than about 5% of customers speak that, so English, Vietnamese, Korean, Spanish, Tagalog and Mandarin.

In addition to English, I speak and read one of these other languages fluently and have since I was about 9 years old, but I don't live in the culture and don't understand the conveyed meaning of translated health-related phrases.

Do you think I use an LLM or an editor that does? No–because that would be silly and could convey information incorrectly to the audience who can only speak that language.

If you want to be quite insulted, turn on the news and get a realistic perspective on what is going on in the world. The people hurt by text going through LLMs is going to be those in extreme poverty and minorities subjected to machine generated translations without human review. You're fighting on a site where most of us would likely be on the same side of so many issues. Let's discuss and not make this facebook full of thoughtless responses.

perching_aix

15 hours ago

Why?

I don't have academic paper publishing peers with bad language skills, but I do have colleagues with bad language skills, and the misunderstandings and petty catfights they get themselves into over poorly worded sentences, missing linguistic cues, and misinterpretations, is utterly bonkers.

All otherwise perfectly smart capable people, they just happen to have this as a gap in their skillset. And no, they don't notice if transformative details get added in or left out.

leakycap

13 hours ago

> I do have colleagues with bad language skills, and the misunderstandings and petty catfights they get themselves into over poorly worded sentences, missing linguistic cues, and misinterpretations, is utterly bonkers.

Is this a widespread systemic issue within the organization, or do you work somewhere large enough that it is easy to find examples like this due to the number of people involved?

If it is the former, I would not want to work somewhere that people get into petty catfights over editing and have no abiity to write a sentence or understand linguistic cues. I don't remember working anywhere I would describe in the way you do in your second paragraph.

> And no, they don't notice if transformative details get added in or left out.

I guess I don't have to tell you not to select them as the people to review your work output?

Again, all the examples I'm reading make me think it would be beneficial for folks to include competent team members or external support for projects that will be published in a language you don't speak natively.

perching_aix

13 hours ago

> Is this a widespread systemic issue within the organization, or do you work somewhere large enough that it is easy to find examples like this due to the number of people involved?

Can't tell you for sure (would require me to have comprehensive knowledge of the language skills around the company). I do know a few folks with proper language skills, but they're a rarity (and I treasure them greatly). Could definitely be just my neck of the woods in the company being like this.

> If it is the former, I would not want to work somewhere [like that where] (...)

Yeah, it's not great. The way I solved this was by simply not caring and just talking to them in proper English, hammering them until they provide me (and each other) with enough cross-verifiable information that thus definitely cannot be wrong (or will be wrong in a very defendable way), with an additional serving of double-triple checking everything. Some are annoyed by this, others appreciate it. Such is life I suppose.

> I guess I don't have to tell you not to select them as the people to review your work output?

I don't really have a choice. I think you might misunderstand what it is that I deliver though. I work with cloud technologies, so while I do sometimes deliver technical writing, most of my output is configuration changes and code. When I speak of language barrier issues, that's about chat, email, and ticket communications. I think that's plenty bad enough to have these kinds of troubles in though, especially when it's managers who are having difficulties.

leakycap

12 hours ago

> I don't really have a choice.

When does your employment contract end?

perching_aix

12 hours ago

Not a fixed contract, so when either side terminates it (I understand the question was rhetorical). Where I live, opportunities are not so plentiful, though I am working on polishing up my CV to compensate. Benefits are decent though, can WFH all the time, so that's also a consideration. Most everywhere they're doing the silly hybrid presence thing now, which would suck (back and joint issues don't mesh too well with having to move around in the office and travel back and forth every (other) day) - maybe moreso than the linguistic landscape that at this point I'm fairly used to.

This hesitance to switch is definitely put to the test a lot these days though :)

user

12 hours ago

[deleted]

user

12 hours ago

[deleted]

stackbutterflow

15 hours ago

Especially because it's so much easier to understand text than to produce it. I can read difficult authors in a foreign language and understand perfectly but there's no way I could write like them.

leakycap

15 hours ago

This just tells me you don't work with multiple languages very often.

I have spoken a second language fluently since about 9. I produce work that is translated into that language regularly... by a translator.

Being able to read words does not means I understand the meaning they convey to a person who only speaks that language. These are scientific papers we're talking about, conveyed meaning is valuable and completely lost when a non-native speaker publishes machine generated output that the writer could not have written themselves.

jltsiren

12 hours ago

These are scientific papers we're talking about, typically written by non-native speakers to a primarily non-native audience. Scientific writing practices have evolved over a long time to convey meaning reliably between non-native speakers. You are supposed to write directly and avoid ambiguity. To rely on literal meaning and avoid idioms used by native speakers. To repeat yourself and summarize.

Based on what I've seen, LLMs can write scientific English just fine. Some struggle with the style, preferring big pretentious words and long vague sentences. Like in some caricature of academic writing. And sometimes they struggle with the nuances of the substance, but so do editors and translators who are not experts in the field.

Scientific communication is often difficult. Sometimes everyone uses slightly different terminology. Sometimes the necessary words just don't exist in any language. Sometimes a non-native speaker struggles with the language (but they are usually aware of it). And sometimes a native speaker fails to communicate (and keeps doing that for a while), because they are not used to international audiences. I don't know how much LLMs can help, but I don't see much harm in using them either.

stackbutterflow

15 hours ago

This just reminds me to never assumes someone's reality. I speak more than two languages. And I disagree.

Papers are read by all type of people, I don't know why you assume scientific papers which are almost all written in English are read solely by native English speakers.

People have been doing science in broken English, French, German, Arabic, Latin and more for as long there has been science to be made.

leakycap

15 hours ago

Sounds like you would be a perfect person to help clear up misunderstandings in texts being translated if your skills are as keen as you describe.

You mention being reminded not to assume someone else's reality by our conversation--I would encourage you to also be reminded of the common fallacy where people wildly overestimate their own abilities, especially when it comes to claiming to speak/read/write multiple languages with knowledge akin to a native speaker.

It is unfortunately very common to mislead oneself about abilities when you haven't had to rely on that skillset in a real environment.

I would venture that you don't regularly work with multiple languages in your work outputs, or you would have likely received feedback by now that could help provide understanding about the nuances of language and communication.

perching_aix

14 hours ago

If you disagree with the assertion that people generally have an easier time understanding language (correctly or not) than producing it, that's one thing and that's fine. But if you consider it a claim outright, and find it incorrect, then that's gonna need some beyond-anecdotal supporting evidence, or you should ask for some from the other side. Digging into each others' backgrounds is not this.

Keeping to anecdotals and opinions though, I only speak one foreign language sadly, that being English, but this effect is very familiar to me, and is also frequently demonstrated and echoed by my peers too. Even comes up with language loss, not just language learning. Goes hand-in-hand with reading, writing, listening, and speaking being very different areas of language ability too, the latter two being areas I'm personally weak in. That's already a disparity that a cursory read of your position says shouldn't exist (do correct me if I'm misinterpreting what your stance is though).

And all this is completely ignoring how even the native language output one produces can be just straight up wrong sometimes, and not faithful to intentions. I've been talking about it like there's a finish line too, but there really isn't. This is why things like mechanized math proofs are so useful. They are composed in formal languages rather than natural ones, enabling their evaluation to be automated (they are machine-checkable). No unintended semantics lost or added.

leakycap

13 hours ago

> If you disagree with the assertion that people generally have an easier time understanding language (correctly or not) than producing it, that's one thing and that's fine.

I disagree with the assertion that a person should rely on an LLM as part of their ability to publish in a language they don't understand well enough themselves to complete without involving a word machine.

> Digging into each others' backgrounds is not this.

I spoke from experience and it was then skewered by someone cosplaying the duolingo owl on the internet. You can take it up with them if you have an issue.

> And all this is completely ignoring how even the native language output one produces can be just straight up wrong sometimes, and not faithful to intentions.

How does the inability you point out of even a native speaker to clearly and effectively communicate sometimes not simply make it more obvious that a person less familiar with the language should involve a person who is?

perching_aix

13 hours ago

> How does the inability you point out (...) not simply make it more obvious that a person less familiar with the language should involve a person who is?

I think that's a perfectly obvious point that the person you were replying to, you, and me, are all on board with and have been throughout. Inviting their or your attention to this was not the purpose of that sentence.

> I spoke from experience

Great.

> and it was then skewered by someone cosplaying the duolingo owl on the internet. You can take it up with them if you have an issue.

But my issue was/is with you. I wanted you to stop engaging in the use of combative and emotionally charged language. I understand that you feel justified in doing so, but nevertheless, I'm asking you to please stop. It dilutes your points, and makes it significantly harder to engage them. I further don't think you guys were disagreeing nearly hard enough to justify it, but that's really not my place to say in the end.

> I disagree with the assertion that a person should rely on an LLM as part of their ability to publish in a language they don't understand well enough themselves to complete without involving a word machine.

Thanks for clarifying - it genuinely looked like you were disagreeing with what I mentioned too.

leakycap

13 hours ago

> But my issue was/is with you. I wanted you to stop engaging in the use of combative and emotionally charged language.

You seem very intelligent, I truly believe your time and energy would be better spent doing literally anything else than providing me feedback on my commenting etiquette. Please, I implore you to do more with your time that will provide value! You genuinely seem smart.

(See how that felt? That's the effectiveness of telling someone on the internet you want them to behave differently. It's really pointless.)

perching_aix

12 hours ago

> See how that felt? That's the effectiveness of telling someone on the internet you want them to behave differently. It's really pointless.

I mean, I think this was pretty alright? I appreciate the advice too, and even generally agree with it. This was just my extremely poor attempt at deescalation, because I thought it might work out nevertheless.

user

13 hours ago

[deleted]

einpoklum

15 hours ago

> LLMs are masterful translators of language and concepts

No, they are not. They will - speaking generally - try to produce what seems like a typical translation; which is often the same thing, but not always.

diggan

14 hours ago

They're pretty good at translation, to be honest. Transformers was initially invented/discovered in order to build something better for doing translations with Google Translate, so it's hardly a surprise they are pretty good at it.

pcrh

15 hours ago

As a researcher myself, I find it curious that so many of my peers are turning to LLMs to assist in writing.

Writing is a process of communicating one's data and thoughts, and it is the coherent crystallization of such thoughts that produces good scientific papers.

I can't imagine a scenario where an LLM would somehow be able to articulate the outcome of original research better than the researcher themselves. And if the researcher can't articulate their findings, it's dubious that they are worth communicating.

The exception would only be those who are not native English writers, in which case the LLM might improve grammar, etc.

gapan

15 hours ago

> The exception would only be those who are not native English writers, in which case the LLM might improve grammar, etc.

So, only just the vast majority of people in this world? I'm not sure I would call that "the exception".

pcrh

15 hours ago

An LLM would only be able to "polish" a non-native speaker's writing. Otherwise it is not conceivable that it could improve the essential nature of original research being communicated, this is as said original research will not be part of its training dataset.

Scientists who hope that using an LLM to help write a paper will improve the quality of their output are deluding themselves.

gapan

12 hours ago

> An LLM would only be able to "polish" a non-native speaker's writing.

Are you really suggesting that all native speakers write perfectly and their writing never needs any corrections whatsoever? I consider that far from the truth for any native speaker of any language and I would of never assumed that.

pcrh

6 hours ago

Everyone's writing can be improved. However, for native speakers, and especially if they have the level of education where they are producing original research, it would be a rare case that the writing was so bad that the science being communicated was undecipherable.

fallinditch

16 hours ago

I both like and dislike the writing style of LLMs.

I like the clarity and easy-to-read usability, but at the same time I find the stylistic tics and overusage of words and phrases often grates.

I wonder if this stylistic failure is something that will get fixed or if it's the inevitable consequence of LLMs.

randycupertino

12 hours ago

At work this week I clocked one of the chief cardiologists I work with using chat GPT for comparing literature around the ideal temperature and fixation time for finding amyloid deposits in skin biopsy samples. He is an arrogant, dry, humorless type who is always grumpy and known for being a pain in my ass and incredibly brilliant and suddenly I have an email from him laden with emojis before bullet points and "It's not just blah - it's blah" writing style. It was classic ChatGPT explanation style and I KNOW it's like that because sometimes I ask ChatGPT to explain confusing work concepts to me and it does the same exact layout with the same emojis before bulleted sections.

It actually made me enjoy him a little bit more because normally he is huge arrogant jerk so it gave me pleasure to see him have a human foibles of relying on chatgpt for his output.

fallinditch

11 hours ago

Haha great story! - interesting how ChatGPT appears to be softening or improving his communication style, an example perhaps of how LLMs can provide a kind of baseline exemplar for good communication, especially useful for the arrogant jerks.

leakycap

15 hours ago

You need to prompt the LLM to provide the style you want. I have sometimes drag a few of my rough notes into the prompt and tell it to write me an article explaining this topic in the style of a site like The Verge, Daring Fireball, NYT, the onion, etc...

I don't publish this kind of output or build on it, but it makes my own topic I need to review more interesting and makes me question it as if I was reading it on a site like those I mention.

Using this kind of output as your own work seems weird, given it is templating so hard on your references -- but I don't like reading the pollyanna GPT speak.

pona-a

15 hours ago

I think a lot of these are artifacts of RLHF and left-to-right thinking, and one might actually go away with diffusion LLMs. If anyone here spoke to Gemini Diffusion, does it have different speech patterns?

aDyslecticCrow

13 hours ago

First time I've heard about Gemini Diffusion. I thought it was only a matter of time before others would adopt it as soon as I saw the paper by inceptionlabs.ai.

Not sure I'm happy about LLMS getting better... but it feels like an obvious improvement in both computational efficiency and training method.

sandspar

15 hours ago

It might be because millions of people interact with the same entity and then swap notes. Someone points out "delve" and “it’s not just X, it’s Y,” and other people go, “Yeah, I see that too!”

It's the same thing with celebrity impressions. Everyone’s watched Jack Nicholson enough to sense his quirks without naming them. Then a comedian highlights those quirks, and we’re like, “Haha, that’s EXACTLY it!”

accrual

16 hours ago

I'm not a scientist or someone producing technical literature for others to consume, but I feel like I'd want my humanity to come through on important topics like this. I love using LLMs but there are some tasks (like writing comments or texting friends) that I refuse to use them with, otherwise I'm applying `normalizeText(myUniqueText)`.

I'm sure there are deadlines and other constraints that cause professionals to reach for LLMs when producing their work, just like school-age kids use them for homework so they have more time to do whatever they'd rather be doing, so I can understand.

perching_aix

15 hours ago

> I'm not (...) producing technical literature for others to consume, but I feel like I'd want my humanity to come through on important topics like this.

This will absolutely be just an opinion, but the kind of documentations I dislike the most are the ones that are (full of) arbitrarily structured prose. There's a time and place for self-expression and phrasing liberties; intricate, brittle, and especially long technical descriptions I don't think is one of them.

accrual

14 hours ago

Definitely, I think I wrote imprecisely when I used "humanity". I meant more like I personally wrote it, the sentence structure and grammar are mine, the mistakes are mine, and hopefully it's still clear and easy to understand.

Speaking of mixing technical text and style/prose, I feel this link from yesterday did a great job executing both (granted it is an article, not a paper):

https://jrsinclair.com/articles/2025/whats-the-difference-be...

perching_aix

14 hours ago

I think we're on the same page, it's just that I feel otherwise about this part:

> and hopefully it's still clear and easy to understand

The idea of leaving it up to fate whether it's clear and easy to understand what I write, and that the sentence structure and grammar mistakes aren't inhibiting or misleading understanding, terrifies me.

Now of course, it's not a terror a hearty serving of pressure, laziness, and overconfidence doesn't compensate for, so I usually just march on ahead and type what I have to type out nevertheless. But I do yearn for better.

Maybe the real deciding factor though is that I'm ashamed and insecure of my writing style rather than proud or appreciative of it, and that's why I'd rather cast it away, substituting it, than keep it. Hard to tell.

JuettnerDistrib

15 hours ago

I'm sure many researchers have English as a second language, and rely on LLM to fix their grammar and vocabulary.

leakycap

15 hours ago

Which is horrifying: if an ESL author must publish in English and therefore does not have a full grasp of nuances and meaning conveyed by English wording, they should involve an editor... not a word machine that doesn't understand, either.

Wording matters when conveying information, ESL speakers should be working with fellow humans when publishing in a language they do not feel comfortable writing on their own.

wanderingstan

15 hours ago

As with many areas, it’s easier to recognize “correct” than to generate “correct.” When I lived in Germany I would often use the early online translation tools to help refine my written German and it was useful to see how they corrected it, and it was usually a matter of “of course that’s the right way, I see it now!”

leakycap

15 hours ago

I think you're generalizing widely between your experience and the ability of researchers publishing these millions of papers, which as of 2024 at least ~13% were being LLMed.

Seems important to know. LLMs lie and mislead and change meaning and completely ignore my prompt regularly. If I'm just trying to get my work out the door and in the editing phase, I don't trust myself to catch errors introduced by and LLM in my own work.

There is a lack of awareness of the importance of the conveyed meaning in text, not just grammatical correctness. Involving people, not word machines, is the right thing to do when improving content for publication.

HappMacDonald

15 hours ago

Does the grant cover an editor's involvement? How much does the experiment need to be trimmed back and the sample size need to be reduced: how much data must be sacrificed to support that?

leakycap

15 hours ago

If no one on a given team fluently speaks, writes, and understands the cultural context of the required output language, that team will need to find a solution.

It should not be a word machine (which, I should point out, does not have a brain)

Solving this problem might just involve using some of the resources to support the output being correct in the required language. You can call that a "cost"

IncreasePosts

15 hours ago

Ok, but what if your humanity expresses itself best in a language that only 20M people speak with no major biomedical publications?

leakycap

15 hours ago

> your humanity expresses itself best

I don't think these papers are about human expression; they're about communicating data and learned information in a way that can be used by others.

You might be thinking of the field of art, which also has strong opinions about AI in their field.

Animats

16 hours ago

Amusingly, the article talks about "improving equity in science". The linked article, "AI tools can improve equity in science" has a link to Google Scholar.[1] The summary there begins "The global space industry is growing rapidly—the number of satellites in orbit is expected to increase from 9000 today to over 60,000 by 2030 . In addition, it is estimated that more than 100 trillion untracked pieces of old satellites ...". That's because Google Scholar hit a paywall and treated the following unrelated letter as the content.

So it looks like the authors used some tool to generate plausible citation links, which they did not read before publishing.

They did some real work. Here's their list of "excess style words", ones whose frequency has increased substantially since LLMs:

accentuates, acknowledges, acknowledging, addresses, adept, adhered, adhering, advancement, advancements, advancing, advocates, advocating, affirming, afflicted, aiding, akin, align, aligning, aligns, alongside, amidst, assessments, attains, attributed, augmenting, avenue, avenues, bolster, bolstered, bolstering, broader, burgeoning, capabilities, capitalizing, categorized, categorizes, categorizing, combating, commendable, compelling, complicates, complicating, comprehending, comprising, consequently, consolidates, contributing, conversely, correlating, crafted, crafting, culminating, customizing, delineates, delve, delved, delves, delving, demonstrating, dependability, dependable, detailing, detrimentally, diminishes, diminishing, discern, discerned, discernible, discerning, displaying, disrupts, distinctions, distinctive, elevate, elevates, elevating, elucidate, elucidates, elucidating, embracing, emerges, emphasises, emphasising, emphasize, emphasizes, emphasizing, employing, employs, empowers, emulating, emulation, enabling, encapsulates, encompass, encompassed, encompasses, encompassing, endeavors, endeavours, enduring, enhancements, enhances, ensuring, equipping, escalating, evaluates, evolving, exacerbating, examines, exceeding, excels, exceptional, exceptionally, exerting, exhibiting, exhibits, expedite, expediting, exploration, explores, facilitated, facilitates, facilitating, featuring, formidable, fostering, fosters, foundational, furnish, garnered, garnering, gauged, grappling, groundbreaking, groundwork, harness, harnesses, harnessing, heighten, heightened, hinder, hinges, hinting, hold, holds, illuminates, illuminating, imbalances, impacting, impede, impeding, imperative, impressive, inadequately, incorporates, incorporating, influencing, inherent, initially, innovative, inquiries, integrates, integrating, integration, interconnectedness, interplay, intricacies, intricate, intricately, introduces, invaluable, investigates, involves, juxtaposed, leverages, leveraging, maintaining, merges, methodologies, meticulous, meticulously, multifaceted, necessitate, necessitates, necessitating, necessity, notable, noteworthy, nuanced, nuances, offering, optimizing, orchestrating, outlines, overlook, overlooking, paving, persist, pinpoint, pinpointed, pinpointing, pioneering, pioneers, pivotal, poised, pose, posed, poses, posing, predominantly, preserving, pressing, promise, pronounced, propelling, realm, realms, recognizing, refine, refines, refining, remarkable, renowned, revealing, reveals, revolutionize, revolutionizing, revolves, scrutinize, scrutinized, scrutinizing, seamless, seamlessly, seeks, serves, serving, shaping, shedding, showcased, showcases, showcasing, signifying, solidify, spanned, spanning, spurred, stands, stemming, strategically, streamline, streamlined, streamlines, streamlining, struggle, substantiated, substantiates, surged, surmount, surpass, surpassed, surpasses, surpassing, swift, swiftly, thorough, transformative, typically, ultimately, uncharted, uncovering, underexplored, underscore, underscored, underscores, underscoring, unexplored, unlocking, unparalleled, unraveling, unveil, unveiled, unveiling, unveils, uphold, upholding, urging, utilizes, varying, versatility, warranting, yielding.

That's the vocabulary of business hype. Those are words that appear frequently in company press releases, and are copied into business publications by cut-and-paste journalists. LLMs asked to polish scientific material will use that vocabulary, even in scientific writing. If you trained an LLM on scientific publications, you'd get more scientific jargon, probably misused.

[1] https://scholar.google.com/scholar_lookup?title=AI+tools+can...

rawgabbit

15 hours ago

These words are what George Orwell ranted against in https://www.orwellfoundation.com/the-orwell-foundation/orwel....

Orwell said these words are vague and are used to obfuscate. Typically to defend the indefensible.

Animats

14 hours ago

Yes, that's well known. Orwell on language is an important subject too long to discuss here. His rules, though, are still useful:

i. Never use a metaphor, simile or other figure of speech which you are used to seeing in print.

ii. Never use a long word where a short one will do.

iii. If it is possible to cut a word out, always cut it out.

iv. Never use the passive where you can use the active.

v. Never use a foreign phrase, a scientific word or a jargon word if you can think of an everyday English equivalent.

vi. Break any of these rules sooner than say anything outright barbarous.

(Much of this came from Orwell's work during WWII, when he worked at the British Ministry of Information. Part of his job was translating news into 1000-word vocabulary Basic English for broadcast in the Colonies, mostly India. Hammering political speech down into Basic English requires forcing it into concrete terms, which is an act of political interpretation. That's where Newspeak, in 1984, came from. 1984 is semi-autobiographical, based on his WWII job. See "Orwell, The Lost Writings". One of his managers was called "Big Brother" behind his back.)

tempodox

15 hours ago

> That's the vocabulary of business hype.

And I don't think that's a coincidence either. “Do not trust any statistics you did not fake yourself”, do not trust an LLM you did not train yourself.

user

15 hours ago

[deleted]

foxglacier

14 hours ago

The actual DOI link also goes to that space article on science.org. It could be just that we as well as Google only sees that but the authors saw the real paper. Of course it's part of a big load of cite-spam so even if they did read it, they probably just cherry picked something - anything that seemed to support their vague low value claim.

Animats

14 hours ago

Right.

They discovered something real. LLMs turn scientific papers into business-speak, and that there's a lot of that going on. Then they tried to dress up that result for publication, ineptly. Doing so probably improved their chances of getting published.

The word list I posted above is from their additional data PDF file. The paper itself doesn't contain it. Once you see their word list, it's obvious to anyone who follows business news what's going on. Scientific papers are being run through a process that converts them to press releases.

That's a good result, made worse by running it through the process the paper derides.

userbinator

15 hours ago

"improving equity" as in "dumbing down everyone", indeed that's what LLMs have done.

leakycap

15 hours ago

Good tools can be used incorrectly.

I wouldn't want to go back to a world without LLMs, but I wish people were more thoughtful in how we use them. Right now, we're seeing AI being slapped on everything like every other trend in the past. When things settle after this upheaval, whether in 20 years or 100, I'd bet AI will still be here and be more useful than ever.

aerostable_slug

15 hours ago

I think the argument is the average person might be less useful than ever without the crutch of AI. A nation of useless dolts unable to compose a sentence or figure a simple sum in their head, dependent on their assistants & their personalized feed of slop.

Some of this is simple progress and fine, some of it is terrifying to comprehend.

leakycap

15 hours ago

I understand the line of thinking, I've considered it - but how is it different than every other "the sky is falling" fear-mongering about the future during previous revolutionary technologies?

People in general don't shoe horses anymore, but we also do not need to know how to. Same with crank engine cars. Same with manual transmissions in the USA. That's just the changes in personal transport in a few generations.

arbirk

15 hours ago

I always take issue with the saying the llms are biased. 1) because they are obviously biased as per their training and instructions 2) but are they more biased (at the moment) than google search and wikipedia?

They are tools and we all have to get used to them.

Soon grok and Deepseek will probably be biased when their owners crack the code, so citing what tools you use will be important.

derbOac

16 hours ago

Took me too long to catch the playful (or hinting?) word use in the title.

pcrh

15 hours ago

The post title isn't grammatically correct. It should be "caused a" or "shifts"

HappMacDonald

15 hours ago

This strikes me as the kind of language fingerprinting someone might use to look for things that I write: I tend to discard articles (a, the, things like that) when I'm in a hurry.

pcrh

15 hours ago

The post title has now been changed to be the same as the article's title.

leakycap

16 hours ago

I had downloaded the article and began taking notes and hadn't noticed until your comment. Oh dear.

mtkd

16 hours ago

I expect in a few days there will be a new tool launched that returns word frequency/velocity in recent biomedical papers ... so next year's PhDs can level things using an MCP function

lawlessone

16 hours ago

Will there? is someone working on it?

jeffbee

15 hours ago

Paper has an A+ title that really should have been submitted!

bello33595985

15 hours ago

I need to get full current network

sandspar

15 hours ago

File under "LLMs as Western soft power victory". Chinese AI's have Western liberal sensibilities, English achieves even deeper dominance in scientific publications etc.

leakycap

15 hours ago

It seems like calling a race the second the light turns green. If anything, recent events including AI show me that monumental shifts are more possible today than ever before.

em3rgent0rdr

16 hours ago

"Delves", "potential", and "significant" all made a large increase. Those words also strike me as the non-committal weasel words Google's AI overview for Google searches tends to produce as the AI isn't really entirely sure it can make a definitive conclusion.

userbinator

15 hours ago

"delves" is stereotypical LLM-ese to me. I don't think I've ever heard a real human use that word before LLMs became common.

sincerely

15 hours ago

Magic the Gathering players are intimately familiar with the term :)

OtherShrezzing

15 hours ago

I think it’s from YouTube. “With that, we can delve into [topic]” was part of the vocabulary for video essays for years. It always joined an introduction’s closing thesis statement with the rest of the video content.

ToValueFunfetti

15 hours ago

The dwarves of Moria are notable for having delved to excess.

jowea

15 hours ago

It seems it is common in Nigerian English, from where in ended up in AI datasets.

ummonk

15 hours ago

I’ve only ever seen it used in magic the gathering

SoftTalker

15 hours ago

I'd see it occasionally, usually in bullshit corporate writing replete with other buzzwords, which is probably where the LLMs picked up on it.

leakycap

16 hours ago

Fascinating to see that the word "important" dropped off in usage

Given the way generated text works, it makes sense it would less often highlight a directly important phrase or passage or detail, since it has no understanding of what it is writing.

bongodongobob

15 hours ago

Delves has a huge increase because it's a new thing in WoW. That's why you see a big spike in Google trends. If you see an article that claims "delves" increase is because of LLMs, you can safely ignore anything else it has to say.

tiahura

16 hours ago

Potential I’ll grant you, but how are significant and delves weasel words?

leakycap

15 hours ago

What does "significant" mean? It means whatever I say it means, it's fairly useless as anything to convey a fact without a definition.

If I needed to program a light sensor to signal when it detects "significant" change in lighting, what does that mean? If I don't define it, it means nothing.

"Delve" is a style word, another category highlighted in the article as a signature of generated text.

Leherenn

15 hours ago

I read it as "statistically significant", i.e. above noise/error level. Sounds pretty well defined to me, at least in a scientific paper.

leakycap

15 hours ago

The reason it is a weasel word is exactly this -- it is a word with an implied meaning, not an implicit one.

We aren't talking about "statistically significant" -- that carries meaning only if the statistic is explained, though.

redwall_hp

15 hours ago

Significant has a very specific meaning in statistics, which is going to come up in any biomedical research. https://en.wikipedia.org/wiki/Statistical_significance

leakycap

15 hours ago

Do you think an LLM takes this into account when it uses the word?

tiahura

14 hours ago

Is the context statistical? If so, then yes.

leakycap

14 hours ago

We clearly disagree, in that case - have you used LLMs?

The person I was replying to linked to a Wiki article on "statistical significance" which is not the word being used or discussed in this convo; did you note that before you replied?

You expect an AI to follow context, but the context switched here and none of the humans are seemingly pointing it out. I'd expect LESS of an LLM, not more.

user

15 hours ago

[deleted]

dmead

16 hours ago

If you're reading a paper you're already delving. It's clearly filler.

leakycap

16 hours ago

Generated text seems to work because so few people read and even fewer think as they read.

AI writing often creates a firehose output of sing-songy, long, over styled text -- which is a lot to ask a reader to get through and really comprehend.

I'm guilty of forming my own conclusion or just believing what I see when I'm in a hurry or unfamiliar or don't care a ton. I'd rather read a human's rough output than a polished generated text version.