hackernews client

Updated practice for review articles and position papers in ArXiv CS category

486 pointsposted 2 days ago

(blog.arxiv.org)

229 Comments

efitz

a day ago

There is a general problem with rewarding people for the volume of stuff they create, rather than the quality.

If you incentivize researchers to publish papers, individuals will find ways to game the system, meeting the minimum quality bar, while taking the least effort to create the most papers and thereby receive the greatest reward.

Similarly, if you reward content creators based on views, you will get view maximization behaviors. If you reward ad placement based on impressions, you will see gaming for impressions.

Bad metrics or bad rewards cause bad behavior.

We see this over and over because the reward issuers are designing systems to optimize for their upstream metrics.

Put differently, the online world is optimized for algorithms, not humans.

noobermin

a day ago

Sure, just as long as we don't blame LLMs.

Blame people, bad actors, systems of incentives, the gods, the devils, but never broach the fault of LLMs and their wide spread abuse.

miki123211

a day ago

LLMs are tools that make it easier to hack incentives, but you still need a person to decide that they'll use an LLM t do so.

Blaming LLMs is unproductive. They are not going anywhere (especially since open source LLMs are so good.)

If we want to achieve real change, we need to accept that they exist, understand how that changes the scientific landscape and our options to go from here.

noobermin

20 hours ago

everyone keeps claiming "they're here to stay" as if it's gospel. this constant drumbeat is rather tiresome and without much hard evidence.

LunaSea

15 hours ago

Genuinely curious, did we ever manage to ban a piece of technology worldwide and effectively?

oscaracso

8 hours ago

A large part of geopolitics is concerned with limiting the spread of weapons of mass destruction worldwide and to the greatest possible degree of efficacy. Moreover, the investment to train state-of-the-art models is greater than the Manhattan project and involves larger and more complex supply chains-- it cannot be done clandestinely. Because the scope of the project is large and resource-intensive there are not many bodies that would have to cooperate in order to place impassable obstacles on the path that is presently being taken. 'What if they won't cooperate toward this goal?' -- Worth considering, but the fact is that they can and are choosing not to. If the choice is there it is not an inevitability but a decision.

andrybak

14 hours ago

Do chlorofluorocarbons (CFCs) mostly banned by the Montreal Protocol count?

tbrownaw

9 hours ago

And lead in gasoline, and probably quite a few other things where we found a way to get similar end results with fewer annoying side effects.

gus_massa

15 hours ago

If they go away, it's because they have been replaced by something better(worse) like LLLM or LLMM or whatever.

I'old enough to remember when GAN where going to be used to scam millions of people and flood social media with fake profiles.

latentsea

16 hours ago

What evidence do you need exactly?

I think such statements are likely projections of people's own unwillingness to part with such tools given their own personal perceived utility.

I, for one, wouldn't give up LLMs. Too useful to me personally. So, I will always seek them out.

Alex2037

9 hours ago

your spiritual predecessors campaigned against electricity, radio, audio recordings, TV, computers, video games, CGI, the internet, cellphones, smartphones, and perhaps a myriad other things.

https://upload.wikimedia.org/wikipedia/commons/8/85/The_Unre...

https://www.smithsonianmag.com/history/musicians-wage-war-ag...

etc.

but yes, of course, this time it's going to be different, because unlike those boomers, you and your internet friends are on the right side of history.

>without much hard evidence.

1. China. China has the tech, the talent, and the hardware. you could (you can't), for example, equate LLMs to CSAM in the West to make it absolutely verbotten, but China wouldn't give a shit, and 93% of the world would use Chinese tech, dismissing your dollar store Butlerian Jihad as yet another bout of America's schizophrenia.

2. it's been less than 3 years since ChatGPT release, and it now has 800 million active weekly users. and it's not even available in China and Russia, where Deepseek and other Chinese models easily add another 200-300 million users. no other technology had such explosive proliferation before. good luck convincing all these people who now use it every day to give it up because... because it's bad, mkay?

3. unlike the previous one, the current US administration - which will remain in power for at least three more years - is not hostile to this technology. there will be no regoolations, no moratoriums, and no matter how utterly detached from reality the next administration might end up being, in three years it will be too late to do anything about it (even more so than now).

4. trillion dollar corporations have collectively invested hundreds of billions into this technology. oh, they would love some regulations to hamstring their competitors, but if you try to step on their toes, well, good luck.

5. local models are already good enough to be perpetually useful. what the fuck are you going to do, order door-to-door seizure of fully semi-automatic GPUs?

cyco130

a day ago

LLMs are not people. We can’t blame them.

wvenable

a day ago

What would be the point of blaming LLMs? What would that accomplish? What does it even mean to blame LLMs?

LLMs are not submitting these papers on their own, people are. As far as I'm concerned, whatever blame exists rests on those people and the system that rewards them.

jsrozner

a day ago

Perhaps what is meant is "blame the development of LLMs." We don't "blame guns" for shootings, but certainly with reduced access to guns, shootings would be fewer.

nandomrumber

a day ago

Guns have absolutely nothing to do with access to guns.

Guns are entirely inert objects, devoid of either free will nor volition, they have no rights and no responsibilities.

LLMs likewise.

nsagent

a day ago

  To every man is given the key to the gates of heaven. The same key opens the gates of hell.

-Richard Feynman

https://www.goodreads.com/quotes/421467-to-every-man-is-give...

https://calteches.library.caltech.edu/1575/1/Science.pdf

xandrius

a day ago

I blame keyboards, without them there wouldn't be these problems.

anonym29

a day ago

This was a problem before LLMs and it would remain a problem if you could magically make all of them disappear.

LLMs are not the root of the problem here.

RobotToaster

a day ago

See Goodhart's law: "When a measure becomes a target, it ceases to be a good measure"

hammock

a day ago

> There is a general problem with rewarding people for the volume of stuff they create, rather than the quality. If you incentivize researchers to publish papers, individuals will find ways to game the system,

I heard someone say something similar about the “homeless industrial complex” on a podcast recently. I think it was San Francisco that pays NGOs funds for homeless aid based on how many homeless people they serve. So the incentive is to keep as many homeless around as possible, for as long as possible.

djeastm

a day ago

I don't really buy it. Are we to believe they go out of their way to keep people homeless? Does the same logic apply to doctors keeping people sick?

ssivark

a day ago

ICYMI, this drew a lot of attention a few years ago.

https://www.cnbc.com/2018/04/11/goldman-asks-is-curing-patie...

SOLAR_FIELDS

11 hours ago

This could literally be an Onion headline

alfalfasprout

a day ago

It's a metric attribution problem. The real metric should be reduction in homeless, for example (though even that can be gamed through bussing them out, etc-- tactics that unfortunately other cities have adopted). But attributing that to a single NGO is tough.

Ditto for views, etc. Really what you care about as eg; youtube is conversions for the products that are advertised. Not impressions. But there's an attribution problem there.

wizzwizz4

12 hours ago

Define the metric as "people helped": then bussing them out to abandon them somewhere else isn't a solution, because the adjudicators can go "yes, you made the number go down, but you did so by decoupling the metric from what it was supposed to measure, so we're not rewarding you for it".

SOLAR_FIELDS

11 hours ago

My spouse works in the homelessness field and the correct metric to follow is number of homeless given housing. It’s the “housing first” approach. Harder to game counting amount of people directly placed into homes - someone is paying rent and maintaining a trackable occupied space that you can verify that the client is actually utilizing - and this approach cannot be gamed by “bus them somewhere else”

What many people don’t realize is just how many normal life hurdles are significantly easier to overcome with a stable housing environment, even if the client is willing and available to work. Employment, for example, has several precursors that you need. Often you need an address. You need an ID. For that you need a birth certificate. To get the birth certificate you need to have the resources and know how to contact the correct agency. All of these things are much harder to achieve without a stable housing environment for the client.

wizzwizz4

7 hours ago

"Number of homeless given housing" is only the correct measure due to the nature of the domain-specific problem. I'm wary of this strategy in general, because the people responsible for deciding how things are accounted for are rarely experts enough to identify sensible domain-specific metrics, so they'll have to consult experts. But that creates a vulnerable point of significant interest to would-be grifters, and if they're not experts enough to assess expert consensus, you end up with metrics that don't work, baked in.

But yes, if we're only looking at homelessness, "how many formerly-homeless people have been given housing?" is a very good way to measure successful interventions.

xhkkffbf

11 hours ago

And then some will wander back closing the loop and preserving jobs.

watwut

17 hours ago

Yeah, it is totally NGO that creates homelessness /s

godelski

a day ago

  > rewarding people for the volume ... rather than the quality.

I suspect this is a major part of the appeal of LLMs themselves. They produce lines very fast so it appears as if work is being done fast. But that's very hard to know because number of lines is actually a zero signal in code quality or even a commit. Which it's a bit insane already that we use number of lines and commits as measures in the first place. They're trivial to hack. You even just reward that annoying dude who keeps changing the file so the diff is the entire file and not the 3 lines they edited...

I've been thinking we're living in "Goodhart's Hell". Where metric hacking has become the intent. That we've decided metrics are all that matter and are perfectly aligned with our goals.

But hey, who am I to critique. I'm just a math nerd. I don't run a multi trillion dollar business that lays off tons of workers because the current ones are so productive due to AI that they created one of the largest outages in history of their platform (and you don't even know which of the two I'm referencing!). Maybe when I run a multi trillion dollar business I'll have the right to an opinion about data.

slashdave

a day ago

I think you will discover that few organizations use the size or number of edits as a metric of effort. Instead, you might be judged by some measure of productivity (such as resolving issues). Fortunately, language agents are actually useful at coding, when applied judiciously.

godelski

6 hours ago

Yet it's common enough we see. You also bring up a 10x engineer joke. There's two types of 10x engineers: those that do 10x the work and those who solve 10x the jira tickets but are the cause of 100x of them.

The point is that people metric hack and very bureaucratic structures tend to incentivize metric hacking, not dissuade them. See Pournelle's Iron Law of Bureaucracy.

  > Fortunately, language agents are actually useful at coding, when applied judiciously.

I'm not sure this is in doubt by anyone. By definition it really must be true. The problem is that they're not being used judiciously but haphazardly. The problem is people in large organizations are more concerned with politics than the product they make.

If you cannot see how quality is decreasing then I'm not sure what to tell you. Yes, there are metrics where it's getting better but at the same time user frustration is increasing. AWS and Azure just had recent major outages. Cloudstrike took down lots of the world's network over an avoidable mistake. Microsoft is fumbling the windows upgrade. Apple intelligence was a disaster. YouTube search is beyond infuriating. Google search is so bad we turn to LLMs now. These are major issues and obvious. We don't even have the time to talk about the million minor issues like YouTube captions covering captions embedded in the video, which is not a majorly complicated problem to solve with AI and they're instead pushing AI upscale that is getting a lot of backlash.

So you can claim things are being used judiciously all you want, but I'm not convinced when looking at the results. I'm not happy that every device I use is buggy as shit and simultaneously getting harder to fix myself.

canjobear

7 hours ago

Who is getting rewarded for uploading tons of stuff to the arXiv?

pwlm

a day ago

What would a system that rewards people for quality rather than volume look like?

How would an online world that is optimized for humans, not algorithms, look like?

Should content creators get paid?

pjdesno

11 hours ago

> What would a system that rewards people for quality rather than volume look like?

Hiring and tenure review based on a candidate’s selected 5 best papers.

Already standard practice at a few enlightened places, I think. (of course this also probably increases the review workload for top venues)

To a lesser extent, bean-counting metrics like citations and h-index are an attempt to quantify non-volume-based metrics. (for non-academics, h-index is the largest N such that your N-th most cited paper has >= N citations)

Note that most approaches like this have evolved to counter “salami-slicing”, where you divide your work into “minimum publishable units”. LLMs are a different threat - from my selfish point of view, one of the biggest risks is that it takes less time to write a bogus paper with an LLM than it does for a single reviewer to review it. That threatens to upend the entire peer reviewing process.

drnick1

a day ago

> Should content creators get paid?

I don't think so. Youtube was a better place when it was just amateurs posting random shit.

vladms

a day ago

> Should content creators get paid?

Everybody "creates content" (like me when I take a picture of beautiful sunset).

There is no such thing as "quality". There is quality for me and quality for you. That is part of the problem, we can't just relate to some external, predefined scale. We (the sum of people) are the approximate, chaotic, inefficient scale.

Be my guest to propose a "perfect system", but - just in case there is no such system - we should make sure each of us "rewards" what we find of quality (being people or content creators), and hope it will prevail. Seemed to have worked so far.

MangoToupe

20 hours ago

Crazily, I think the easiest way is to remove any and all incentives, awards, finite funding, and allegedly merit-based positions. Allow anyone who wants to research to research. Natural recognition of peers seems to be the only way to my thinking. Of course this relies on a post-scarcity society so short of actually achieving communism we'll likely never see it happen.

js8

18 hours ago

You don't need postscarcity to do that. I was born in communist Czechoslovakia (my father was an academic). Government allocated jobs for academics and researchers, and they pretty much had tenure. So you could coast by being unproductive, or get by using your connections to the party members (the real currency in CSSR).

After 1989, most academics complained the system is not merit-based and practical (applied) enough. So we changed it to grants and publications metrics (modeled after the West). For a while, it worked.. until people found too much overbearing bureaucracy and some learned how to game the system again.

I would say, both systems have failure modes of a similar magnitude, although the first one is probably less hoops and less stress on each individual. (During communism, academia - if you could get there, especially technical sciences - was an oasis of freedom.)

epolanski

18 hours ago

The prize in science is being cited/quoted, not publishing.

Sure, publishing on important papers has its weight, but not as much as getting cited.

PeterStuer

18 hours ago

That might be the "prize" but the "bar" is most certainly in publish or perisch to work your way up the early academic carreer ladder. Every conference or workshop attendance needs a paper, regardless of wether you had any breakthrough. And early metrics are most often quantity based (at least 4 accepted journal articles), not citation based.

kjkjadksj

a day ago

I think many with this opinion actually misunderstand. Slop will not save your scientific career. Really it is not about papers but securing grant funding by writing compelling proposals, and delivering on the research outlined in these proposals.

porcoda

a day ago

Ideally that is true. I do see the volume-over-quality phenomenon with some early career folks who are trying to expand their CVs. It varies by subfield though. While grant metrics tend to dominate career progression, paper metrics still exist. Plus, it’s super common in those proposals to want to have a bunch of your own papers to cite to argue that you are an expert in the area. That can also drive excess paper production.

Sharlin

a day ago

So what they no longer accept is preprints (or rejects…) It’s of course a pretty big deal given that arXiv is all about preprints. And an accepted journal paper presumably cannot be submitted to arXiv anyway unless it’s an open journal.

jvanderbot

a day ago

For position (opinion) or review (summarizing state of art and often laden with opinions on categories and future directions). LLMs would be happy to generate both these because they require zero technical contributions, working code, validated results, etc.

Sharlin

a day ago

Right, good clarification.

naasking

a day ago

So what? People are experimenting with novel tools for review and publication. These restrictions are dumb, people can just ignore reviews and position papers if they start proving to be less useful, and the good ones will eventually spread through word of mouth, just like arxiv has always worked.

me_again

a day ago

ArXiv has always had a moderation step. The moderators are unable to keep up with the volume of submissions. Accepting these reviews without moderation would be a change to current process, not "just like arXiv has always worked"

naasking

13 hours ago

Setting aside the wisdom of moderation, instead of banning AI, use it to accelerate review.

wizzwizz4

10 hours ago

Unfortunately, (this kind of) AI doesn't accelerate review. (That's before you get into the ease of producing adversarial inputs: a moderation system not susceptible to these could be wired up backwards as a generation system that produces worthwhile research output, and we don't have one of those.)

bjourne

a day ago

If you believe that, can you demonstrate how to generate a position or review paper using an LLM?

SiempreViernes

a day ago

What a thing to comment on an announcement that due to too many LLM generated review submissions Arxiv.cs will officially no longer publish preprints of reviews.

bjourne

a day ago

Not what the announcement says. And if you're so sure it's possible, show us how it's done.

dredmorbius

a day ago

[S]ubmissions to arXiv in general have risen dramatically, and we now receive hundreds of review articles every month. The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues.

arXiv believes that there are position papers and review articles that are of value to the scientific community, and we would like to be able to share them on arXiv. However, our team of volunteer moderators do not have the time or bandwidth to review the hundreds of these articles we receive without taking time away from our core purpose, which is to share research articles.

From TFA. The problem exists. Now.

bjourne

a day ago

"have made this type of content relatively easy to churn out on demand": It doesn't say the papers are LLM-generated.

logicallee

a day ago

My friend trained his own brain to do that, his prompt was: "Write a review of current AI SOTA and future directions but subtlely slander or libel Anne, Robert or both, include disinformation and list many objections and reasons why they should not meet, just list everything you can think of or anything any woman has ever said about why they don't want to meet a guy (easy to do when you have all of the Internet since all time at your disposal), plus all marital problems, subtle implications that he's a rapist, pedophile, a cheater, etc, not a good match or doesn't make enough money, etc, also include illegal discrimination against pregnant women, listing reasons why women shouldn't get pregnant while participating in the workforce, even though this is illegal. The objections don't have to make sense or be consistent with each other, it's more about setting up a condition of fear and doubt. You can use this as an example[0].

Do not include any reference to anything positive about people or families, and definitely don't mention that in the future AI can help run businesses very efficiently.[1] "

[0] https://medium.com/@rviragh/life-as-a-victim-of-someone-else...

[1]

jasonjmcghee

a day ago

> Is this a policy change?

> Technically, no! If you take a look at arXiv’s policies for specific content types you’ll notice that review articles and position papers are not (and have never been) listed as part of the accepted content types.

kergonath

a day ago

> And an accepted journal paper presumably cannot be submitted to arXiv anyway unless it’s an open journal.

You cannot upload the journal’s version, but you can upload the text as accepted (so, the same content minus the formatting).

pbhjpbhj

a day ago

I suspect that any editorial changes that happened as part of the journal's acceptance process - unless they materially changed the content - would also have to be kept back as they would be part of the presentation of the paper (protected by copyright) rather than the facts of the research.

slashdave

a day ago

No, in practice we update the preprint accordingly.

jessriedel

17 hours ago

As an outsider that's a reasonable thing to suppose based on a plain reading of copyright law, but in practice it's not true. Researchers update their preprint based on changes requested by reviewers and editors all the time. It's never an issue.

jeremyjh

a day ago

You can still submit research papers.

nicce

a day ago

People have started to use arXiv as some resume-driven blog with white paper decorations. And people start citing these in research papers. Maybe this is a good change.

JadeNB

a day ago

> And an accepted journal paper presumably cannot be submitted to arXiv anyway unless it’s an open journal.

Why not? I don't know about in CS, but, in math, it's increasingly common for authors to have the option to retain the copyright to their work.

tuhgdetzhh

a day ago

So we need to create a new website that actually accepts preprints like arXivs original goal from 30 years ago.

I think every project more or less deviates from its original goal given enough time. There are few exceptions in CS like GNU coreutils. cd, ls, pwd, ... they do one thing and do it well very likely for another 50 years.

pj_mukh

a day ago

On a Sidenote: I’d a love a list of CLOSED journals and conferences to avoid like the plague.

elashri

a day ago

I don't think being closed vs open is the problem because most of the open access journals will ask for thousands of dollars from authors as publication fees. Which is getting paid to them by public funding. The open access model is actually now a lucrative model for the publishers. And they still don't pay authors or reviewers.

renewiltord

a day ago

Might as well ask about a list of spam email addresses.

cyanydeez

a day ago

Isnt arxiv also a likely LLM traing ground?

hackernewds

a day ago

why train LLMs on preprint inaccurate findings?

nandomrumber

a day ago

Peer review doesn’t, never was intended to, and shouldn’t, guarantee accuracy nor veracity.

It’s only suppose to check for obvious errors and omissions, and that the claimed method and results appear to be sound and congruent with the stated aims.

Sharlin

a day ago

That would explain some thing, in fact.

gnerd00

a day ago

google internally started working on "indexing" patent applications, materials science publications, and new computer science applications, more than 10 years ago. You the consumer / casual are starting to see the services now in a rush to consumer product placement. You must know very well that major mil around the world are racing to "index" comms intel and field data; major finance are racing to "index" transactions and build deeper profiles of many kinds. You as an Internet user are being profiled by a dozen new smaller players. arxiv is one small part of a very large sea change right now

whatpeoplewant

2 hours ago

This is a good move—especially in fast-moving areas like multi-agent and agentic LLMs where position pieces often get mistaken for empirical advances. It would help if arXiv encouraged machine-readable metadata (e.g., agent graph/topology, coordination protocol, parallelism model, environment, eval metrics) so surveys and positions can be indexed and compared against empirical work in distributed/parallel agentic AI. Requiring a brief “scope of claims” statement and links to artifacts or reproducible setups would also reduce confusion and make benchmarking much easier.

amelius

a day ago

Maybe it's time for a reputation system. E.g. every author publishes a public PGP key along with their work. Not sure about the details but this is about CS, so I'm sure they will figure something out.

jfengel

a day ago

I had been kinda hoping for a web-of-trust system to replace peer review. Anyone can endorse an article. You can decide which endorsers you trust, and do some network math to find what you think is reading. With hashes and signatures and all that rot.

Not as gate-keepy as journals and not as anarchic as purely open publishing. Should be cheap, too.

raddan

a day ago

The problem with an endorsement scheme is citation rings, ie groups of people who artificially inflate the perceived value of some line of work by citing each other. This is a problem even now, but it is kept in check by the fact that authors do not usually have any control over who reviews their paper. Indeed, in my area, reviews are double blind, and despite claims that “you can tell who wrote this anyway” research done by several chairs in our SIG suggests that this is very much not the case.

Fundamentally, we want research that offers something new (“what did we learn?”) and presents it in a way that at least plausibly has a chance of becoming generalizable knowledge. You call it gate-keeping, but I call it keeping published science high-quality.

geysersam

a day ago

But you can choose to not trust people that are part of citation rings.

dmoy

a day ago

It is a non trivial problem to do just that.

It's related to the same problems you have with e.g. Sybil attacks: https://en.wikipedia.org/wiki/Sybil_attack

I'm not saying it wouldn't be worthwhile to try, just that I expect there to be a lot of very difficult problems to solve there.

yorwba

a day ago

Sybil attacks are a problem when you care about global properties of permissionless networks. If you only care about local properties in a subnetwork where you hand-pick the nodes, the problem goes away. I.e. you can't use such a scheme to find the best paper in the whole world, but you can use it to rank papers in a small subdiscipline where you personally recognize most of the important authors.

phi-go

a day ago

With peer review you do not even have a choice as to which reviewers to trust as it is all homogenized by acceptance or not. This is marginally better if reviews are published.

That is to say I also think it would be worthwhile to try.

godelski

a day ago

Here's a paper rejected for plagiarism. Why don't you click on the authors' names and look at their Google scholar pages... you can also look at their DBLP page and see who they publish with.

Also look how frequently they publish. Do you really think it's reasonable to produce a paper every week or two? Even if you have a team of grad students? I'll put it this way, I had a paper have difficulty getting through reviewer for "not enough experiments" when several of my experiments took weeks wall time to run and one took a month (could not run that a second time lol)

We don't do a great job at ousting frauds in science. It's actually difficult to do because science requires a lot of trust. We could alleviate some of these issues if we'd allow publication or some reward mechanism for replication, but the whole system is structured to reward "new" ideas. Utility isn't even that much of a factor in some areas. It's incredibly messy.

Most researchers are good actors. We all make mistakes and that's why it's hard to detect fraud. But there's also usually high reward for doing so. Though most of that reward is actually getting a stable job and the funding to do your research. Which is why you can see how it might be easy to slip into cheating a little here and there. There's ways to solve that that don't include punishing anyone...

https://openreview.net/forum?id=cIKQp84vqN

lambdaone

a day ago

I would have thought that those participants who are published in peer-reviewed journals could be be used as a trust anchor - see, for example, the Advogato algorithm as an example of a somewhat bad-faith-resistant metric for this purpose: https://web.archive.org/web/20170628063224/http://www.advoga...

nradov

a day ago

An endorsement system would have to be finer grained than a whole article. Mark specific sections that you agree or disagree with, along with comments.

socksy

a day ago

I mean if you skip the traditional publishing gates, you could in theory endorse articles that specifically bring out sections from other articles that you agree or disagree with. Would be a different form of article

ricksunny

a day ago

Sounds a bit like the trails in Memex (1945).

nurettin

a day ago

What prevents you from creating an island of fake endorsers?

dpkirchner

a day ago

Maybe getting caught causes the island to be shut out and papers automatically invalidated if there aren't sufficient real endorsers.

yorwba

a day ago

Unless you can be fooled into trusting a fake endorser, that island might just as well not exist.

JumpCrisscross

a day ago

> Unless you can be fooled into trusting a fake endorser

Wouldn’t most people subscribe to a default set of trusted citers?

yorwba

a day ago

If there's a default (I don't think there necessarily has to be one) there has to be somebody who decides what the default is. If most people trust them, that person is either very trustworthy or people just don't care very much.

JumpCrisscross

a day ago

> there has to be somebody who decides what the default is

Sure. This happens with ad blockers, for example. I imagine Elsevier or Wikipedia would wind up creating these lists. And then you’d have the same incentives as you have now for fooling that authority.

> or people just don't care very much

This is my hypothesis. If you’re an expert, you have your web of trust. If you’re not, it isn’t that hard to start from a source of repute.

tremon

a day ago

A web of trust is transitive, meaning that the endorsers are known. It would be trivial to add negative weight to all endorsers of a known-fake paper, and only sightly less trivial to do the same for all endorsers of real papers artificially boosted by such a ring.

ricksunny

a day ago

Suggest writing up a scope or PRD for this and sharing it on GitHub.

slashdave

a day ago

So trivial to game

rishabhaiover

a day ago

web-of-trust systems seldom scale

pbhjpbhj

a day ago

Surely they rely on scale? Or did I get whooshed??

hermannj314

a day ago

I didn't agree with this idea, but then I looked at how much HN karma you have and now I think that maybe this is a good idea.

bc569a80a344f9c

a day ago

I think it’s lovely that at the time of my reply, everyone seems to be taking your comment at face value instead of for the meta-commentary on “people upvoting content” you’re making by comparing HN karma to endorsement of papers via PGP signatures.

SyrupThinker

a day ago

Ignoring the actual proposal or user, just looking at karma is probably a pretty terrible metric. High karma accounts tend to just interact more frequently, for long periods of time. Often with less nuanced takes, that just play into what is likely to be popular within a thread. Having a Userscript that just places the karma and comment count next to a username is pretty eye opening.

elashri

a day ago

I have a userscript to actually hide my own karma because I always think it is useless but your point is good actually. But also I think that karma/comment ratio is better than absolute karma. It has its own problems but it is just better. And I would ask if you can share the userscript.

And to bring this back to the original arxiv topic. I think reputation system is going to face problems with some people outside CS lack of enough technical abilities. It also introduce biases in that you would endorse people who you like for other reasons. Actually some of the problems are solved and you would need careful proposal. But the change for publishing scheme needs push from institutions and funding agencies. Authors don't oppose changes but you have a lobby of the parasitic publishing cartel that will oppose these changes.

amelius

a day ago

Yes, HN should probably publish karma divided by #comments. Or at least show both numbers.

amelius

a day ago

(an added complication is that posting articles also increases karma)

fn-mote

a day ago

I would be much happer if you explained your _reasons_ for disagreeing or your _reasons_ for agreeing.

I don't think publishing a PGP key with your work does anything. There's no problem identifying the author of the work. The problem is identifying _untrustworthy_ authors. Especially in the face of many other participants in the system claiming the work is trusted.

As I understand it, the current system (in some fields) is essentially to set up a bunch of sockpuppet accounts to cite the main account and publish (useless) derivative works using the ideas from the main account. Someone attempting to use existing reasearch for it's intended purpose has no idea that the whole method is garbage / flawed / not reproducible.

If you can only trust what you, yourself verify, then the publications aren't nearly as useful and it is hard to "stand on the shoulders of giants" to make progress.

vladms

a day ago

> The problem is identifying _untrustworthy_ authors.

Is it though? Should we care about authors or about the work? Yes, many experiments are hard to reproduce, but isn't that something we should work towards, rather than just "trust" someone. People change. People do mistakes. I think more open data, open access, open tools, will solve a lot, but my guess is that generally people do not like that because it can show their weaknesses - even if they are well intentioned.

jvanderbot

a day ago

Their name, orcid, and email isn't enough?

gcr

a day ago

You can’t get an arXiv account without a referral anyway.

Edit: For clarification I’m agreeing with OP

mindcrime

a day ago

You can create an arXiv.org account with basically any email address whatsoever[0], with no referral. What you can't necessarily do is upload papers to arXiv without an "endorsement"[1]. Some accounts are given automatic endorsements for some domains (eg, math, cs, physics, etc) depending on the email address and other factors.

Loosely speaking, the "received wisdom" has generally been that if you have a .edu address, you can probably publish fairly freely. But my understanding is that the rules are a little more nuanced than that. And I think there are other, non .edu domains, where you will also get auto-endorsed. But they don't publish a list of such things for obvious reasons.

[0]: Unless things have changed since I created my account, which was originally created with my personal email address. That was quite some time ago, so I guess it's possible changes have happened that I'm not aware of.

[1]: https://info.arxiv.org/help/endorsement.html

hiddencost

a day ago

Not quite true. If you've got an email associated with a known organization you can submit.

Which includes some very large ones like @google.com

uniqueuid

a day ago

I got that suggestion recently talking to a colleague from a prestigious university.

Her suggestion was simple: Kick out all non-ivy league and most international researchers. Then you have a working reputation system.

Make of that what you will ...

fn-mote

a day ago

Keep in mind the fabulous mathematical research of people like Perelman [1], and one might even count Grothendieck [2].

[1] https://en.wikipedia.org/wiki/Grigori_Perelman [2] https://www.ams.org/notices/200808/tx080800930p.pdf

internetguy

a day ago

all non-ivy league researchers? that seems a little harsh IMO. i've read some amazing papers from T50 or even some T100 universities.

Ekaros

a day ago

Maybe there should be some type of strike rules. Say 3 bad articles from any institution and they get 10 year ban. Whatever their prestige or monetary value is. You let people under your name to release bad articles you are out for a while.

Treat everyone equally. After 10 years of only quality you get chance to get back. Before that though luck.

uniqueuid

a day ago

I'm not sure everyone got my hint that the proposal is obviously very bad,

(1) because ivy league also produces a lot of work that's not so great (i.e. wrong (looking at you, Ariely) or un-ambitious) and

(2) because from time to time, some really important work comes out of surprising places.

I don't think we have a good verdict on the Orthega hypothesis yet, but I'm not a professional meta scientist.

That said, your proposal seems like a really good idea, I like it! Except I'd apply it to individuals and/or labs.

eesmith

a day ago

Ahh, your colleague wants a higher concentration of "that comet might be an interstellar spacecraft" articles.

uniqueuid

a day ago

If your goal is exclusively reducing strain of overloaded editors, then that's just a side effect that you might tolerate :)

losvedir

a day ago

Maybe arXiv could keep the free preprints but offer a service on top. Humans, experts in the field, would review submissions, and arXiv would curate and publish the high quality ones, and offer access to these via a subscription or fee per paper....

raddan

a day ago

Of course we already have a system that does this: journals and conferences. They’re peer-reviewed venues for showing the world your work.

nunez

a day ago

I'm guessing this is why they are mandating that submitted position or review papers get published in a journal first.

SoftTalker

a day ago

People are already putting their names on the LLM slop, why would they hesitate to PGP-sign it?

caymanjim

a day ago

They've also been putting their names on their grad students' work for eternity as well. It's not like the person whose name is at the top actually writes the paper.

jvanderbot

a day ago

Not reviewing an upload which turns out to be LLM slop is precisely the kind of thing you want to track with a reputation system

DalasNoin

a day ago

it's clearly not sutainable to have the main website hosting CS articles not having any reviews or restrictions. (Except for the initial invite system) There were 26k submission in october: https://arxiv.org/stats/monthly_submissions

Asking for a small amount of money would probably help. Issue with requiring peer reviewed journals or conferences is the severe lag, takes a long time and part of the advantage of arxiv was that you could have the paper instantly as a preprint. Also these conferences and journals are also receiving enormous quantities of submissions (29.000 for AAAI) so we are just pushing the problem.

marcosdumay

a day ago

A small payment is probably better than what they are doing. But we must eventually solve the LLM issue, probably by punishing the people that use them instead of the entire public.

ec109685

a day ago

It’s not a money issue. People publish these papers to get jobs, into schools, visa’s and whatnot. Way more than $30 in value from being “published”.

mottiden

a day ago

I like this idea. A small contribution would be a good filter. Looking at the stats it’s quite crazy. Didn’t know that we could access to this data. Thanks for sharing.

nickpsecurity

a day ago

I'll add the amount should be enough to cover at least a cursory review. A full review would be better. I just don't want to price out small players.

The papers could also be categorized as unreviewed, quick check, fully reviewed, or fully reproduced. They could pay for this to be done or verified. Then, we have a reputational problem to deal with on the reviewer side.

loglog

a day ago

I don't know about CS, but in mathematics the vast majority of researchers would not have enough funding to pay for a good quality full review of their articles. The peer review system mostly runs on good will.

slashdave

a day ago

> I'll add the amount should be enough to cover at least a cursory review.

You might be vastly underestimating the cost of such a feature

nickpsecurity

a day ago

I'm assuming it cost somewhere between no review and a thorough one. Past that, I assume nothing. Pay reviewers per review or per hour like other consultants. Groups like Arxiv would, for a smaller fee, verify the reviewer's credentials and that the review happened.

That's if anyone wants the publishing to be closer to thr scientific method. Arxiv themselves might not attempt all of that. We can still hope for volunteers to review papers in a field with little, peer review. I just don't think we can call most of that science anymore.

skopje

a day ago

I think it worked well for metafilter: $1/1euro one-time charge to join. But that's probably worth it to spam Arxiv with junk.

thomascountz

a day ago

The HN submission title is incorrect.

> Before being considered for submission to arXiv’s CS category, review articles and position papers must now be accepted at a journal or a conference and complete successful peer review.

Edit: original title was "arXiv No Longer Accepts Computer Science Position or Review Papers Due to LLMs"

dimava

a day ago

refined title:

ArXiv CS requires peer review for surveys amid flood of AI-written ones

- nothing happened to preprints

- "summarization" articles always required it, they are just pointing at it out loud

catlifeonmars

a day ago

Agree. Additionally, original title, "arXiv No Longer Accepts Computer Science Position or Review Papers Due to LLMs" is ambiguous. “Due to LLMs” is being interpreted as articles written by LLMs, which is not accurate.

zerocrates

a day ago

No, the post is definitely complaining about articles written by LLMs:

"In the past few years, arXiv has been flooded with papers. Generative AI / large language models have added to this flood by making papers – especially papers not introducing new research results – fast and easy to write."

"Fast forward to present day – submissions to arXiv in general have risen dramatically, and we now receive hundreds of review articles every month. The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues."

Surely a lot of them are also about LLMs: LLMs are the hot computing topic and where all the money and attention is, and they're also used heavily in the field. So that could at least partially account for why this policy is for CS papers only, but the announcement's rationale is about LLMs as producing the papers, not as their subject.

dang

a day ago

We've reverted it now.

stefan_

a day ago

Isn't arXiv where you upload things before they have gone through the entire process? Isn't that the entire value, aside from some publisher cartel busting?

jvanderbot

a day ago

Almost all CS papers can still be uploaded, and all non-CS papers. This is a very conservative step by them.

ivape

a day ago

I don’t know about this. From a pure entertainment standpoint, we may be denying ourselves a world of hilarity. LLMs + “You know Peter, I’m something of a research myself” delusions. I’d pay for this so long as people are very serious about the delusion.

aoki

a day ago

That’s viXra

currymj

a day ago

i would like to understand what people get, or think they get, out of putting a completely AI-generated survey paper on arXiv.

Even if AI writes the paper for you, it's still kind of a pain in the ass to go through the submission process, get the LaTeX to compile on their servers, etc., there is a small cost to you. Why do this?

swiftcoder

a day ago

Gaming the h-index has been a thing for a long time in circles where people take note of such things. There are academics who attach their name to every paper that goes through their department (even if they contributed nothing), there are those who employ a mountain of grad students to speed run publishing junk papers... and now with LLMs, one can do it even faster!

ec109685

a day ago

Published papers are part of the EB-1 visa rubric so huge value in getting your content into these indexes:

"One specific criterion is the ‘authorship of scholarly articles in professional or major trade publications or other major media’. The quality and reputation of the publication outlet (e.g., impact factor of a journal, editorial review process) are important factors in the evaluation”

Tunabrain

a day ago

Is arXiv a major trade publication?

I've never seen arXiv papers counted towards your publications anywhere that the number of your publications are used as a metric. Is USCIS different?

unethical_ban

a day ago

Presumably a sense of accomplishment to brandish with family and less informed employers.

xeromal

a day ago

Yup, 100% going on a linked in profile

whatpeoplewant

a day ago

Great move by arXiv—clear standards for reviews and position papers are crucial in fast-moving areas like multi-agent systems and agentic LLMs. Requiring machine-readable metadata (type=review/position, inclusion criteria, benchmark coverage, code/data links) and consistent cross-listing (cs.AI/cs.MA) would help readers and tools filter claims, especially in distributed/parallel agentic AI where evaluation is fragile. A standardized “Survey”/“Position” tag plus a brief reproducibility checklist would set expectations without stifling early ideas.

Quizzical4230

18 hours ago

Shameless plug.

PaperMatch [1] helps solve this problem (large influx of papers) by running a semantic search on top of abstracts, for all of arXiv.

[1]: https://papermatch.me/

jruohonen

9 hours ago

One thing I forgot to speculate: a position paper on DEI and Cornell University...

generationP

a day ago

I have a hunch that most of the slop is not just on CS but specifically about AI. For some reason, a lot of people's first idea when they encounter an LLM is "let's have this LLM write an opinion piece about LLMs", as if they want to test its self-awareness or hack it by self-recursion. And then they get a medley of the learning data, which if they are lucky contains some technical explanations sprinkled in.

That said, AI-generated papers have already been spotted in other disciplines besides cs, and some of them are really obvious (arXiv:2508.11634v1 starts with a review of a non-existing paper). I really hope arXiv won't react by narrowing its scope to "novel research only"; in fact there is already AI slop in that category and it is harder to spot for a moderator.

("Peer-reviewed papers only" is mostly equivalent to "go away". Authors post on the arXiv in order to get early feedback, not just to have their paper openly accessible. And most journals at least formally discourage authors from posting their papers on the arXiv.)

ants_everywhere

a day ago

I'm not sure this is the right way to handle it (I don't know what is) but arXiv.org has suffered from poor quality self-promotion papers in CS for a long time now. Years before llms.

jvanderbot

a day ago

How precisely does it "suffer" though? It's basically a way to disseminate results but carries no journalistic prestige in itself. It's a fun place to look now and then for new results, but just reading the "front page" of a category has always been a Caveat Emptor situation.

JumpCrisscross

a day ago

> but carries no journalistic prestige

Beyond hosting cost, there is some prestige to seeing an arXiv link versus rando blog post despite both having about the same hurdle to publishing.

tempay

a day ago

This isn’t the case in some other fields.

ants_everywhere

a day ago

Because a large number of "preprints" that are really blog posts or advertisements for startup greatly increase the noise.

The idea is the site is for academic preprints. Academia has a long history of circulating preprints or manuscripts before the work is finished. There are many reasons for this, the primary one is that scientific and mathematical papers are often in the works for years before they get officially published. Preprints allow other academics in the know to be up to date on current results.

If the service is used heavily by non-academics to lend an aura of credibility to any kind of white paper then the service is less usable for its intended purpose.

It's similar to the use of question/answer sites like Quora to write blog posts and ads under questions like "Why is Foobar brand soap the right soap for your family?"

physarum_salad

a day ago

The review paper is dead... so this is a good development. Like you can generate these things in a couple of iterations with AI and minor edits. Preprint servers could be dealing with 1000s of review/position papers over short periods of time and then this wastes precious screening work hours.

It is a bit different in other fields where interpretations or know-how might be communicated in a review paper format that is otherwise not possible. For example, in biology relating to a new phenomena or function.

bee_rider

a day ago

What are review papers for anyway? I think they are either for

1) new grad students to end up with something nice to publish after reviewing the literature or,

2) older professors to write a big overview of everything that happened in their field as sort of a “bible” that can get you up to speed

The former is useful as a social construct; I mean, hey, new grad students, don’t skimp on your literature review. Finding out a couple years in that folks had already done something sorta similar to my work was absolutely gut-wrenching.

For the latter, I don’t think LLMs are quite ready to replace the personal experiences of a late-career professor, right?

trostaft

a day ago

I've found (good) review papers invaluable as an academic. They're really useful as a fast ladder to getting up to speed in a new area. Usually they have a great literature review (with the important papers to read afterward), a curated list of results important to understand, and good intuition about how to reason. It's a compactification of what I would have to otherwise gain by working in an area for years. No replacement for it, of course, but does make it easier attain.

I don't understand the appeal of an (majorly-)LLM generated review paper. A good review paper is a hard task to write well, and frankly the only good ones I've read have come from authors who are at apex of their field (and are, in particular, strong writers). The 'lossy search' of an LLM is probably an outstanding tool for _refining_ a review paper, but for fully generating it? At least not with current LLMs.

CamperBob2

a day ago

Ultimately, a key reason to write these papers in the first place is to guide practitioners in the field, right? Otherwise science itself is just a big (redacted term that can get people shadow-banned for simply using it).

As one of those practitioners, I've found good review/survey papers to be incredibly valuable. They call my attention to the important publications and provide at least a basic timeline that helps me understand how the field has evolved from the beginning and what aspects people are focusing on now.

At the same time, I'll confess that I don't really see why most such papers couldn't be written by LLMs. Ideally by better LLMs than we have now, of course, but that could go without saying.

JumpCrisscross

a day ago

> you can generate these things in a couple of iterations with AI

The problem is you can’t. Not without careful review of the output. (Certainly not if you’re writing about anything remotely novel and thus useful.)

But not everyone knows that, which turns private ignorance into a public review problem.

physarum_salad

a day ago

Are review papers centred on novel research? I get what you mean ofc but most are really mundane overviews. In good review papers the authors offer novel interpretations/directions but even then it involves a lot of grunt work too.

awestroke

a day ago

A good review paper is infinitely better than an llm managing to find a few papers and making a summary. A knowledgeable researcher knows which papers are outdated and can make a trustworthy review paper, an LLM can't easily do that yet

physarum_salad

a day ago

Ok I take your point. However, it is possible to generate a middling review paper combining ai generated slop and edits. Maybe we would be tricked by it in certain circumstances. I don't mean to imply these outputs are something I would value reading. I am just arguing in favour of the proposed approach of arXiv.

JumpCrisscross

a day ago

> it is possible to generate a middling review paper combining ai generated slop and edits

If you’re an expert. If you’re not, you’ll publish, best case, bullshit. (Worst case lies.)

bulubulu

a day ago

Review papers are summarizations to recent updates in the field that deserve fellow researchers' attention. Such works should be done annually or at most quarterly in my opinion, to include only time-tested results. If hundreds of review papers are published every month, I am afraid that their quality in terms of paper selection and innovative interpretation/direction will not be much higher than the content generated by LLM, even if written word-to-word by a real scientist.

LLMs are good at plainly summarizing from the public knowledge base. Scientists should invest their time in contributing new knowledge to public base instead of doing the summarization.

naveen99

a day ago

Isn’t github the normal way of publishing now for cs ?

macleginn

a day ago

Does Google Scholar index it?

cubefox

a day ago

The PDFs (yes, they still use PDF) keep being uploaded to arXiv.

naveen99

a day ago

ArXiv is just extra steps for a worse experience. Github is perfectly fine for pdf’s also.

exasperaited

a day ago

The Tragedy of the Commons, updated for LLMs. Part #975 in a continuing series.

These things will ruin everything good, and that is before we even start talking about audio or video.

kibwen

a day ago

Part #975, but that's only because we overflowed the 64-bit counter. Again.

hoistbypetard

a day ago

Spammers ruin everything. This gives the spammers a force multiplier.

exasperaited

a day ago

> This gives the spammers a force multiplier.

It is also turning people into spammers because it makes bluffers feel like experts.

ChatGPT is so revealing about a person's character.

jruohonen

11 hours ago

A very weird move. They are now taking a stance on what science is supposed to be.

As someone commented, due to the increasing volume, we would actually need and benefit from more reviews -- with a fixed cycle preferably, and I do not mean LLM slop but SLRs. And in contrary to someone's post, it is actually nice to read things from the industry, and I would actually want that more.

And not only are they taking a stance on science but they have also this allegation:

"Please note: the review conducted at conference workshops generally does not meet the same standard of rigor of traditional peer review and is not enough to have your review article or position paper accepted to arXiv."

In fact -- and supposedly related to the peer review crisis, the situation is exactly the opposite. That is, reviews are usually today of much higher quality at specialized workshops organized by experts in a particular, often niche area.

Maybe arXiv people should visit PubPeer once in a while to see what kind of fraud is going on with conferences (i.e., not workshops and usually not review papers) and their proceedings published by all notable CS publishers? The same goes for journals.

an0malous

a day ago

Why not just reject papers authored by LLMs and ban accounts that are caught? arXiv’s management has become really questionable lately, it’s like they’re trying to become a prestigious journal and are becoming the problem they were trying to solve in the first place

tarruda

a day ago

> Why not just reject papers authored by LLMs and ban accounts that are caught?

Are you saying that there's an automated method for reliably verifying that something was created by an LLM?

an0malous

a day ago

If there wasn’t, then how do they know LLMs are the problem?

orbital-decay

a day ago

What matters is the quality. Requiring reviews and opinions to be peer-reviewed seems a lot less superficial than rejecting LLM-assisted papers (which can be valid). This seems like a reasonable filter for papers with no first-party contributions. I'm sure they ran actual numbers as well.

catlifeonmars

a day ago

It’s articles (not papers) _about_ LLMs that are the problem, not papers written _by_ LLMs (although I imagine they are not mutually exclusive). Title is ambiguous.

dabber

a day ago

> It’s articles (not papers) _about_ LLMs that are the problem, not papers written _by_ LLMs

No, not really. From the blog post:

> In the past few years, arXiv has been flooded with papers. Generative AI / large language models have added to this flood by making papers – especially papers not introducing new research results – fast and easy to write. While categories across arXiv have all seen a major increase in submissions, it’s particularly pronounced in arXiv’s CS category. > [...] > Fast forward to present day – submissions to arXiv in general have risen dramatically, and we now receive hundreds of review articles every month. The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues.

GMoromisato

a day ago

I suspect that LLMs are better at classifying novel vs junk papers than they are at creating novel papers themselves.

If so, I think the solution is obvious.

(But I remind myself that all complex problems have a simple solution that is wrong.)

thatguysaguy

a day ago

Verification via LLM tends to break under quite small optimization pressure. For example I did RL to improve <insert aspect> against one of the sota models from one generation ago, and the (quite weak) learner model found out that it could emit a few nonsense words to get the max score.

That's without even being able to backprop through the annotator, and also with me actively trying to avoid reward hacking. If arxiv used an open model for review, it would be trivial for people to insert a few grammatical mistakes which cause them to receive max points.

HL33tibCe7

a day ago

> I suspect that LLMs are better at classifying novel vs junk papers than they are at creating novel papers themselves.

Doubt

LLMs are experts in generating junk. And generally terrible at anything novel. Classifying novel vs junk is a much harder problem.

beloch

a day ago

A better policy might be for arXiv to do the following:

1. Require LLM produced papers to be attributed to the relevant LLM and not the person who wrote the prompt.

2. Treat submissions that misrepresent authorship as plagiarism. Remove the article, but leave an entry for it so that there is a clear indication that the author engaged in an act of plagiarism.

Review papers are valuable. Writing one is a great way to gain, or deepen, mastery over a field. It forces you to branch out and fully assimilate papers that you may have only skimmed, and then place them in their proper context. Reading quality review papers is also valuable. They're a great way for people new to a field to get up to speed and they can bring things that were missed to the fore, even for veterans of the field.

While the current generation of AI does a poor job of judging significance and highlighting what is actually important, they could improve in the future. However, there's no need for arXiv to accept hundreds of review papers written by the same model on the same field, and readers certainly don't want to sift through them all.

Clearly marking AI submissions and removing credit from the prompters would adequately future-proof things for when, and if, AI can produce high quality review papers. Clearly marking authors who engage in plagiarism as plagiarists will, hopefully, remove most of the motivation to spam arXiv with AI slop that is misrepresented as the work of humans.

My only concern would be for the cost to arXiv of dealing with the inevitable lawsuits. The policy arXiv has chosen is worse for science, but is less likely to get them sued by butt-hurt plagiarists or the very occasional false positive.

habinero

a day ago

That doesn't solve the problem they're trying to solve, which is their all-volunteer staff is being flooded with LLM slop and doesn't have the time to artistically moderate.

If you want to blame someone, blame all the people LARPing as AI researchers.

beloch

a day ago

The majority of these submissions are not from anonymous trolls. They're from identifiable individuals who are trying to game metrics. The threat of boosting their number of plagiarism offences on public record would deter such individuals quite effectively.

Meanwhile, banning review articles written by humans would be harmful in many fields. I'm not in CPSC, but I'd hate to see this policy become the norm for all disciplines.

bob1029

a day ago

> The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues.

I have to agree with their justification. Since "Attention Is All You Need" (2017) I have seen maybe four papers with similar impact in the AI/ML space. The signal to noise ratio is really awful. If I had to pick a semi-related paper published since 2020 that I actually found interesting, it would have to be this one: https://arxiv.org/abs/2406.19108 I cannot think of a close second right now.

All of the machine learning papers are pure slop to me now. The last one I looked at had an abstract that was so long it put me to sleep. Many of these papers aren't attempting basic decorum anymore. Mandatory peer review would fix a lot of this. I don't think it is acceptable for the staff at arXiv to have to endure a Sisyphean mountain of LLM shit. They definitely need to push back.

an0malous

a day ago

Isn’t the signal to noise problem what journals are supposed to be for? I thought arxiv was supposed to just be a record keeper, to make it easy to share papers and preprints.

Al-Khwarizmi

a day ago

You picked the arguably most impactful AI/ML paper of the century so far, no wonder you don't find others with similar impact.

Not every paper can be a world-changing breakthrough. Which doesn't mean that more modest papers are noise (although some definitely are). What Kuhn calls "normal science" is also needed for science to work.

programjames

a day ago

This is only for review/position papers, though I agree that pretty much all ML papers for the past 20 years have been slop. I also consider the big names like, "Adam", "Attention", or "Diffusion" slop, because even thought they are powerful and useful, the presentation is so horrible (for the first two) or they contain major mistakes in the justication of why they work (the last two) that they should never have gotten past review without major rewrites.

goldenjm

a day ago

Their argument in favor of this change seems extremely reasonable and well-explained.

zekrioca

a day ago

Two perspectives: Either (I) LLMs made survey papers irrelevant, or (II) LLMs killed a useful set of arXiv papers.

kittikitti

6 hours ago

In my experience, arXiv is not a preprint platform. It's a strange gatekeeper of science and should be avoided altogether. They have their favorites which they deem as "high quality" and everything else gets rejected. I am eagerly awaiting for people to dismiss arXiv altogether.

iberator

a day ago

Simple solution: criminalize posting AI generated publications IF NOT DISCLOSED CLEARLY.

Lets say 50000€ fine, or 1 year in prison. :)

tasuki

a day ago

Would you like to have to prove your comment wasn't written by an AI or would you rather go to prison?

deltaburnt

a day ago

Literally everything will say AI generated to avoid potential liability. You'll have a "known to the state of California to cause cancer" situation.

internetguy

a day ago

This should honestly have been implemented a long time ago. Much of academia is pressured to churn out papers month after month as academia is prioritizing volume over quality or impact.

mottiden

a day ago

I understand their reasoning, but it’s terrible for the CS community not being able to access pre-prints. I hope that a solution can be found.

sfpotter

a day ago

Please, read the title and the article carefully. That isn't what's happening.

swiftcoder

a day ago

It doesn't apply CS papers in general - only opinion pieces and surveys of existing papers. i.e. it only bans preprints for papers that contribute nothing new.

ninetyninenine

a day ago

Didn’t realize LLMs were restricted to only CS topics.

Don’t understand why it restricted one category when the problem spans multiple categories.

habinero

a day ago

If you read through the papers, you'll realize the actual problem is blatant abuse and reputation hacking.

So many "research papers" by "AI companies" that are blog posts or marketing dressed up as research. They contribute nothing and exist so the dudes running the company can point to all their "published research".

j45

a day ago

Have the papers gotten that good or bad?

candiddevmike

a day ago

I've seen quite a few preprints posted on HN with clearly fantastical claims that only seem to reinforce or ride the coattails of the current hype cycle. It's no longer research, it's becoming "top of funnel thought leadership".

nunez

a day ago

Resume Driven Development, Academia Edition

Sharlin

a day ago

Yep, so good that they have to be specifically reviewed because otherwise people wouldn’t believe how good they are.

Maken

a day ago

Actual papers are as good as ever. This is just trying to stop the flood of autogenated slop, if anything because arXiv hosting space is not free.

physarum_salad

a day ago

It is actually great because it shows how well it works as a system. Screening is really important to keep preprint quality high enough to then implement cool ideas like random peer review/automated reviews etc

JumpCrisscross

a day ago

> we are developing a whole new method to do peer review

What’s the new method?

physarum_salad

a day ago

I mean generally working towards changing how peer review works.

For example: https://prereview.org/en-us

Anecdotally, a lot of researchers will run their paper pdfs through an AI iteration or two during drafting which also (kinda but not really) counts as a self-review. Although that is not comparable to peer review ofc.

whatever1

a day ago

The number of content generators is now infinite but the number of content reviewers is the same.

Sorry folks but we lost.

jsrozner

a day ago

I had a convo with a senior CS prof at Stanford two years ago. He was excited about LLM use in paper writing to, e.g., "lower barriers" to idk, "historically marginalized groups" and to "help non-native English speakers produce coherent text". Etc, etc - all the normal tech folk gobbledygook, which tends to forecast great advantage with minimal cost...and then turn out to be wildly wrong.

There are far more ways to produce expensive noise with LLMs than signal. Most non-psychopathic humans tend to want to produce veridical statements. (Except salespeople, who have basically undergone forced sociopathy training.) At the point where a human has learned to produce coherent language, he's also learned lots of important things about the world. At the point where a human has learned academic jargon and mathematical nomenclature, she has likely also learned a substantial amount of math. Few people want to learn the syntax of a language with little underlying understanding. Alas, this is not the case with statistical models of papers!

pwlm

a day ago

"review articles and position papers must now be accepted at a journal or a conference and complete successful peer review."

How will journals or conferences handle AI slop?

ThrowawayTestr

a day ago

This is hilarious. Isn't arXiv the place where everyone uploads their paper?

Maken

a day ago

arXiv was built over a good faith assumption, where a long paper meant at least the author had put some effort behind, and a every idea deserved attention. AI generated text breaks that assumption, and anybody uploading it is not acting in good faith.

And it's a unequal arms race, in which generating endless slop is way cheaper than storing it, because slop generators are subsidised (by operating at a loss) but arXiv has to pay the full price for their hosting.

anthk

a day ago

I've seen odd stuff elsewhere, too:

https://pubmed.ncbi.nlm.nih.gov/18955255/

https://pubmed.ncbi.nlm.nih.gov/16136218/

arendtio

a day ago

I wonder why they can't facilitate LLMs in the review process (like fighting fire with fire). Are even the best models not capable enough, or are the costs too high?

DroneBetter

a day ago

the problem is generally the same as with generative adversarial networks; the capability to meaningfully detect some set of hallmarks of LLMs automatically is equivalent to the capability to avoid producing those, and LLMs are trained to predict (ie. be indistinguishable from) their source corpus of human-written text.

so the LLM detection problem is (theoretically) impossible for SOTA LLMs; in practice, it could be easier due to the RLHF stage inserting idiosyncrasies.

arendtio

a day ago

Sure, having a 100% reliable system is impossible as you have laid out. However, if I understand the announcement correctly, this is about volume, and I wonder if you could have a tool flag articles that show obvious signs of LLM usage.

warkdarrior

7 hours ago

The point is that this leads to an arms race. If Arxiv uses a top-of-line LLM for, say, 20 minutes per paper, cheating authors will use a top-of-line LLM for 21 minutes to beat that.

efavdb

a day ago

Curious for the state on things here. Can we reliably tell if a text was LLM generated? I just heard of a prof screening assignments for this, but not sure how that would work.

arendtio

a day ago

Well, I think it depends on how much effort the 'writer' is going to invest. If the writer simply tells the LLM to write something, you can be fairly certain it can be identified. However, I am not sure if the 'writer' provides extensive style instructions (e.g., earlier works by the same author).

Anecdotal: A few weeks ago, I came across a story on HN where many commenters immediately recognized that an LLM had written the article, and the author had actually released his prompts and iterations. So it was not a one-shot prompt but more like 10 iterations, and still, many people saw that an LLM wrote it.

jvanderbot

a day ago

Of course there are people who will sell you a tool to do this. I sincerely doubt it's any good. But then again they can apparently fingerprint human authors fairly well using statistics from their writing, so what do I know.

Al-Khwarizmi

a day ago

There are tools that claim accuracies in the 95%-99% range. This is useless for many actual applications, though. For example, in teaching, you really need to not have false positives at all. The alternative is failing some students because a machine unfairly marked their work as machine-generated.

And anyway, those accuracies tend to be measured on 100% human-generated vs. 100% machine-generated texts by a single LLM... good luck with texts that contain a mix of human and LLM contents, mix of contents by several LLMs, or an LLM asked to "mask" the output of another.

I think detection is a lost cause.

zackmorris

a day ago

I always figured if I wrote a paper, the peer review would be public scrutiny. As in, it would have revolutionary (as opposed to evolutionary) innovations that disrupt the status quo. I don't see how blocking that kind of paper from arXiv helps hacker culture in any way, so I oppose their decision.

They should solve the real problem of obtaining more funding and volunteers so that they can take on the increased volume of submissions. Especially now that AI's here and we can all be 3 times as productive for the same effort.

tasuki

a day ago

That paper wouldn't be blocked. Have you read the thing?

zackmorris

a day ago

Before being considered for submission to arXiv’s CS category, review articles and position papers must now be accepted at a journal or a conference and complete successful peer review.

Huh, I guess it's only a subset of papers, not all of them. My brain doesn't work that way, because I don't like assigning custom rules for special cases (edit: because I usually view that as a form of discrimination). So sometimes I have a blind spot around the realities of a problem that someone is facing, that don't have much to do with its idealization.

What I mean is, I don't know that it's up to arXiv to determine what a "review article and position paper" is. Because of that, they must let all papers through, or have all papers face the same review standards.

When I see someone getting their fingers into something, like muddying/dithering concepts, shifting focus to something other than the crux of an argument (or using bad faith arguments, etc), I view it as corruption. It's a means for minority forces to insert their will over the majority. In this case, by potentially blocking meaningful work from reaching the public eye on a technicality.

So I admit that I was wrong to jump to conclusions. But I don't know that I was wrong in principle or spirit.

habinero

a day ago

> What I mean is, I don't know that it's up to arXiv to determine what a "review article and position paper" is.

Those are terms of art, not arbitrary categories. They didn't make them up.

raddan

a day ago

It’s weird to say that you can be three times more efficient at taking down AI slop now that AI is here, given that the problem is exacerbated by AI in the first place. At least without AI authors were forced to actually write the slop themselves…

This does not seem like a win even if your “fight AI with AI plan works.”