AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights

315 pointsposted 9 hours ago
by laurex

160 Comments

hyperpape

8 hours ago

I'll copy what I wrote on LinkedIn (note: I read roughly 25 pages, which is half the paper, and read it quickly)[0]:

"If I read the paper correctly, they don’t actually show that LLMs prefer resumes they generate.

Their actual method seems to be taking a human written resume, deleting the executive summary, having an LLM rewrite the executive summary based on the rest of the resume and then having another LLM rate the executive summary without the rest of the resume.

That’s likely to massively overstate any real impact, if you can even rely on it capturing a real effect.

I really wonder if I read that correctly, because I can’t come up with a justification for that study design."

[0] I couldn't help but mildly copy-edit before pasting here.

Edit: yes, the authors present a reason for their design, and an ideal version of my comment would've said that. I do not consider it much of a justification. See below: https://news.ycombinator.com/item?id=47987256#47987727.

b112

7 hours ago

Could be an ad for 'use LLMs more'. A generic ad like this helps all in the market, but if you own 30% of LLM market share, it still helps you 30% of the time.

Now that I think of it, every other industry has an 'advocacy group', whether cheese, oil, or nutmeg. So surely there is now some sort of LLM 'consortium', and group funding studies like this just fuels the FOMO. You can be sure such groups exist, and are pummeling every government in the world thusly. But I bet they're also looking here.

After all, it's a circle. Uh-oh! HR is using LLMs, you'd better too potential employee! Then later? Uh-oh! The best employees you can hire are using LLMs, you'd better too HR!

They already FOMOed us into basically everything else, why not LLMs too?

charliebwrites

8 hours ago

Anecdata, sample size of one:

When I was looking for my next role after being laid off, I didn’t get much of a response with my human handmade resume despite my experience

Just for kicks, I asked ChatGPT to “Analyze my resume and give it a score for what percentage it was in” then I asked it to revise it to make it score as high as possible

I still tweaked and fact checked it but after I started sending that out, I got a much higher hit rate than before

But who knows, maybe the market changed, was a better time of year, etc

I still had to pass interviews and prove my worth. But it probably helped me get my foot in the door

leonidasv

8 hours ago

Same thing happened to my wife as well. I helped her tailor her LinkedIn profile and resume with a lot of attention to detail: adding metrics, keywords, results, etc. Nevertheless, she never received any outreach recruiters and got very few application responses. It went like that for months, almost a year.

Then she asked ChatGPT 5.x for help. I was skeptical about the changes it recommended (and was skeptical at all about using AI for this given the homogeneification it tends to produce). But somehow it worked: few days later, a recruiter reached out, then another, then applications started moving forward, etc.

My guess is that, as LLMs are shoveled into every phase of the recruiting process, not having an LLM write your resume for you is now playing on hard mode. The LLMs reviewing resumes are downranking resumes and profiles that are not "speaking" the same language and activating the correct neurons, thus preventing you from moving forward. This contrasts with years ago when we had more humans in the loop and the pasteurised writing of GPT 3.5/4o would make you look less worthy. Again, just a theory, but...

andsoitis

7 hours ago

> I helped her tailor her LinkedIn profile and resume with a lot of attention to detail: adding metrics, keywords, results, etc.

FWIW, when I see a resume with metrics and keywords, I immediately filter it out.

schrodinger

7 hours ago

Same.

If it's something like "Refactored the apartment list service improving P99 Latency from 2s to 180ms", it definitely boosts the resumé in my mind. A good engineer would be measuring their impact and likely have numbers like that off the top of their head.

But if it's like "Increased revenue by $18.7M by reducing time-to-first-interaction latency from 2.3s to 117ms, increasing conversion by 47% and LTV by 28%," with the same fidelity on each bullet, I'm very skeptical.

--

I don't summarily reject AI-written resumés to be clear, as honestly, it's basically a necessity at this point to be competitive with others; it'd be putting yourself at a severe disadvantage on pure principles in a way that has no real positive net effect on society. Even if you disagree with AI resumé screeners, you're only hurting yourself — especially at a time that has the largest impact on your compensation (i.e. negotiating salary at job start is one of the most valuable ways to spend your time since it will pay you back every paycheck).

Though I _do_ tend to question resumés that look like they were written almost entirely by an LLM without the candidate providing significant context and refinement.

nerdsniper

6 hours ago

I wish it was at least normalized to submit two resumes - one for AI and one for humans. Threading the needle to please both audiences is such a crap-shoot.

8note

5 hours ago

im kinda thinking about adding an llm resume to my resume as like tiny clear text somewhere in the corner.

jwolfe

6 hours ago

> If it's something like "Refactored the apartment list service improving P99 Latency from 2s to 180ms", it definitely boosts the resumé in my mind. A good engineer would be measuring their impact and likely have numbers like that off the top of their head.

> But if it's like "Increased revenue by $18.7M by reducing time-to-first-interaction latency from 2.3s to 117ms, increasing conversion by 47% and LTV by 28%," with the same fidelity on each bullet, I'm very skeptical.

Do you mind explaining why? The former doesn't indicate caring about business impact whatsoever (is this service in the critical path of any online process? Who knows!) while the latter does.

majormajor

4 hours ago

A couple issues I have with this in particular:

> "Increased revenue by $18.7M by reducing time-to-first-interaction latency from 2.3s to 117ms, increasing conversion by 47% and LTV by 28%,"

The first is that they're playing fast and loose with their numbers. Latency has before/after, conversion and LTV have percentages; revenue is just a single number. Did that double revenue? Or is that half a percent, and is it lost in the statistical noise?

The other is that there's nothing there to convince me that the technical work was was the full cause, instead of, say a new marketing promotion that launched at the same time, or another team redesigning the landing page flow, or another team re-doing all the product photography, or any other concurrent work.

Maybe all those questions have good answers, but I would at least want some nod in there to how they validated it. I find people who focus on "business impact" but don't know how to do the math to have confidence in it dangerous, because it's so easy to cherry-pick numbers that will make execs happy at a glance and prioritize for those things instead of actual long-term system or product or customer-facing improvements.

I'm not binning the resume for it, and maybe it helps get past the people who see it before I do, but I'm gonna dig in on it. And I'm usually disappointed by the answers.

dolebirchwood

6 hours ago

Because the latter's "business impact" is clearly made-up bull shit?

mikeyouse

7 hours ago

Which is a very “HN” sentiment when the vast majority of recruiters and hiring managers are absolutely not doing the same. Especially for roles outside of tech.

andsoitis

7 hours ago

Yeah I don’t know what others are doing, but I work in the valley and those elements signal checklist mentality. To wit, those keyword lists often include, in my experience, proficiency in specific tool use, rather than communicating skills that transcend tools, which tells me the person is likely not very dynamic or creative.

Izkata

7 hours ago

> those keyword lists often include, in my experience, proficiency in specific tool use

This used to be called "buzzword bingo" and was pretty much required. It was how you got past the initial automated filtering step before a human even saw your resume.

andsoitis

7 hours ago

I don’t know whether it was ever effective strategy for candidates, but I will simply say that as a hiring manager for over 12 years, I have never been interested in anyone’s resume when I see that.

pqtyw

4 hours ago

The problem is that the candidate doesn't know, its not even good proxy either way just like everything on the resume besides the list of companies the person worked on.

Most applicants have no idea about your internal HR procedures and what's the pipeline before the resume even gets from you so they might as well optimize for what generally seems the most "successful" approach. Maybe they actually think writing metrics and keywords is a good idea, maybe they think its stupid and resent it but can't get any interviews without it, its really impossible to tell without other variables..

schrodinger

7 hours ago

As someone who's been a hiring manager for around 7 years, I agree with you, but note that the people who screen resumés before they even _get to you_ very well may be looking for those references.

For my own resumé, I include the stack used at each job which I feel strikes a fair balance.

mcv

6 hours ago

That's what I always did too. Then I removed it because I wanted to focus more on the kind of problems I solve rather than the languages I've worked in, and recruiters complained, so I put it back in.

Melatonic

6 hours ago

Most HR departments have been filtering resumes (or LinkedIn) based on things like keywords for years before they got to you. So your reaction to resumes that heavily use those may be reactionary to being presented with tons of those (by whoever filtered them before you)

3eb7988a1663

4 hours ago

I once had an application rejected less than five minutes after submission. Hard to imagine that was anything but lacking a certain threshold keyword.

rustystump

6 hours ago

No used to be. It still is standard. Large companies that do not use external recruiters still use keywords and skills matching to find candidates and it drives me nuts.

mcv

6 hours ago

I rewrote my resume in a way that sounds like exactly what you want: focus on skills that transcend tools instead of just the tools, and every recruiter asks me about tools.

ponector

6 hours ago

And then during the screening call they ask questions like this: how many years of experience do you have with Jira?

david-gpu

7 hours ago

Please, do show us your resume so that we can judge the heck out of you as well. It is a fun game, apparently.

andsoitis

7 hours ago

I’m giving advice that I judge to be useful.

RajT88

7 hours ago

Same. I am well aware how the metrics game goes - even inside the company it can be hard to disprove the metrics claimed, and people count on that. Even managers coach you on putting metrics you cannot prove or disprove.

hiAndrewQuinn

7 hours ago

What counts as a keyword here? If you're hiring for a frontend developer and you see e.g. "Redux" do you just can it?

andsoitis

7 hours ago

Knowing or having experience with Redux isn’t going to cause me to pick you over someone else who doesn’t list it for a job where I’m paying you hundreds of thousands of dollars. I look at other skills.

I would not can it in isolation, but if I see a comma-separated list like: “proficient in redux, react, html, JavaScript, sql, kubernetes, word and excel”… then yes, you don’t make the cut.

Or if you list your Microsoft qualifications or your MIT continuing education courses. These are all negative signals.

coder68

6 hours ago

What would be an example of a positive signal?

mcv

6 hours ago

Unfortunately many recruiters do look at that. I'm always a bit disappointed when someone wants me to rate my Java experience, or complains that my CV doesn't mention REST experience.

tkiolp4

7 hours ago

Metrics: I increased retention 2x; I reduced latency from X ms to Y ms; increased slo to 99.999… those are all meaningless. It was in fashion to put such numbers in cvs maybe 5-10 years ago. Not anymore

andsoitis

7 hours ago

They were always lies because they’re imprecise. “I” didn’t do any of those things, you did other things together with other people leveraging company infrastructure to accomplish those things. Tell me about the SKILLS you excel in tha make those things happen.

Melatonic

6 hours ago

Why would you not want to know a general idea of what specific technology someone is familiar with ? Someone could be an "infrastructure engineer" and be more proficient in specific tools vs others - don't you want to match that to the job your hiring for ?

mcv

6 hours ago

In my case it's not a lie: I reduced the time for a complex import process from 1 hour to 3 minutes, a 20 fold improvement. I included it in my CV, but now I wonder if I should take it out.

reillyse

7 hours ago

You must be a pre 5.0 model so.

ai_slop_hater

6 hours ago

Gigachad. Just don’t forget to signal somehow that you aren’t like everyone else, so that legitimate candidates can send their real resume instead of AI generated one.

j45

7 hours ago

Having implemented more than a few applicant tracking systems, too many are so anchored in the past, that they would probably try to boil the ocean at once by letting AI loose on it, leaving an ability for ai resumes to ai applicant tracking systems.

The key insight here is humans are responsible for improved articulation to the ai, who in turn will improve the rest, and that can be as detailed and informative, and educational as the human likes.

spike021

7 hours ago

I was recently job hunting and did something similar. Had it check my bullets and see if they "read well" and it suggested many many tweaks. I tried a few. I'm not sure how much more it helped the applications though.

Esophagus4

8 hours ago

There are services that will do this as well - I’ve used them both on my LinkedIn and resume with decent success.

fuzzy_biscuit

8 hours ago

I've done as you described and then edited it down to sound human again.

luotuoshangdui

4 hours ago

It makes sense. An LLM can definitely help polish your resume.

amelius

8 hours ago

I suppose the HR folks gave you a "+1 knows how to use AI".

grey-area

8 hours ago

It seems more likely the HR people depend on LLMs to do the job of screening and LLMs unsurprisingly prefer LLM output and rank it highly.

It’s not lazy incompetence, it’s quietly getting the job done with 1% of the effort (that was a sarcastic pastiche, in case anyone was unsure).

zdragnar

7 hours ago

It's not uncommon to get hundreds or thousands of applications per opening for web tech, if the position is advertised on LinkedIn or a similar job board.

They'd need to use some automation, even if it is just picking ten at random.

normie3000

7 hours ago

Maybe? I've filtered 300-400 CVs by hand before, and didn't find it particularly time consuming to bin the ones which clearly didn't meet requirements or have any redeeming features. And hiring was not my full-time role.

anthuswilliams

6 hours ago

At 90 seconds per resume, that would take up a full 8 hour day. Having gone through this myself, I don't think it's possible to do this much faster than that, even if you have an ATS that optimizes for that workflow.

I often found myself falling into patterns of poor judgement, e.g. mentally filtering out resumes based on the layout because, to my tired and bored mind, they looked similar to the resumes I had seen from unqualified candidates. I actually think some automation is helpful in evaluating them more rigorously.

cyberax

6 hours ago

The last time I posted on HN in the 1-st of the month hiring post, I got around 2 thousand resumes. Pretty much all of them were this kind of: "Increased the performance of the service by 23.123213%" collection of bullet points.

PS: I replied to most of them, I think, but I'm sorry if I missed somebody :(

ben_w

8 hours ago

Some will, others openly say on the job ad they will fail you for using AI.

izacus

8 hours ago

And then still use a CV scanning service that rejects non-AI resumes.

dawnerd

8 hours ago

I know if I got a resume from someone that had obviously used AI to generate it, it would be a pass.

drillsteps5

7 hours ago

Before the resume ends up in the hiring manager's inbox it needs to be picked by the recruiter from literally hundreds of others. The recruiter uses HR software to determine the match (usually the percentage), and then picks top 5% or top 20 or whatever highest ranked resumes.

Guess what's doing the ranking.

dawnerd

2 hours ago

That’s assuming everyone is doing it that way.

bell-cot

8 hours ago

What if your own HR's LLM didn't send you any other kind?

p_stuart82

6 hours ago

that's the loop though. if GPT does the screening, people learn to write for GPT. once that loop exists, why would the company selling the filter want it gone?

davebren

7 hours ago

Probably gonna get downvoted for this, but when you give an anecdote you don't have to preface it with "anecdata, n=1 sample size".

We know it's from your individual experience because it's a story about your individual experience. We've been doing this for all of human history. This is some kind of strange milieu of trying to always sound scientific, or it's fear of the "well akshually I'm gonna need to see a random placebo controlled trial", which is equally annoying.

Fezzik

7 hours ago

It became necessary because, for years (decades), if you made a comment online that your personal experience informed you in such-and-such a way, the first comment would always be some moronic comment dismissing that personal experience because it is just one person’s experience. So, to avoid that idiocy, people started to preface their anecdotes by acknowledging that they know it is an anecdote. It sets the tone for the conversation.

davebren

6 hours ago

Yeah but we can't let the insufferable dictate our way of speaking. In spoken language I hear it mainly by people that don't have a scientific background trying to sound more scientific.

peyton

7 hours ago

I’ve been told explicitly to do what GP said, so it’s perhaps becoming word-of-mouth career advice at this point. In my case it told a different career story that is maybe more easily digestible.

tayo42

7 hours ago

Llms were good for being objective and helping cut out stuff from mine. Harder to do when you personally think everything you ever did is important.

davebren

7 hours ago

It actually is important and if I was hiring you I'd find it useful to get a more comprehensive understanding of your experience, especially if there's something I'm aware is a very challenging problem to solve. And it would provide more things to cross-examine in interviews to make sure it's not fake. The idea that people hiring are saving time by not reading an extra resume page when deciding on someone that will hopefully work there for years is ridiculous.

For some reason that's the minority opinion because everything has to be dumbed down now.

And how is a resume with the most important or recent work highlighted and at the top worse than a resume with that plus the rest of your experience after it?

nickserv

7 hours ago

Same. I've gotten into disagreements with HR over this when hiring for my teams.

tayo42

6 hours ago

If I could give two resumes one for the recruiter and one for the hm I would.

But as an applicant, I'm dealing with recruiters who think Java and Javascript are basically the same.

benashford

8 hours ago

Intuitively this feels obvious. Content generated by the model will be shaped by its training, therefore when reading it back it will resonate with that same training and have a positive view as a result.

Human when preparing a CV: "Make my CV more professional"

LLM many days later presenting a report to HR: "This CV is really professional"

There's probably more to it than that of course.

But it justifies my personal policy of using a different LLM family for code review tasks than for code generation tasks. To avoid the "marking your own homework" problem.

bendergarcia

8 hours ago

We are without our consent introducing a party in between people. The models become the arbiters of who does and does not get a job. It feels problematic.

justonceokay

7 hours ago

There will be a great arbitrage for people who do not use LLMs.

If your HR department is using ChatGPT to filter resumes, you’ll end up with people who used ChatGPT to generate resumes. I don’t want to make a “slippery slope“ argument, but my gut feeling is that the quality of your organization will deteriorate quickly.

On the other hand, I am a handyman/subcontractor. Almost all of my work comes through phone calls, texts, and one-off emails. I only work with people that are recommended by a trusted sources. I haven’t handled a traditional resume (mine or other people’s) in over eight years.

If I started interacting with somebody and they seemed like they were a computer, that would be the fastest way for me to know I should move on to another client. If they can’t take the time to interact with me, how am I supposed to perform hundreds of hours of physical labor for them?

bendergarcia

8 hours ago

And I feel the common response of: well just use the model that’s available. Ai is and will probably always be resource constrained and profit driven, that means we will eventually see a world where poor people have worse resumes than rich people and there really won’t be any way around it because the man in the middle has the final say

adrianN

7 hours ago

Not too long ago I bet resumes that were printed from a computer were preferred to resumes typed on a typewriter. What happened was that computers became commodities. It is reasonable to assume that LLMs will become commodified too.

YurgenJurgensen

7 hours ago

That would hardly be surprising. Monospaced fonts make natural language a pain to read, so what that would prove is that well-presented resumes are preferred to poorly-presented ones.

This case is different, as the LLM output isn’t measurably better than the human output (unless you have a particular love of bland corpo-speak).

Nuzzerino

7 hours ago

This is a terrible way to soften an obvious alignment failure with AI rollout.

falcor84

7 hours ago

The ship has sailed as soon as hiring managers stopped reading cv's directly and we got recruiters as a profession.

ekianjo

8 hours ago

before it used to be HR, so you always had a party in between "actual" people. HR (mostly) never cared about the CV, they just look at a checklist and see if it matches.

sneak

8 hours ago

We already did that when we all created LinkedIn accounts.

sxg

8 hours ago

Take a look at how things worked before (and still do): employers decide who get jobs based on a combination of personal biases, nepotism, and ulterior motives while applicants present distorted versions of themselves and network/pull strings to put the odds in their favor. That seems more problematic.

1attice

7 hours ago

You would be surprised at the process in other industries. What you are describing is the tech job market specifically.

Other fields have their own problems, including credentialism and ballooning concomitant student loans, but do, by strict convention, not hire based on vibes or pulled strings. Often to their partial detriment, as the cure -- ie, strict oversight of hiring that also forces the hiring manager to ignore important implicit signals -- is alive and well in medicine, law, civil engineering, education, and the trades. Notable exceptions include entertainment, sales, real estate, and software engineering.

By optimizing for vibes, the tech industry gains "Spidey senses" in the hiring loop but pays for it in impartiality.

IMO this precipitated the DEI movement's advent, as it was seen as a way of remediating the drawbacks while preserving the information channel.

Without it, expect either homophily, and, eventually, a harsh and remedial credentialism.

sxg

3 hours ago

I'm a physician and have recently been on both sides of the hiring process for new physicians and residents at a few different institutions. It's absolutely not meritocratic--you'd be shocked at how strong a role connections and pedigree play. The hard requirements are just table stakes, but the selection process from there is completely subjective and susceptible to all kinds of problematic biases. Generally people don't want to rock the boat and discuss this stuff openly, but it's absolutely a problem that needs to be pointed out.

rogermarley

8 hours ago

I think resumes will eventually (or have already) become obsolete in tech. The SNR is so low, they offer very thin filtering value.

Even taking the tiny bits of the resume that are "hard signal", like GPA, certifications, prior roles, etc, it doesn't translate into their performance in the initial screening interview.

This is why what I think the industry sorely needs is examination consortia.

Rather than trying to guess capability from the name of the university they went to, leading tech companies creating standardized tests in various fields, and your test scores form your "resume", so that developers can just focus on improving their scores rather than wasting time on resume/application/repetitive-screening toil.

indiv0

8 hours ago

Eventually even a system like that can be gamed, similarly to how Leetcode-maxxing and the like sprung up in response to typical SV interview questions. Studying for the job becomes studying for the test becomes studying for the pre-test test.

qwytw

4 hours ago

Maybe just a lottery instead? Would be approximately as useful just way simpler.

Also don't all of the "enterprise" certificates already provide all that, anyway?

aDyslecticCrow

7 hours ago

> standardized tests in various fields

This is itself a massively difficult problem. Standardised tests are bad indicator of topic understanding. (setting aside the massive incentive for blatant cheating)

You're effectively advocating for leetcode being effective hiring tool, which many would highly criticize.

cyberax

6 hours ago

It's hard to design tests for CS. Leetcode is too simplistic, it just tests the basic algorithmic knowledge that is nearly useless for regular software development.

AlexB138

8 hours ago

This may lead to some interesting gamesmanship. For instance, if I am applying to a company, and I know they use a certain applicant tracking system, and I know that ATS uses a certain model provider for its filter, I should then use that model to write the version of my resume I send to the company.

mft_

7 hours ago

Good observation. There are so many versions of the future that just become an LLM arms race.

ivansmf

7 hours ago

I suspect the entire industry uses "auto-raters", where an agent instance is used to scores the agent's output. The idea is similar in intent as using adversarial networks to train image generation, minus the human labelers. Raising the scores of the auto-rater then becomes the metric teams optimize, and it is no wonder the end result is that the agent scores its own generated content the highest.

danielodievich

5 hours ago

So just to test, loaded qwen/qwen3-v1-30b locally, and fed my 100% human-written resume and asked it "Make this resume more professional".

Mucho bullets came out.

My sentence "I specialized in enterprise data modeling and worked on Cost of Goods Sold optimizations across entire customer base." became a bullet sentence "Specialized in enterprise data modeling and performance optimization, driving $5M+ in recurring cost savings across the customer base.".

The $5M+ sure sounds awesome, and clearly the corpus of resumes lean towards metrics, but its not true and I didn't ask the model to make up numbers.

Oh and it awarded me a "Bachelor of Science in Computer Science from University of California, Berkeley | 1996 – 1998" out of thin air. My resume has a SDE job between 1996 -1998. Oh man.

voncheese

4 hours ago

Oh man is right! The making stuff up is going to make this problem even bigger.

There will be people that correct those hallucinations, in that scenario it’s “only” the applicants time that is wasted.

There will be other people that don’t correct those hallucinations, in that scenario the best case outcome is wasted time for the applicants and interviewers (who find the mistake later). The worst case scenario is people are hired who aren’t capable of doing the job and that’s all kinds of messy and inefficient for all.

mcv

6 hours ago

Timely topic for me. My CV had grown to 7 pages, and I kept reading everywhere that it should be no more than 2, so I asked Gemini to rewrite it. Took a lot of time, because Gemini loves to exaggerate everything, but I'm quite happy with the result.

The first couple of recruiters I sent it to preferred my old 7 page CV. I guess they're not using enough AI yet.

onlyrealcuzzo

6 hours ago

Further, LLMs consistently think LLM written content is "good".

Ask an LLM to write some design doc for you, wait until you get one that's very bad, send it to other LLMs and get their feedback, they will typically have good things to say.

Compare that to a very well written document you have. They will typically have a lot more bad things to say, even if the premise is solid.

Someone should study this.

LLMs clearly have a lot of value. But IMO this is very interesting and points out a weakness that's not entirely clear what the full ramifications of it are.

I suspect LLMs also have a major bias to code they write.

Take something universally considered to be well written like Redis, feed it to an LLM for feedback. They'll probably find much to pick apart (and a lot of it may be flat out wrong).

Feed the same LLM some clearly garbage LLM repository. Do they have a similar response as they do with design? Do they treat language different than code, and they're just susceptible to the way they write regular language that's different from logical code? Or do they have the same problem?

Has anyone done this?

drillsteps5

8 hours ago

That's what people on both side have been doing for at least couple years already.

Recruiters scan resumes for the best match with LLMs, candidates use the same LLMs (there's only like 3 of them) to tweak their resume for better match. I don't know what research you need to see why that makes sense.

yagi0x00

7 hours ago

This indicates that resumes created by the same model may have an advantage over those created by other model, so I suppose technically you may have a small advantage if an insider tells you the resume parsing tool is powered by Gemini as opposed to the other models.

My broader discomfort is that we are still learning about model biases while human biases are arguably better understood, and I don't like the ethics of rejecting a person based on criteria I don't fully understand.

drillsteps5

6 hours ago

I wasn't saying that this is the optimal solution (it clearly is not). I was saying that it makes perfect sense for both sides - HR has their work automated and candidates have better chance to be noticed - and therefore became a common practice in many places.

The well has been already poisoned, to survive you have to get in on the action.

Don't want to play this game? Make connections, set up the network, and use it to get/stay employed.

aDyslecticCrow

7 hours ago

It further makes expecting or spending the effort hand writing a proper introduction useless. Which then undermine the entire purpose of it.

visarga

8 hours ago

When classifying resumes it is better to use the LLM as a feature extractor, think of 10-20 features you base your decision on, and extract them by LLM. The LLM only needs to do lower level task of question answering. Then you fit a classical ML model (xgboost for example) on the extracted features, based on company triage data points. This way you don't rely on the biases in the model, you can decide what criteria to use and how to judge cases without retraining the LLM. The feature extractor is generic, and the actual triage model is a toy you can retrain in seconds on new data points. It is also much more explainable, you can see how features influence decisions.

aDyslecticCrow

7 hours ago

I'd rather my employers just does the classic of shredding random 80% and looking at the remainder properly.

cyberax

6 hours ago

Ah, the good old "we don't need unlucky losers here" strategem.

aykutseker

7 hours ago

The uncomfortable part is that this is probably rational behavior for both sides.

Employers use models to filter resumes, candidates optimize resumes for those models, and suddenly the resume is no longer written for a human at all.

logicalfails

8 hours ago

I suspect this is more a function of the corporate sanitization of language within the models. When I have passed my resume through the models for refinement, it often sanitizes some of the more easy going or simpler wording. It expands the vocabulary, makes it more dense, and uses more corpo speak in the bullets and formatting.

Each model likely has its own biases in terms of what constitutes correct corporate speak, and it chooses the resumes that best fit this. Ultimately, I suspect it's more a function of model saying "this grammer, syntax structure, and formatting is most aligned with what is correct corporate language, so flag as high quality".

ilia-a

8 hours ago

Seems kinda obvious, given that most large recruiting firms/hr use algos to analyze resumes and AI written version likely do a better job at hitting keywords/structure algos/llms pick up on...

embedding-shape

8 hours ago

You'll find the same is true if you have two different LLMs first independently come up with a plan for an implementation, then ask each one of them to say which one of the two designs/plans are the best. They're much more likely to favor the plans generated from the same model, rather than from other models. I'm sure, internally, this somehow makes sense, but it's worth thinking about if you're doing the whole "ask N models for voting/rating N plans to find the best" charade.

SeriousM

8 hours ago

That's why I let the LM write it's own AGENT.md or SAFESPOT.md because it "knows" best how to write it so it can resume next time without issues.

Is hits the same spot as that I would take other notes than anyone else and no one could follow them as easily than I do. Everyone leaves the "of course" parts out of the notes if it's for the own use.

mpurbo

8 hours ago

At this point, all these are becoming almost like comedy.

sb057

8 hours ago

Well yeah, LLMs generate resumes (and other text) that they judge as superior to alternative plausible texts. Why would that judgement change just because a different instance hasn't seen it before? To anthropomorphize it, it's like having a hiring manager write a resume, get amnesia, and then have to judge it among other resumes.

Ekaros

8 hours ago

Seems like obvious thing. If LLM have some weights involved on what is good resume to write there is very likely correlation to what would be good resume to rate. And this is probably a even good thing, at least from model quality perspective. Model itself should rate highly whatever it produces. There should be correlation between output and review of same output.

bendergarcia

8 hours ago

I wouldn’t put it past these tech companies to prefer ai outputs to encourage ai inputs

jamiecurle

8 hours ago

disclaimer: Not a lawyer, but studying towards CIPP/E.

You'd make no friends doing it, but as I understand it, for those that have GDPR as a statutory right then under "[Article 22 - Automated individual decision-making, including profiling][0]" you can request to know if your CV was screened by AI and what (and this is key) "meaningful human interaction" led to that decision. Technically this falls under a data subject access request and so a response is mandatory (but who really is going to enforce that - ICO / <insert your data protection agency here> probably isn't). Companies can't just smash a button and claim meaningful interaction, it has to be, well, meaningful and smashing a "nope" button obviously isn't meaninful.

If it turns out that it was only AI that screened it you can request a human review. Do not hold your breath.

Again, you'd make no friends doing it, but sooner or later a test case will emerge to generate some case law around "AI said no" because employment, or lack of because AI says no, does have significant impact on a human.

[0]: https://gdpr.algolia.com/gdpr-article-22

noprocrasted

8 hours ago

The issue is that indeed, nobody is going to enforce that.

ryeguy_24

8 hours ago

Does anyone know of any HR departments actually using LLMs for scoring, selection, extraction, classification or any real use cases? I'm curious to hear about it and how they are using it.

oogetyboogety

7 hours ago

We were told by hr NY has strict state laws against this

jimnotgym

8 hours ago

I just guessed that and got Copilot to rewrite my profile on the internal HR system. I also got a job spec benchmarked higher by getting Copilot to write it with that exact aim given in the prompt

fecalmatter

8 hours ago

i straight up lied about my work experience

we are exactly the same

einpoklum

8 hours ago

> As artificial intelligence (AI) tools become widely adopted, large language models (LLMs) are increasingly involved ... [in] ... decision-making processes

That's the problem right there.

bendergarcia

8 hours ago

Absolutely! I don’t think people are really considering the full effects of just letting ai be the middle man. I mean Sam Altman basically said this is what he wants Gwen he said intelligence is a commodity no?

analog8374

7 hours ago

This means that LLM human resource departments will only hire LLMs. Which is kind of beautiful.

booleandilemma

7 hours ago

HR departments aren't using LLMs to select candidates for jobs are they?

bjourne

8 hours ago

The only test that has worked 100% of the time for me is to read the candidate's code. Two hours is enough to precisely estimate the candidate's qualities as a software developer. I never understood why companies waste time with tests and quizzes because since it is so easy for me it should be just as easy for other software developers too. Of course, a candidate may be a jerk or unfit for other reasons, but ranking them on a software developer hot-or-not scale is not very difficult.

noprocrasted

8 hours ago

Just like they'll send you an LLM'd resume, they will send you LLM'd code.

bjourne

7 hours ago

Conceptually no different from copy-pasting someone else's code.

parentheses

8 hours ago

Reading only the abstract: LLMs prefer output of their own generation over humans or even other models.

This is a very good reason to avoid using model-generated data to train future models. We'd be deepening this bias by continuing to do that, essentially forcing society to reshape their output using LLMs to increase engagement. This feels like a form of enshittification that doesn't just touch one product but all of society.

jonahs197

8 hours ago

Will people snap over this?

oytis

4 hours ago

That makes sense to me. "Write me a good CV" and "find me CVs in this pile" are kind of two queries for the same underlying data.

bdangubic

7 hours ago

My new CV contains 37 emdashes

Der_Einzige

8 hours ago

This is extremely obvious to anyone whose read other papers. There's tons of papers showing LLMs prefer their own outputs. It's a big enough problem that LLM-as-judge has to be a different LLM from the LLM you are testing in papers.

jqpabc123

8 hours ago

Repeat after me --- it makes no sense to try and prompt a language prediction engine to display good judgment.

interstice

7 hours ago

"I'm not just good, I'm amazing"

randomdrake

8 hours ago

I wonder if this extends to training models on new content as well. Are we creating a cyclical information-consumption and training situation in which models being trained are more likely to pick up on and reference content created by themselves or by other LLMs than by other humans?

johndhi

8 hours ago

Another way to phrase this might be that LLMs make better resumes no?

budoso

8 hours ago

If that were the case they would select the ones generated by other models at a similar rate to the ones they generated themselves.

delecti

8 hours ago

You'd have to define "better".

All this shows is that LLMs generate resumes that fit the heuristics LLMs use to judge resumes. And that makes sense, but isn't necessarily a given.

mrktf

8 hours ago

Or in other words: LLM it is optimizing function which is generated by same LLM, think you have random variable y, where generator sin(x+r) and your optimizer trying to fit function sin(x+unkown1) + unknown2 ("unknown" function) - it is obvious that will find best fit.

rectang

8 hours ago

By one metric, yes!

If you are a candidate who wants to be hired, and your target employers use LLMs to filter resumes, then an LLM-generated resume that the employer LLM-powered resume filters favor is "better" — as in "more likely to get you the job".

jezzamon

8 hours ago

In text generation, LLM language is full of very emphatic phrases. At a surface level it might sound stronger. But as a human reader, it's not necessarily better

mathgeek

8 hours ago

*for getting past ATS reviews.

Emanation

8 hours ago

Where I work, my boss decided to make an application that uses AI to score long text field entries to ensure required information is present.

The AI lacks the ability to extract nuance and implicit information, which means entires end up being long winded and repeatitive. For each requirement its looking for, it must be explicity expressed-- it's quite unnatural, and almost feels like solving a puzzle, to which the obvious solution is to write a comment, then give it and the AI feedback to a failing comment to AI, so it can generate the proper structure the rubric-AI is looking for.

LLMs are statistically driven, and I can only imagine having the AI rewrite the comment produces a result that's more statistically fitting to the model than if any given human were to write it. So, it might mean, yeah, LLMs are better at writing resumes that the LLM can successfully classify-- are they better for a human to consume? Who knows.

nottorp

8 hours ago

Easy then. Apply N times, each time with a resume generated by a different LLM.

No human is going to notice anyway. Or add a N+1 resume written by yourself in which you describe your strategy, just in case.

zipy124

8 hours ago

Do you really believe no human is going to read your resume at some point in the process and notice the classic AI tells?

Further de-duplication is rather easy, and will likely see you black-listed by competant organisations.

stingraycharles

8 hours ago

“Do you really believe no human is going to read your resume at some point in the process and notice the classic AI tells?”

Even here on HN many people don’t recognize AI tells that are obvious. Pretty much 100% of all articles posted on HN have been AI generated for months and months already and people don’t seem to care.

I have very little faith in humanity being able to deal with the chaos that LLMs are going to unleash on society.

Heck, most resumes are probably skimmed at best already.

cl0ckt0wer

8 hours ago

The only resumes that make it past the ai to a human are ai generated

Esophagus4

8 hours ago

When I’m hiring, a human recruiter (or the hiring manager) reads most resumes.

For us, there is some sorting by basic keyword analysis and we start near the top, but there is no proverbial black box that rejects candidates outright.

If candidates are ignored by humans, it’s not because AI rejected them, it’s because we are starting with candidates earlier in the list and might not make it to applicant 537.

zipy124

8 hours ago

Rather unlikely to be the case, supported by the original article itself here, since if your statement was to be the case they would find that the human generated resume is 100% less likely to be shortlisted.

stingraycharles

8 hours ago

Obviously it’s not 100% of all human resumes are going to be filtered out, but it’s quite damning that human resumes are more likely to be filtered out just because they didn’t LLM-ify it.

nottorp

7 hours ago

In organizations where LLMs sort the resumes yes, I believe no human will read my resume until it's too late.

almostdeadguy

8 hours ago

Happy for everyone trying to invent SEO hacking for resumes.

stingraycharles

8 hours ago

You don’t understand the problem.

Companies are using AI / LLMs to pre-filter resumes. These AIs prefer their own slop resumes. Not just human vs LLMs, but Claude prefers Claude resumes over ChatGPT. Nothing good can come out of that, when resumes are pre-filtered like that.

Unless, of course, you’re not being serious and just trying to be edgy on HN.

DiscourseFan

8 hours ago

Why would I want to work for a company where all the employees made slop to get hired by slop to do slop? It’s slop all the way down!

stingraycharles

8 hours ago

Because this is where the industry as a whole is moving towards, and you don’t want to be out of a job I presume.

skeledrew

7 hours ago

Pretty straight forward IMO. The model is looking for particular qualities in a given resume, and strives to ensure the qualities it looks for is present in resumes it creates. Humans do the exact same thing (unless forced by something like DEI, etc to do otherwise), so I see nothing noteworthy here.

idopmstuff

8 hours ago

Even if we take this to be true, I'm not sure that it really matters?

It's comparing two resumes with the same information and picking one of the two. That's obviously a situation that would never occur in actual hiring. This doesn't demonstrate anything at all that indicates that LLMs would incorrectly preference LLM-written resumes in the real world.

It'd be interesting to do the same thing but with two resumes that are almost identical. One is slightly better (an extra year of experience or a specific note of some skill that is relevant to the role), and the other slightly worse one is written by an LLM. If the reviewing LLM picks the worse one in that case, you're potentially establishing a bias that would matter. As it stands this experiment just seems contrived and pointless.

neom

7 hours ago

I did a very hack job version of this as your question got me curious, but again, not at all a rigorous test. I took my resume and had an LLM re-write the exec summary, then changed the names of the business to comparable ones and gave one a couple more years experience, then prompted: "2 Candidates that are very similar on paper, If you had to pick just based on these 2 resumes for a GM & Marketing of a series A robotics startup, who would you pick? Just reply with the name."

It picked the one with the LLM exec summary and less experience, then I said: "Interesting, less experience, why did you pick him?"

It said:

"Honestly? Looking back at the two, I shouldn't have answered that confidently. They're nearly identical on paper — same career arc, same era, same kind of advisory portfolio, both went from community roles → hosting/cloud operator → founder → VC partner. The scale claims are even comparable (Vultr "hundreds of millions ARR" vs. DigitalOcean "$500M+ ARR," both 10 → hundreds of employees).

If I'm being rigorous about why John edged it for me: The DigitalOcean tenure is more legible and verifiable — "2nd largest cloud provider," reporting to the CEO, specific scope (devrel, people ops, product ops). Joshua's Vultr description is a bit vaguer on role and outcome. For a GM hire at a Series A robotics company, where you're betting on someone to actually run the operating function, I lean toward the resume that shows clearer operational ownership rather than a marketing/CMO title.

John also has Tiny Mile (sidewalk delivery robots) and Starship-adjacent exposure — wait, actually Joshua has Starship Technologies as an advisor, which is the more relevant robotics signal. That cuts the other way.

Truthfully, this is a coin flip and I called it too quickly. You'd want to dig into references, what they actually did day-to-day at Vultr/DO, and how they think about hardware/capex businesses before picking."

Opus 4.7 and again, pretty hack job test but I was curious.

idopmstuff

5 hours ago

Interesting, thanks for testing.

I feel like a more detailed prompt and/or some scaffolding to have it extract experience, put it in a structured format, give numerical ratings against specific criteria then use all of that would be able to consistently get the right result, but I am too lazy to actually test.