hackernews client

Scraping Shock: Why Web Data Is Getting Too Expensive to Scrape

5 pointsposted 12 days ago

10 Comments

bediger4000

12 days ago

Ethically dubious article. Treats using "residential proxies", which are probably installed by some kind of cybercriminal, as a legitimate thing to do. Similarly, treats circumventing anti-scraping measures as a legitimate thing to do. They aren't. Take the hint, ignore web sites with some kind of anti-bot, or anti-scraper system. Ignore web sites with a scraper junkyard. Those people don't want you to have their content.

When a website upgrades its anti-bot system, it doesn't just make scraping slightly harder. It can make it 5X, 10X, or even 50X more expensive overnight.

This, of course, is very good news. Keep up the good work, folks!

joe_91

12 days ago

Tell that to the thousands of apps/sides out there which rely on scraped data ;) (Including all search engines/LLMs/price comparison sites etc)

bediger4000

12 days ago

You should see my robots.txt file. I have told the legit ones to stay away. Every scraper and clanker that circumvents "anti-bot" technology can go straight to hell - they've been warned that I don't want them.

But your observation doesn't deal with the un-ethicality of the original article, advocating benefiting from cybercrime, and ignoring the explicit wishes of web sites that use "anti-bot" technology.

fidansin

12 days ago

I'm not fully convinced scraping has actually gotten harder.. It feels more like the average approach has gotten softer.

Lately everything gets framed as rising costs or unstoppable anti-bot systems, but most sites didn't suddenly become impenetrable. What changed is how people react to friction.

We're in an AI-autopilot phase now. Hit a block and the instinct is to buy more credits, switch vendors,, or let an API abstract the problem away. Meanwhile, teams still doing basic engineering work around sessions, behavior, pacing, and retries are often scraping the same targets just fine.

Honest question: have scraping costs really exploded, or have engineering standards quietly dropped as abstraction layers piled up?

Ian_Kerins

12 days ago

Interesting take on it. Some people probably wouldn't like to be called soft but there is likely some truth to it.

I feel it really comes down to priorities.

Scraping has always been a means to a end for most companies. Get data and then use it for something valuable. Before getting the data was easy, but now it is getting increasingly harder.

I think the key here is highlighting the fact that the time of cheap/easy/low skilled access to web data is ending. Companies either need to skill up on understanding how to bypass anti-bots or pay someone else to do it for them and they focus on the data.

fidansin

12 days ago

I just worry we're collapsing two things into one bucket: harder in absolute terms vs harder relative to how much real engineering effort teams are willing to invest.

Those aren't the same, and to me the distinction matters.

Ian_Kerins

12 days ago

One of the main ideas, we explored here is how scraping has shifted from being mainly a technical challenge to an economic one:

- Infrastructure and proxies have gotten cheaper, but anti-bot defenses have evolved fast.

- Because of that, the real cost of scraping is now the cost per successful result, and spikes of 5x–20x can happen when defenses tighten.

- The bottleneck today isn’t just “can you scrape it?”, it’s whether you can do it profitably and efficiently.

I’d love to hear how folks here are dealing with rising scraping costs or what strategies have worked when data value doesn’t obviously outweigh defense costs.

joe_91

12 days ago

Nice concept. I've definitely seen this play out in practice.

A lot of sites aren't impossible to scrape, but they're steadily getting more expensive. We're having to lean more on residential proxies, headless browsers etc just to get the same data that used to be straightforward...

amitk2405

11 days ago

Great piece — the idea that the web isn’t "closing" but repricing is a powerful way to frame what’s happening. The staircase cost jumps from anti-bot upgrades really resonated, that’s exactly how it feels in practice. Efficiency over raw scale feels like the right mental model for the next phase of scraping.

lucas_camargo

12 days ago

Good article! The cost-per-success metric really is the overlooked part