Ask HN: How do you search the web programmatically these days?

2 pointsposted 12 hours ago
by coreyp_1

Item id: 47809373

3 Comments

pwg

12 hours ago

> and found that most of them block the use of curl

Try again, but have curl provide a user agent string from one of the real browsers. You'll likely find that the request goes through.

raw_anon_1111

2 hours ago

Can’t speak for search engines specifically. But I recently had to do a project which required me to crawl the customer’s large site and index it into a vector search for RAG for a call center.

My first attempt was to use crawl it just by doing GET requests (ie same thing as using curl). That got me nowhere. I had to use headless Chrome and Playwright.

Do any modern websites work with just curl even if they don’t block it - ie without being able to run JS?