PyPI in 2025: A Year in Review

66 pointsposted 13 hours ago
by miketheman

19 Comments

heavyset_go

9 hours ago

One of the big companies making billions on Python software should step up and fund the infrastructure needed to enable PyPI package search via the CLI, like you could with `pip search` in the past.

talideon

5 hours ago

I upvoted you because I broadly agree with you, but search is never coming back in the API. They previously outlined the cost involved and there's no way, given how minimal the value it gives more broadly, it's coming back ant time soon. It's basically an abusive vector because of the compute cost.

woodruffw

8 hours ago

Serious question: how important is `pip search` to your workflows? I don’t think I ever used it, back when PyPI still had an XMLRPC search endpoint.

(I think the biggest blocker on CLI search isn’t infrastructure, but that there’s no clear agreement on the value of CLI search without a clear scope of what that search would do. Just listing matches over the package names would be less useful than structured metadata search for example, but the latter makes a lot of assumptions about the availability of structured metadata!)

firesteelrain

8 hours ago

Funding could help, but it still requires PyPI/Warehouse to ship and operate a new public search interface that is safe at internet scale.

coldtea

8 hours ago

They operate a public package hosting interface, how is a search one any harder?

miketheman

7 hours ago

PyPI responses are cached at 99% or higher, with less infrastructure to run.

Search is an unbounded context and does not lend itself to caching very well, as every search can contain anything

bastawhiz

7 hours ago

Pypi has fewer than one million projects. The searchable content for each package is what? 300 bytes? That's a 200mb index. You don't even need fancy full text search, you could literally split the query by word and do a grep over a text file. No need for elasticsearch or anything fancy.

And anyway, hit rates are going to be pretty good. You're not taking arbitrary queries, the domain is pretty narrow. Half the queries are going to be for requests, pytorch, numpy, httpx, and the other usual suspects.

woodruffw

4 hours ago

The searchable context for a distribution on PyPI is unbounded in the general case, assuming the goal is to allow search over READMEs, distribution metadata, etc.

(Which isn’t to say I disagree with you about scale not being the main issue, just to offer some nuance. Another piece of nuance is the fact that distributions are the source of metadata but users think in terms of projects/releases.)

froh

5 hours ago

I wonder how a PyPi search index could be statically served and locally evaluated on `pip search`?

firesteelrain

4 hours ago

PyPI servers would have to be constantly rebuilding a central index and making it available for download. Seems inefficient

bastawhiz

7 hours ago

Pypi has a search interface on their public website, though?

rat9988

6 hours ago

They probably don't need it. You can start a crowdfunding campaign if you do.

zahlman

3 hours ago

> 1.92 exabytes of total data transferred

That's something like triple the amount from 2023, yes?

nodesocket

an hour ago

Is the compute and network required to service pypi all from donations or do they have any business arm that generates income?

nmstoker

7 hours ago

Great work!

Side issue: anyone else seeing that none of the links in the article work? They're all 404s.

miketheman

7 hours ago

Whoops, sorry about that. Should be fixed now. Happy New Year!

fud101

6 hours ago

This seems to suggest once the bubble pops, it will take Python down with it. The next AI winter will definitely replace Lisp with Python.

talideon

5 hours ago

Appropriate username!