Show HN: How This Graybeard Built the Fastest and Freest Postgres BM25 Search

32 pointsposted 4 hours ago
by tjgreen

3 Comments

simonw

21 minutes ago

This is really cool. I've built things on PostgreSQL ts_vector() FTS in the past which works well but doesn't have whole-index ranking algorithms so can't do BM25.

It's a bit surprising to me that this doesn't appear to have a mechanism to say "filter for just documents matching terms X and Y, then sort by BM25 relevance" - it looks like this extension currently handles just the BM25 ranking but not the FTS filtering. Are you planning to address that in the future?

I found this example in the README quite confusing:

  SELECT * FROM documents
  WHERE content <@> to_bm25query('search terms', 'docs_idx') < -5.0
  ORDER BY content <@> 'search terms'
  LIMIT 10;
That -5.0 is a magic number which, based on my understanding of BM25, is difficult to predict in advance since the threshold you would want to pick varies for different datasets.

gplprotects

20 minutes ago

> ParadeDB, is guarded behind AGPL

What a wonderful ad for ParadeDB, and clear signal that "TigerData" is a pernicious entity.