BM25 search and Claude = efficient precision

2 pointsposted 12 hours ago
by marwamc

4 Comments

marwamc

12 hours ago

When using AI coding assistants to refactor symbols across large codebases (6k+ files), developers face a binary choice: precision (LSP-based tools) or efficiency (grep/ripgrep). Shebe attempts to address this trade-off by way of a good old BM25 index, which is surprisingly fast and efficient.

icsa

12 hours ago

How well does this approach work with C++ source code - which is notoriously difficult to parse, given context-dependent semantics?

marwamc

11 hours ago

shebe asks the simple question: "where does this symbol appear as text?". For C++ codebases that heavily use templates and macros, shebe will struggle. But I'm curious how it would actually perform, so I'm currently performing a search on https://gitlab.com/libeigen/eigen. Will report the results shortly.