bflesch
3 hours ago
Somehow "clerk" is on my ublock origin blocklist and therefore the whole website is not loading. I didn't add "clerk" to the blocklist so it must've been added by one of the blocklists that ublock origin is subscribed to, so there must be a good reason why "clerk" is on that blocklist.
When building a product for medical audience which might care a lot about privacy maybe don't use components which are shady enough that they end up on blocklists.
Edit:
> Why no Vector DB? In medicine, "freshness" is critical. If a new trial drops today, a pre-indexed vector store might miss it. My real-time approach ensures the answer includes papers published today.
This is total rubbish - did you talk to a single medical practitioner when building this? Nobody will do new treatments on their patients if a new paper was "published" (whatever that means, just being added to some search index). These people require trusted source, experimental treatment is only done for private clients who have tried all other options.
amber_raza
3 hours ago
Thanks for the feedback—this is helpful.
1. Re: Clerk/uBlock: You were spot on. The default Clerk domain often gets flagged by strict blocklists. I just updated the DNS records to serve auth from a first-party subdomain (clerk.getevidex.com) to resolve this. It should be working now.
2. Re: Freshness & 'Rubbish': You are absolutely right that standard of care doesn't (and shouldn't) change overnight based on one new paper.
However, the decision to ditch the Vector DB for Live Search wasn't about pushing 'experimental treatments'—it was about Safety and Engineering constraints:
Retractions & Safety Alerts: A stale vector index is a safety risk. If a major paper is retracted or a drug gets a black-box warning today, a live API call to PubMed/EuropePMC reflects that immediately. A vector store is only as good as its last re-index.
The 'Long Tail': Vectorizing the entire PubMed corpus (35M+ citations) is expensive and hard to keep in sync. By using the search APIs directly, we get the full breadth of the database (including older, obscure case reports for rare diseases) without maintaining a massive, potentially stale index.
The goal isn't to be 'bleeding edge'—it's to be 'currently accurate'.
breadislove
2 hours ago
a good system (like openevidence) indexes every paper released and semantic search can incredible helpful since the the search api of all those providers are extremely limited in terms of quality.
now you get why those system are not cheap. keeping indexes fresh, maintaining high quality at large scale and being extremely precise is challenging. by having distributed indexes you are at the mercy of the api providers and i can tell you from previous experience that it won't be 'currently accurate'.
for transparency: i am building a search api, so i am biased. but i also build medical retrieval systems for some time.
amber_raza
2 hours ago
Appreciate the transparency and the insight from a fellow builder.
You are spot on that maintaining a fresh, high-quality index at scale is the 'hard problem' (and why tools like OpenEvidence are expensive).
However, I found that for clinical queries, Vector/Semantic Search often suffers from 'Semantic Drift'—fuzzily matching concepts that sound similar but are medically distinct.
My architectural bet is on Hybrid RAG:
Trust the MeSH: I rely on PubMed's strict Boolean/MeSH search for the retrieval because for specific drug names or gene variants, exact keyword matching beats vector cosine similarity.
LLM as the Reranker: Since API search relevance can indeed be noisy, I fetch a wider net (top ~30-50 abstracts) and use the LLM's context window to 'rerank' and filter them before synthesis.
It's definitely a trade-off (latency vs. index freshness), but for a bootstrapped tool, leveraging the NLM's billions of dollars in indexing infrastructure feels like the right lever to pull vs. trying to out-index them.