hackernews client

mahmood726

14 days ago

I’m an academic working on reliability for high-stakes LLM use (coding + scientific/medical workflows). This repo proposes a “fail-closed” certification gate: an output only ships if it passes published checks; otherwise it rejects. The benchmark emphasis is on false-ship rate (shipped-but-wrong), not just accuracy. Looking for critique and real failure cases: where do LLMs most often produce plausible outputs that are silently wrong (C#/.NET, SQL, Python notebooks, data extraction, etc.)? What validation checks would you consider non-negotiable?

mahmood726

14 days ago

Just open-sourced Burhan (TruthCert): a fail-closed “ship gate” for LLM outputs. Goal: cut false-ships (shipped-but-wrong) in coding + research workflows using multi-witness verification + validators + provenance. Repo: https://github.com/mahmood726-cyber/Burhan

Burhan(TruthCert)fail-closed verification LLM outputs(measure false-ship rate)

2 Comments

mahmood726

mahmood726