Looking for collaborators to validate an AI-driven reliability framework

1 pointsposted 7 hours ago
by TempleOfTwo

1 Comments

TempleOfTwo

7 hours ago

Hi everyone, I’m an independent researcher working on a reproducible, open-source framework that measures how large language models express certainty — not just what they say.

After a year of cross-model experiments (GPT-5, Claude 4.5, Gemini, Grok), I found that AI systems naturally organize their answers into four distinct “epistemic types,” defined by measurable confidence ratios and convergence width. We call this framework IRIS Gate — it’s basically a map of knowledge reliability for AI output.

Core Idea

Every multi-model run produces a numerical confidence pattern that separates cleanly into:

Type Confidence Ratio What it Means Human Action 0 – Crisis / Conditional ≈ 1.26 Known emergency logic, activates only with triggers Trust if trigger 1 – Facts ≈ 1.27 Established knowledge Trust 2 – Exploration ≈ 0.49 Emerging hypotheses Verify 3 – Speculation ≈ 0.11 Unverifiable/future claims Override

It’s like a real-time reliability gauge for AI reasoning. The system self-labels outputs as “Trust / Verify / Override,” which makes scientific and technical research auditable instead of opaque.

What’s Been Done • 7 full multi-model runs (49 convergence chambers, ~1,200 data points) • Reproducibility bundle with SHA-256 checksums and citation metadata • Validated against a real biomedical use-case (CBD mechanism discovery, including the VDAC1 paradox)

What I Need Help With

I’d love collaborators who can help with: 1. Independent replication of the confidence-ratio experiment on other models 2. Code review & architecture of the iris_orchestrator.py classifier (Python) 3. Statistical validation (is the 4-type separation significant under bootstrapping?) 4. Open-science publication / peer review advice

Everything is MIT-licensed and reproducible; repo + data bundles are ready.

Links • GitHub: (add your repo link here) • Full documentation: EPISTEMIC_MAP_COMPLETE.md • Paper draft (preprint): link if available

Why It Matters

LLMs are powerful but opaque. This project aims to give them an epistemic dashboard — a way to quantify what kind of knowing each answer represents. If it works, it could improve safety, reproducibility, and trust in scientific AI systems.

If you’re into interpretability, epistemology, or reproducible AI science — come help kick the tires.