Automated PR risk scoring with LLMs

1 pointsposted 10 hours ago
by KinanNasri

1 Comments

KinanNasri

10 hours ago

I built PRScope after noticing that large pull requests often get merged without anyone fully understanding their real risk surface.

The idea is simple:

Take the raw unified diff from a GitHub PR

Parse and structure it

Feed it into an LLM with a deterministic prompt

Generate a structured Markdown review that includes:

Severity levels

Risk assessment

Suggested improvements

Positive observations

The hard parts weren’t “using AI” — they were:

• Handling large diffs without blowing token limits • Keeping output consistent across runs • Avoiding hallucinated issues • Making scoring feel rational instead of arbitrary • Supporting both hosted APIs and local inference (Ollama)

One thing that improved reliability significantly was separating the prompt into:

Analysis phase

Structured output phase

It currently works as a CLI and GitHub Action.

I’m especially curious about:

Deterministic scoring approaches

Handling monorepo-scale PRs

Preventing false positives in AI review

CI performance tradeoffs

Repo: https://github.com/KinanNasri/PRScope

Happy to answer questions.