hackernews client

sharp-dev

5 hours ago

TL;DR: AI code reviewer went from "confidently wrong" to actually useful. Fix: stopped pre-selecting context, gave the model tools to fetch evidence itself. Now it either cites file:line or stays quiet.

The Problem

Our AI reviewer flagged a "blocker." Cited the diff, built a plausible argument, suggested a fix. The senior engineer spent 20 minutes disproving it. Did the guard clause get missed? Two files away. The model never had that file, so it guessed and sounded certain. Pre-selecting context doesn't work. Code review follows evidence chains, and chains aren't predictable.

The Fix

Agentic loop:

Model: "This calls validate()" → search_code("validate") Model: "Two call sites use withRetry(). Third doesn't." → get_file_content("config/defaults.go") Model: "Missing timeout. Bug found." → submit_code_review(structured_output) Model fetches what it needs. Loop ends when it submits structured findings (path, line, severity, evidence—not prose).

What Changed

- Before: "This might break retries." - After: "In foo/bar.go:123, call bypasses withRetry(). Other call sites use it (see search results). Wrap or document."

The Pieces

1. Tools — boring, fast, deterministic. get_file_content, search_code. Treat them like production APIs. 2. Terminal action — structured JSON submission, not Markdown. No evidence? Can't submit. 3. Loop — model turn → tool turn → repeat. Aggressive context shrinking (old results truncated, diff stays). 4. Guardrails — iteration caps, timeouts, self-critique checklist.

Evaluation

Pick 5-10 PRs where you know the real risks. Check: - Found the issue? - Cited exact file:line? - Hallucinated anything? - Fetched evidence when uncertain? Iterate on tools, not prompts.

The Pattern

Don't build bigger prompts. Build a loop where the model can fetch evidence, test hypotheses, and submit only when it can cite sources. That's the difference between "sounds right" and "is right."

Agentic AI Code Review: From Confidently Wrong to Evidence-Based

1 Comments

sharp-dev