LLMs use "safety" specific neuron layers to identify vulnerabilities in code

5 pointsposted 12 hours ago
by summarity

2 Comments

westurner

12 hours ago

> Circuit Tracer on Gemma-2-2b

decoderesearch/circuit-tracer: https://github.com/decoderesearch/circuit-tracer

ScholarlyArticle: "Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection" (2026-05) https://arxiv.org/abs/2605.29901v1