hackernews client

Exploring the internal representations of Pangram 3.3.2

31 pointsposted 12 hours ago

5 Comments

saithound

10 hours ago

I use Pangram quite extensively (burning through my 600 token allowance every month). They managed to get their false positive rate impressively low: if Pangram says something is 100% AI-written, you can trust that.

But they need to improve their humanizer dataset. Right now, most models can be given system prompts which cause them to emit text classified as 100% human. It looks like their automated humanizers do worse than these system prompts. Or (alarming if so) they chose not to include ones that would make their product look unreliable.

meander_water

9 hours ago

GPTZero is much better at handling humanized outputs. Also has a similar false positive rate to Pangram.

Chu4eeno

11 hours ago

I wonder if they had enough material from individual humans if they could've distinguished between them as well? It really seems like their model is learning to recognize some general form of writer's "voice", so to speak (and I assume their final layer just knows which voices are supposed to be tagged as what).

andai

10 hours ago

I heard an author say recently (I think it was a blog posted here) that an LLM was able to identify him from one of his unpublished high school essays.

The DoD claimed to have de-anonymized Satoshi Nakamoto by similar means a while back. (Well, I think it was before LLMs. By similar means I mean stylometry, running statistics on a person's use of language.)

jazzpush2

10 hours ago

Hoping for a follow-up with Sparse Autoencoders.