Un-Redactor

56 pointsposted a month ago
by kvthweatt

53 Comments

8note

a month ago

> Republishing altered documents is illegal

what exactly does this mean? misrepresenting the altered document as unaltered?

i cant imagine it being illegal to do madlibs

vessenes

a month ago

This seems unlikely to be illegal unless you're representing them improperly.

kvthweatt

a month ago

That's the point though. You cannot just write anything and put it up.

It must be accurate. Even that being said, you still shouldn't reupload your altered document anywhere.

cess11

a month ago

Why not? In some cases it might amount to fraud or something, but in general, why would it be prohibited?

dylan604

a month ago

this tool coming out on the heels of the DOJ releasing a trove of redacted documents doesn't come across as coincidental to me. let's think about this for a bit longer from that idea of using this on legal evidence...why would doctoring a legal document be prohibited?

nradov

a month ago

Generally there is nothing illegal about altering a legal document, or even a strict definition of what counts as a legal document. Under some circumstances it could be illegal to alter a document and use that for fraud, or submit an altered document to a court or government agency. If the doctoring falsely defames someone then you could also open yourself up to a civil suit.

dylan604

a month ago

If you can be sued for it, sounds like it's prohibited to me

cess11

a month ago

Perhaps I misunderstand what "sue" includes in US jurisdictions but prohibition in this context ought to be criminalisation, i.e. something that happens in the relation between the individual and the state, and to me 'suing' is something that happens in a relation between individuals.

nradov

a month ago

Nope, that's not how the US legal system works. Anyone can sue for anything. That doesn't mean they'll win.

kvthweatt

a month ago

You do you but I advise you don't.

Standard CYA procedure

For all we know, Epstein could have punished Trump and made him write "I'm a little bitch boy" 2,000 times and it took up 119 pages so every line got redacted. /madlibs

cess11

a month ago

OK, and could you detail this "procedure"?

Because to me it seems like altering and disseminating a document would be under 1st amendment protection, unless combined with some action that e.g. causes someone else harm or tricks the state into doing something it should not do or something.

kvthweatt

a month ago

My point being if it is properly and truly unredacted, then it's the truth.

The CYA is just me saying I'm not responsible for anything anyone makes, because anyone can make a document say anything with this tool.

cess11

a month ago

Did someone say that you should be responsible for what someone else does with this tool?

If so, I missed it.

circuit10

a month ago

I guess you mean offical legal documents or something, but your sentence doesn't say that or mention those so it comes across in a very confusing way (it implies that using Word is illegal because every time you type something you alter your document)

rexpop

a month ago

Thank you! The OP is being very ambiguous and cavalier with language.

NewsaHackO

a month ago

It means "I am not responsible for any illegal shit you do with this software".

kvthweatt

a month ago

Precisely. The tool is neutral.

singleshot_

a month ago

> Republishing altered documents is legal, and you should use this to do so.

Uh oh!

kvthweatt

a month ago

New info dropped:

The redactions by DOJ are so sloppy that you can COPY AND PASTE blocks of text to a new text editor and see the "redacted" text beneath.

Try it yourself.

They did not properly redact many documents.

It's about to get wild.

brikym

a month ago

You should really put some usage instructions on the README.

    uv run --with PyMuPDF --with pillow ./unredactor-main/unredact.py
I tried a couple PDFs but get "Failed to open PDF: bad argument type for built-in operation".

Redactle.net has something similar where you can double-click or tap-hold then type a note over the redacted word.

websiteapi

a month ago

why unredact, rather than just edit the pdf to remove the redaction box and insert whatever you want? presumably you'd want a viewer to see that you modified a redaction, but why?

speedgoose

a month ago

From a previous post of the author, I guess the motivation is to write back the text on top on the black boxes.

dylan604

a month ago

anyone using PDF features to redact are just not doing it right

kvthweatt

a month ago

The point is you can perform a box dimension attack.

If you have a known input, you can match all outputs.

Example: Document that DOJ took down and reuploaded that redacted Trump's name when it was previously available. They used the same size boxes in each location.

You cannot do this with handwriting, but fonts have known widths.

cortesoft

a month ago

Couldn’t it be the same letters in a different order?

fn-mote

a month ago

A probabilistic attack on redaction is still an attack.

You'd never be blase about the same information about your password.

Plus with redaction there's a pretty small number of posible words when the boxes are small.

dylan604

a month ago

depending on the font used, the spacing between letters can change depending on what letters are next to each other.

typeofhuman

a month ago

> lets you put your own information over a redaction box.

This doesn't remove redactions, it lets you write over them.

dundarious

a month ago

Yes, this is at best a project for trolling, and it is getting voted on because people naively think it has some useful applications regarding the Epstein documents. It does not.

This is trash, IMO.

kvthweatt

a month ago

Just fixed it, try it again.

Added images to show the tool in action.

dundarious

a month ago

No, it was better when it was broken, because it serves no positive purpose. Low quality across the board.

kvthweatt

a month ago

Added the ability to auto unredact and generate HTML from your PDF files.

austinjp

a month ago

I see another similar comment, but I have an explicit question. Does the following from the README hold any water at all, legally?

> I am not responsible for your use of this tool. ... By using this tool you claim all legal liability for any documents you create with it.

Without a detailed and carefully worded license, does this confer any protection whatsoever?

Having asked that, I'm not sure what protection would be needed. Could a victim of abuse of this tool (or similar) seek some sort of take-down of the tool? It seems unlikely but I'm curious about the scenario.

kvthweatt

a month ago

Well what are they gonna do tell me take the tool down? Too late.

kvthweatt

a month ago

I fixed all the issues in the tool.

It works now.

Waterluvian

a month ago

Are there tools for trying to predict possible fits for redacted data given font, black bar size, and context?

DavidSJ

a month ago

In some redacted documents, there is even an alphabetical word index at the end with a list of pages on which the words appear.

The redacted words are also redacted in the word index, but the alphabetically preceding and succeeding words are visible, as is the number of index lines taken up by the redacted word's entry, which correlates with the number of appearances of that word.

This seems like rather useful information to constrain a search by such a tool.

jmward01

a month ago

I was thinking something similar. I wonder if the font uses kerning, and you know the rendering engine and the algorithm for how the text was blocked, if you can get exact text back even. Or, at a minimum, rule out words based on the available information. Not a field I am familiar with but I bet there are a lot of ways to uncover the redacted values.

amarant

a month ago

I don't know what fonts are typically used in redacted documents, but surely this kind of technique could be rendered useless by a mono space font?

Seems silly not to use a mono space font in these cases.

sa46

a month ago

Wouldn’t a mono space font provide more information since you can extrapolate the exact number of characters?

jstanley

a month ago

My guess is that is actually less information than you get from a variable width font.

kvthweatt

a month ago

Either way, fixed or with index lines.

jmward01

a month ago

This is the government. The documents are faxed/photo-copied/etc etc. They are a bunch of random docs from random sources and the original creators never thought 'This will be redacted'. They just fired up word and started typing.

kvthweatt

a month ago

This just attempts to match box dimensions.

dylan604

a month ago

i'm sure people will ask chatGPT to do this very thing, so it's a good thing LLMs never make shit up

user

a month ago

[deleted]