A Reminder Why Digitising Old Printed Material Is Important (2025)

2 pointsposted 11 hours ago
by walterbell

1 Comments

k310

10 hours ago

A note on the Internet Archive and a request.

Many documents scanned by the Internet Archive seem to have a compromise scan setting in which images are more or less accurate, while text is usually black against a dishwater gray background.

Using free tools (Apple Preview app) I have managed to threshold straight text and music notation, but this step destroys any images that aren't line art, and I have used the contact sheet view to replace trashed images in the new document with ones from the original. [0]

It would be wonderful if someone could devise software that recognizes text areas and image areas, and treats each so as to approach the clarity of the original, or even improve it, as with faded originals, etc.

And, of course, be free or inexpensive. I don't have scads of money to spend.

[0] Sometimes, it takes a "reduce lightness" quartz filter to get proper thresholding on a second pass. And it's very slow.