hollerith
3 months ago
If this had been ordinary text rather than an image with text in it, I would've read it.
n1xis10t
3 months ago
Alright. I OCRed it with Tesseract, and then proofread it and fixed things that were broken like links, and then I put it up on github: https://n1xis10t.github.io/search-engine-article/
It is one left aligned column of pre-formatted text in an html file, which contains no style information or javascript.
n1xis10t
3 months ago
Is this because of the malware risk inherent in images?
hollerith
3 months ago
No, it is several things.
One is that I cannot select part of the text, then copy and paste it into the "reference material" I keep in plain-text files on my local computer.
Another is that there is no convenient way to search, i.e., Ctrl+F does not work.
Another is that making the text larger[1] is less convenient: in general I need to right click > "Open image in new tab", then do the zooming rather than just do the zooming, and then the zooming is jumpier somehow (which I can handle, but it slows me down) and text never reflows. Sometimes after I zoom the image to make the text large enough for me to read easily, a column of text has become too wide for my screen so that I need to scroll back and forth horizontally for every line of text. That was not a problem with this specific image file, but IIRC I gave up right before I noticed that that would not be a problem.
Finally, even when the text I want to read is actual text (not an image) sometimes I have to resort to Ctrl+A (select all) then copy and paste the text into a text editor (because every web site is a snowflake, and some of the snowflakes are gnarly); knowing that this last resort is not available with an image file makes me less likely to try to read the image file.
The reasons I give above add up to a situation in which it is slower and more annoying to read from an image file than to read ordinary HTML text or plain text.
Also, I was already annoyed by archive.org even before I figured out that the thing I might want to read is an image file: the way my browser is configured (OS zoom set to 200%, then Chrome's zoom set to 80%, so the effective zoom of the viewport is 160%, but the browser chrome occupies twice as much vertical real estate as it would if OS zoom were 100%) when I land on the web page, more than half of the viewport is occupied by a plea for money at the top plus the two menu bars in the site header. (And the text in the plea for money is larger and sharper than the text of the content.)
[1] My eyesight is substandard.
horseradish7k
3 months ago
malware in images is as prevalent as razorblades in candy
n1xis10t
3 months ago
Either you are saying that malware in images isn’t very common, or you are saying that it is common and you live in a really scary place where you should never buy candy