hackernews client

zahlman

5 months ago

This is not insightful. It describes (in a very typical padded, AI style) some generic reasons that PDFs are hard to parse (which apply just as much in any programming language), and then makes a few common suggestions for Python PDF parsing libraries that could trivially be found by anyone who needs them with a simple Web search (https://duckduckgo.com/?q=python+pdf+library), or by looking around more specialized places (e.g. https://stackoverflow.com/search?tab=votes&q=%5bpython%5d%20... , even though this is considered off-topic on Stack Overflow now). Or by asking here, for that matter.

Admittedly, PyPI's own search is quite poor for this sort of thing.

Dozens of other articles have been submitted from the same domain, almost all by the author (https://news.ycombinator.com/from?site=theseattledataguy.com) and a large fraction of these have been filtered.

Challenges You Will Face When Parsing PDFs with Python

1 Comments

zahlman