I built a pure Go PDF library with full XFA support

1 pointsposted 5 hours ago
by b-g-d

1 Comments

b-g-d

5 hours ago

I built a pure Go PDF processing library after repeatedly running into limitations with existing options, especially around XFA forms and encrypted PDFs.

Most PDF libraries I tried fall into one (or more) of these buckets: - Depend on C/C++ via CGO, which complicates builds, cross-compilation, and deployment - Treat XFA as an edge case or don’t support it at all - Struggle with encrypted PDFs that use object streams and incremental updates

XFA is still widely used in government and enterprise workflows (tax forms, applications, compliance docs), but it’s notoriously hard to work with programmatically.

What pdfer does: - pdfer is a zero-dependency, pure Go PDF library that supports: - Full XFA handling: extract, parse, modify, and rebuild XFA forms (including encrypted PDFs) - Automatic handling of both AcroForm and XFA via a unified API - Correct parsing of encrypted PDFs (RC4, AES-128/256) with object streams - Byte-preserving parsing and reconstruction (important for legal/compliance use cases) - Incremental update parsing (PDFs modified multiple times) - Content extraction (text, images, graphics, annotations with positioning) - Document operations (merge, split, rotate, extract pages) - PDF diffing using an LCS-based approach with configurable granularity

Design notes: - Uses a parse-then-decrypt strategy to correctly handle encrypted object streams (similar in spirit to pypdf) - XFA was treated as a first-class feature from the start rather than bolted on - No CGO or external dependencies — easy to embed in Go services and cross-compile

Primary use cases so far have been automated form processing, document comparison/versioning, and PDF handling in backend services where external dependencies are undesirable.

The project is open source: https://github.com/benedoc-inc/pdfer

Happy to hear feedback, especially from others who’ve had to deal with XFA or encrypted PDFs in production.