Have you tried using non-LLM based methods? Like starting with something rules-based and working through a layered multi-model setup?
That’s what we’ve been using for document extraction where accuracy needs precision (capital markets documents, medical assessments). We had a go at pure LLM with medical documents but the output was poor and felt like it would take substantial investment to create something more robust.
I'm working on this exact problem with https://citellm.com .
Every extracted field comes with a precise citation back to the source document (page + snippet + bounding box + confidence score) so reviewers can verify where each value came from.
Hallucinations get flagged automatically because there's no supporting text in the source.
The goal is to make HITL fast and not have reviewers read through the whole document.
I have a project with them, processing auto insurance claims. Mostly extracting details from police reports like license plate numbers, extracting details of the incident.
"Human in the loop doesn't help because the human would just have to read the document themselves to ensure accuracy, defeating the point of the automation."
They're doing it manually without it. Semi-auto beats manual readily. There's still checks like submission of the number to grab the details of the individuals involved, and if the names, vehicle type, etc don't match, that automatically flags that something's off.
We are. But our usecase is more tolerant of failures so it's probably not as much of an issue.
How do you remediate failures?
we're using it at SummaryForge