hackernews client

Ask HN: Is anyone using LLM based document processing in production?

7 pointsposted 2 months ago

Item id: 46319267

8 Comments

f_k

2 months ago

I'm working on this exact problem with https://citellm.com .

Every extracted field comes with a precise citation back to the source document (page + snippet + bounding box + confidence score) so reviewers can verify where each value came from.

Hallucinations get flagged automatically because there's no supporting text in the source.

The goal is to make HITL fast and not have reviewers read through the whole document.

muzani

2 months ago

I have a project with them, processing auto insurance claims. Mostly extracting details from police reports like license plate numbers, extracting details of the incident.

"Human in the loop doesn't help because the human would just have to read the document themselves to ensure accuracy, defeating the point of the automation."

They're doing it manually without it. Semi-auto beats manual readily. There's still checks like submission of the number to grab the details of the individuals involved, and if the names, vehicle type, etc don't match, that automatically flags that something's off.

ensemblehq

2 months ago

Have you tried using non-LLM based methods? Like starting with something rules-based and working through a layered multi-model setup?

That’s what we’ve been using for document extraction where accuracy needs precision (capital markets documents, medical assessments). We had a go at pure LLM with medical documents but the output was poor and felt like it would take substantial investment to create something more robust.

Ask HN: Is anyone using LLM based document processing in production?

8 Comments

f_k

muzani

ensemblehq

whinvik

asdev

cranberryturkey

asdev

cranberryturkey