serjester
a day ago
This is just a company advertisement, not even one that’s well done. They didn’t benchmark any of the real leaders in the space (reducto, extend, etc) and left Gemini out of the first two tests, presumably because it was the best performer (while also being multiple orders of magnitude cheaper).
diptanu
a day ago
Hey! I am the founder of Tensorlake. We benchmarked the models that our customers consider using in enterprises or regulated industries where there is a big need for processing documents for various automation. Benchmarking takes a lot of time so we focussed on the ones that we get asked about.
On Gemini and other VLMs - we excluded these models because they don't do visual grounding - aka they don't provide page layouts, bounding boxes of elements on the pages. This is a table stakes feature for use-cases customers are building with Tensorlake. It wouldn't be possible to build citations without bounding boxes.
On pricing - we are probably the only company offer a pure on-demand pricing without any tiers. With Tensorlake, you can get back markdown from every page, summaries of figures, tables and charts, structured data, page classification, etc - in ONE api call. This means we are running a bunch of different models under the hood. If you add up the token count, and complexity of infrastructure to build a complex pipeline around Gemini, and other OCR/Layout detection model I bet the price you would end up with won't be any cheaper than what we provide :) Plus doing this at scale is very very complex - it requires building a lot of sophisticated infrastructure - another source of cost behind modern Document Ingestion services.
coderintherye
a day ago
Google's Vertex API for document processing absolutely does bounding boxes. In fact, some of the document processors are just a wrap around Google's product.
diptanu
a day ago
OP mentioned Gemini and not Google’s Vertex OCR API which has very different performance and accuracy characteristics than Gemini
ianhawes
a day ago
I just tested a non-English document and it rendered English text. Does your model not support anything other than English?
diptanu
a day ago
It does, we have users in Europe and Asia using it with non English languages. Can you please send me a message at diptanu at tensorlake dot ai, would love to see why it didn’t work.
hotpaper75
a day ago
Thanks for mentioning them, indeed their post seem to only surface a couple of names in the field and maybe not the most relevant ones.
JLO64
a day ago
Personally I use OpenAI models via the API for transcription of PDF files. Is there a big difference between them and Gemini models?