Yes, of course I review everything.
I treat it like hiring a consultant. They do a lot of work, but I still review the output before making a decision or passing it on.
Sending something with errors to my boss or peers makes me look stupid. Saying it was caused by unrevised AI makes me look stupider.
Ever? More like always. Keeping humans in the loop is the current best practice. If you truly need to automate something that cannot afford a human checkpoint, find a deterministic solution for it, not LLMs.
At this point, we are not there yet in terms of letting AI make business critical decisions based on its own outputs. Its meant to serve as a decision support system rather than a decision maker.
To minimize hallucinations, yes AI should be set up for deterministic behaviour (depending on your use case, for example, in recruiter, yes it should be deterministic so it produces the same evaluation for the same candidate every time). Secondly, having another AI check hallucination can be a good starting point, assigning scores and penalizing the first AI can also lead to more grounded responses.
In my opinion, the way this will play out is with a significant amount of validation and human oversight to fully utilize these LLMs. As you mentioned, I recommend giving the AI room for error and improving the experience of manually checking everything. Maybe create a tool to facilitate manually checking the output?
This is a valuable read: https://www.ufried.com/blog/ironies_of_ai_1/
Build validation layers, not trust. For structured outputs (invoices, emails), use JSON schemas + fact-checking prompts where a second AI call verifies critical fields against source data before you see it.
Real pattern: AI generates → automated validation catches type/format errors → second LLM does adversarial review ("check for hallucinated numbers/dates") → you review only flagged items + random samples. Turns "check everything" into "check exceptions," cuts review time 80%.
Also lets 50% of errors through
The new guys on my team do not check it. They already had problems checking their work, AI is just amplifying the actual human problem.
You have put your finger on why agent assisted coding often doesn't suck, and other use cases of LLMs often do suck. Lint and the compiler get there licks in before you even smoke test the code. There aren't two layers of deterministic, algorithmic checking for your emails or invoices.
So before anyone concludes that coding agents prove that AI can be useful, find some use cases with similar characteristics.
I also don't trust LLMs, but I still find automations useful. Even with human-in-the-loop they save a bunch of time. Clicking "Approve & Send" is much quicker than manually writing out the email, and I just rewrite the 5% that contains hallucinations.
I have been building Research automation with LangGraph for the past 2 months. We always put a human in the loop checkpoint after each critical step, might be annoying now but I think it will save us long-term.
You can't 100% be sure the AI won't hallucinate. If you don't want to manually check it, you can have a different AI check it and if it finds something suspect flag it for a human to verify it. Even better have 2 different AIs check the output and if they don't agree flag it.