hackernews client

rs1996

a month ago

We built a consumer app that does deep ingredient and health analysis (food, supplements, skincare, cat treats, etc.) using llama-3.3-70b in production.

Some numbers from the last month: - ~3.0M+ tokens processed - ~$2.07 total inference cost - ~0.5–0.6 cents per scan - Median latency ~3s, typical range 3–5s - Long prompts, structured outputs, ingredient-level caching

This isn’t a demo or batch job — it’s a real latency-constrained mobile workload with thousands of active scanning users.

The main takeaway for us was that deep, high-quality inference can be surprisingly cheap and predictable if you design for it intentionally.

Happy to answer questions or share more details if useful.

Running a real consumer app on a 70B LLM at sub-cent cost per scan

1 Comments

rs1996