simonw
2 months ago
JustHTML https://github.com/EmilStenstrom/justhtml is a neat new Python library - it implements a compliant HTML5 parser in ~3,000 lines of code that passes the full existing 9,200 test HTML5 conformance suite.
Emil Stenström wrote it with a variety of coding agent tools over the course of a couple of months. It's a really interesting case study in using coding agents to take on a very challenging project, taking advantage of their ability to iterate against existing tests.
I wrote a bit more about it here: https://simonwillison.net/2025/Dec/14/justhtml/
EmilStenstrom
2 months ago
Thanks for sharing simon! Writing a parser is a really good job for a coding agent, because there's a clear right/wrong answer. In this case, the path there is the challenging part. The hours I've spent trying to convince agents to implement adoption agency well... :)
msephton
2 months ago
RSS on website is erroring. I'd like to follow!
EmilStenstrom
2 months ago
Thanks! Now fixed.
gabrielsroka
2 months ago
> 3,000 loc
I cloned the repo and ran `wc -l` on the src directory and got closer to 9,500. Am i missing something?
Edit: maybe you meant just the parser
HPsquared
2 months ago
Better to use something like `cloc` which excludes blank and comment lines.