simonw
8 hours ago
I think the most interesting thing about this is how it demonstrates that a very particular kind of project is now massively more feasible: library porting projects that can be executed against implementation-independent tests.
The big unlock here is https://github.com/html5lib/html5lib-tests - a collection of 9,000+ HTML5 parser tests that are their own independent file format, e.g. this one: https://github.com/html5lib/html5lib-tests/blob/master/tree-...
The Servo html5ever Rust codebase uses them. Emil's JustHTML Python library used them too. Now my JavaScript version gets to tap into the same collection.
This meant that I could set a coding agent loose to crunch away on porting that Python code to JavaScript and have it keep going until that enormous existing test suite passed.
Sadly conformance test suites like html5lib-tests aren't that common... but they do exist elsewhere. I think it would be interesting to collect as many of those as possible.
tracnar
an hour ago
If you're porting a library, you can use the original implementation as an 'oracle' for your tests. Which means you only need a way to write/generate inputs, then verify the output matches the original implementation.
It doesn't work for everything of course but it's a nice way to bug-for-bug compatible rewrites.
gwking
7 hours ago
I’ve idly wondered about this sort of thing quite a bit. The next step would seem to be taking a project’s implementation dependent tests, converting them to an independent format and verifying them against the original project, then conducting the port.
pbowyer
37 minutes ago
I think I've asked this before on HN but is there a language-independent test format? There are multiple libraries (think date/time manipulation for a good example) where the tests should be the same across all languages, but every library has developed its own test suite.
Having a standard test input/output format would let test definitions be shared between libraries.
skissane
4 hours ago
Give coding agent some software. Ask it to write tests that maximise code coverage (source coverage if you have source code; if not, binary coverage). Consider using concolic fuzzing. Then give another agent the generated test suite, and ask it to write an implementation that passes. Automated software cloning. I wonder what results you might get?
gaigalas
an hour ago
> Ask it to write tests that maximise code coverage
That is significantly harder to do than writing an implementation from tests, especially for codebases that previously didn't have any testing infrastructure.
skissane
5 minutes ago
Give a coding agent a code base with no tests, and tell it to write some, it will - if you don’t tell it which framework to use, it will just pick one. No denying you’ll get much better results if an experienced developer provides it with some prompting on how to test than if you just let it decide for itself.
cr125rider
5 hours ago
I’ve got to imagine a suite of end to end tests (probably most common is fixture file in, assert against output fixture file) would be very hard to nail all of the possible branches and paths. Like the example here, thousands of well made tests are required.
aadishv
6 hours ago
I wonder if this makes AI models particularly well-suited to ML tasks, or at least ML implementation tasks, where you are given a target architecture and dataset and have to implement and train the given architecture on the given dataset. There are strong signals to the model, such as loss, which are essentially a slightly less restricted version of "tests".
simonw
6 hours ago
I'm certain this is the case. Iterating on ML models can actually be pretty tedious - lots of different parameters to try out, then you have to wait a bunch, then exercise the models, then change parameters and try again.
Coding agents are fantastic at these kinds of loops.
montroser
4 hours ago
We've been doing this at work a bunch with great success. The most impressive moment to me was when the model we were training did a type of overfitting, and rather than just claiming victory (as it all too often) this time Claude went and just added a bunch more robust, human-grade examples to our training data and hold out set, and kept iterating until the model effectively learned the actual crux of what we were trying to teach it.
heavyset_go
7 hours ago
This is one of the reasons I'm keeping tests to myself for a current project. Usually I release libraries as open source, but I've been rethinking that, as well.
simonw
7 hours ago
Oddly enough my conclusion is the opposite: I should invest more of my open source development work in creating language-independent test suites, because they can be used to quickly create all sorts of useful follow-on projects.
heavyset_go
5 hours ago
I'm not that generous with my time lol
cortesoft
5 hours ago
Isn't the point that you might be one of the people who benefits from one of those follow on projects? That is kind of the whole point of open source.
Why are you making your stuff open source in the first place if you don't want other people to build off of it?
heavyset_go
4 hours ago
> Why are you making your stuff open source in the first place if you don't want other people to build off of it?
Because I enjoy the craft. I will enjoy it less if I know I'm being ripped off, likely for profit, hence my deliberate choices of licenses, what gets released and what gets siloed.
I'm happy if someone builds off of my work, as long as it's on my own terms.
bgwalter
5 hours ago
Open source has three main purposes, in decreasing order of importance:
1) Ensuring that there is no malicious code and enabling you to build it yourself.
2) Making modifications for yourself (Stallman's printer is the famous example).
3) Using other people's code in your own projects.
Item 3) is wildly over-propagandized as the sole reason for open source. Hard forks have traditionally led to massive flame wars.
We are now being told by corporations and their "AI" shills that we should diligently publish everything for free so the IP thieves can profit more easily. There is no reason to oblige them. Hiding test suites in order to make translations more difficult is a great first step.
inejge
2 hours ago
> Hard forks have traditionally led to massive flame wars.
Provided that the project is popular and has a community, especially a contributor community (the two don't have to go together.) Most projects aren't that prominent.
visarga
3 hours ago
I think the only non-slop parts of the web are: open source, wikipedia, arXiv, some game worlds and social network comments in well behaved/moderated communities. What do they share in common? They all allow building on top, they are social first, people come together for interaction and collaboration.
The rest is enshittified web, focused on attention grabbing, retention dark patterns and misinformation. They all exist to make a profit off our backs.
A pattern I see is that we moved on from passive consumption and now want interactivity, sociality and reuse. We like to create together.
cies
8 hours ago
This is an interesting case. It may be good to feed it to other model and see how they do.
Also: it may be interesting to port it to other languages too and see how they do.
JS and Py are but runtime-typed and very well "spoken" by LLMs. Other languages may require a lot more "work" (data types, etc.) to get the port done.