hackernews client

A new era for software testing

87 pointsposted 4 days ago

26 Comments

rglover

3 hours ago

> I have the feeling that the introduction of automatic QA may raise the bar of quality for new releases of software, and maybe partially compensate for the lower quality of the code produced at high speed with the use of automatic programming.

In theory. The only difference between today and "the aughts" is that we have machines that can spit out a ton of code very quickly.

Nothing has changed about the discipline or honesty around testing (you can skip automated tests even faster now if you wish). You can and should work with AI to write tests, but you have to know the difference between a good test and a "looks good on paper" test in order for it to truly be effective and raise the quality of what you're building.

mlmonkey

3 hours ago

Writing unit tests used to be the bane of my existence. I used to hate them. Often times, the LoC for unit tests was 3X the LoC of the actual code.

But not any more! Now I point the LLM to the code and order it to write unit tests, covering all edge cases, etc. I'd rather spend 3 hours arguing with the LLM than writing unit tests! :-D

dkn

2 hours ago

I am curious in your experience how often the LLM must also update the tests. I find that if LLMs write tests after the implementation exists, they are either extremely brittle because they are coupled to the implementation, or they cover little of value because they mock everything to the point of testing nothing.

mplanchard

25 minutes ago

I have found a decent trick to be to write a parameterized test with e.g. a `cases` array that tests a function how you want it tested. Then ask the LLM to fill out more cases. It’s not perfect, but results in much less brittleness since you’ve already defined the specifics of what gets tested and what doesn’t.

dcastm

an hour ago

Same for me. I actively ask the LLM to write as few tests as possible. Otherwise you end up redundant and low value ttests.

dkn

43 minutes ago

Yep, and wasted token spend on an ongoing basis.

I instruct the LLM to follow TDD practices in certain areas, but otherwise prioritize integration style tests at the edges.

avensec

an hour ago

> The idea is to create a markdown file where an AI agent is asked to work as a QA engineer

Given your code-base is mature enough, please don't have a single Skill/Steering/Persona/Ruleset (or whatever) for your "QA Engineer." This is just the same "my behavioral file can one-shot the entire system build" kind of thinking that will give you expensive, marginal results as the system grows.

If you want to have success in this space, get really fine-grained. Every single test scope needs its own behavioral files.

Have your core behavioral file define some simple specifics around Test Pyramid, Test Purposes, checks for tautological tests, etc. Then get _really_ specific;

<test-type>-architect (plan)

<test-type>-engineer (execute)

<test-type>-resolver (problem solver, maintenance, how to manage a failure, etc.)

e.g., playwright-architect, etc.

Then create additional ones for Unit tests, API tests, contract tests, or any other required test layer for the SUT.

Overengineered? Maybe given the size of your codebase. But for anything significant, you are codifying what humans and their skillsets do.

simianwords

4 days ago

Scenario testing is the new word for it and I think this is a game changer.

Two of the reasons I never liked writing tests is

- they didn’t seem to usually assert much internal logic

- they would have to be maintained along with the original code

I think scenario testing is much better instead because the actual way a person uses a feature hardly changes but the internals might change a lot.

So imagine I’m making an e-commerce website. There are lots of internal mechanisms. I’ll have an agent testing all the functionalities as if it were a customer. This gives me much much more confidence while writing code because it is more uncorellated with the code.

Tomorrow I can change a lot of internals but the testing agent stays the same.

There’s something to note though: not all code is possible to be scenario tested. Like data engineering and other things where the feedback time is huge.

anthonypasq

3 hours ago

are we just re-inventing playwright tests except 10x slower and infinity times more expensive?

i feel like im going insane

righthand

4 minutes ago

Well playwright tests used to be called puppeteer tests which used to be called selenium tests, so you tell me.

hugs

3 hours ago

since the rise of agentic coding tools, it feels like we're in a new "eternal september" of people discovering ui end-to-end test automation.

acdha

3 hours ago

Also the merits of documentation and specs. It’s been eye-opening to see the subset of developers who were almost disdainful about writing documentation for their colleagues but are now tripping over themselves to do so for their clanker.

simianwords

40 minutes ago

Clanker is the new excuse to use hard R against something you don't like.

inigyou

3 hours ago

People are rediscovering everything. Some people have proposed using a more formal language to tell the AI precisely what code to write. That's a compiler.

righthand

7 minutes ago

This already exists. You mean capturing user flows which should already be supplied by product to the developer. A decent system is Behavior Driven Development (though honestly a poor acronym for it’s use).

avensec

an hour ago

So, throw out the traditional test pyramid, shift right, and rely more on persona testing than fine-grained atomic tests? I would hope teams don't need to re-learn that lesson for themselves, but...

konart

3 hours ago

>Scenario testing is the new word

How is scenario different from a behavior (as in Behavior-Driven Development)?

Gherkin and things like Cucumber are not something new, are they?

hulitu

3 days ago

> Two of the reasons I never liked writing tests

Are you an engineer ? You must test your "creation". Or would you expect that the microwave owen you just bougth will be tested by your child while getting burned ?

robotresearcher

4 hours ago

'I never liked writing tests' is not the same as 'I don't write tests'.

marshalhq

2 hours ago

I ran mutation testing on a side project recently and found a test that passed even if the production method returned an empty string. AI-generated tests at scale will have exactly this problem. High coverage, confident test names, zero actual verification.

onemoresoop

an hour ago

Don't worry, AI maximalists have a solution: create tests for the tests.

wesselbindt

an hour ago

The idea of injecting more indeterminacy in pipelines is beyond me.

devin

an hour ago

Well you see, you just run the same test 10,000 times, and then...

wrxd

4 days ago

I believe this can work if done on top of traditional testing. I would feel very uneasy to replace deterministic (ok, not always but mostly) test suites with something that is not deterministic at all

simianwords

4 days ago

I think this is just TDD or unit test dogma and I’m personally not a fan.

Unit tests and deterministic tests are hard to get right and need to be done at the correct boundary.

I have seen many people dogmatically pushing unit tests religiously but this often leads to very hard to maintain tests that mostly exist just to change along with the main code itself.

A good way to understand if your unit tests are good: are you changing them along with changing your actual code? Then it’s a bad test. I think the argument for “it’s just documentation” is weak.

fcarraldo

4 days ago

I don’t disagree with your point, but there is still value in having unit tests that change along with the code. It’s less than a “proper” test, but when these tests break _unexpectedly_, it’s still more signal than you’d have without them. Like, always changing `file.go` alongside `file_test.go` may be acceptable if you catch errors that impact `serve_test.go` unexpectedly.

Of course, if you’re just watching Claude changing both and saying “LGTM” then it’s not very valuable.