Delete tests

111 pointsposted 3 days ago
by mooreds

146 Comments

recursivedoubts

20 hours ago

One of the most important things you can do is move your tests up the abstraction layers and away from unit tests. For lack of a better term, to move to integration tests. End-to-end tests are often too far from the system to easily understand what's wrong when they break, and can overwhelm a development org. Integration tests (or whatever you want to call them) are often the sweet spot: not tied to a particular implementation, able to survive fairly significant system changes, but also easy enough to debug when they break.

https://grugbrain.dev/#grug-on-testing

BinaryIgor

16 hours ago

Was exactly about to point that out - if you mostly have integration (aka in-between tests) tests, you rarely need to refactor your tests. It's about testing mostly at the right abstraction level: https://binaryigor.com/unit-integration-e2e-contract-x-tests...

RaftPeople

2 hours ago

Agree.

I don't think people realize the stats based on studies:

Unit testing catches about 30% of bugs

Visual code inspection catches about 70%

End to end testing also catches about 70%

All are important, but emphasis should be on the more effective methods.

> We’ve had decades of thought leadership around testing

I really disagree that this is the case that the industry has had thought leadership. I think we've had people pushing automated unit testing very hard when end to end is more effective. I don't think the position was based on facts but more of a few people's opinion.

RHSeeger

18 hours ago

Integration tests and Unit tests are different tools; and each has their place and purpose. Using one "instead" of the other is a mistake.

MrJohz

17 hours ago

I've never really found this to be the case in practice. When I look at well-written unit tests and well-written integration tests, they're usually doing exactly the same sort of thing and have very similar concerns in terms of code organisation and test structure.

For example, in both cases, the tests work best if I test the subject under test as a black box (i.e. interact only with its public interface) but use my knowledge of its internals to identify the weaknesses that will most require testing. In both cases, I want to structure the code so that the subject under test is as isolated as possible - i.e. no complex interactions with global state, no mocking of unrelated modules, and no complex mechanism to reset anything after the test is done. In both cases, I want the test to run fast, ideally instantaneously, so I get immediate results.

The biggest difference is that it's usually harder to write good integration tests because they're interacting with external systems that are generally slower and stateful, so I've got to put extra work into getting the tests themselves to be fast and stateless. But when that works, there's really not much difference at all between a test that tests a single function, and a test that tests a service class with a database dependency.

rkomorn

16 hours ago

I've found that well-written unit tests help me narrow down problems faster during development (eg one unit test failing for a function would show that a change or refactor missed an edge case).

I've found that well-written integration tests help me catch workflow-level issues (eg something changed in a dependency that might be mocked in unit tests).

So while I think good integration tests are the best way to make sure things should ship, I see a lot of value in good unit tests for day-to-day velocity, particularly in code that's being maintained or updated instead of new code.

CuriouslyC

15 hours ago

Unit tests are good for testing isolated units of code, integration tests test integration. If you wait until you have enough code to test integration, when you actually write the tests you're going to find you've checked in a bunch of almost-working code.

ffsm8

13 hours ago

Let's say you have a function thats being called to compute a state using hundreds of attributes spread across tens of different objects, and various different levels of nesting.

Now, you could create hundreds of different integration tests for each branch of the computation..., most of which will assert the same final output state, but achieved through different transitions

Or you can make some integration tests which make sure the logic itself is being called, and then only unittest the specific criteria in isolation.

What you're talking about is likely founded in either frontend testing (component tests vs unittest) or backends which have generally pretty trivial logic complexity. In these cases, just doing an integration test gets it done for the most part, but as soon as you got multiple stakeholders giving you sperate requirements and the consumed inputs get bigger and multiply ... Testing via integration tests gets essentially impossible to do in practice

chamomeal

12 hours ago

I feel like that’s where property based testing comes in. Quickcheck style libraries.

I only recently started looking into Quickcheck style libraries in the typescript world, and fast-check is fantastic. Like super high quality. Great support for shrinking in all sorts of cases, very well typed, etc.

Hooking fast-check up to a real database/redis instance has been incredible for finding bugs. Pair it up with some regular ol case by case integration tests for some seriously robust typescript!

RHSeeger

16 hours ago

I'll go with a bank account, because that was one of the initial examples for automated testing.

I would write integration/system (different, but similar, imo) to test that the black box integrations with the system work as expected. Generally closer to the "user story" end of things.

I would write integration tests for smaller, targeted thing. Like making sure the sort method works in various cases, etc. Individual methods, especially ones that don't interact with data outside what is passed into them (functional methods), are good for unit testing.

9rx

16 hours ago

> to test that the black box integrations with the system work as expected. Generally closer to the "user story" end of things.

This is what unit testing was originally described as. Which confirms my belief that unit testing and integration testing has always been the very same thing.

> Individual methods, especially ones that don't interact with data outside what is passed into them (functional methods), are good for unit testing.

Perhaps unit testing has come to mean this, but these kinds of tests are rarely ever worth writing, so it is questionable if it even needs a name. Sometimes it can be helpful to isolate a function like that for the sake of pinning down complex logic or edge cases, but is likely you'll want to delete this kind of test once you're done. This is where testing brittleness is born.

MrJohz

8 hours ago

I find if you figure out the right unit boundaries, and find a good way of testing the code, you can often keep the tests around long-term, and they'll be very stable. Even when you update the code you're testing, if the tests are well-written, updating the tests is often just a case of running a find-and-replace job.

That said, I think it takes a real knack to figure out the right sort of tests, and it sometimes takes me a couple of attempts to get it right. In that case, being willing to delete or completely rewrite tests that just aren't being useful is important!

RHSeeger

15 hours ago

I've described this before on occasion; I consider there to be a wide variety of tests.

- Unit test = my code works

- Functional test = my design works

- Integration test = my code is using your 3rd party stuff correctly (databases, etc)

- Factory Acceptance Test = my system works

- Site Acceptance Test = your code sucks, this totally isn't what I asked for!?!

Then there's more "concern oriented" groupings, like "regression tests", which could fall into any number of the above.

That being said, there's a pretty wide set of opinions on the topic, and that doesn't really seem to change over time.

> these kinds of tests are rarely ever worth writing

I strongly disagree. I find it very helpful to write unit tests for specific implementations of things (like a specific sort, to make sure it works correctly with the various edge cases). Do they get discarded if you completely change the implementation? Sure. But that doesn't detract from the fact that they help make sure the current implementation works the way I say it does.

9rx

15 hours ago

> I find it very helpful to write unit tests for specific implementations of things (like a specific sort, to make sure it works correctly with the various edge cases).

Sorting mightn't be the greatest example as sorting could quite reasonably be the entire program (i.e. a library).

But if you needed some kind of custom sort function to serve features within a greater application, you are already going to know that your sort function works correctly by virtue of the greater application working correctly. Testing the sort function in isolation is ultimately pointless.

As before, there may be some benefit in writing code to run that sort function in isolation during development to help pinpoint what edge cases need to be considered, but there isn't any real value in keeping that around after development is done. The edge cases you discovered need to be moved up in the abstraction to the greater program anyway.

MrJohz

8 hours ago

It's very often easier to trigger edge cases when just testing a smaller part of a system then when testing the whole system. Moreover, you'll probably write more useful tests if you write them knowing what's going on in the code. In these cases, colocating the tests with the thing they're meant to be testing is really useful.

I find the problem with trying to move the tests up a level of abstraction is that eventually the code you're writing is probably going to change, and the tests that were useful for development the first time round will probably continue to be useful the second time round as well. So keeping them in place, even if they're really implementation-specific, is useful for as long as that implementation exists. (Of course, if the implementation changes for one with different edge cases, then you should probably get rid of the tests that were only useful for the old implementation.)

Importantly, this only works if the boundaries of the unit are fairly well-defined. If you're implementing a whole new sort algorithm, that's probably the case. But if I was just writing a function that compares two operands, that could be passed to a built-in sort function, I might look to see if there's a better level of abstraction to test at, because I can imagine the use of that compare function being something that changes a lot during refactorings.

9rx

8 hours ago

> eventually the code you're writing is probably going to change

Ideally your units/integrations will never change. If they do change, that means the users of your code will face breakage and that's not good citizenry. Life is messy and sometimes you have little choice, but such changes should be as rare as possible.

What is actually likely to change is the little helper functions you create to support the units, like said bespoke sort function. This is where testing can quickly make code fragile and is ultimately unnecessary. If the sort function is more useful than just a helper then you will move it out into its own library and, like before, the sort function will become the entire program and thus the full integration.

MrJohz

4 hours ago

The interface ideally doesn't change, but the implementation probably will. And most of the units you're writing are probably internal-facing, which means that even if the interface does change, fixing that is just an internal refactoring change - with types and a good IDE, it's often just a couple of key presses away.

I think this is what you're saying about moving useful units out into their own library. I agree, and I think it sounds like we'd draw the testing boundaries in similar places, but I don't think it's necessary to move these sorts of units into separate libraries for them to be isolated modules that can be usefully tested.

The sort function is one of the edge cases where how I'd test it would probably depend a lot on the context, but in theory a generic sort function has a very standard interface that I wouldn't expect to change much, if at all. So I'd be quite happy treating it as a unit in its own right and writing a bunch of tests for it. But if it's something really implementation-specific that depends on the exact structure of the thing it's sorting, then it's probably better tested in context. But I'm quite willing to write tests for little helper functions that I'm sure will be quite stable.

9rx

4 hours ago

> The interface ideally doesn't change

The whole of the interface is the unit, as Beck originally defined it. As it is the integration point. Hence why there is no difference between them.

> And most of the units you're writing are probably internal-facing

No. As before, it is a mistake to test internal functions. They are just an implementation detail. I understand that some have taken unit test to mean this, but I posit that as it is foolish to do it, there is no need to talk about it, allowing unit test to refer to its original and much more sensible definition. It only serves to confuse people into writing useless, brittle tests.

> So I'd be quite happy treating it as a unit in its own right

Right, and, likewise, you'd put it in its own package in its own right so that it is available to all sort cases you have. Thus, it is really its own program — and thus would have its own tests.

MrJohz

2 hours ago

> Right, and, likewise, you'd put it in its own package in its own right so that it is available to all sort cases you have. Thus, it is really its own program — and thus would have its own tests.

Sure, yeah, I think we're saying the same thing. A unit is a chunk of code that can act as its own program or library - it has an interface that will remain fairly fixed, and an implementation that could change over time. (Or, a unit is the interface that contains this chunk of code - I don't think the difference between these two definitions is so important here.) You could pull it out into its own library, or you can keep it as a module/file/class/function in a larger piece of software, but it is a self-contained unit.

I think the important thing that I was trying to get across earlier, though, is that this unit can contain other units. At the most maximal scale, the entire application is a single unit made up of multiple sub-units. This is why I think a definition of unit/integration test that is based on whether a unit integrates other units doesn't really make much sense, because it doesn't actually change how you test the code. You still want quick, isolated tests, you still want to test the interface and not the internals (although you should be guided by the internals), and you still want to avoid mocking. So distinguishing between unit tests and integration tests in this way isn't particularly useful.

9rx

an hour ago

> and you still want to avoid mocking.

Assuming by mock you mean an alternate implementation (e.g. an in-memory database repository) that relieves dependence on a service that is outside of immediate control, nah. There is no reason to avoid that. That's just an implementation detail and, as before, your tests shouldn't be bothered by implementation details. And since you can run your 'mock' against the same test suite as the 'real thing', you know that it fulfills the same contract as the 'real thing'. Mocks in that sense are also useful outside of testing.

If you mean something more like what is more commonly known as a stub, still no. This is essential for injecting failure states. You don't want to have to actually crash your hard drive to test your code under a hard drive crash condition. Testing failure cases are the most important tests you will write, so you will definitely be using these in all but the simplest programs.

mrugge

16 hours ago

In test-driven development, fast unit tests are a must-have. Integration tests are too slow. If you are not doing test-driven development, can go heavier into integration tests. I find the developer experience is not as fun without good unit tests, and even if velocity metrics are the same, that factor alone is a good reason to focus on writing more fast unit tests.

MrJohz

14 hours ago

In general, fast tests are a must-have, but I find that means figuring out how to write fast integration tests as well so that they can also be run as part of a TDD-like cycle. In my experience, integration tests can generally be written to be very quick, but maybe my definition of an integration test is different from yours?

For me, heavy tests implies end-to-end tests, because at that point you're interacting with the whole system including potentially a browser, and that's just going to be slow whichever way you look at it. But just accessing a database, or parsing and sending http requests doesn't have to be particularly slow, at least not compared to the speed at which I develop. I'd expect to be able to run hundreds of those sorts of tests in less than a second, which is fast enough for me.

mrugge

9 hours ago

I inherited a django project which has mostly 'unit' tests that flex the ORM and the db, so they are really integration tests and are painfully slow. There is some important logic that happens in the ORM layer and that needs to be tested. At some point I want to find the time to mock the database so that they can be faster, but in some cases I worry about missing important interactions. Domain is highly specialized so not very easy to just know how to untangle the mess.

9rx

7 hours ago

> I worry about missing important interactions.

If you are concerned that the ORM won't behave as it claims to, you can write tests targeted at it directly. You can then run the same tests against your mock implementation to show that it conforms to the same contract.

But an ORM of any decent quality will already be well tested and shouldn't do unexpected things, so perhaps the worry is for not?

barrkel

9 hours ago

What if, instead of a bank account, it's FooSystemFrobnicationPreparer? Something which is necessary today, but probably should be refactored within the next year or two?

Maybe FooSystem will be redesigned to take different inputs,maybe the upstream will change to provide different outputs, maybe responsibility will shift around due to changes in the number of dependencies and it makes sense to vertically integrate some prep to upstream to share it.

Unit tests in these circumstances - and they're the majority of unit tests, IME - can act as a drag on the quality of the system. It's better to test things like this at a component level instead of units.

MrJohz

2 hours ago

I mean, you get to decide what the unit is. I think this is one of the biggest issues with Java and some similar languages, in that it puts so much emphasis on classes (each class gets its own file and is the unit of import) that people used to Java think of classes as _the_ unit boundary, as opposed to being one type of boundary that can sometimes be useful.

So `BankAccount` as a class is probably a useful unit boundary: once you've designed the class, you're probably not going to change the interface much, except for possibly adding new methods occasionally. You have a stable boundary there, where in theory you could completely rewrite the internals of the class but the external boundary will stay the same.

`FooSystemFrobnicatorPreparer` sounds much more like an internal detail of some other system, I agree, and its interface could easily be rewritten or the class removed entirely if we decide to prepare our frobnication in a different way. But in that case, maybe the `foo.system.frobnicator` package is the unit we want to test as a whole, rather than one specific internal class inside that package.

I think a lot of good test and system design is finding these natural fault lines where it's possible to create a relatively stable interface that can hide internal implementation details.

soanvig

15 hours ago

I think this discussion has to be open with what is a "unit" in unit tests. "Integration" consists of many units working together. But my unit can be a function or entire module. That's what people ignore in most discussions about test types.

JackSlateur

13 hours ago

My integration tests test things that must work

My unit tests test things that must not work

globular-toast

14 hours ago

It depends what you are doing. Let's say your module implements a way to declare rules and then run some validation function to check objects against those rules. You can't just test every possible set of rules and object that you want to check, even though this is, of course, all that matters. You have to unit test the implementation of the module to be at all confident that it's doing the right thing.

So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.

You should push your tests as far to the edge as possible but no further. If a test at the edge duplicates a test in the middle, delete the test in the middle. But if a test at the edge can't possibly account for everything you're going to bed a test in the middle.

MrJohz

13 hours ago

Yeah, that's similar to how I'd look for the correct place to put my tests. But at that point, a unit test is just the innermost layer of tests, which doesn't feel like a useful distinction. In your example, I might have a set of tests checking how the rules are parsed and interpreted (say), and then a set of tests one level up checking that the validation engine was a whole works, and then another set of tests a level up testing a module that uses the validation engine. The tests for the validation engine won't retest parsing, and the tests for the module using the validation engine won't test validation per se, but there's multiple layers there where each layer contains unit tests focusing on that layer's code specifically.

troupo

13 hours ago

> You can't just test every possible set of rules and object that you want to check, even though this is, of course, all that matters.

If it matters, why can't you check? Will your product/app/system not run into these possible sets eventually?

> So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.

Don't you have to write the combinatorial explosion of inputs for the unit tests, too, to test "every possible combination"? If not, and you're only testing a subset, then why not test the whole flow while you're at it?

globular-toast

11 hours ago

You can't check because the numbers quickly become astronomical. Can you test the Python parser on all possible Python programs? Even if you limited the length of a program you're still talking about an absurdly large number of possible inputs.

What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.

I think of it like building any other large machine with many inputs. You can't possibly test a car under every conceivable condition. Imagine if someone was like "but wait, did you even test going round a corner at 60mph in the wet with the radio on?!"

troupo

an hour ago

> You can't check because the numbers quickly become astronomical.

But you can with unit tests?

> Can you test the Python parser on all possible Python programs?

A parser is one of the few cases where unit tests work. Very few people write parsers.

See also my sibling reply here: https://news.ycombinator.com/item?id=45078047

> What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.

Ah yes. Somehow "behaviour of unit tests is correct" but "just testing interfaces in just a few integration tests". Funny how that becomes a PagerDuty alert at 3 in the morning because "correct behaviour" in one unit wasn't tested together with "correct behaviour" in another unit.

But when you actually write an actual integration test over actual (or simulated) inputs, suddenly 99%+ of your unit tests become redundant because actually using your app/system as intended covers most of the code paths you could possibly use.

JimDabell

14 hours ago

In my experience, a bug that causes a unit test to fail also causes an integration or E2E test to fail. Also, it’s relatively easy to determine the cause of the problem given a change and a failing integration/E2E test. Unit tests are usually much quicker to run, but you also need a lot more of them. I think when you combine these things, it’s easy to reach the conclusion that unit tests are redundant.

barrkel

9 hours ago

I don't agree. When you implement some new code and want to cover it in testing, you have a choice where you call into the system from your tests. Calling at the lowest level is not always the right choice, but it's the way the testing pyramid will bias you.

Instead - instead, there really is an instead here - you can call at a higher level which is less brittle to refactoring, has less complex setup, doesn't involve mocks that may not behave like the real thing, but still runs quickly due to fakes stubbing out expensive dependencies.

s_ting765

14 hours ago

Integration tests make unit tests absolutely redundant.

integralid

12 hours ago

Integration tests are as old as unit tests, and both predate their names. When exactly were unit tests made redundant? I don't see the point of your quip without a trace of actual argument.

I feel like I don't write enough tests, and when I do they're usually integration tests, but some things - algorithms, complex but pure functions, data structures - absolutely deserve their unit tests that can't be reasonably replaced by integration/e2e tests.

s_ting765

12 hours ago

Here's the argument backing up my claim.

Unit tests don't matter when you have other types of testing like functional or integration testing that will tell you whether your code has the intended behavior and effect when run.

In the above statement unit tests is also considered as code.

That's where the redundancy comes from.

skydhash

11 hours ago

Unit tests do matter, especially when the logic is somewhat complex or very defined (splitting money, parsing some message). So unless the specs change, you rarely have to modify the tests. So it helps more in a technical sense, catching developer mistakes. Just like qa tests on some small part of the car can spot defect early on.

Integrated tests is more about ensuring what matters to Product. A car that refuses to start is worthless for most cases. But the engine light and a window that can’t open is not usually a dealbreaker.

Unit tests can help pinpoint an issue or ensure that a specs is implemented. But that’s mostly relevant to the developer world. So for a proper DX, add unit tests to help pinpoint bugs faster, especially with code that doesn’t change as much and where knowledge can be lost.

yakshaving_jgt

12 hours ago

No they don’t.

If you’re, for example, writing a web application, and you have an endpoint which parses some data from the request and then responds with the result of that computation, why the hell would you test the fine-grained behaviour of your parser by emulating HTTP requests against your server?

Testing the parsing function in isolation is orders of magnitude cheaper.

s_ting765

12 hours ago

Cheaper to run tests does not mean better. And your example does not test for real world behavior therefore the test is lower quality by definition.

yakshaving_jgt

31 minutes ago

All else being equal, cheaper is absolutely better.

How does my example not test real world behaviour? I mean, I didn’t even provide any code here so what exactly are you imagining?

simianwords

16 hours ago

Wow I hate this dogmatism. It is indeed better to use one instead of the other. Let’s stop pretending all are equally good and we need every type of test.

Sometimes you just don’t need unit tests and it’s okay to admit it and work accordingly.

RHSeeger

16 hours ago

And sometimes you only need screws, instead of nails; or vice versa. But that doesn't invalidate the tool; it just means your use case doesn't need it.

imiric

15 hours ago

You can't build a house without nails and screws, though.

Sure, if you're only writing a small script, you might not need tests at all. But as soon as that program evolves into a system that interacts with other systems, you need to test each component in isolation, as well as how it interacts with other systems.

So this idea that unit tests are not useful is coming from a place of laziness. Some developers see it as a chore that slows them down, instead of seeing it as insurance that makes their life easier in the long run, while also ensuring the system works as intended at all layers.

CuriouslyC

15 hours ago

If you don't write unit tests, how do you know something works? Just manual QA? How long does that take you relative to unit tests? How do you know if something broke due to an indirect change? Just more manual QA? Do you really think this is saving you time?

tsimionescu

15 hours ago

You can write many other other kinds of automated tests. Unit tests are rarely worth it, since they only look at the code in isolation, and often miss the forest for the trees if they're the only kind of test you have. But then, if you have other higher level tests that test your components are working well together, they're already implicitly covering that each component individually works well too - so your unit tests for that component are just duplicating the work the integration tests are already doing.

skydhash

11 hours ago

Sometimes you really need to ensure that something is a tree. And you do not need the whole forest around for that. Sure you can’t have an adventure with only a tree. But if you need a tree, you need to make sure someone don’t bring a concrete tree sculpture.

user

14 hours ago

[deleted]

troupo

13 hours ago

> If you don't write unit tests, how do you know something works?

Integration tests. Unlike unit tests they actually test if something works.

yakshaving_jgt

12 hours ago

This is utter nonsense.

troupo

an hour ago

Unit tests test units in isolation.

Integration tests test that your system works. Testing how a system works covers the absolute vast majority of functionality you'd test with unit tests because you will hit the same code paths, and test the same behaviours you'd do with unit tests, and not in isolation.

This is a joke, but it's not: https://i.sstatic.net/yHGn1.gif

yakshaving_jgt

an hour ago

I have been doing TDD for over a decade, and I don’t know why you’re trying to explain the basics to me.

Yes, you can exercise the same code paths with integrated tests as you might with unit tests. There are multiple approaches to driving integrated tests, from the relatively inexpensive approach of emulating a HTTP env, to something more expensive and brittle like Selenium. You could also just test everything with manual QA. Literally pay some humans to click through your application following a defined path and asserting outcomes. Every time you make a change.

Obviously all of these have different costs. And obviously, testing a pure function with unit tests (whether example based or property based) is going to be cheaper than testing the behaviour of that same function while incidentally testing how it integrates with its collaborators.

imiric

15 hours ago

You claim it's dogmatism, yet do the same thing in reverse. (:

Unit and integration tests test different layers of the system, and one isn't inherently better or more useful than the other. They complement each other to cover behavior that is impossible to test otherwise. You can't test low-level functionality in integration tests, just as you can't test high-level functionality in unit tests.

There's nothing dogmatic about that statement. If you disagree with it, that's your prerogative, but it's also my opinion that it is a mistake. It is a harmful mentality that makes code bases risky to change, and regressions more likely. So feel free to adopt it in your personal projects if you wish, but don't be surprised if you get push back on it when working in a team. Unless your teammates think the same, in which case, good luck to you all.

tsimionescu

14 hours ago

The problem with this line of argument is that, in general, high level behavior (covered by integratuon tests) is dependent on low level behavior. So if your code is ascertained to work at the high level, you also know that it must be working at the lower level too. So, integration tests also tell you if your component works at a low level, not just a high level.

The converse is not true, however. It's perfectly possible for individual components to "work" well, but to not do the right thing from a high level perspective. Say, one component provides a good fast quicksort function, but the other component requires a stable sort to work properly - each is OK in isolation, but you need an integration test to figure out the mistake.

Unit tests are typically good scaffolding. They allow you to test bits of your infrastructure as you're building it but before it's ready for integration into the larger project. But they give you realtively little assurance at the project level, and are not worth it unless you're pretty sure you're building the right thing in the first place.

integralid

13 hours ago

> So if your code is ascertained to work at the high level, you also know that it must be working at the lower level too

In the ideal world maybe. But It's very hard to test edge cases of a sorting algorithm with integration test. In general my experience is that algorithms and some complex but pure functions are worth writing unit tests for. CRUD app boilerplate is not.

MoreQARespect

11 hours ago

Ive never in my life written a test for a sorting algorithm nor, im sure, will i ever need to.

The bias most developers have towards integration tests reflects the fact that even though we're often interviewed on it, it's quite rare that most developers actually have to write complex algorithms.

It's one of the ironies of the profession.

yakshaving_jgt

19 minutes ago

I write parsers all the time.

Why wouldn’t you test parsers in isolation?

EliRivers

12 hours ago

So if your code is ascertained to work at the high level, you also know that it must be working at the lower level too.

I have 100% seen bugs that cancel each other out; code that's just plain wrong at the lower level, coming together by chance to work at the higher level such that one or more integration tests pass. When one piece of that lower level code then gets fixed, either deliberately or because of a library update or hardware improvement or some other change that should have nothing to do with the functionality, and the top level integration tests starts failing, it can be so painful to figure it out.

I've also seen bugs that cancel either other out to make one integration test pass, but don't cancel each other out such that other integration tests fail. That can be a mindmelt; surely if THIS test works, then ALL THIS low level code must be correct, but simultaneously if THAT test fails, then ALL THIS low level code is NOT correct. At which point, people start wishing they had lower level tests.

imiric

13 hours ago

> So if your code is ascertained to work at the high level, you also know that it must be working at the lower level too.

No, that is not guaranteed.

Integration and E2E tests can only cover certain code paths, because they depend on the input and output from other systems (frontend, databases, etc.). This I/O might be crafted in ways that never trigger a failure scenario or expose a bug within the lower-level components. This doesn't mean that the issue doesn't exist—it just means that you're not seeing it.

Furthermore, the fact that, by their nature, integration and E2E tests are often more expensive to setup and run, there will be fewer of them, which means they will not have full coverage of the underlying components. Another issue is that often these tests, particularly E2E and acceptance tests, are written only with a happy path in mind, and ignore the myriad of input that might trigger a failure in the real world.

Another problem with your argument is that it ignores that tests have different audiences. E2E and acceptance tests are written for the end user; integration tests are written for system integrators and operators; and unit tests are written for users of the API, which includes the author and other programmers. If you disregard one set of tests, you are disregarding that audience.

To a programmer and maintainer of the software, E2E and acceptance tests have little value. They might not use the software at all. What they do care about is that the function, method, object, module, or package, does what says on the tin; that it returns the correct output when given a specific input; that it's performant, efficient, well documented, and so on. These users matter because they are the ones who will maintain the software in the long run.

So thinking that unit tests are useless because they're a chore to maintain is a very shortsighted mentality. Instead, it's more beneficial to see them as guardrails that make your future work easier, by giving you the confidence that you're not inadvertently breaking an API contract whenever you make a change, even when all higher-level tests remain green across the board.

troupo

13 hours ago

> This I/O might be crafted in ways that never trigger a failure scenario or expose a bug within the lower-level components.

You mean just like unit tests where every useful interaction between units is mocked out of existence?

> Furthermore, the fact that, by their nature, integration and E2E tests are often more expensive to setup and run, there will be fewer of them

And that's the main issue: people pretend that only unit tests matter, and as a result all other forms of testing are an afterthought. Every test harness and library is geared towards unit testing, and unit testing only.

imiric

12 hours ago

> You mean just like unit tests where every useful interaction between units is mocked out of existence?

Sure, that is a risk. But not all unit tests require mocking or stubbing. There may be plenty of pure functions that are worth testing.

Writing good tests requires care and effort, like any other code, regardless of the test type.

> And that's the main issue: people pretend that only unit tests matter, and as a result all other forms of testing are an afterthought.

Huh? Who is saying this?

The argument is coming from the other side with the claim that unit tests don't matter. Everyone arguing against this is saying that, no, all tests matter. (Let's not devolve into politics... :))

The idea of the test pyramid has nothing to do with one type of test being more important than another. It's simply a matter of practicality and utility. Higher-level tests can cover much more code than lower-level ones. In projects that keep track of code coverage, it's not unheard of for a few E2E and integration tests to cover a large percentage of the code base, e.g. >50% of lines or statements. This doesn't mean that these tests are more valuable. It simply means that they have a larger reach by their nature.

These tests also require more boilerplate to setup, external system dependencies, they take more time to run, and so on. It is often impractical to rely on them during development, since they slow down the write-test loop. Instead, running the full unit test suite and a select couple of integration and E2E tests can serve as a quick sanity check, while the entire test suite runs in CI.

Conversely, achieving >50% of line or statement coverage with unit tests alone also doesn't mean that the software works as it should when it interacts with other systems, or the end user.

So, again, all test types are important and useful in their own way, and help ensure that the software doesn't regress.

troupo

an hour ago

> Sure, that is a risk. But not all unit tests require mocking or stubbing.

Not all integrations require mocking or stubbing either. Yet somehow your argument against integration tests is that they somehow won't trigger failure scenarios.

> The argument is coming from the other side with the claim that unit tests don't matter.

My argument is that the absolute vast majority of unit tests are redundant and not required.

> The idea of the test pyramid has nothing to do with one type of test being more important than another. It's simply a matter of practicality and utility.

You're sort of implying that all tests are of equal importance, but that is not the case. Unit tests are the worst of all tests, and provide very little value in comparison to most other tests, and especially in comparison to how many unit tests you have to write.

> it's not unheard of for a few E2E and integration tests to cover a large percentage of the code base, e.g. >50% of lines or statements. This doesn't mean that these tests are more valuable.

So, a single E2E tests a scenario that covers >50% of code. This is somehow "not valuable" despite the fact that you'd often need up to a magnitude more unit tests covering the same code paths for that same scenario (and without any guarantees that the units tested actually work correctly with each other).

What you've shown, instead, is that E2E tests are significantly more valuable than unit tests.

However, true, E2E tests are often difficult to set up and run. That's why there's a middle ground: integration tests. You mock/stub out any external calls (file systems, API calls, databases), but you test your entire system using only exposed APIs/interfaces/capabilities.

> These tests also require more boilerplate to setup, external system dependencies, they take more time to run, and so on.

And the only reason for that is this: "people pretend that only unit tests matter, and as a result all other forms of testing are an afterthought." It shouldn't be difficult to test your system/app using it the way your users will use, but it always is. It shouldn't be able to mock/stub external access, but it always is.

That's why instead of writing a single integration test that tests a scenario across multiple units at once (at the same time testing that all units actually work with each other), you end up writing dozens of useless unit tests that test every single unit in isolation, and you often don't even know if they are glued together correctly until you get a weird error at 3 AM.

troupo

13 hours ago

In the absolute vast majority of cases unit tests are useless. Because your end result should be a working system, not isolated working units with everything useful mocked out of existence.

Integration tests will cover every use case unit tests are supposed to cover if you actually tests the system behavior.

EliRivers

13 hours ago

At my last employer, some of our customers used our software to run systems of which the hardware value alone exceeded the entire market cap of my employer. They ran systems with a physical footprint many many times the area of my employer's leased offices and workspaces. The physical hardware devices that our software ran alone exceeded the value of my employer's market cap; just buying one of each would have been an enormous expense, ignoring that many of them were no longer available new and some of them were decades old, sitting alongside hardware that was made in the last six months. All the physical hardware working in synchronicity, linking up with similar sites in two other locations around the world, handing off to each other to follow the sun.

It would be nice to fully test system behaviour, but to do so would have bankrupted the company long before even coming close.

troupo

13 hours ago

How were you making sure that your system actually works?

So you have unit tests testing things in isolation? Did you not test how they work together? Did you never run your system to see it actually works and behaves as expected? You just YOLO'd it over to customers and prayed?

yakshaving_jgt

12 hours ago

It’s a terrible tragedy isn’t it, that we can only choose one or the other.

troupo

an hour ago

You don't have to.

Unit tests work well for well-defined, contained units and library-like code.

E.g. you have code that calculates royalties based on criteria. You can and should test code like that with unit tests (better still, with property-based testing if possible)

Such code is in a tiny minority.

What you really want to do, is test that your system behaves as advertised. E.g. that if your API is called with param=1 it returns { a: "hello" }, and when with param=-1, it returns HTTP 401 or something.

The best way to do that is, of course E2E tests, but those are often quite difficult to set up (you need databases, external services, file systems etc.)

So you go for the middle ground: integration tests. Mock/stub unavailable external services. Test your full code flow. With one test you're likely to hit code paths that would require multiple unit tests to test, for a single scenario. You'll quickly find that easily 99%+ of your unit tests are absolutely redundant.

---

Offtop/rant/sidetrack.

This is especially egregious in "by the book" Java code. You'd have your controller that hits a service that collects data from facades that each hit external services via some modules. Each of those are tested in unit tests mocking the living daylight out of everything.

So for param=1 you'd have a unit test for controller (service mocked), service (facades mocked), each of the facades (if there are more than one, external services modules mocked), each of the external service modules (actual external services mocked).

Replace that with a single integration test where just the external service is mocked, and boom, you've covered all of those tests, and can trivially expand it to test external service being unavailable, timing out, returning invalid data etc.

EliRivers

12 hours ago

I think the record uptime for a customer before they shut down one process to upgrade the software (leaving a dozen other such processes running, taking over the work in turn as each was upgraded) was on the order of six years. This was a set of 24 hour broadcast channels.

"How were you making sure that your system actually works?"

Good design and good software engineering goes a long way.

When you know that you cannot test by simply doing everything the customer will do, you have to think about what tests you can do that will indicate how the system will operate under a load that's orders of magnitude greater than what you can do yourself, with hardware you've never even seen. You have to think about how to write software that is likely to be high quality even if you can't test it how you'd like to.

For example, one can design the architecture in such a way that adding more load, more devices, will only linearly increase the demands on resources, and then from testing infer what the loads will be on actual customer sites. Any non-linearity in those regard were identified, if not at the design stage, in the unit-testing thereof.

One can design the code in such a way that the internal mechanisms of how devices work are suitably abstracted away, leaving as best one can manage common interfaces, and then rather than have to test with the exact arrangements of hardware customers will have, test with devices that to the extent possible, simulate the interactions our software will see. In this regard, it turns out that many devices that purport to meet standard protocols actually meet "variations about the theme" of protocols. But this too can be mitigated and handled to a degree by careful design and thought in the software engineering. The learning from doing this with one set of devices and protocols spreads to significantly different devices and protocols; every subsequent fresh design for the next iteration or generation of hardware is better and more resilient. A software engineering organisation that learns and retains knowledge and experience can go long way.

One can recognise that running live on customers sites is itself an opportunity. Some customers would never say a thing, for years on end. Some would want to be involved and would regularly talk about things they'd seen, unexpected things that happened, loads and events and so on; one can ensure that all that information is gathered by the sales people, the support reps, anyone and everyone who talks to the customers, and passed back effectively to have the results of that testing applied. For doomsday scenarios, such as crashes, resource exhaustion, pathological behaviours and so on; good logging and live measurements and dump catching etc can at least feed back so that this situation (which we would never be able to truly test ourselves) is not wasted, and gets fixed, and the lessons of it applied forwards into design and development. Harsh for the customer who finds an issue, but great for the hundred customers who will never hit it because it was tested by that unlucky customer. We'd be fools not to gather as much information as we could from poor customer experiences.

One can get hold of cheap, twenty year old devices that in theory match the same protocol, and go to town on them (some customers will actually be using that exact device and contemporaries - some customers will have brand new hardware that costs a tenth of my employer's market cap). From that, get an idea of how the software performs. Get another cheap device from ebay that is a decade old, and test it; see where it fails, but don't just fix those failures. From them, and similar repeats of the process, learn at a more fundamental level how devices differ and develop more general solutions that will either then be resilient to some new piece of hardware that hasn't even been made yet, or at least will not go wrong in such a way that the whole system is taken out and the poorly-supported brand new hardware is clearly seen by the software and reported on.

There's more. There's so much more, but once you have no choice but to come up with cheap, fast testing that nonetheless give a good indication of how the system will work when someone spends tens of millions on the hardware, software engineers can really come up with some smart, reliable ideas. It can also be really fun and satisfying to work on this.

"You just YOLO'd it over to customers and prayed?"

Absolutely not. It was all tested, repeatedly, over and over, and over the course of about fifteen year became remarkably resilient, adaptable, resource light, and so on. All the good things one would hope for. In a pinch, a small system could be run from someone's laptop; at the top end, banks and banks of servers with their fans banshee wailing 24 hours a day, with dozens of the principal processes (i.e. the main executable that runs) all running, all talking to each other across countries and time zones, handling their own redundancy against individual processes turning off. Again, when you begin knowing that the software has to deliver on such a range of systems, where one customer is two college kids in a basement and one customer is valued in the tens of billions (although doing a lot more, of course, than just what our software let them do), design and good software engineering goes a very long way.

troupo

an hour ago

How do you "design a good system" without testing it? Oh wait:

> It was all tested, repeatedly, over and over, and over the course of about fifteen year

So, you do test how your system actually works, and not just isolated unit tests.

> Again, when you begin knowing that the software has to deliver on such a range of systems, where one customer is two college kids in a basement and one customer is valued in the tens of billions (although doing a lot more, of course, than just what our software let them do), design and good software engineering goes a very long way.

Indeed. And that good engineering would include a simple wisdom "unit tests are useless without integration and E2E tests, otherwise you wouldn't be able to run your software anywhere because units just wouldn't fit together".

And once you have proper integration tests, 99%+ of unit tests become redundant.

yeswecatan

18 hours ago

I find testing terminology very confusing and inconsistent. Personally, I prefer tests that cover multiple components. Is that an integration test because you test multiple components? What if you system is designed in such a way that these tests are _fast_ because the data access is abstracted away and you can use in memory repositories instead of hitting the database?

creesch

14 hours ago

> I find testing terminology very confusing and inconsistent.

That's because it is both confusing and inconsistent. In my experience, every company uses slightly different names for different types of tests. Unit tests are generally fairly well understood as testing the single unit (a method/function) but after that things get murky fast.

For example, integration tests as reflected by the confused conversation in this thread already has wildly different definitions depending on who you ask.

For example, someone might interpret them as "unit integration tests" where it reflects a test that tests a class, builder, etc. Basically something where a few units are combined. But, in some companies I have seen these being called "component tests".

Then there is the word "functional tests" which in some companies means the same as "manual tests done by QA" but for others simply means automated front-end tests. But in yet other companies those automated tests are called end 2 end tests.

What's interesting to me when viewing these online discussions is the complete lack of awareness people display about this.

You will see people very confidently say that "test X should by done in such and such way" in response to someone where it is very clear they are actually talking about different types of tests.

MoreQARespect

11 hours ago

Unit tests dont have a coherent agreed upon definition either.

In fact, when I first saw Kent Beck's definition I did a double take because it covered what I would have called hermetic end to end tests.

The industry badly needs new words because it's barely possible to have a coherent conversation within the confines of the current terminology.

jessekv

17 hours ago

I think it's relative, right? That's how abstractions and interfaces work.

I can write a module with integration tests at the module level and unit tests on its functions.

I can now write an application that uses my module. From the perspective of my application, my module's integration tests look like unit tests.

My module might, for example, implicitly depend on the test suite of CPython, the C compiler, the QA at the chip fab. But I don't need to run those tests any more.

In your case you hope the in-memory database matches the production one enough that you can write fast isolated unit tests on your application logic. You can trust this works because something else unit-tested the in-memory database, and integration tested the db client against the various db backends.

3036e4

16 hours ago

I remember reading blogs (and Testing on the Toilet) around 2010 about how Google divided tests into Small/Medium/Large, with specific definitions, rather than trying to use more vague and overloaded terminology that no one ever agreed on. Seems like they are no longer doing that? Too bad, since I think it was a clever trick to avoid having to get into pointless discussions about things like "what is a unit?". Having experienced more than one project where a unit test was uselessly defined to "have to only run a single method, everything else must be mocked" I like the idea of not having any level of tests below "small" (that is still above a level most would call "unit").

Found this long 2011 post now that goes into some detail on the background and the reasons for introducing that ("The Testing Grouplet"?): https://mike-bland.com/2011/11/01/small-medium-large.html

But I am not sure even after reading all that if the SML terminology was still used in 2011 or if they had moved on already? Can't really find any newer sources that mention it.

MathMonkeyMan

19 hours ago

Integration tests at $DAY_JOB are often slow (sleeps, retries, inadequate synchronization, startup and shut down 8 processes that are slow to start and stop), flaky (the metrics for this rate limiter should be within 5%, this should be true within 3 seconds, the output of this shell command is the same on all platforms), undocumented, and sometimes cannot be run locally or with locally available configurations. When I run a set of integration tests associated with some code I'm modifying, I have no idea what they are, why they were written, what they do, how long they will take to run, or whether I should take failures seriously.

Integration tests are closer to what you want to know, but they're also more. If I want to make sure that my state machine returns an error when it receives a message for which no state transition is defined, I could spin up a process and set up log collection and orchestrate with python and... or I could write a unit test that instantiates a state machine, gives it a message, and checks the result.

My point is that we need both. Write a unit test to ensure that your component behaves to its spec, especially with respect to edge cases. Write an integration test to make sure that the feature of which your component is a part behaves as expected.

majormajor

18 hours ago

You need to test contracts with external code without having to include full external systems. Unit tests on internal implementation details are fragile as behavior changes. Unit tests on your module's contracts give you confidence in refactoring.

Passing params in instead of making external calls inside your business logic functions can help. DI can help if that's too impractical or unwieldy for whatever reason in the domain.

It's hard to do right the first time - sometimes its fuzzy what's an internal detail vs what's an external contract - but you need to get there ASAP.

skydhash

19 hours ago

My current mental model is a car. If it’s a function or some other things you’re fully confident you captured the domain, add unit tests to capture that. Just like an engine. But the most important is integration tests that couple something like the engine, the ignition system and test that when the user press the start button, the engine start and the dashboard light up.

Unit tests are great for DX but only integration and above tests matter business wise.

MoreQARespect

11 hours ago

The way some programmers treat test flakiness is weird.

With other types of bug programmers want to fix it. With flakiness they either want to rerun the test until it passes or tear it down and write an entirely different type of test - as if it is in fact not a bug, but some immutable fact of life.

lenkite

15 hours ago

These are "system tests" at your $DAY_JOB, not "integration tests".

MathMonkeyMan

5 hours ago

There is an even higher level of testing that they call "system tests." Those are for things like tracking performance regressions. I think there's always a spectrum among the terms "system," "integration," and "unit."

barrkel

9 hours ago

Yes. When testing a component, use its public API, inject real implementations where you can, and use fakes where it's too expensive. Don't use mocks, don't test interfaces that have very complex setups to invoke, if possible.

strogonoff

16 hours ago

If there is one single test-related thing you must have, that would be e2e testing.

Integration tests are, in a way, worst of both worlds: they are more complicated than unit tests, they require involved setup, and yet they can’t really guarantee that things work in production.

End-to-end tests, meanwhile, do show whether things work or not. If something fails with an error, error reporting should be good enough in the first place to show you what exactly is wrong. If something failed without an error but you know it failed, make it fail with an error first by writing another test case. If there was an error but error reporting somehow doesn’t capture it, you have a bigger problem than tests.

At the end of the day, you want certainty that you deliver working software. If it’s too difficult to identify the failure, improve your error reporting system. Giving up that certainty because your error reporting is not good enough seems like a bad tradeoff.

Incidentally, grug-friendly e2e tests absolutely exist: just take your software, exactly as it’s normally built, and run a script that uses it like it would be used in production. This gives you a good enough guarantee that it works. If there is no script, just do it yourself, go through a checklist, write a script later. It doesn’t get more grug than that.

dsego

15 hours ago

E2e tests are the hardest to maintain and take a lot of time for little benefit in my experience. I'm talking about simulating a browser to open pages and click on buttons. They are flaky and brittle, the UI is easily the component which gets updated the most often, it's also easy to manually test while developing during QA and UAT. It's hard to mock out things, so you either have to bootstrap or maintain a whole 2nd working system with all the bells and whistles, including authentication, users, real data in the database, 3rd party integrations etc. It's just too overwhelming for little benefit. It's also hard to cover all error cases to see if a thing works correctly or breaks subtly. Most commonly in e2e we test for the happy path just to see that the thing doesn't fall over.

strogonoff

15 hours ago

The benefit is certainty that the system you are building and delivering to people works. If that benefit is little, then I don’t quite understand the point of testing.

> it's also easy to manually test while developing during QA and UAT.

As I said in the original comment, e2e tests can definitely be manual. Invoke your CLI, curl your API, click around in GUI. That said, comprehensively testing it that way quickly becomes infeasible as your software grows.

dsego

10 hours ago

> The benefit is certainty that the system you are building and delivering to people works.

I'd say that works and works correctly and covers all edge cases are different scenarios in my mind. Looking at an exaggerated example, if I build tax calculator or something that crunches numbers, I'd have more confidence with a few unit tests matching the output of the main method that does the calculation part than a whole end-to-end test suite. It seems wasteful to run end to end (login, click buttons, check that a UI element appears, etc) to cover the logical output of one part that does the serious business logic. A simple e2e suite could be useful to check for regressions, as a smoke test, but it still needs to be kept less specific, otherwise it will break on minor UX changes, which makes it a pain to maintain.

strogonoff

10 hours ago

An e2e test shows that it works. If your tax calculator’s business logic perfectly calculates the tax, but the app fails with a blank screen and a TypeError in console because a function from some UI widget lib dependency changed its signature, your calculator is as good as useless for all intents and purposes. A good unit test will not catch this, because you are not testing third-party code. An integration test that catches it approaches the complexity of an e2e.

Sure, you wouldn’t have all possible datasets and scenarios, but you can easily have a few, so that e2e test fails if results don’t make sense.

Of course, unit tests for your business logic make sense in this case. Ideally, you would express tax calculation rules as a declarative dataset and take care of that one function that applies these rules to data; if the rules are wrong, that is now a concern for the legal subject matter experts, not a bug in the app that you would need to bother writing unit tests for.

However, your unit test passing is just not a signal you can use for “ship it”. It is a development aid (hence the requirement for them to be fast). Meanwhile, an e2e test is that signal. It is not meant to be fast, but then when it comes to a release things can wait a few minutes.

dsego

8 hours ago

What's more likely to fail or cause issues? Dependencies failing and parsing errors are usually handled by the build system (type checkers and linters). In the case where they are triggered in production, it can be easily caught by monitoring services like Sentry. Ideally any changes are manually tested before releasing, and a bug in one part of the app that's being worked on is not likely to affect a different section, e.g. not necessary to retest the password reset flow if you're working on the home dashboard. Having a suit of usually flaky end-to-end tests seems like the most sloppy and cumbersome way to ensure the application runs fine, especially for a small team.

jraph

15 hours ago

Integration test are a compromise. e2e tests may be quite expensive to run (for a web application, you might need to run your backend and a web browser, possibly in a docker container - and the whole thing will also run slower). Efficiency matters a lot.

You can have robust testing by combining the two. You can check that the whole thing runs end to end once, and then test all the little features / variations using integration tests.

That's what we do for XWiki.

https://dev.xwiki.org/xwiki/bin/view/Community/Testing/#HTes...

ivanb

15 hours ago

In practice e2e tests don't cover all code paths and raise a question: what is the point? There is a code path explosion when going from a unit to an endpoint. A more low-level test can cover all code paths of every small unit, whereas tests at service boundary do not in practice do that. Even if they did, there would be a lot of duplication because different service endpoints would reuse the same units. Thus, I find e2e tests very limited in usability. They can demonstrate that the whole stack works together on a happy path, but that's about it.

strogonoff

13 hours ago

You are testing that the software works. I think that is higher value than testing all possible code paths in isolation, and then still not having the guarantee that it all works.

user

16 hours ago

[deleted]

jhhh

17 hours ago

If you are having to refactor 150 things each time you change your codebase then maybe you need to refactor your test suite first. Direct calls in tests to constructed/mocked objects is usually something you can just stuff into a private method so you only need to change it in one place.

Not quite sure I agree with the conclusion of the tiers of testing section either. If a test suite takes a long time but still covers something useful, then just deleting it because it takes too long makes no sense. Yes, if you have a 'fastTests' profile that doesn't run it that could temporarily convince you your changes are fine when they aren't. But the alternative is just never knowing your change is bad until it breaks production instead of just breaking your CI prior to that point.

strogonoff

16 hours ago

Tests are code. Code has bugs. More complex code has more bugs. The more complex your tests, the more bugs in your tests. Who tests the tests? It’s one thing if you rely on functionality provided by a stable testing framework, but I bet grug no like call stacks in own test code.

jessekv

16 hours ago

> Who tests the tests?

To me it's a bit like double entry bookkeeping. Two layers is valuable, but there's rapidly diminishing returns beyond two.

pydry

12 hours ago

Tests get implicitly tested by being run against code. When they fail in spite of the presence of no bugs then congratulations youve found a bug in your test.

strogonoff

11 hours ago

What if a test passes as a result of the bug?

pydry

6 hours ago

What I did say: tests get tested implicitly by the code they test.

What I didnt say: this catches 100% of all bugs in your tests.

eru

16 hours ago

A simple thing you can set up is to run short tests on every push to a feature branch, but run the long tests only when merging into master.

Basically, you provisionally make the merge commit, run the expensive tests against it, and iff they pass, declare the newly created commit to be the new master.

lenkite

15 hours ago

In my last org, we just separated "flaky" system tests into its own independent suite. They were still valuable to run - just not all the time.

vitonsky

15 hours ago

No, thanks. I already spent time to write tests while implementing a features, now I have a lot of tests that proof the feature is works fine, and I no more fear to make changes, because tests keep me safe of regression bugs.

The typical problems of any code base with no tests is a regression bugs, rigid team (because they must keep in mind all cases when code may destroy everything), fear driven development (because even team with zero rotation factor don't actually remember all problems they've fixed).

willio58

15 hours ago

Did you read the article?

What is your answer to the points the author makes around flaky tests/changing business requirements/too many tests confirming the same functionality and taking too long to run?

snovv_crash

15 hours ago

Flaky tests: tests should be deterministic. If your tests are flakey in a 100% controlled environment, probably your real system is unreliable too.

Changing business requirements: business logic should be tested separately. It is expected to change, so if all of your tests include it, then yes of course it will be hard to maintain.

Too many tests for the same thing: yeah then maybe delete some of the duplicates?

Taking too long: mock stuff out. Also, maybe reconsider some architectural decisions you made, if your tests take too long it's probably going to bother your customers with slow behaviour too.

ikari_pl

15 hours ago

I think the point of article is to delete the BAD tests.

Just like you need to delete the bad code, not all the code. ;)

CPLX

15 hours ago

Hacker News is a prominent message board where users create wide ranging conversations based on article titles.

dcminter

18 hours ago

How about you fix the flakey tests?

The tests I'd delete are the ones that just test that the code is written in a particular way instead of testing the expected behaviour of thr code.

Shank

16 hours ago

> How about you fix the flakey tests?

Often times a flakey test is not flakey because it was well-written and something else strange is failing. Often times the test reveals something about the system that is somewhat non-deterministic, but not non-deterministic in a detrimental way. When you have multiple levels of abstraction and parallelization and interdependent behavior, fixing a single test becomes a time consuming process that is difficult to work with (because it's flakey, you can't always replicate the failure).

If a test fails in CI and the traceback is unclear, many people will re-run once and let it continue to flake. Obvious flakes around time and other dependencies are much easier to spot and fix, so they are. It's only the weird ones that lead to pain and regret.

Izkata

an hour ago

My favorite flakey test where nothing was actually wrong with the code or test: The system used some of the same settings between development and CI, including the memcached server. The test would fail if one of the devs happened to be using their development site within 15 minutes of the next CI run, because the code would retrieve a nonexistent object from the cache and fail with a really strange error.

lexicality

13 hours ago

Sounds like it's not actually well written in that case. Either you're testing the wrong output if it's non-deterministic or you have a consistency bug that's corrupting data in production.

dcminter

12 hours ago

Exactly; a very occasionally flakey test may be tolerable but is almost by definition not well written.

The commonest type I see is one where instead of waiting until expected behaviour is exhibited with a suitable timeout, the test sleeps for some shorter period and then checks to see if the behaviour was exhibited.

These tests not only flake occasionally when the CI server or dev laptop is under unusual load, but worse, accumulate until the test suite is so full of "short" sleeps that the full set of test takes half an hour to run.

Often the sleeps were seen as being acceptable because the plan was to run the tests in parallel, but then the increased load results in the tests becoming flakey.

Once you have dozens of these flaking tests for this or other reasons, it becomes a project in itself to refactor them back to something sane.

Flakey tests should always be fixed immediately unless you're in the middle of an incident or something.

silversmith

17 hours ago

Came here to comment this. Most of the flakey tests are badly written, some warn you about bugs you don't yet understand.

Couple years ago I helped to bring a project back on track. They had a notoriously flakey part of test suite, turned out to be caused by a race condition. And a very puzzling case of occasional data corruption - also, turns out, caused by the same race condition.

MoreQARespect

11 hours ago

I tend to find that those bugs are in the extreme minority.

Most flakiness ends up being a bug in the test or nondeterminism exhibited by the code which users dont actually care about.

XorNot

17 hours ago

This: anything which starts doing stuff like "called API N times" is utterly worthless (looking at you whole bunch of AWS API mock tests...)

avg_dev

21 hours ago

idk, i never thought

> “it is blasphemy to delete a test”,

was ever a thing. i still don't.

if a test is flaky, but it covers something useful, it should be made not flaky somehow. if tests are slow, but they are useful, then they should be optimized so they run faster. if tests cover some bit of functionality that the software no longer needs to provide, the functionality should be deleted from the code and the tests. if updating a small bit of code causes many tests to need to be adjusted, and that's a pain, and it happens frequently, then the tests should be refactored or adjusted.

> Confidence is the point of writing tests.

yes, agreed. but tests are code, too. just maintain the tests with the code in a sensible way. if there is something worth deleting, delete it; there is no gospel that says you can't. but tests provide value just like the author describes in the "fix after revert after fix after revert" counterexample. just remember they're code like anything else is all and treat them accordingly.

rcktmrtn

20 hours ago

> > “it is blasphemy to delete a test”,

> was ever a thing. i still don't.

I experienced this when working at a giant company where all the teams were required to report their "code coverage" metrics to middle management.

We had the flaky test problem too, but I think another angle of is being shackled to test tech-debt. The "coverage goals" in practice encouraged writing a lot of low quality tests with questionable and complex fixtures (using regular expressions to yoink C++ functions/variables out of their modules and place them into test fixtures).

Fiddling with tests slowed down a lot of things, but there was a general agreement that the whole projected needed to be re-architected (it was split up over a zillion different little "libraries" that pretended to be independent, but were actually highly interdependent) and while I was there I always felt like we needed to cut the Gordian knot and accept that it might decrease the sacred code coverage.

Not sure if I was right or what ever happened with that project but it sure was a learning experience.

skybrian

20 hours ago

I think the argument is that sometimes updating a flaky test is not worth the effort, so consider deleting it,

arkis22

18 hours ago

Adding a test is easy. Deleting a test should involve like 3-4 people who all know the codebase.

dgunay

12 hours ago

I see a lot of debate about the definition and merits of unit, integration, and e2e testing here.

For my workplace, we have recognized a few issues with an overreliance on unit tests.

When you have important behaviors and invariants enforced by your database, you mock it out at your own peril. This literally caused a bug to slip through to prod this week. Unit tests just don't help here.

We use clean architecture. There are pockets of our codebase where, through deliberate or accidental deviations from our architecture, stuff like e.g. controllers with business logic in them exists. In some cases it is easier to just integration or e2e test this code instead of do the ugly refactoring to bring it into compliance. Doing the test first will even make the refactor easier.

Parts of our codebase are just big, pure functions. Arguments go in, return values come out. These are the ideal candidates for unit testing, and we do so extensively because they're cheap and fast.

I think what occurs to me as I write this is that if you live in an idyllic codebase which can express every single state transition in memory, without any dependency on external systems, sure, unit tests are great. For those of us who are not so lucky, e2e tests can be a lifeline and a way to maintain control with minimal mock-induced churn in the test suite.

bubblebeard

18 hours ago

The author has a point. Obsolete tests serves no one, but deleting a test because it will randomly fail is an indication of an unstable process. Maybe there is a race condition, maybe your code has some dependency that is sporadically unavailable. Deleting such tests is just turning a blind eye to the problem. Unstable tests means you either didn’t write that test very well to begin with, or the process you are testing is itself unstable.

sitkack

15 hours ago

What a poorly written article. I should delete my tests because they fail randomly. My tests don't fail randomly.

jampa

17 hours ago

I work in an app where bugs are unacceptable due to the nature of the company's reputation. We've been having a lot of success with E2E, but getting there was NOT easy. Some tips:

- False negative results will make your devs hate the tests. People want to get things done and will start ignoring them if you unnecessarily break their workflow. In the CI, you should always retry on failure to avoid flaky false-negative tests.

- E2E Tests can fail suddenly. To avoid breaking people's workflow, we do a megabenchmark every day at 1 AM, and the test runs multiple times - even if it passes - so that we can measure flakiness. If a test fails in the benchmark, we remove it from the CI so we don't break other developers' workflows. The next day, we either fix the test or the bug.

- Claude Code SDK has been a blessing for E2E. Before, you couldn't run all the E2E in the PR's CI due to the time they all take. Now, we can send the branch to the Claude Code SDK to determine what E2E tests should run.

- Also, MCPs and Claude Code now write most of my E2E. I wrote a detailed Claude.md to let it run autonomously --writing, validating, and repeating -- while I do something else. It does in 3 to 4 shots. For the price of a cup of coffee, it saves me 30-60 minutes per test.

_caw

8 hours ago

Would love to hear more about using Claude to determine which E2E tests to run. What context are you giving it?

Is it like, "this looks like a billing feature, let me run any tests that seem relevant"?

4ndrewl

15 hours ago

> If your test is creating confidence in broken code with failing tests, it would be better for it to not exist.

The author never considers the other option of fixing the flaky tests. I find this odd.

teiferer

15 hours ago

All seems like "fix tests" is the better advice.

Flaky test? Fix it! Make it rock solid! Slow test? Fix it! Make it fast! That can be hard (if it was easy, people would have already done it), but it's vastly more useful than deleting.

Even the mentioned overtesting requires a fix by focusing the tests on separate things. You could call that "deleting" but that's oversimplifying what's going on. Same with changed requirements.

egeozcan

15 hours ago

At one of the companies I worked with when I was doing consulting, they could make the slow tests, which used to take around 3 hours, run much faster and in parallel by throwing engineering and hardware resources at the problem. First it was 30 minutes, then it was 10, then around 2-3 minutes.

I think it was one of the best investments that company made.

So my point is, don't delete slow tests, just make them fast.

nine_k

15 hours ago

Un-clickbaiting the title: "Delete useless tests".

I once faced a suite of half-broken tests; so many were broken that engineers stopped caring if their changes broke another test. I suggested to separate a subset of still-working, useful tests, keep them always green, and make them passing a required check in CI/CD. Ignore the rest of the tests for CI/CD purposes. Gradually fix some of the flaky or out-of-sync tests if they are still useful, and promote them to the evergreen subset. Delete tests that are found to be beyond repair (like the article suggests).

This worked.

throwmeaway222

16 hours ago

delete all mocked tests imo

  mock exists
  call exists
  assert exists was called one time
so so so useless so that you can increase your coverage. just move to integration tests

simianwords

16 hours ago

A good heuristic for a test is: how many times you are having to fix it when you change real code.

If every small change in the code base causes you to go back and fix the tests then your tests are bad. They should not get in the way so often. There should be a concept of “test maintenance overhead” that is weighted against the number of bugs it catches. You could also think of it as false positives vs true positives.

huflungdung

13 hours ago

Contrarian blog post trying to challenge the status quo without understanding the implications in order to look like a visionary.

I say this in interviews just to look smart. And people think it’s revolutionary. Everyone loves an outspoken opposition -what do they know, where did they get this knowledge?!

Only a couple have ever pushed back and those that do are the companies that I want to work with.

A wild example, let’s delete a test that ensures a heart pump works at the correct duty cycle given its parameters. Now someone comes along and redefines milliseconds to microseconds for some unrelated component. The tests are all fine. Patient now has a 60000 bpm heart rate.

Stupid idea.

efitz

18 hours ago

I have had a weird thought lately about testing at runtime. My thought is just to log violations of expectations- i.e. log when the test would have failed.

This doesn’t prevent the bug from being introduced but can remove a huge amount of complexity for test cases that are hard to set up.

seer

17 hours ago

I’ve kinda of the opinion that if introducing tests, especially the useful integration tests is hard and complex, then it is a code smell.

Most of the times, especially while I was learning, making your code “more testable” has always involved things that should have been done in the first place, but we were lazy/didn’t know better.

Things like reducing dependencies, moving state away from the core and into the shell. Using more formal state machines etc. Once the “painful changes” were done I’ve found that it was actually beneficial in a lot of other contexts.

That given, I’ve kinda almost stopped writing unit tests - with the advent of expressive types everywhere, the job of unit tests has now been shifted to the compiler.

In one typescript project I’ve managed to set it up, the part that kept the state was statically typed (a database) making sure any data that went in and out was _exactly_ like the compiler expected.

After typing and validating all the other user / non-user inputs into the code, it ended up in a situation where “if the code compiles, it will work” and that was glorious. We had very minimal unit tests - only around actual business logic with state machines, the rest was kinda handled by the compiler and we didn’t feel the need to do it manually.

Apart from that, the integration tests had the philosophy of “don’t specify anything that the user is not seeing” so no button test ids, urls or weird expectation of the underlying code, just an explanation of “the user is on the page with this title, they see a button named this and they press it, expecting they are now in a page titled this”

The concept was taken from the capybara ruby testing library way back in the day, and the tests this produced have been incredibly resilient. Any update that changes the user experience would fail the tests (as they should) and any refactor, up to the level of changing urls or even changing the underlying libraries and frameworks, would be ignored.

runstop

16 hours ago

Sounds a bit like "design by contract", leaving the assertions enabled in production code. It would be great to have solid DbC support in mainstream languages.

rustystump

16 hours ago

At the end of the day, you need to have some kind of way to know if shit work or dont. This article feels a bit contrived to make an edgy point of “delete the test” which feels like it misses the real why behind testing.

gijoeyguerra

18 hours ago

I've always deleted tests. I've never heard anyone say not to delete tests.

fritzo

18 hours ago

I repeatedly, emphatically tell AI coding assistants not to delete tests.

readthenotes1

18 hours ago

I groaned when a co-worker deleted a test that was pointing out his code was broken.

I didn't tell him not to delete tests. It wouldn't have done any good.

cjfd

15 hours ago

I think the word you were looking for was 'cow orker'.

rotbart

15 hours ago

So... clickbait title for an article that could have been called "Delete flakey tests"...but then and most of us would have just gone "yep" and not clicked.

mirekrusin

16 hours ago

We add .skip and QAs are taking over in the background to address those issues.

imiric

17 hours ago

I'm a big believer in the utility of tests, and I do think the author has a point. There is a time and place when a test is not useful, and should be deleted.

However...

> If the future bug occurs, fix it and write a new test that doesn’t flake. Today, delete the tests.

How is this different from simply fixing the flaky test today?

Tests are code, and can also incur technical debt. The reason the test is flaky is likely because nobody is willing to take the time to address it properly. Sometimes it requires a refactoring of the SUT to allow making the test more reliable. Sometimes the test itself is too convoluted and difficult to change. All of this is chore work, and is often underappreciated. Nobody got promoted or celebrated for fixing something that is an issue a random percentage of times. After all, how do we know for sure that it's permanently fixed? Only time will tell.

But the flaky test might still deliver confidence and be valuable when it does run successfully. So deleting it would bring more uncertainty. That doesn't seem like a fair tradeoff for removing an annoyance. The better approach would be to deal with the annoyance.

> What if your tests are written so that a one line code change means updating 150 tests?

That might be a sign that the tests are too brittle, and too "whiteboxy". So fix the tests.

That said, there are situations when a change does require updating many tests. These are usually large refactors or major business logic changes. This doesn't mean that the tests are and won't be useful. It's just a side effect of the change. Tests are code, so fix the tests.

I've often heard negativity around unit tests, from programmers who strongly believe that more utility comes from integration tests (the inverted test pyramid, etc.). One of the primary reasons is this belief that unit tests slow you down because they need to be constantly updated. This is a harmful mentality, coming from a place of laziness.

Tests are code, and require maintenance just as well. Unit tests in particular are tightly coupled to the SUT, which makes them require maintenance more frequently. There should also be more unit tests than other types, adding more maintenance burden. But none of these are reasons to not write unit tests, and codebases without them are more difficult to change, and more susceptible to regressions.

> What if your tests take so long to run that you can’t run them all between merges, and you start skipping some?

That is an organizational problem. Label your tests by category (unit, integration, E2E), and provide quick ways to run a subset of them. During development, you can run the quick tests for a sanity check, while the more expensive tests run in CI.

There's also the problem of long test suites because the tests are inefficient.

Again: *fix the tests*.

> Even worse, what if your business requirements have changed, and now you have thousands of lines of tests failing because they test the wrong thing?

That is a general maintenance task. Would you say the same because you had to update a library that depended on the previous business logic? Would you simply delete the library because it would take a lot of effort to update it?

No?

Then *fix the tests*. :)

mattlondon

15 hours ago

Don't delete flaky tests, fix them.

juped

15 hours ago

I think all the listed reasons are good reasons to delete tests. I like to keep the test suite running in a single-digit number of seconds. (Sometimes a test you really need takes a while, and you can skip it by default and enable it on the CI test runner or whatever.)

Another one I really agree with is "What if your tests are written so that a one line code change means updating 150 tests?". If you update a test, basically, ever, it's probably a bad test and is better off not existing than being like that. It's meant to distinguish main code with errors from main code without errors; if it must be updated in tandem with main code, it's just a detector that anything changed, which is counterproductive. Of course you're changing things, that's why you fenced them with tests.

jessekv

15 hours ago

I hate to admit it, but flaky tests almost always highlight weaknesses in my software architecture.

And fixing a flaky test usually involves making the actual code more robust.

codeulike

13 hours ago

But before you delete the test, write a test that tests whether the test is deleted, and make sure that test is failing as expected. Then delete the test. Then run the other test that makes sure the test is deleted and it should now pass /s