Hyrum’s Law in Golang

317 pointsposted 2 months ago
by thunderbong

180 Comments

kibwen

2 months ago

Hyrum's Law is one of those observations that's certainly useful, but be careful not to fixate on it and draw the wrong conclusions. Consider that even the total runtime of a function is an observable property, which means that optimizing a function to make it faster is a breaking change (what if suddenly one of your queues clears too fast and triggers a deadlock??), despite the fact that 99.99999999% of your users would probably appreciate having code that runs faster for no effort on their part.

Therefore it's unavoidable that what constitutes a "breaking change" is a social contract, not a technical contract, because the alternative is that literally nothing is ever allowed to change. So as a library author, document what parts of your API are guaranteed not to change, be reasonable, and have empathy for your users. And as a library consumer, understand that making undocumented interfaces into load-bearing constructs is done at your own risk, and have empathy for the library authors.

materielle

2 months ago

I think everything you said is totally correct for open source library owners.

But let me offer a different perspective: Hyrum’s law is neither a technical contract nor a social contract. It’s an emergent technical property in a sufficiently used system.

How you respond to that emergent property depends on the social context.

If you are a FOSS maintainer, and an optimization speeds up 99.99% of users and requires 0.01% to either fix their code or upgrade to a new API, you ship it.

If you are working at a big tech company, you need both the optimization and breaking 0% of the company. So you will work across teams to find the sweet spot.

If you are an enterprise software company, and you change breaks 0.1% if users, but that user is one of the top 5 contracts, you don’t ship.

kmacdough

2 months ago

Seems like you're saying the same thing, just using "social contract" differently. I think they use social contract not to mean binding, but to highlight the fact that Hyrums Law must be taken in the social context of the project. In the case of large SW company, the social contract would be to not break services, even when folks are misusing an API. And for a popular open source project, it would mean not breaking a widely used behavior, even if it isn't specified or officially supported. Determining the social contract seems to be precisely what you describe as "not a social contract".

uoaei

2 months ago

> It’s an emergent ... property in a sufficiently used system

This is also a sufficient description of "social contract" for this context.

SatvikBeri

2 months ago

I once sped up a very suboptimal routine that from ~100s to ~.1s (it was doing database lookups in a for loop), and that broke the reporting system because the original author had made several asynchronous function calls and assumed they would have all finished running by the time the (formerly) slow routine was done. Figuring out exactly what happened took forever.

bluGill

2 months ago

That used to be a problem in the 1980s. Thus PCs came with a turbo button to slow them down, and 8 bit computers went the entire decade without upgrading their speed even though faster CPUs were available. These days nearly everything runs on more than one CPU and so nobody relies on function runtime (other than is if fast enough). Even in embedded they have been burned by their one CPU going out of production and so try to avoid that dependency because it cannot be relied on anymore.

thayne

2 months ago

> nobody relies on function runtime

Maybe not intentionally.

But there have been several times where I've seen bugs where two tasks are done concurrently, but task A always takes longer than task B, then someone makes A faster, and that exposes some race condition or deadlock that only occurs if A completes before B.

hinkley

2 months ago

I found a used copy of Warcraft III and found it was unplayable because the scrolling algorithm ran as fast as possible with no minimum time. Any map bigger than 2x2 screens you could not scroll to the middle.

marcus_holmes

2 months ago

I used to enjoy Wing Commander back in the 90's. Then I upgraded my PC and it became unplayably fast - 1 second after I took off the "you died" screen appeared.

outworlder

2 months ago

In the 8 bit computer era, we knew exactly how much time any given instruction took. Retrieving some precision clock (not available!) and computing the time delta between runs - as is trivially done today - would probably be more computing power than they had at the time. Every cycle counted. Not very surprising that it wasn't done at that era. Also, there wasn't a "winning" instruction set or compilers able to target different architectures, so there was far more at stake than just clock speeds. If they changed the processor, you lost all your software.

DOS didn't have any precision clocks either as far as I know (it seems that there's interrupt 1A but it only updates 18 times a second, which is an eternity). Apparently there's 8254 based timer code after a few PC generations.

Windows 95 came up with QueryPerformanceCounter() and that simplified life quite a bit.

rjst01

2 months ago

One day I will give a lighting talk about the load bearing teapot, or how and why I made HTTP Status 418 a load bearing part of an internal API, and why it was the least bad option considering the constraints.

renewiltord

2 months ago

It’s a classic. Binance will give you 429 errors to back off then 418s to tell you you will be IP banned and then they’ll ban you.

hinkley

2 months ago

Google’s spiders will punish you for giving them too many 429 responses. It’s hell for hosting sites with vanity urls. They can’t tell they’re sending you 50+ req/s.

It’s practically a protection racket. Only AWS gets the money.

wbl

2 months ago

50 requests/sec? Did you forget a few zeros?

hinkley

2 months ago

Little’s law is a bitch, and you can get away with a little throttling but not much.

Also, that’s a bit dismissive for HN.

ljm

2 months ago

I feel like this is approaching absurdity, if only because something like the total runtime of a function is not under the control of the author of the function. The operating environment will have an impact on that, for example, as will the load the system is currently experiencing. A GC pass can affect it.

In short, I wouldn't consider emergent behaviours of a machine as part of an intentional interface or any kind of contract and therefore I wouldn't see it as a breaking change, the same as fixing a subtle bug in a function wouldn't be seen as a breaking change even if someone depended on the unintentional behaviour.

I think it's more of a testament to Go's hardcore commitment to backwards compatibility, in this case, than anything else.

skybrian

2 months ago

Yes, it’s an absurd example to make a point. We don’t normally consider performance in scope for what’s considered a breaking API change and there are good reasons for that, including being non-portable. Performance guarantees are what hard real-time systems do and they’re hardware-specific. (There is also the “gas fee” system that Ethereum has, to make a performance limit consistent across architectures.)

But there are still informal limits. If the performance impact is bad enough, (say, 5x slower, or changing a linear algorithm to quadratic), it’s probably going to be reverted anyway. We just don‘t have any practical way of formalizing rough performance guarantees at an API boundary.

kibwen

2 months ago

> If the performance impact is bad enough

Even worse, it's possible to select a new algorithm that improves the best-case and average-case runtimes while degrading the worst-case runtime, so no matter what you do it will punish some users and reward others.

AlotOfReading

2 months ago

It's quite common in cryptography for the runtime to be important. For example, password verification time shouldn't depend on the value of the key or the password. Systems have been broken because someone wrote a string compare that returned early.

ljm

2 months ago

And, since most languages short circuit on basic string comparisons, you'd have some form of `secure_compare` function that compares two strings in constant time, and that behaviour is contracted in the name of the function.

Nobody is rewriting `==` to compare strings in constant time, not because it breaks some kind of API contract, but because it would result in a massive waste of CPU time. The point is, though, that they could. But then they are deciding to sacrifice performance for this one problem.

Crypto is obviously a case of it own when it comes to optimisations and as much as I called out the parent for approaching the absurd, we can pull out many similar special cases of our own.

tshaddox

2 months ago

> Consider that even the total runtime of a function is an observable property, which means that optimizing a function to make it faster is a breaking change

Well yeah, that's pretty much the textbook example of Hyrum's Law (or some funnier variation like "I was relying on the heat from the CPU to warm my bedroom, can you please revert your change that improved CPU performance").

tester756

2 months ago

>which means that optimizing a function to make it faster is a breaking change (what if suddenly one of your queues clears too fast and triggers a deadlock??), despite the fact that 99.99999999% of your users would probably appreciate having code that runs faster for no effort on their part.

I agree with your point, but that's poor example because you can't rely on function's speed reliably and easily.

Timing differs between hw, OS, OS updates, whatever.

Meanwhile it is trivial and easy to rely on error messages.

skovati

2 months ago

reminds me of: https://xkcd.com/1172/

citizenpaul

2 months ago

That one always fell flat for me, but I get it. The idea that an emacs user would communicate with another human rather than tinker with their config to deal with the change is unrealistic. /s /sorta

FiloSottile

2 months ago

Hah, I wrote the crypto/rsa comments. We take Hyrum's Law (and backwards compatibility [1]) extremely seriously in Go. Here are a couple more examples:

- We randomly read an extra byte from random streams in various GenerateKey functions (which are not marked like the ones in OP) with MaybeReadByte [2] to avoid having our algorithm locked in

- Just yesterday someone reported that a private ECDSA key with a nil public key used to work, and now it doesn't, so we probably have to make it work again [3]

- Iterating over a map uses a randomized order to avoid exposing the internals

- The output of rand.Rand is considered part of the compatibility promise, so we had to go to great lengths to improve it [4]

- We discuss all the time what commitments to make in docs and what behaviors to disclaim, knowing we can never change something documented and probably something that's not explicitly documented as "this may change" [6]

[1]: https://go.dev/doc/go1compat

[2]: https://pkg.go.dev/crypto/internal/randutil#MaybeReadByte

[3]: https://go.dev/issue/70468

[4]: https://go.dev/blog/randv2

[5]: https://go.dev/blog/chacha8rand

[6]: https://go-review.googlesource.com/c/go/+/598336/comment/5d6...

mjw_byrne

2 months ago

The map iteration order change helps to avoid breaking changes in future, by preventing reliance on any specific ordering, but when the change was made it was breaking for anything that was relying on the previous ordering behaviour.

IMO this is a worthwhile tradeoff. I use Go a lot and love the strong backwards compatibility, but I would happily accept a (slightly) higher rate of breaking changes if it meant greater freedom for the Go devs to improve performance, add features etc.

Based on the kind of hell users of other ecosystems seem willing to tolerate (cough Python cough), I believe I am not alone in this viewpoint.

wild_egg

2 months ago

Data point of one, but I've been using Go since 2012 and would drop it instantly if any of the backwards compatibility guarantees were relaxed.

Having bugs imposed on you from outside your project is a waste of time to deal with and there are dozens of other languages you can pick from if you enjoy that time sink. Most of them give you greater capabilities as the balance.

Go's stability is a core feature and compensates for the lack of other niceties. Adding features isn't a good reason to break things. I can go use something else if I want to make that trade.

otterley

2 months ago

Respectfully, I don’t think you would just pack up and leave. The cost of switching to an entirely different language—which might have even worse backwards compatibility issues—is significantly higher than fixing bugs you inadvertently introduced due to prior invalid assumptions.

I’d call your bluff.

wild_egg

2 months ago

That's a bit bold when you know nothing about me, but sure.

I exist in a polyglot environment and we use Go for things that we expect to sit and do their job for years without modification.

Nothing more annoying with these than needing to update a runtime to patch a CVE and suddenly needing to invest two weeks to handle all the breaking changes. Go lets us take 5 minutes to bump the version number in the Dockerfile and CI configs and move on to more important work.

I'm not suggesting we'd go rewrite all of those if Go relaxed its guarantees but we'd stop picking it to write new things in and it would slowly disappear as we decommission the existing services over the years.

otterley

2 months ago

Every language and its environment has issues. Switching always introduces a new set of problems, some of which could be worse, and many of which you won't have anticipated when you encounter them.

hn34381

2 months ago

Also, there is a time and a place for things.

Breaking API changes in a minor version update sucks and is often an unexpected time sink, and often mandatory because it has some security patch, critical bug fix, or something.

Breaking API changes in a major version update is expected, can be planned for, and often can be delayed if one chooses.

dsymonds

2 months ago

The map iteration order was always "random", but imperfectly so prior to Go 1.3. From memory, it picked a random initial bucket, but then always iterated over that bucket in order, so small maps (e.g. only a handful of elements) actually got deterministic iteration order. We fixed that in Go 1.3, but it broke a huge number of tests across Google that had inadvertently depended on that quirk; I spent quite a few weeks fixing tests before we could roll out Go 1.3 inside Google. I imagine there was quite a few broken tests on the outside too, but the benefit was deemed big enough to tolerate that.

jerf

2 months ago

Breaking iteration order was also well established as a valid move. Several other languages had already made a similar change, much later in their own lifecycle than Go did. That helps a lot, because it shows it is largely just an annoyance, mostly affecting tests.

ljm

2 months ago

I'd consider stuff like that part of the opinion the language has. Go's opinion is that backwards compatibility at all reasonable cost is a priority.

When it comes to ecosystems, the opinions have trade-offs. I would say that Go's approach to dependencies, modules and workspaces is one of those. As a language it mostly stays out of your way, but correcting imports because it pulled in the wrong version, or dealing with go.mod, go.work and replace directives in a monorepo, gets old pretty fast (to the extent it's easier to just have a monorepo-wide go.mod with literally every dependency in it). At least it's an improvement over having to use a fairly specific directory structure though.

hinkley

2 months ago

Java 5 was a fun upgrade for a lot of people because it caused JUnit tests to run in a different order. Due to hashtable changes altering the iteration of the reflected function names.

Don’t couple your tests, kids!

unscaled

2 months ago

> We randomly read an extra byte from random streams in various GenerateKey functions (which are not marked like the ones in OP) with MaybeReadByte [2] to avoid having our algorithm locked in

You don't seem to do that in ed25519. Back before ed25519.NewKeyFromSeed() existed, that was the only way to derive a public Ed25519 key from a private key, and I'm pretty sure I've written code that relied on that (it's easy to remember, since I wasn't very happy about it, but this was all I could do). The documentation of ed25519.GenerateKey mentions that the output is deterministic, so kudos for that. It seems you've really done a great job with investigating and maintaining ossified behavior in the Go cryptography APIs and preventing new ones from happening.

mkesper

2 months ago

The nil key case really makes me wonder how sane it is to support these cases. You will be forced to lug this broken behavior with you forever, like the infamous A20 line (https://en.wikipedia.org/wiki/A20_line).

FiloSottile

2 months ago

> You will be forced to lug this broken behavior with you forever

Yep, welcome to my life.

atsjie

2 months ago

Wouldn't that broken behaviour be a potential security issue by itself?

I do remember Go making backwards incompatible changes in some rare scenarios like that.

(and technically the loopvar fix was a big backwards incompatible change; granted that was done with a lot of consideration)

whizzter

2 months ago

Wouldn't a nil ECDSA key be a security risk?

unscaled

2 months ago

If a private key is available, the public key can be derived from the private key using scalar multiplication. This is how ecdsa.GenerateKey works by itself - it first generates a private key from the provided random byte stream and then derives a public key from that private key.

I don't see how this can be a security risk, but allowing a public key that has a curve but a nil value is definitely a messy API.

boloust

2 months ago

Ironically, I once wrote a load balancer in Go that relied on the randomized map iteration ordering.

OskarS

2 months ago

Man, you really can’t escape Hyrum’s Law ever! Now we have people depending on the iteration order being random!

dwattttt

2 months ago

Clearly you need to randomly decide whether or not to randomise it.

ahoka

2 months ago

That's why it's totally stupid to randomize it.

abtinf

2 months ago

This is one of the least appreciated aspects of Go. Code I wrote 12 years ago still just works.

gnfargbl

2 months ago

As a user of your code this is true, and I'm very grateful indeed that you take this approach.

I would add as a slight caveat that to benefit from this policy, users absolutely must read the release notes on major go versions before upgrading. We recently didn't, and we were burnt somewhat by the change to disallow negative serial numbers in the x509 parser without enabling the new feature flag. Completely our fault and not yours, but I add the caveat nevertheless.

FiloSottile

2 months ago

We have gotten a liiiiittle more liberal ever since we introduced the new GODEBUG feature flag mechanism.

I've been meaning to write a "how to safely update Go" post for a while, because the GODEBUG mechanism is very powerful but not well-known and we could build a bit of tooling around it.

In short, you can upgrade your toolchain without changing the go.mod version, and these things will keep working like they did, and set a metric every time the behavior would have changed, but didn't. (Here's where we could build a bit of tooling to check that metric in prod/tests/CLIs more easily.) Then you can update the go.mod version, which updates the default set of GODEBUGs, and if anything breaks, try reverting GODEBUGs one by one.

gnfargbl

2 months ago

That sounds good.

Breaking changes in major version updates is a completely normal thing in most software and we usually check for it. Ironically the only reason we weren't previously bothering in go is that the maintainers were historically so hyper-focused on absolute backwards compatibility that there were never any breaking changes!

hambes

2 months ago

Solution to the specifically mentioned problem: Don't use string-based errors, use sentinel errors [1].

More generally: Don't produce code where consumers of your API are the least bit inclined to rely on non-technical strings. Instead use first-level language constructs like predefined error values, types or even constants that contain the non-technical string so that API consumers can compare the return value againnst the constant instead of hard-coding the contained string themselves.

Hyrum's Law is definitely a thing, but its effects can be mitigated.

[1]: https://thomas-guettler.de/go/wrapping-and-sentinel-errors

gwd

2 months ago

The frustrating thing is that the error in question already is a sentinel error -- Grafana (the top-level culprit in the linked search) should be using `errors.As(&http.MaxBytesError{})` rather than doing a string compare.

The whole point of Hyrum's Law is that it doesn't matter how well you design your API: no matter what, people will depend on its behavior rather than its contract.

sssddfffdssasdf

2 months ago

But it looks like that until 3 years ago, this string comparison was the only way to do it. https://github.com/golang/go/pull/49359/files

gwd

2 months ago

Good catch. So in a sense this isn't really Hyrum's Law (which would be more appropriate to things like the Sim City / Windows 3.x UAF bug described in a sibling comment); it's more like, if people need to do something, and you don't give people an explicit way to do it, they'll find an implicit way, and then you're stuck supporting whatever that happened to be.

ekidd

2 months ago

There was a well-known trick in MacOS development in the 90s. You couldn't always avoid relying on undocumented behavior. The docs were incomplete and occasionally vague.

What you could do was try to rely on the same undocumented behavior as everyone else. This way, if Apple broke you, they'd break half their ecosystem at the same time.

lokar

2 months ago

Or they could have fixed the error (adding the type) instead of matching the string.

LudwigNagasena

2 months ago

Early Go lacked lots of features such as errors.As. It was and still is sometimes idiomatic to generate Go because it is so featureless and writing it is often a chore. So it is very much about how well you design your API.

Svip

2 months ago

In your example, the onus is on the consumer not the provider. I could still be writing code that checks if `err.String() == "no more tea available."`. I agree, I shouldn't do that, but nothing is preventing me from doing that. Additionally, errors.Is is a relatively recent addition to Go, so by the time people would check for errors like this, it was just easier to check the literal string. But as an API provider in Go, you cannot prevent your consumers from checking the return values of .String().

hambes

2 months ago

Unfortunately true. The Go maintainers might not agree with me on this, but I think in this case consumers have to learn the hard way. Go tries to always be backwards compatible, but I don't think that trying to be backwards compatible with incorrect usage is ever the right choice.

LudwigNagasena

2 months ago

So the people who decided to make a stringly type error with `errors.New("http: request body too large")` and make you suffer, now can remove a stringly typed error and make you suffer even more? What would the lesson be? What would consumers learn?

hambes

2 months ago

I don't understand your point. The lesson is "don't rely on magic strings, instead rely on exported and documented constants, otherwise your code might break".

LudwigNagasena

2 months ago

My point is that a few years ago there was no exported and document constant. The lesson should be "provide sensible tools, otherwise your consumers will have to rely on implementation details for the most basic expected stuff".

stonemetal12

2 months ago

>My point is that a few years ago there was no exported and document constant.

Then the feature didn't exist. Figuring out undocumented implementation details to "make it work" is asking for it to be broken in the future. So if you are unwilling or unable to support fixing it in the future then don't do that.

If it is "the most basic expected stuff" then quite literally make the determination that it isn't ready for use. A lot of Go was and maybe still be half baked and not ready for production. It is ok to recognize that and not use it.

Joker_vD

2 months ago

I am glad that your circumstances are such that you can just stop working on a project when the tooling it uses turns out to be inadequate, wait five years, and then come back when it improves.

Unfortunately, many people can't really do that: when the ecosystem turns out to be somewhat inadequate in a project that's already been in use for couple of years, their options are either "just make it work one way or another, who cares if it's a hardcoded string, we have to ship the fix ASAP" or "rewrite it all in Rust/X, allegedly their ecosystem is production-ready".

outworlder

2 months ago

> I am glad that your circumstances are such that you can just stop working on a project when the tooling it uses turns out to be inadequate, wait five years, and then come back when it improves.

Is it that terrible to just handle an error as an error, without having to know exactly what the error was? If you see some of the codebases which rely on the error, they are trying to be too clever and doing things like returning a 400 instead of 500 if that's the specific error message returned. Is that really necessary?

Unless the codebase can take corrective actions (and it could still attempt to do it regardless if that's the case), there's really no point trying to be cute. An error is returned, and that's that.

dwattttt

2 months ago

> "just make it work one way or another, who cares if it's a hardcoded string, we have to ship the fix ASAP"

Sure, but now that there's a "correct" way to do this, you don't get to complain that the hacky thing you did needs to keep being supported. You fix the hacky thing you did, or you make peace that you're still doing the hacky thing, problems it causes and all.

beautron

2 months ago

I love that the Go project takes compatibility so seriously. And I think taking Hyrum's Law into account is necessary, if what you're serious about is compatibility itself.

Being serious about compatibility allows the concept of a piece of software being finished. If I finished writing a book twelve years ago, you could still read it today. But if I finished writing a piece of software twelve years ago, could you still build and run it today? Without having to fix anything? Without having to fix lots of things?

> Sure, but now that there's a "correct" way to do this, you don't get to complain that the hacky thing you did needs to keep being supported.

But that's the whole point and beauty of Go's compatibility promise. Once you finish getting something working, you finished getting it working. It works.

What I don't want, is for my programming platform to suddenly say that the way I got the thing working is no longer supported. I am no longer finished getting it working. I will never be finished getting it working.

Go is proving that a world with permanently working software is possible (vs a world with software that breaks over time).

user

2 months ago

[deleted]

estebarb

2 months ago

That is the kind of stuff I would have expected `go vet` to fix.

user

2 months ago

[deleted]

karel-3d

2 months ago

Using string error comparisons was the only way to do this few years ago; and Go has a backwards compatibility promise.

user

2 months ago

[deleted]

user

2 months ago

[deleted]

cedws

2 months ago

Code that checks raw error strings is just plain bad and should be exempt from Go’s backwards compatibility guarantees. There is almost never an excuse for it, especially in stdlib.

pjmlp

2 months ago

Go original design is to blame, for a long time string based errors were the only way, some standard library packages still have them if I am not mistaken, let alone the whole ecosystem.

That is what happens when history of programming languages is ignored on purpose, followed by a "design as we go" approach.

user

2 months ago

[deleted]

adontz

2 months ago

Honestly, this is so much worse than "catch". It's what a "catch" would look like in "C".

hambes

2 months ago

It might look worse than catch, but it's much more predictable and less goto-y.

guappa

2 months ago

goto was only bad when used to save code and jump indiscriminately. To handle errors is no problem at all.

froh

2 months ago

yes, yes, yes! see the Linux Kernel for plenty of such good and readable uses of go-to, considered useful: "on error, jump there in the cleanup sequence ..."

_flux

2 months ago

..as long as you don't make mistakes. I fixed enough goto bugs in Xorg when I was fixing Coverity-issues in Xorg that I can see the downsides of this easy way of error handling.

int_19h

2 months ago

If "catch" is goto-y (and it kinda is), then so is "defer".

kbolino

2 months ago

The biggest difference between try-catch and error values syntactically IMO is that the former allows you to handle a specific type of error from an unspecified place and the latter allows you to handle an unspecified type of error from a specific place. So the type checking is more cumbersome with error values whereas enclosing every individual source of exceptions in its own try-catch block is more cumbersome than error values. You usually don't do that, but you usually don't type-check error values either.

lovasoa

2 months ago

An interesting topic is how to fight Hyrum's law. A possibility is to add randomness in things you don't want people to rely on. If I remember well, this is what the QUIC protocol does. Some fields are unused in the current version, but required by the specification to be set to random values, not null bytes, so that routers don't start relying on them to identify the packets.

EDIT.

I think I found the source: https://www.rfc-editor.org/rfc/rfc9000#section-17.2.1

> The value in the Unused field is set to an arbitrary value by the server. Clients MUST ignore the value of this field. [...] Note that other versions of QUIC might not make a similar recommendation.

I think they call it "greasing", to prevent "ossification".

klabb3

2 months ago

This is wonderful. I’m quite familiar with QUIC but hadn’t heard about this.

Nothing like waking up after 10 years, realize you now really need those bits, and 20 different routers from 10 brands have decided that those bits must be a certain way.

Bonus points for checksums/crypto that breaks on the other end if the bits have been messed with. Curse those middle-boxes and their “clever hacks”.

rho4

2 months ago

Interesting thanks! Might indeed be valuable to add to one's toolbox.

adontz

2 months ago

This is a good example of "stringly typed" software. Golang designers did not want exceptions (still have them with panic/recover), but untyped errors are evil. On the other hand, how would one process typed errors without pattern matching? Because "catch" in most languages is a [rudimentary] pattern matching.

https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...

KRAKRISMOTT

2 months ago

Go has typed errors, it just didn't use it in this case.

simiones

2 months ago

In principle. In practice, most Go code, and even significant parts of the Go standard library, return arbitrary error strings. And error returning functions never return anything more specific than `error` (you could count the exceptions in the top 20 Go codebases on your fingers, most likely).

Returning non-specific exceptions is virtually encouraged by the standard library (if you return an error struct, you run into major issues with the ubiquitous `if err != nil` "error handling" logic). You have both errors.New() and fmt.Errorf() for returning stringly-typed errors. errors.Is and errors.As only work easily if you return error constants, not error types (they can support error types, but then you have to do more work to manually implement Is() and As() in your custom error type) - so you can't easily both have a specific error, but also include extra information with that error.

For the example in the OP, you have to do a lot of extra work to return an error that can be checked without string comparisons, but also tells you what was the actual limit. So much work that this was only introduced in Go 1.19, despite MaxBytesReader existing since go 1.0 . Before that, it simply returned errors.New("http: request body too large") [0].

And this is true throughout the standard library. Despite all of their talk about the importance of handling errors, Go's standard library was full of stringly-typed errors for most of its lifetime, and while it's getting better, it's still a common occurrence. And even when they were at least using sentinel errors, they rarely included any kind of machine-readable context you could use for taking a decision based on the error value.

[0] https://cs.opensource.google/go/go/+/refs/tags/go1:src/pkg/n...

kbolino

2 months ago

You do not have to do more work to use errors.Is or errors.As. They work out of the box in most cases just fine. For example:

    package example

    var ErrValue = errors.New("stringly")

    type ErrType struct {
        Code    int
        Message string
    }
    func (e ErrType) Error() string {
        return fmt.Sprintf("%s (%d)", e.Message, e.Code)
    }
You can now use errors.Is with a target of ErrValue and errors.As with a target of *ErrType. No extra methods are needed.

However, you can't compare ErrValue to another errors.New("stringly") by design (under the hood, errors.New returns a pointer, and errors.Is uses simple equality). If you want pure value semantics, use your own type instead.

There are Is and As interfaces that you can implement, but you rarely need to implement them. You can use the type system (subtyping, value vs. pointer method receivers) to control comparability in most cases instead. The only time to break out custom implementations of Is or As is when you want semantic equality to differ from ==, such as making two ErrType values match if just their Code fields match.

The one special case that the average developer should be aware of is unwrapping the cause of custom errors. If you do your own error wrapping (which is itself rarely necessary, thanks to the %w specifier on fmt.Errorf), then you need to provide an Unwrap method (returning either an error or a slice of errors).

simiones

2 months ago

Your example is half right, I had misread the documentation of errors.As [0].

errors.As does work as you describe, but errors.Is doesn't: that only compares the error argument for equality, unless it implements Is() itself to do something different. So `var e error ErrType{Code: 1, Message: "Good"} = errors.Is(e, ErrType{})` will return false. But indeed Errors.As will work for this case and allow you to check if an error is an instance of ErrType.

[0] https://play.golang.com/p/qXj3SMiBE2K

dgunay

2 months ago

If you want errors to behave more like value types, you can also implement `Is`. For example, you could have your `ErrType`'s `Is` implementation return true if the other error `As` an `ErrType` also has the same code.

kbolino

2 months ago

Probably worth noting that errors.As uses assignability to match errors, while errors.Is is what uses simple equality. Either way, both work well without custom implementations in the usual cases.

unscaled

2 months ago

Go errors cannot be string-typed, since they need to implement the error interface. The reason testing error types sometimes won't work is that the error types themselves may be private to the package where they are defined or that the error is just a generic error created by errors.New().

In this case the Error has an easy-to-check public type (*MaxBytesError) and the documentation clearly indicates that. But that has not always been the case. The original sin is that the API returned a generic error and the only way to test that error was to use a string comparison.

This is an important context to have when you need to make balanced decisions about Hyrum's law. As some commentators already mentioned, you should be wary of taking the extreme version of the law, which suggest that every single observable behavior of the API becomes part of the API itself and needs to be preserved. If you follow this extreme version, every error or exception message in every language must be left be left unchanged forever. But most client code doesn't just go around happily comparing exception messages to strings if there is another method to detect the exception.

karel-3d

2 months ago

They didn't have them when they implemented this code.

Back then, error was a glorified string. Then it started having more smart errors, mostly due to a popular third party packages, and then the logic of those popular packages was more or less* put back to go.

* except for stacktraces in native errors. I understand that they are not there for speed reasons but dang it would be nice to have them sometimes

adontz

2 months ago

Nobody teaches people to use them. There is no analog to "catch most specific exceptions" culture in other languages.

TheDong

2 months ago

It has typed errors, except every function that returns an error returns the 'error' interface, which gives you no information on the set of errors you might have.

In other statically typed languages, you can do things like 'match err' and have the compiler tell you if you handled all the variants. In java you can `try { x } catch (SomeTypedException)` and have the compiler tell you if you missed any checked exceptions.

In go, you have to read the recursive call stack of the entire function you called to know if a certain error type is returned.

Can 'pgx.Connect' return an `io.EOF` error? Can it return a "tls: unknown certificate authority" (unexported string only error)?

The only way to know is to recursively read every line of code `pgx.Connect` calls and take note of every returned error.

In other languages, it's part of the type-signature.

Go doesn't have _useful_ typed errors since idiomatically they're type-erased into 'error' the second they're returned up from any method.

inlined

2 months ago

You actually should never return a specific error pointer because you can eventually break nil checks. I caused a production outage because interfaces are tuples of type and pointer and the literal nil turns to [nil, nil] when getting passed to a comparator whereas your struct return value will be [nil, *Type]

stouset

2 months ago

It's really hard to reconcile behavior like this with people's seemingly unshakeable love for golang's error handling.

rocqua

2 months ago

Exceptions in Python and C are the same. The idea with these is, either you know exactly what error to expect to handle and recover it, or you just treat it as a general error and retry, drop the result, propagate the error up, or log and abort. None of those require understanding the error.

Should an unexpected error propagate from deep down in your call stack to your current call site, do you really think that error should be handled at this specific call-site?

TheDong

2 months ago

Yes, python and C also do not have properly statically typed errors.

In python, well, python's a dynamically typed language so of course it doesn't have statically typed exceptions.

"a better type system than C" is a really low bar.

Go should be held to a higher bar than that.

Svip

2 months ago

The consumer didn't, but the error in the example is typed, it's called `MaxBytesError`.

eptcyka

2 months ago

Matching the underlying type when using an interface never feels natural and is definitely the more foreign part of Go's syntax to people who are not super proficient with it. Thus, they fall back on what they know - string comparison.

simiones

2 months ago

Only since go 1.19. It was a stringy error since go 1.0 until then.

stouset

2 months ago

Go didn't have them at the time.

Pessimistically this is yet another example of the language's authors relearning why other languages have the features they do one problem at a time.

dhosek

2 months ago

At one job, I found a misspelling in an error message and fixing it only to discover that the web of dependencies on that misspelled text was so deep that it was impractical to fix and had to return to the misspelled text. It still bugs me.

fulafel

2 months ago

At least in the post context there's still time to fix "Golang".

zaptheimpaler

2 months ago

It's sort of Hyrum's Law but it's really just Go being Go. The error could've been an enum type that could be changed with only a string replace for consumers. Instead they are using strings as types, so now you have no idea how consumers might rely on it. They could check the middle 6 chars of the error and break if you change it. It's another terrible anachronistic design decision when better alternatives have been in use in other languages for decades. Early mistakes + inability to change things means you're stuck forever.

nitwit005

2 months ago

Yes, sadly, the comment is essentially incorrect. The strings were their official API in many cases.

adrianmsmith

2 months ago

It's interesting that this law is the exact opposite of the Robustness Principle / Postel's Law.

> be conservative in what you send, be liberal in what you accept

If you are liberal in what you accept, you'd better understand the ways in which you've been liberal, and document them (at least) internally, because you're going to have to support all those ways forever, even after huge codebase changes, due to Hyrum's Law.

I try to avoid creating APIs which are "liberal in what they accept" for exactly that reason.

remus

2 months ago

> I try to avoid creating APIs which are "liberal in what they accept" for exactly that reason.

That's my preference too. When you have relaxed criteria about what kind of data you accept via an API I find you inevitably end up having to make decisions about how to massage that data in to some sort of canonical format, and those decisions almost always seem to end up leading to behaviour that's surprising to users in one way or another.

mox111

2 months ago

Perhaps some package authors are more accepting of this than others. I stumbled upon this comment in the `json` package the other day:

// isValidNumber reports whether s is a valid JSON number literal. // // isValidNumber should be an internal detail, // but widely used packages access it using linkname. // Notable members of the hall of shame include: // - github.com/bytedance/sonic

jameson

2 months ago

what I learned from shipping APIs:

1. Clients will do whatever they need to do get their job done, even if it's not the publisher's intended way

2. Clients don't read documentation

3. Bugs will become part of API once enough clients rely on their behavior

4. The number of API calls does not necessarily equate to importance.

---

As such, I aim for the following when developing an API

1. Ship the beta API early and see how they use it to minimize the surprise. (This may not always be possible)

2. In most cases, bump up the major version while supporting the previous version. This means you'll need to define SLA for your API

3. Most clients are OK with breakages as long as they are given enough time to migrate, or the API provider gives them a tool to auto-migrate the code (if that's possible in your product)

turtleyacht

2 months ago

In Docker's error response for `docker rmi'; the fifteenth word is "container" and the sixteenth is the container ID.

apitman

2 months ago

When I clicked on the link to codebases relying on the specific error string, I was expecting to see random side projects. Wasn't expecting to see Grafana and Caddy on the list.

gwd

2 months ago

To be fair to those projects, the type was introduced only three years ago:

https://github.com/golang/go/pull/49359/files

Before that, doing a string compare was basically the only way to detect that specific error. That was definitely an omission on the part of the original authors of the stdlib code; I don't it should be classified as "Hyrum's Law".

apitman

2 months ago

Yeah I don't doubt it was the best option. Just a bit surprised.

mholt

2 months ago

Hey Anders; Francis notified us of this today. I didn't realize a proper type had been created. We'll update our code.

Cthulhu_

2 months ago

Never underestimate the mediocrity of known large codebases, lol.

(just kidding, they're not mediocre, but they're not infallible or perfect either)

apitman

2 months ago

This instance doesn't necessarily indicate they did anything wrong. See sibling.

aidenn0

2 months ago

Many occurrences of Hyrum's are "desire paths[1]" of APIs. For many people, the most obvious way to determine if the error was a MaxBytesError was to check the string. I'm not familiar with Go, but assuming it has RTTI, I'm guessing the intended path for people to take was to check the type of the error against MaxBytesError, and occurrences of this were people who either didn't know that, or found the error string to be immediately available in their tooling, but the type not immediately available.

[edit]

Per [2], this looks desire paths is even more an apt analogy than I thought; until 3 years ago, this code returned a generic Error type.

1: https://en.wikipedia.org/wiki/Desire_path

2: https://news.ycombinator.com/item?id=42202472

algorithmsRcool

2 months ago

Another related effect of this is Protocol Ossification [0] which happens when implementers of a public API/Protocol surface area take implicit dependencies on common but not standardized behaviors of the API/Protocol implementation.

That being said, you can take proactive steps to defeat this. For example, the default Hash for strings in .NET is randomly seeded each time a process starts[1] in order to strongly dissuade folks from taking an implicit dependency on the underlying algorithm which is not guaranteed to be stable

[0] : https://en.wikipedia.org/wiki/Protocol_ossification

[1] : https://andrewlock.net/why-is-string-gethashcode-different-e...

thayne

2 months ago

IME, one of the places hyrums law shows up the most is in tests.

I've seen tests of various types (unit, integration end-to-end) break because they made assumptions about behaviors that weren't garanteed, and supposedly backwards compatible updates broke them. Here are some examples of things that have broken tests:

- an update resulted in a change in the order of elements in a Hashmap or set.

- a change in an error message (or other user-facing message), changed

- a change in how leap days are handled for datetime arithmetic

- change in the format of locale-specific datetimes

- the timezone offset for a given area

- removal of internal-only APIs that were accessed using reflection

- something performed faster, which revealed race conditions in the testing code

- changing the precise representation of some data format. To give a specific example, changing a single byte when gzip compressing a file, that has no impact on the compressed content.

raverbashing

2 months ago

It's like an inverted game of cat and mice

1 - Lang/OS/Lib developer puts out a quirky or buggy API (or even just an ok API)

2 - Developers rely on a quirky, weird or unexpected side effect because it's easier/more obvious or it just works this way due to a bug

3 - Original developer can't fix it because it would break compatibility

4 GOTO 1

withinboredom

2 months ago

Immediately reminded of this: https://externals.io/message/126011 that is an ongoing conversation in php-internals about removing a quirky/buggy behavior from PHP that, at the very end (at least of this comment's time) someone jumps in and says "yep, its useful, please keep it"

Cthulhu_

2 months ago

And this isn't even quirky/buggy, it's just the string representation of an error. That said, Go took a while to improve its core error mechanisms and add utilities for matching errors by type instead of its string representation.

simiones

2 months ago

In this case, it really is - because until Go 1.19, that function simply returned `errors.New("http: request body too large")`. So until Go 1.19, there really was no other way to check if this error occurred than `err.String() == "http: request body too large"`. Even if we had had errors.Is/As earlier, it wouldn't have helped in this case.

sudhirj

2 months ago

Weren’t there a couple of anecdotes where Windows couldn’t fix a bug because some popular game (maybe SimCity?) depended on it, so the devs hardcoded a SimCity check inside Windows and made the bug happen if it was running?

masklinn

2 months ago

It was not a bug in windows, it was a bug in SimCity: it would UAF some memory, but the Windows 3.x allocator did not unmap / clear that memory so it worked.

Windows 95 changed that, and so one of the compatibility shims it got is that the allocator had a 3.x adjacent mode, which would be turned on when running SimCity (and probably other similarly misbehaving software as well).

Nowadays this is formalised in the compatibility engine (dating back to windows do), which can enable special modes or compatibility shims for applications (windows admins trying to run legacy or unmaintained applications can manage the application of compatibility modes via the “compatibility administrator”).

praptak

2 months ago

Still a pretty good example of having to support something which is definitely not part of the official spec.

guappa

2 months ago

Had it been open source, they could have just fixed the software instead

masklinn

2 months ago

Fixing the upstream would not have updated it on the millions of machines running it, which is what they wanted to not break.

adontz

2 months ago

https://www.joelonsoftware.com/2000/05/24/strategy-letter-ii...

Jon Ross, who wrote the original version of SimCity for Windows 3.x, told me that he accidentally left a bug in SimCity where he read memory that he had just freed. Yep. It worked fine on Windows 3.x, because the memory never went anywhere. Here’s the amazing part: On beta versions of Windows 95, SimCity wasn’t working in testing. Microsoft tracked down the bug and added specific code to Windows 95 that looks for SimCity. If it finds SimCity running, it runs the memory allocator in a special mode that doesn’t free memory right away. That’s the kind of obsession with backward compatibility that made people willing to upgrade to Windows 95.

_rlh

2 months ago

Fred Brooks discussed this in the unfortunately named pun "The Mythical Man-Month". Most of the gray beards have read it, ask to borrow it, it will make their day. The punchline was on the IBM 360 they stopped fixing bugs when the fix cause the same or more bugs than the unfixed bug, which soon became all bugs.

Well aware of Brooks, when the loop var semantics were changed Go did an analysis showing that many more bugs were fixed than created by the change.

littlestymaar

2 months ago

> so per Hyrum's Law it's probably relied upon by some.

Yikes. this kind of defensive posture with respect to Hyrum's law is extreme and absurd. Per Hyrum's Law everything is potentially relied upon by someone, keeping stuff that may be relied upon means you cannot change anything (see this infamous xkcd on this[1])!

Thinking that no change is acceptable at all isn't the right take-away from Hyrum's Law: instead you should be ready to have to roll back changes that break people's workflow even when you didn't expected the change to break anything (and it also means that you need to have a way for your users to communicate their issues to you, which definitely isn't something Google is well-known for …).

[1]: https://xkcd.com/1172/

user

2 months ago

[deleted]

earthboundkid

2 months ago

I wrote "// Due to Hyrum's law, this text cannot be changed." AMA.

rotbart

2 months ago

Hyrum's Law especially applies when you have consumers of your APIs that violate Postel's Law. To minimise those in the past, we've introduced intentional jitter in our API responses that while didn't violate the schema prevented unintentional reliance on behaviour that wasn't intentional[1].

[1]: <https://medium.com/pageup-tech/update-on-driving-client-resi...>

praptak

2 months ago

Corollary: uptime is part of the defacto spec being relied on.

One of the SRE practices is breaking your service on purpose to bring the actual service level closer to what is promised and supported.

Cthulhu_

2 months ago

As another commenter pointed out, this is to a point what Go does as well; for example, map iteration is randomised so no implementation will rely on insertion order.

dangfault

2 months ago

another one, you pay me below market rate and you get below market rate code

jgeada

2 months ago

I think Hyrum's law really depends on APIs not applying consequences to people that depend on non-guaranteed behavior. The world needs more consequences for poor behavior.

Just randomly change the non-guaranteed stuff in every release and this behavior likely would stop and/or you'd lose the users that don't know any better. Both sides of that sound like a win to me.

kazinator

2 months ago

The only reason we care about this law is that the end users matter.

If we can change a misused API such that things break only for the misusing developer, then that is usually a who-cares.

Sometimes that developer is in the same organization that you're in, and their users are your users.

Sometimes that developer is no longer in your organization and their code is your code.

souenzzo

2 months ago

Clojure manages to do improvement to the language, without breaking the users.

Most of improvements are "additions", it is never a "change" or "re-do something better"

It is an awesome experience be able to upgrade the language anytime with no fear or pain.

inlined

2 months ago

Go seems really sensitive to this subject. Maps iterate in order, but one day they said “this is incidental and we said not to rely on it. You do, so we’re breaking it in a minor release” and now maps iterate in order… from a random offset

kbolino

2 months ago

On the one hand, I never realized that map iteration order was consistent, but it's just the starting point that changes. On the other hand, I guess there's no other way to do it, since a proper shuffle would require O(n) bookkeeping. I suppose you could also flip a coin for going backwards too.

sublimefire

2 months ago

There are cases when you need to make a choice if you want to fix the bug as it might break many people who rely on it. There is no real good answer but to be able to look forward and anticipate the misuse.

fullstackchris

2 months ago

Sure... but this is why we have sem versioning and release notes. It's always nice to try and support all users but sometimes you just need to ship breaking changes...

Cthulhu_

2 months ago

While in principle you're correct, Go the language is very dedicated to backwards and forwards compatibility; while there's been talk of a Go 2 for a long time now, they're not eager to go there and if they do, they intend to make the transition low impact.

That said, I'd say this is an excellent candidate to deprecate or warn about now, and to make impossible in a version 2. Then again, how would you even stop this? A string representation of an error is common in any language, you need it to log things.

I think at best there will be a static analysis rule (in e.g. go vet) that tries to figure out if any matching is done on the string representation of an error.

TheDong

2 months ago

> I think at best there will be a static analysis rule (in e.g. go vet) that tries to figure out if any matching is done on the string representation of an error.

First they'd need to export the errors the stdlib returns https://news.ycombinator.com/item?id=41507714

I wouldn't hold my breath on that one.

fullstackchris

2 months ago

I'm not talking about Go itself, I'm talking about building an API. All this talk of "string vs type" is not the solution to the root problem - sure, types can be better to return but what if the type changes? You still have breaking changes.

hifromwork

2 months ago

Hyrum's law is specifically about how every change is a breaking change if you have enough users. So it's always a bit subjective. No sane person considers changing an error message a breaking change in context of semver. It's just go going above and beyond to take care of backward compatibility.

jnordwick

2 months ago

Does anybody else always read this as Hyrule's Law?

jimjimjim

2 months ago

I wish someone in the Chrome team knew about Hyrum's Law before they released a breaking change in the way chrome validates custom URLs.

sixfiveotwo

2 months ago

Quite interesting, thank you.

However, in this specific instance, even if the text cannot be changed, couldn't the error itself in the server be processed and signaled differently, eg. by returning a Status Code 413[1], since clients ought to recognize that status code anyway?

[1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/413

majewsky

2 months ago

Since the caller gets this as an error object, instead of as a plain string, it seems likely that this is within the same process, i.e. a library function returns the MaxBytesError to a level higher in the business logic, without a network transmission inbetween.

wodenokoto

2 months ago

At work we have a data providers whose API always returns 200, often with the text content “ERROR: …”

And that’s how put Hyrums law into effect.

indulona

2 months ago

This is why we have semantic versioning.

simiones

2 months ago

Semantic versioning does nothing to help here. If you don't realize that people are depending on such a behavior, you won't increment the major version number.

mseepgood

2 months ago

And if you realize it (as in this case) you probably don't want to increase the major version number either, but leave it as-is (unless you follow the CADT model of maintainership).

cloverich

2 months ago

I think a very nice middle ground is when you decide to remove something, to mark it as deprecated in the next major version, and remove it in the one after. Not always possible, but IIRC React does this; so I'd frequently upgrade, then start seeing deprecation warning messages (in dev); I'd then have a clear signal before upgrading to the next version. It helped that major versions did not arrive often so making this kind of change was only occasionally necessary.

A bit trickier in this case no doubt; and trade offs. Ive not minded the React updates over the years, but busting out the Go code I wrote many years ago and having it still run flawlessly is amazing too.

gr4vityWall

2 months ago

This line of the article resonates with me a lot: > A good reminder to be careful when changing code others might depend on

If you maintain a widely use Free Software library, please consider avoiding breaking changes when possible.

I'm not going to imply you have any obligation to do so, but your users will appreciate it.

evanfarrar

2 months ago

What do they call the law that says they can’t increase the major version?

remus

2 months ago

"Guido's Law" perhaps?

red_admiral

2 months ago

Should this not be handled by checking "resp.status == 413" ?

ivanjermakov

2 months ago

I don't know. I think Hyrum's law should not prevent the project from advancing. If the user is so dependent on non-contact behavior of the API, they have to expect that some logic might break even on minor version updates.

lokar

2 months ago

People who parse text error messages deserve what they get

tonymet

2 months ago

> to design systems in a way that minimizes the chances of unintended

go team did their best to define a custom MaxSizeError to discourage developers from the flimsy string dependency.

Every system hits a limit on the amount of guard rails and protections needed to protect foolish customers from their own bad behavior.

Sometimes you need to deliberately break dependencies that were never meant to exist to reveal the vulnerabilities in a system.

djoldman

2 months ago

One interesting metric for LLMs is that for some tasks their precision is garbage but recall is high. (in essence: their top 5 answers are wrong but top 100 have the right answer).

As relates to infinite context, if one pairs the above with some kind of intelligent "solution-checker," it's interesting if models may be able to provide value across absolute monstrous text sizes where it's critical to tie two facts that are worlds apart.

mormegil

2 months ago

This probably didn't belong here?