OpenBSD now enforcing no invalid NUL characters in shell scripts

129 pointsposted 6 hours ago
by CTOSian

110 Comments

jrockway

21 minutes ago

I like the term post-Postel.

There are two reliability constraints that all software faces; security and interoperability. The more lax you are about validation, the more likely interoperability is. "That's weird, I'll just do whatever" is doing SOMETHING, and it's often to the end user's liking. But, you also enter a more and more undefined state inside the software on the other side, and that's where weird things happen. Weird things happening typically manifest as security problems. So the more effort you go to to minimize the possibility of entering a weird state, the more confidence you have that your software is working as specified.

Postel's Law made a lot of sense to me when developing the early Internet. A lot of people were reading imperfect RFCs, and it was nice when your HP server could communicate with a Sun workstation, even though maybe some bit in the TCP header was set wrong. But now? You just gotta get it right and push a hotfix when you realize you messed something up. (Sadly, I don't think it's possible. Middleboxes are getting more and more popular. At work, we make a product where the CLI talks to the server over HTTP/2. We also install Zscaler on every workstation. Zscaler simply blocks HTTP/2. So you can't use our product. Awkward.)

Thiez

16 minutes ago

This is also where Google went right with QUIC: encrypt as much as possible to show middleboxes the least possible. This combats ossification. Then again it seems likely middleboxes will just block QUIC (or UDP in general).

amiga386

5 hours ago

Here's the actual diff:

https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/bin/ksh/shf.c....

And it looks like that covers all parsed parts of the shell script or history file, including heredocs. I get the feeling it's going to break all shar archives with binary files (not that they're particularly common). It will stop NULs being in the script itself, but it won't stop them coming from other sources, e.g.

    $ var=$(printf '\0hello')
    -bash: warning: command substitution: ignored null byte in input
    $ echo $var
    hello
It remains to be seen if this will be adopted by anyone else, or if it'll be another reason to use OpenBSD only as a restricted environment and not as a general computing platform.

> "If there is ONE THING the Unix world needs, it is for bash/ksh/sh to stop diverging further"

> OpenBSD ksh: diverges further

matrix2003

4 hours ago

Eh - I actually like developing on OpenBSD first, because of restrictions like this. If it runs on OpenBSD, you are likely to have fewer bugs around things like malloc.

OpenBSD is also really good about upstreaming bug fixes, which is a good thing. Firefox used to be a dumpster fire of core dumps on OpenBSD, and many issues were uncovered and fixed that way.

bell-cot

an hour ago

> Here's the actual diff:

Only 8 short, simple lines of c code. Beautiful.

mcculley

5 hours ago

"We are in a post-Postel world" is a great way to put it. This needs to be repeated by everyone working with file formats or accepting untrusted input.

stackghost

5 hours ago

"Postel" is not a term that carries any significance for me, and Googling that word didn't turn anything up that seemed relevant.

Who or what is a Postel?

Ndymium

5 hours ago

It's a reference to Jon Postel who wrote the following in RFC 761[0]:

    TCP implementations should follow a general principle of robustness:
    be conservative in what you do, be liberal in what you accept from
    others.
Postel's Law is also known as the Robustness principle. [1]

[0] https://datatracker.ietf.org/doc/html/rfc761#section-2.10

[1] https://en.wikipedia.org/wiki/Robustness_principle

arcanemachiner

3 hours ago

I've always felt that this was a misguided principle, to be avoided when possible. When designing APIs, I think about this principle a lot.

My philosophy is more along the lines of "I will begrudgingly give you enough rope to hang yourself, but I won't give you enough to hang everybody else."

quesera

2 hours ago

HTML parsing is the modern-ish layer-uplifted example of liberal acceptance.

I won't argue that this hasn't been a disaster for technologists, but there are many arguments that this was core to the success of HTML and consequently the web.

Which, yes, could be considered its own separate disaster, but here we are!

IshKebab

35 minutes ago

Ironically it leads to less robust systems in the long term.

komon

5 hours ago

A reference to Postel's Law: be conservative in what you produce and liberal in what you accept.

The law references that you should strive to follow all standards in your own output, but you should make a best effort to accept content that may break a standard.

This is useful in the context of open standards and evolving ecosystems since it allows peers speaking different versions of a protocol to continue to communicate.

The assertion being made here is that the world has become too fraught with exploiting this attitude for it to continue being a useful rule

godshatter

5 hours ago

What would have been the result of John Postel advocating for conservative inputs, I wonder? I'm wondering if the most common protocols would have been bypassed if they had all done this by other protocols that allowed more liberal inputs.

miki123211

2 hours ago

Probably more convoluted protocols, because there are always things that you do accept and that can be used to negotiate protocol extensions.

Imagine a protocol where both sides have to speak JSON with a rigidly-defined structure, and none of the sides is allowed to ask whether the other supports any extension. Such a protocol looks impossible to extend, but that is not the case, you can indicate that you speak a "relaxed" version of that protocol by e.g. following your first left brace by a predefined, large number of whitespace characters. If you see a client doing this, you know they won't drop the connection if you include a supported_extensions field, and you're still able to speak the rigid version to strict clients.

quesera

2 hours ago

This made me laugh, because it's even more terrible than the most ridiculous chicanery we had to vomit into HTML and CSS over the years (most of which was the fault of MSIE6).

Joker_vD

5 hours ago

Yep. Which is why Postel law is, sadly, more like a law of nature (see also "worse is better") than an engineering principle you may or may not follow.

mrighele

5 hours ago

I know it is a single example and we should extrapolate much out of it, but in the case of html those who accepted more liberal input (html4/5) won over over those that were more conservative (xhtml).

mikaraento

2 hours ago

RFC 9413 referenced in a parent mentions HTML. It points out that formats meant to be human-authored may benefit more from being liberally accepted.

I also read that XHTML made template authoring hard, as the template itself might not be valid XHTML and/or different template inputs might make output invalid. (I sadly can't find the source of this point right now, but I can't claim credit for it).

0cf8612b2e1e

3 hours ago

I would almost argue a failing of so many standards is the lack of surrounding tooling. Is this implementation correct? Who knows! Try it against this other version and see if they kind of agree. More specifications need to require test suites.

jancsika

4 hours ago

Am I correct that malformed pages in xhtml would have triggered the browser to output a red XML error and fail to render the page at all?

Calavar

3 hours ago

Yes, but only if you served the XHTML with the proper MIME type of application/xhtml+xml. Nearly everyone served it as text/html, which would lead to the document being intepreted as this weird pseudo XHTML/HTML4 hybrid dialect with all sorts of brower idiosyncrasies [1].

[1] https://www.hixie.ch/advocacy/xhtml

edflsafoiewq

3 hours ago

Not really, since in the end HTML5 defined a precise parsing algorithm that AFAIK everyone follows.

quesera

2 hours ago

HTML5 was born in an era of decent HTML authoring tooling. Very few people write HTML by hand nowadays. This was not true of earlier versions.

Also note that HTML5 codified into liberal acceptance some of the "lazy" manual errors that people made in the early days (many of which were strictly and noisily rejected in XHTML, for example).

ok123456

5 hours ago

The fact that googling Postel was worthless also indicates we're in a post-google search world.

Brian_K_White

4 hours ago

2nd result on kagi was about him but in the form of another critic.

https://datatracker.ietf.org/doc/draft-thomson-postel-was-wr...

Hard disagree.

It's a valid argument, but I say it's merely an argument, not an argument that wins or should win.

But also, I say that detecting out of spec or unexpected input and handling it in any other way than crashing IS adhering to Postel.

Refusing to process a request is better than munging the data according to your own creative interpretation of reasonable or likely, and then processing that munged data.

I consider that to be within Postel to return a nice error (or not if that would be a security divulgence). Failing Postel would be to crash or do anything unintended.

skybrian

3 hours ago

Google’s results for “Postel’s law” and “Jon Postel” are fine. “Postel” is ambiguous, a fairly common surname, so websites of unrelated companies show up, and a disambiguating page on wikipedia that links to Jon Postel and several other people.

ok123456

3 hours ago

I thought the whole point of letting Google surveil your entire life was they would know that if you're interested in computing and networks, to the point of participating on news.hackernews.com, then they'd know that if you're searching for "Postel," you'd probably want Postel's law to be on the first page.

We're back at pre-1998 search, where we have to specify more and more context just to get results that aren't noise.

stackghost

3 hours ago

I'm actually astounded at how quickly the quality of Google search results has tanked in recent years.

AStonesThrow

4 hours ago

Bing had no trouble at all finding him from my device.

runjake

4 hours ago

Jon Postel was instrumental in making the Internet what it is today.

https://en.wikipedia.org/wiki/Jon_Postel

The Wikipedia article is kinda unclear and doesn't provide the proper context, so:

- Ran IANA, which assigned IP addresses for the Internet.

- Editor of RFCs, which are documents that defined protocols in use by the Internet.

- He wrote a bunch of important RFCs that defined how some very important protocols should work.

- Created or helped create SMTP, DNS, TCP/IP, ARPANET, etc.

teraflop

5 hours ago

It's a reference to "Postel's law" which is a pretty well-known principle in the networking world, and in software more broadly. Named after Jon Postel, who edited and published many of the RFCs describing core Internet protocols.

https://en.wikipedia.org/wiki/Robustness_principle

nabla9

5 hours ago

Agreed.

When every implementation in wide use has their own quirks, you must support them all to make your program widely used. Every special case is yet another potential bug to chase down.

It also allows "Embrace, extend, and extinguish" -strategy that Microsoft used so successfully to assfuck the internet over a decade.

pjmlp

5 hours ago

I think you mean Google.

nabla9

4 hours ago

No. The Microsoft. MS invented the term. DOJ found that MS used "Embrace, extend, and extinguish" in internal documents.

Younger people don't know how absolutely ruthless and harmful Wintel monopoly was under Gates. Java did not work on purpose. Javascript did not work for purpose.

   <!--[if IE]> 
everywhere.

They attempted to kill open web in the crib with their blackbird project. Only MSN (The Microsoft Network) for normal people.

pjmlp

2 hours ago

Except it Google that morphed the Web into ChromeOS, with the help of EVERYONE that ships it alongside their applications, as they can't be bothered to learn cross-platform frameworks.

Many of them people that used to complain about Micro$oft and should know better.

IshKebab

32 minutes ago

Anyone who was around for the IE6 era knows how much worse it was than the current Chrome era. It's not even close.

bigstrat2003

2 hours ago

Agreed that it is a Microsoft term. But in my experience, it is older people who incorrectly judge Microsoft ruthlessness, not younger people. I am of an age where I remember well what Microsoft was like in those days, and it frankly was not as bad as people make it out to be. Nor was it really worse than the ruthless tech companies of today.

quesera

2 hours ago

I was there too, and I disagree completely. Microsoft was not just ruthless, they were ubiquitous. They sabotaged any perceived competitors in anticompetitive, market- and industry-damaging ways.

You (the generic "you") can complain all you want about Apple today, but you have another perfectly viable option. And Apple is (almost-entirely) happy to grow market share on merits without salting the earth of any rivals.

In Microsoft's heyday, that was not true. Those of us who rejected MS back then did so at a much higher cost than green chat bubbles.

It was worth it though. And we did win, eventually.

chasil

an hour ago

Microsoft could not win, although they tried very hard.

Windows was never going to scale down to the portable devices that we now use (because defeating Apple would have been very difficult, and AOSP made it insurmountable).

Windows was never going to scale up to the top 500 supercomputer list (for largely economic reasons).

Microsoft itself has tacitly admitted that Azure is better served by Linux, and we ponder why.

Did the DoJ actions against Microsoft really have an impact? I don't know.

Brian_K_White

3 hours ago

There is no such thing as a post Postel world. But handling the input in any other way than crashing or ub IS perfectly Postel.

Deciding that nul is invalid data, and refusing to allow it, and refusing to munge the data and proceed based on the munged data that you essentially made up, as long as whatever you did do instead was graceful and intentional, to me that is perfectly Postel.

sneela

5 hours ago

> This was in snapshots for more than 2 months, and only spotted one other program depending on the behaviour (and that test program did not observe that it was therefore depending in incorrect behaviour!!)

Fascinating. I wonder what that program is, and why it depends on the NUL character.

parasense

2 hours ago

Is this going to murder those fancy shell scripts that self-extract a program appended to the tail, which is really just an encoded blob of some kind, presumably compressed, etc.. ???

talideon

2 hours ago

Not if it was done competently. Shar files and the likes shouldn't contain NULs, even if they contain compressed data. The appended data should be binary safe.

Thiez

12 minutes ago

And in case your data does contain NULs, presumably one could add a layer of base64 encoding. Not nice for the filesize, but also much less likely to upset a text editor when the script is opened (even in the absence of NUL bytes).

saagarjha

5 hours ago

> There appears to be one piece of software which is misinterpreting guidance of this, and trying to depend upon embedded NUL.

Curious what this is

semiquaver

5 hours ago

I wonder if it’s https://justine.lol/ape.html / cosmopolitan libc

chubot

3 hours ago

I'm pretty sure it is, I remember reading something about this

Yeah I found it here

https://news.ycombinator.com/item?id=41030960

2019 bug - https://austingroupbugs.net/view.php?id=1250

https://justine.lol/cosmo3/

> This is an idea whose time has come; POSIX even changed their rules about binary in shell scripts specifically to let us do it.

FWIW I agree with this OpenBSD change, which says more pointedly

All the shells are written in C, and majority of them use C strings for everything, which means they cannot embed a NUL, so this is not surprising. It is quite unbelievable there are people trying to rewrite history on a lark, and expecting the world to follow alone.

i.e. it's not worth it to change a bunch of old code in order to allow making code more esoteric.

We want systems code to be more predictable, reliable, and less esoteric ... not more esoteric

eesmith

3 hours ago

Shouldn't be. See the "exit 1" in your link? That's the end of the shell script, and as the OpenBSD link says;

> It remains possible to put arbitrary bytes AFTER the parts of the shell script that get parsed & executed (like some Solaris patch files do). But you can't put arbirary bytes in the middle,

oguz-ismail

3 hours ago

It is. Binaries generated by cosmocc have NUL in the middle.

chasil

5 hours ago

I was going to check the status of mksh (the Android system shell), but the project page returns:

"Unavailable For Legal Reasons - Sorry, no detailled error message available."

http://www.mirbsd.org/mksh.htm

The Android system shell is now abandoned? This is also in rhel9 basesos.

talideon

2 hours ago

Fine for me. I just got a HTTP warning and nothing else.

~~I believe Android uses toybox, not mksh.~~ It does use toybox, but toybox doesn't appear to include a shell.

kbolino

5 hours ago

It's blocked for me too, but only on my home Internet (Xfinity), not my phone (Google Fi/T-Mobile).

torstenvl

5 hours ago

Works fine for me on Xfinity Home via WiFi, Xfinity Mobile, T-Mobile, and Visible by Verizon.

kbolino

3 hours ago

Whatever the issue was, it seems to have been resolved sometime after I last checked.

chasil

5 hours ago

I see it on my T-Mobile device also. Strange.

chaosite

5 hours ago

Looks fine here, maybe they're blocking your IP range for some reason?

tux3

5 hours ago

Works from an EU IP, so whatever it is, it's probably not GDPR?

fragmede

4 hours ago

What's your browser? The server is using an old TLS version which is no longer supported, and some clients will try https and fail there and not try http.

chasil

2 hours ago

I'm using Edge on my corporate desktop.

Edge first tries TLS and comes back with: "SSL handshake error '-1' sslerr='1' sslerrdesc='error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol' sslerrfunc='607' sslerrreason='258'"

Setting to http:// results the the above error, along with "httpd/3.30A Server at www.mirbsd.org Port 80" - I think that the target itself is blocking me.

blueflow

4 hours ago

> Android system shell

This hurt a little.

Taikonerd

5 hours ago

On a similar note, I sometimes think about how newline characters are allowed in filenames, and how that can break simple...

    for each $filename in `ls`
loops -- because in many contexts, UNIX treats newlines as a delimiter.

Is there any legitimate use for filenames with newlines?

bityard

5 hours ago

Well, knowing how to deal with wacky input and corner cases are a requirement of learning ANY programming language. Bourne-style shells are no exception.

Your example has illegal syntax, but the biggest issue is that you should never parse the output of ls. The shell has built-in globbing. This is how you would loop over all entries (files, dirs, symlinks, etc) in the current directory without getting tripped up by whitespace:

    for e in *; do echo "got: $e"; done

Taikonerd

5 hours ago

> knowing how to deal with wacky input and corner cases are a requirement of learning ANY programming language.

In general, I agree. But if there's a corner case that occasionally breaks naive code but otherwise doesn't do anything, then I'm going to think, "maybe we should just remove that corner case."

bell-cot

4 hours ago

Replace "maybe" with "OBVIOUSLY". Keeping useless-but-hazardous "features" in any language is as idiotic as keeping a heap of oily rags in the furniture factory warehouse.

chuckadams

4 hours ago

> Is there any legitimate use for filenames with newlines?

IMHO no, but they can exist, so you need to handle them without blowing up. Also, even spaces are considered delimiters here, which is why it's bad form to parse the output of ls.

    $ touch "foo bar baz"
    $ for f in `ls`; do echo $f; done
    foo
    bar
    baz

    # always use double quotes, though they aren't needed here
    $ for f in *; do echo "$f"; done 
    foo bar baz
At least the OS guarantees you won't run into NUL though.

kstrauser

3 hours ago

I’m not in a place where I can easily check. What happens there if the file name contains a quote?

chuckadams

3 hours ago

It's fine, the content of an expanded variable isn't parsed further:

    $ touch "foo \"bar baz"; for f in *; do echo "$f"; done
    foo "bar baz

    # quotes don't affect it either
    $ touch "foo \"bar baz"; for f in *; do echo $f; done
    foo "bar baz
Though once you start passing args with quotes to other scripts, things get ugly. Rule of thumb is to always pass with "$@", and if that isn't enough to preserve quoting for whatever use case, write them out to a tempfile instead, or don't use a shell script for it in the first place.

kstrauser

2 hours ago

What about in the case of

  for f in `ls`; do echo "$f"; done
Same behavior, for the same reason?

chuckadams

2 hours ago

The quotes are preserved, but backquote expansion fills the argument list using any whitespace as a delimiter.

    $ for f in `ls`; do echo "$f"; done
    foo
    "bar
    baz
If you absolutely must parse ls (let's assume it's some other script that outputs items with spaces) and the output can contain spaces, you have a few options:

    $ ls | while read f; do echo "$f"; done
    foo "bar baz

    # parens keep the IFS change isolated to a subshell
    $ (IFS="\n"; for f in `ls`; do echo "$f"; done)
    foo "bar baz
But if your filenames contain newlines, you'll really want to stick with the glob expansion, or output custom delimiters and set IFS to that.

IsTom

5 hours ago

You can also create files named e.g. '--help' (if you're not particularly malicious) and with globbing it'll cause e.g. 'ls *' to print help.

jasonjayr

5 hours ago

    touch -- '-f ..'
(If you want to lay an evil trap)

Remember that in most option parsing libraries, putting '--' in your arguments stops option parsing, so you can safely run:

    rm -- '-f ..'

xxpor

an hour ago

this is why things like `find -print0` exist, which is IMO the easiest way to handle this robustly.

Joker_vD

5 hours ago

Sticky notes on the desktop :) Who needs data storage when you can store it all in the metadata?

fragmede

4 hours ago

A GUI file browser will display the filename with a newline in it as a new line (and an icon above it) so as to be asthetically pleasing.

klooney

2 hours ago

Does this break the self extracting tarball trick, where you have a bootstrap shell script with a binary payload appended?

whiterknight

4 hours ago

Side note: tell your startup to switch its “hardware with Ubuntu Linux inside” to BSD. You will have a much more stable and simple platform that can last a long time.

quesera

2 hours ago

The recommendation is solid, but FWIW no one looking for stability would choose Ubuntu, among the Linuxen!

bell-cot

6 hours ago

Kudos to OpenBSD!

Similar to the olde-tyme "-o noexec" and "-o nosuid" options for `mount`, there should be easy, no-exceptions ways to blanket ban other types of simply obvious red-flag activity.

raverbashing

an hour ago

> There appears to be one piece of software which is misinterpreting guidance of this, and trying to depend upon embedded NUL.

Big oof here. Why? How?

> If there is ONE THING the Unix world needs, it is for bash/ksh/sh to stop diverging further by permitting STUPID INPUT that cannot plausibly work in all other shells. We are in a post-Postel world.

Amem

2snakes

2 hours ago

Surprised noone has mentioned the Crowdstrike issue, which was due to NUL characters wasn't it?

soupbowl

5 hours ago

I wish FreeBSD replaced /bin/sh with OpenBSDs.

rollcat

4 hours ago

FreeBSD made many cool moves in the 14.0 release, like finally getting rid of sendmail and adopting DMA (the irony), so perhaps there's a chance?

But FreeBSD has always been much less focused on polish/cleanliness than OpenBSD; I mean - they have THREE firewalls, wtf.

nubinetwork

6 hours ago

So I can't bury a tarball inside a shell script anymore?

josephcsible

6 hours ago

You still can; it just needs to go at the end:

> It remains possible to put arbitrary bytes AFTER the parts of the shell script that get parsed & executed (like some Solaris patch files do).

volkadav

6 hours ago

Looks like you might be able to at the end of the file, reading the commit message, just not willy-nilly in the middle. :)

enriquto

2 hours ago

Great. Now forbid spaces in filenames.

sph

5 hours ago

Is this in reference to something? Judging from the comments, NUL bytes in shell scripts are a common occurrence that everybody is celebrating this change as if it were ground breaking.

I mean, it's a good idea, but I wonder what am I missing here. Also what do they mean by post-Postel?

BlackFly

5 hours ago

Early spec of TCP had a section on the robustness principle that was generally known as Postel's law (https://datatracker.ietf.org/doc/html/rfc761#section-2.10). At the time and until recently this was considered good design. Nowadays people generally want servers to be stricter in what they accept since decades of experience dealing with diverging interpretations of a specification create problems for interoperability.

eesmith

3 hours ago

"until recently"? More than 10 years just going by HN. https://news.ycombinator.com/item?id=5161214

I think HTML showed the problem with Postel's principle. Quoting "Postel’s Law is not for you" at http://trevorjim.com/postels-law-is-not-for-you/ from 2011

> The next version of HTML, HTML5, should considerably reduce the problem of browser incompatibilities. It does this, in part, by rejecting Postel’s Law for browser implementors. Instead of allowing browsers to be liberal when dealing with “flawed” markup, HTML5 requires them to parse it exactly as in the HTML5 specification, and that specification is given much more precisely than before, in the form of a deterministic state machine, in fact. HTML5 is trying to give implementors no leeway at all in this, in the name of browser compatibility.

cesarb

an hour ago

> "until recently"? More than 10 years just going by HN.

The TCP protocol is from the 1970s (according to Wikipedia, it's from 1974, which is 50 years ago). Something which only happened 10 years ago is recent.

JimDabell

5 hours ago

Postel’s Law, also known as the Robustness Principle:

> be conservative in what you do, be liberal in what you accept from others

It’s intended as a way to maximise compatibility, and people have generally followed it when designing protocols and file formats. However it’s led to many security vulnerabilities and has caused a lot of compatibility problems itself. These days a lot of people are realising that it’s more harmful than helpful.

lupusreal

6 hours ago

Does this break those self-extracting script/tar files? I forget how those are done, I haven't seen one in many years.

zx2c4

6 hours ago

From the article: "It remains possible to put arbitrary bytes AFTER the parts of the shell script that get parsed & executed (like some Solaris patch files do). "

jancsika

4 hours ago

If you don't know anything about OpenBSD, here's a fun thing:

1. Randomly choose "yes" or "no" to this question.

2. Read the post and get the answer.

3. Repeat until you begin to get a tingly "Spidey sense" that overrides your random-choice.

My Spidey sense here was, "Yes, because OpenBSD would have already thought about and covered that use-case." And indeed, toward the end of the post, that contingency is covered and documented.

Note: if you try this at your job and sense that the company will almost always choose the worst option, you should probably leave that job.

sneela

5 hours ago

ape4

5 hours ago

That was a neat idea back in the day but should disallowed now. Running downloaded executables considered harmful.

osmsucks

4 hours ago

> Running downloaded executables considered harmful

Most executables are downloaded. :)

Joker_vD

5 hours ago

Not in the "Installation: just run `docker run kekw/our-shiny-ai-chatbot` in your shell" world we're living today.

73kl4453dz

4 hours ago

They were generally uuencoded or similar