hackernews client

Stringly Typed

41 pointsposted 2 months ago

76 Comments

cartoffal

2 months ago

The idea of "type safety over the network" is a fiction.

When it comes down to it, what is being sent over the network is 1s and 0s. At some point, some layer (probably multiple layers) are going to have to interpret that sequence of 1s and 0s into a particular shape. There are two main ways of doing that:

- Know what format the data is in, but not the content (self-describing formats) - in which case the data you end up with might be of an arbitrary shape, and casting/interpreting the data is left to the application

- Know what format the data is in, AND know the content (non-self-describing formats) - in which case the runtime will parse out the data for you from the raw bits and you get to use it in its structured form.

The fact that this conversion happens doesn't depend on the language; JS is no more unsafe than any other language in this regard, and JSON is no better or worse of a data serialisation format for it. The boundary already exists, and someone has to handle what happens when the data is the wrong shape. Where that boundary ends up influences the shape of the application but what is preferable will depend on the application and developer.

rappatic

2 months ago

> The idea of "type safety over the network" is a fiction

> When it comes down to it, what is being sent over the network is 1s and 0s

When it comes down to it, all of computing is 1s and 0s. This is not some feature that's particular to the wire.

lblume

2 months ago

The difference being that a client can be malicious, while e.g. a local file is assumed to behave with the same intent as another. Programs that run on one computer can always be statically verified, while the task is harder for server-client applications — the client could always be an untrusted impersonator!

sophacles

2 months ago

A local file can be "newly local" and have recently been saved from the network or via a usb drive, etc.

And assuming a file is going to behave with good intent, or even the same intent as another file of the same format, is bad. It's how we get jpeg/png/etc parsing errors. Its how we end up with PDFs that are also valid executables, and 1000 more issues.

munchler

2 months ago

This happens with local files also, and was originally called “DLL hell”. The mismatch isn’t malicious, but the effect is the same.

JadeNB

2 months ago

> The difference being that a client can be malicious, while e.g. a local file is assumed to behave with the same intent as another.

I'm not sure what it means to assume something about the behavior of a file, presumably thought of as a static piece of data, but I'd certainly disagree that a modern computing system is entitled to assume that all local apps behave with the same intent as one another (except to the extent that it assumes that all local apps behave maliciously).

denkmoon

2 months ago

What does that have to do with type safety though? If anything, type safety improves whichever piece of the puzzle you do have control over by reducing the likelihood of you accepting malformed data.

tbrownaw

2 months ago

> The idea of "type safety over the network" is a fiction.

Not really, it just has to be enforced at run time rather than compile or link time.

cartoffal

2 months ago

In many statically-typed languages, types do not exist at runtime - they are erased once the program is known to be internally consistent. What is left is not type safety, it is parsing and validation of unstructured binary blobs (or arbitrary strings, depending on the protocol) into structured data. Structure and types are not the same thing, and in many languages they barely even overlap.

setr

2 months ago

Any input data has the same problem. Type safety exists after validation, and its guarantees hold only if your original validation was upheld.

Files, database, user input, network protocols, etc. I don’t know why the network would be any way special. You parse/validate unstructured binary blobs into structured data, and what’s left is type safety. It’s not in the runtime only because, if the compiler has done its job correctly, it is typesafe by construction.

In other words, how many times are you going to check your data structure is correct, before you start assuming it’s correct? Once — at parsing and validation — after that, you’re working with structured data, and your types are just recording the fact

cartoffal

2 months ago

> I don’t know why the network would be any way special.

The network isn't special. This applies locally too. But the article we are commenting on (at least, _I_ was commenting on) is about the network, and it uses the phrase "type safety over the network" in particular.

> You parse/validate unstructured binary blobs into structured data, and what’s left _is type safety_.

That is in fact exactly what I said. The point is that at some point you started with unstructured binary blobs. As soon as data leaves the application, it is no longer "safe", and it is unsafe until it is ingested and (re)validated. So my point can be freely generalised to "type safety beyond the application boundary is a fiction". And the application boundary will always exist, whether you are working with a strongly or weakly, statically or dynamically typed language.

bluGill

2 months ago

While it is all 1 and 0, what those bits mean can easially be encoded. When you say what those bits mean in detail (which we need to anyway - what code page is that string), we can then assign what is valid, and in turn we can reject messages that while they are only 1/0 are still wrong. Also by assigning meaning we can get closer to what we want. the string "12345" and the number 1234 can both mean the same thing, but we can put one into 2 bytes if we want, while the string is at least 5 bytes. Not to mention a number is easier to parse, turning a string to a number is not always trivial (depending of course on which code page is active)

p0w3n3d

2 months ago

I agree with you. I think that except for memory cost, having everything in strings just moves the logic of validation of unknown or malformed strings from deserializer to validator method. But it must happen. And it will break nevertheless when new possible value occur.

You've just implemented POST and GET enum? Here's the PATCH. You have all http codes in your enum? And what about teapot? You had STARTING, PENDING, and FINISHED in your ``state`` allowed values? Our business analyst wants FAILED. Etc. Etc.

teeray

2 months ago

> At some point, some layer (probably multiple layers) are going to have to interpret that sequence of 1s and 0s into a particular shape

You do it once, at the application border. Doing it multiple times, in multiple layers, is a path to madness.

cartoffal

2 months ago

My point was more that the layers below the application will also have to parse the data into a particular format - in the case of networked applications, into TCP/IP packets, then anything particular to the message protocol, before hitting the application. And then the application will, at runtime (regardless of whether you are using a type safe language or not) have to parse and validate the shape of the data before it can be used.

01HNNWZ0MV43FF

2 months ago

Not to agree, but this is the same for files on disk or data in a database, where the malicious or misbehaving peer might just be an older version of the same program

bqmjjx0kac

2 months ago

Not trying to be mean, but there's not much content here. It's a definition of the term "stringly typed" (from another blog) followed by the idea of using appropriate types.

zahlman

2 months ago

I guess the author is "one of today's 10,000", as they say. Wiktionary attests the term from 2019 but I'm sure I've been hearing it much longer than that.

smnrchrds

2 months ago

The post is a true web-log. Someone logged something they learned and put it on the web.

matsemann

2 months ago

I first heard of it from Jeff Atwood in 2012, loads of fun concepts here I reference often. Favorite must be "shrug report"

https://blog.codinghorror.com/new-programming-jargon/

crabmusket

2 months ago

I was working with the Torque Game Engine in like 2008 which had a scripting language where almost all data was strings. Vectors? String of three numbers with spaces in between. Looking back I think it was kind of TCL inspired. But I definitely heard it called "stringly typed".

gskye

2 months ago

xkcd has a relevant take on this: https://xkcd.com/1053/

TLDR, we should totally be celebrating learning in public

nothrabannosir

2 months ago

That's not just a take, that's the origin of the phrase OP used :)

In a beautiful meta moment, you are one of today's 10k about the origin of 10k :D

codydkdc

2 months ago

I'm witnessing...something!

user

2 months ago

[deleted]

titzer

2 months ago

> JavaScript is a weakly typed language without much type safety

This is a little inaccurate. JavaScript is dynamically typed. Values carry strong, unforgeable type information (tags), though tags for numbers are extremely cheap and optimized away whenever possible. It is not possible that JavaScript "forgets" that 1.0 is a number and allows a program to use it as a "pointer".

Jtsummers

2 months ago

Strong and weak are ill-defined, but JS is generally considered weakly typed. It's about how the language treats the results of expressions on mixed types and whether it permits implicit conversions or not. Contrast with Python, which is also dynamically typed but is strongly typed (by default, you could add in your own operations to weaken these runtime type checks).

JavaScript: https://tio.run/##bdBBCoMwEIXhfU4xuNFG01C6K3iYF2tLixhREXr6NJ...

Python: https://tio.run/##K6gsycjPM/7/P1HBVkHdUJ0rCUgbc3EVFGXmlWgkKm...

Few languages are truly strongly typed (zero implicit conversions, numeric operations commonly allow them without requiring explicit casts) so it's really more of a spectrum. How much does a language allow in comparison to other languages.

RHSeeger

2 months ago

Most of the well thought out writeups on types that I've seen put them on two different scales

- Strong vs Weak - how does the language convert between types

- Static vs Dynamic - Does type information apply "early" or "late" (not sure what the right term here, but generally "when it's compiled" vs "when it's run"). I've also seen this as whether the type information is applied to the variable (name of/thing that points at the value) or the value itself; which tends to work out to the same thing.

Which is, I think, more or less saying here. Just with more words.

bhk

2 months ago

Re: conversion between types

Most languages can be placed somewhere on a strong-weak spectrum, but JavaScript is an outlier. Two "equal" values can implicitly convert to opposite boolean values, or different numeric values, or different string values. Not just weak... maybe pathological is a better term.

bloppe

2 months ago

In static languages, variables have types. In dynamic languages, values have types.

andai

2 months ago

I think this is referring to JS's unfortunate habit of doing nonsensical things with types unless the user takes special precautions.

For years I thought I needed explicit static typing, until I tried Python, which is also dynamically typed, and found that I had none of the problems I had in JS. This is because Python is strongly typed.

Indeed Python has the opposite problem of being a bit too pedantic with the type conversions. I thought it was interesting that C#, which is also strongly typed, lets you do string + number. (IIRC it has something to do with how they both descend from Object...)

Worth mentioning that I do think static typing is a very good idea in any nontrivial program, and I wish more languages forced the programmer to be explicit here — TypeScript and even Rust have both bitten me in the ass with the type system making incorrect assumptions instead of just asking me (i.e. forcing me to actually specify the program and eliminate guesswork).

chowells

2 months ago

> IIRC it has something to do with how they both descend from Object...

It's far less interesting than that. The truth is that the symbol + means a wide variety of different things in the language, and it uses compile-time type information to select exactly which of those things it means. In the case where either operand is a string, it means string concatenation, with string conversion for a non-string operand if one exists.

That specific conversion part is the only thing that cares that stuff descends from Object. The conversion algorithm is "call ToString() and let dynamic dispatch sort it out."

(For completeness, the other meanings are various forms of addition, but adding two doubles is a different machine instruction than adding two floats. It really does still need to use type information to figure out which of those operations it is.)

user

2 months ago

[deleted]

RHSeeger

2 months ago

Static typing provides different benefits to strong typing. Both are useful in their own ways. And both can get in the way (and not be worth it) in some cases.

t-writescode

2 months ago

I have such an addiction to types that the first thing I do to anything as it comes in and out of systems I own (even across systems I own) is put it back into types and error all the things if it’s invalid.

I assumed this was the regular action because it seems so much safer to me.

turnsout

2 months ago

Yes, once you start noticing the (mis)use of strings, it's everywhere. I set my IDE to make strings a bright orange color so they're very noticeable in the code.

With that said, you can overdo it. For example, if you're constructing a URL in an internal method that will never be seen by the caller, it's okay to just use a bare "https" without turning it into an enum like Scheme.HTTPSecure.

zzo38computer

2 months ago

I think that JSON is overused (DER is better), but even without JSON, string data is also overused, in cases where numbers or other types would do better. (Unicode string types are also overused, but if a different type other than strings is better anyways, then Unicode is not the main issue here anyways.)

> If you're using a strongly typed language like TypeScript, receiving the user object as any or unknown type is unfortunate. You'll lose all the type safety and you can only regain it with manual type checking.

This is not specific to "stringly typed" stuff or to JSON, but is just the case when you transfer data that may use multiple types. In strongly typed programming languages, your program can parse it as the data that it expects and use an error handler when it is not what it is expected. (If you do expect that it may have any type, then you might be able to pass the unparsed value if appropriate; for example, I have a ASN1_Value structure in some of my C programs for this purpose.)

inejge

2 months ago

> I think that JSON is overused (DER is better)

When I was younger and more enthusiastic about some aspects of network communications, I set out to understand ASN.1 by reading the specifications. That was when ISO and ITU-T were even more stingy with access to their standards/recommendations, so it wasn't easy to get them as someone with no connection to standards bodies, and also without a few hundred CHF to spare. Reading those specs is an art in itself, but one gets a hang of it after a while. It went pretty well until this part of X.208:

The resulting type and value of an instance of use of the new value notation is determined by the value (and the type of the value) finally assigned to the distinguished local reference identified by the keyword VALUE, according to the processing of the macrodefinition for the new type notation followed by that for the new value notation.

That's where I burst out laughing and finally deeply understood why the ISO networking stack crashed and burned despite having some solid ideas.

All this is to point out that yes, DER is not bad at all, but the whole infrastructure it rests on is simply too alien to people outside of telecom space and those who have to deal with it by necessity because of its use in various security protocols.

xg15

2 months ago

> What if we considered API endpoints to be remote function calls?

What if we considered API endpoints to be part of the frontend applications instead of remote interfaces?

I fear every 5-10 years, a new generation of devs discovers those questions, are awestruck at what they believe to be fundamental insights - and then go on to reinvent RMI.

I would argue like this: "Stringly typed" systems inside a single process are an anti pattern. It's nonsensical for a process to write a constant string into memory, only to verify, character by character, that it's in fact that same constant string in later parts of the program. There are almost certainty other organization schemes that would make the program more type safe, more safe from bugs, less memory intensive and more performant.

(Even in those situations, "string typing" can be acceptable technical debt if switching to such an alternate system would be very costly or work-intensive. You can still follow the "one good deed every day" pattern and try to reduce the string typing step-by-step whenever you're working on those parts of the code anyway.)

BUT: "String typing" on network boundaries (or even just general IO boundaries) is something completely different: There, your underlying data model are strings (of bytes, not of ASCII chars/Unicode chars/whatever else) and every attempt to model a more "truthful" representation on top of it is essentially a fiction. This fiction is often very useful, but I'd always keep in mind that it is an abstraction, and provide an "escape hatch" that lets you easily map back to the wire format representation. Because that is what's relevant when you have to debug issues or investigate security vulnerabilities.

Also: Never ever try to abstract away a network or IO boundary. Many have tried and it usually ends in tears.

somat

2 months ago

Openbsd went with a stringly typed system for their pledge api.

I think it was for simplicity of use. But i find it a very strange interface from a bunch of C programmers.

https://man.openbsd.org/pledge

update: Found the commit message

    Move to next tame() API.  The flags are now passed as a very simple string,
    which results in tame() code placements being much more recognizeable.
    tame() can be moved to unistd.h and does not need cpp symbols to turn the
    bits on and off.  The resulting API is a bit unexpected, but simplifies the
    mapping to enabling bits in the kernel substantially.

user

2 months ago

[deleted]

valorzard

2 months ago

I know JSON is the standard now, but are there “better” serialization formats out there? Especially since JSON doesn’t know what an integer is in the spec

horsawlarway

2 months ago

I guess it depends on how you define "better".

JSON does a couple things really well, and most other things terribly.

But the things it does well are pretty valuable. So in the "strengths" category I'd put the two following points:

1. JSON is very easy to read and understand as a human

2. JSON stuck to the basics. No comments, no references, no clever tricks, and not much space to let folks try to hammer in cleverness (see - no comments).

Neither of those are all that much related to JSON itself as a format - the semantics are basically an accident of timing around JS syntax from the 2000s.

But it's very, very useful to be able to get the raw text for a network message and know exactly what's getting sent without having to have a whole specialized tool framework to parse and understand the message.

It's also useful to not let the spec get so complex that I never want to do that, even if I could (see: xml).

So with JSON - I can easily read the actual network request and understand it, even with essentially zero additional tooling AND I have a very good chance of literally being able to open a text editor and create a new message with valid syntax without any other tools or references.

Further - this holds true even if I'm not an industry expert with 20 years of experience. Most random people off the street can do it with only a couple minutes of coaching.

Not many other serialization formats can do that.

Imagine taking your 8 year old, sitting them down in front of the computer and legitimately saying "JSON doesn't know what an integer is in the spec"!

It's true... but it's absolutely not the point. For normal people "number" is complex enough. And if you need an int and not a float... you can do that processing just fine after getting a JSON payload if you'd like. It won't be as fast as a specialized format (ex protobuffs), or as flexible (ex XML) as other formats - but that's a far distant concern to "Can I hold the hammer".

JSON is really easy. "Easy" as a strength is wildly discounted, but man is it a winner when you get it. I also think it's surprisingly hard to do.

RHSeeger

2 months ago

> No comments

There are a large number of people that consider that _not_ a benefit.

horsawlarway

2 months ago

Yeah, and if people would solely use them as comments for humans to read... I'm with you.

But they won't. A big part of the reason comments weren't included in JSON is that people tried to get clever with them.

Directly quoting Crockford:

> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.

And while I'd also love to occasionally throw a comment in a json file, I don't want to have to deal with any of the headaches they would have created in the ecosystem.

And to be fair to Crockford here - it's not like he wasn't aware this was a downside. He even released a tool as a preprocessor for JSON if you wanted to put comments in: https://www.crockford.com/jsmin.html

JSON intentionally chose to stay as simple and compatible as possible, and personally - I think that constraint was the right call.

If I'm writing files I want to throw a lot of comments in... It usually means I should move to something like YAML instead.

Again - JSON is terrible at a lot of things, but really hammered on simple and easy as focus points. If you give devs a place to store data outside the structure of the protocol... they will use it for all sorts of complicated craziness... which devolves to either multiple protocols, or a really complicated protocol.

RHSeeger

2 months ago

I know his reasoning for it, I just disagree with him. People added JSON parsers that allow comments and can _still_ get tricky with them. The only thing the standard not adding them did was make sure we can't rely on them being there. And, for ANY file format that is used for config (and similar) that is supposed to be human readable, being able to add comments is pretty much table stakes imo.

horsawlarway

2 months ago

> The only thing the standard not adding them did was make sure we can't rely on them being there.

I mean... that's basically the entire point of the decision. If you can't rely on them... you can't rely on them to ship metadata.

If you could rely on them to ship metadata... you start seeing parsers diverge wildly in features and scope - to the point that you've really got several different protocols all pretending to be "JSON". You end up with JSON_V1_UTF8_RTL, and JSON_V4_UCS2 and JSON_V3_EXT_UNICODE_LE, etc... All of which will be subtly (or not so subtly) incompatible, and then you're right back at XML.

No one is stopping you from writing config files in a superset of JSON that supports comments (yaml being the most common).

But the HUGE win here is that those formats are actually supersets - not alternative protocols. They all parse plain JSON. They might also happen to do tricky things - but if you give it standard JSON it works a-ok.

---

I think that there's an argument to be made that given how successful JSON is NOW - adding the ability to insert comments might be valuable (See: JSONC, or JSON5)

But I think as far as the initial stages went, not having comments was pretty clearly the right call.

Like - it's just a small step from comments to processing_instructions (ala XML: https://en.wikipedia.org/wiki/Processing_Instruction) and while I don't always agree with Crockford... I'm with him 1000% that the second you let that kind of metadata live in the format... the complexity EXPLODES.

Better to keep the standard format intentionally clean of it, and let people declare their own supersets.

knome

2 months ago

sure, but you'd see folks using them to add metadata and extend json in horrible incompatible ways

user

2 months ago

[deleted]

bluGill

2 months ago

Define better.

As the other poster said, you could use XML which is more powerful, but as a result is a lot more complex. For most tasks I'd prefer JSON because while it is lacking, all the real world parsers I've seen are much easier to work with and I rarely need more complexity. If someone did a JSON++ (I have no doubt many people have but I'm not aware of them!) that added things like integers, without the complexity of XML that might be even better. In the real world if something should be an integer it isn't hard to check that and error out - you need to support parse errors in any data format anyway.

Protobuf is sometimes better for data serialization. It isn't human readable, but you rarely need that and saving data bytes is often useful even today. Protobuf does have your integer type that you are missing, but it has other limitations might or might not apply to you. (I don't use protobuf enough myself to know what they are.

Sqlite has more than once suggested that their database file is a great serialization format. You get a lot of power here and for complex things a database is often easier to work with than an xml file. There are various no sql databases as well that sometimes can work for this.

I've handwritten my own serialization format in the past. The only hard part is designing enough the ability to add whatever the future needs are (note that I've never had to read my serialization on a different CPU family, things like little vs big endian I'm told can be a pain)

There might be something else I didn't cover... Everything has pros and cons.

bglusman

2 months ago

Protobuf does support JSON encoding[0], which I like as the .proto definition is quite readable, and then you can encode/decode either human readably or efficiently. It's even quite easy to have your consumer support both since the two are pretty easy to tell apart and if you know its either one or the other, you can just failover trying one to the other, possibly at some small cost... the guide also does point out some significant downsides to relying on the JSON version, but it can be useful in development and/or debugging in some cases, especially if you control both sides sending and receiving and can just toggle to use it when you want temporarily.

[0]https://protobuf.dev/programming-guides/json/

ackfoobar

2 months ago

> It isn't human readable

This is a tooling problem. Wireshark can decode protobuf for you when you're inspecting gRPC traffic.

horsawlarway

2 months ago

Needing that tooling is a format problem.

JSON is bad at everything except being simple and easy. Turns out simple and easy is a real winner.

nothrabannosir

2 months ago

JSON has one glaring flaw: nested json encoding in strings becomes awful to read. I encounter it too often in reality where individual layers use JSON, but want to support arbitrary strings in their API. Encodings which use prefix length don't suffer from this, which ironically even includes most binary formats.

bluGill

2 months ago

Back to my main point though: normally I don't need the complexity that things like nested JSON would be. When you do though JSon is a bad format. (actually I would go so far as to say you never need something that complex - but the problems you are trying to solve with nested JSON are still complex enough that you should use a more powerful/complex framework, but better design of your data store would avoid the need for nested JSON.)

bluGill

2 months ago

If you have the correct version available. All to often when debugging problems the person in the field doesn't have the correct tools, or doesn't know how to use them (in this case you may not want to share the proto config with that person...) As such the less tools needed to understand something the better.

zzo38computer

2 months ago

Different formats are good for different things, but I think DER is much better. No character escaping is necessary, Unicode is not required (although it can be used if you want to do), arbitrary binary data can be stored, integers can be arbitrarily big (although implementations might only support integers as big as the specific application requires), you can skip past any block without needing to know how to interpret it, and many other advantages. (However, I had made up a variant with a few additional types, such as: key/value list, BCD string, TRON string, etc. This makes it strictly a superset of the types of data which can be stored in JSON (if the types you use are: sequence, key/value list, real number, null, boolean, and UTF-8 string). I use DER in some of my programs, because I think it is generally much better than JSON. Also, DER is a binary format, although I did make up a text format (called TER) which can be converted to DER (but TER is not really meant for other uses, since it is more complicated to handle).)

wpm

2 months ago

It does, and it predates JSON

https://www.apple.com/DTDs/PropertyList-1.0.dtd

As elegant as anything on json.org

frankfrank13

2 months ago

If you care a lot you can use Protobufs. Downside is now everything has to speak protobuf + no can't read in your network tab. Upside is (mostly) smaller payloads and a lot more type safety.

Analemma_

2 months ago

I'm having a vision of XML reading your comment and going "well well well, look who's decided to come crawling back".

zahlman

2 months ago

I can't imagine why. XML is still fundamentally, well, a markup language, not a serialization format designed as such. But the "extensible" part isn't so accurate - attributes aren't extensible. GP complains that JSON doesn't know what an integer is (as distinct from a generic number), but at least it does know more than just strings. And needing to repeat a tag name when closing it just adds useless complexity.

wpm

2 months ago

It’s not anymore useless than a closing } or ], except since it has the tag name in it, so when I’m reading a highly nested object I’m not stuck in my text editor looking at a bunch of }’s at random indentation levels I have to scroll all the way back up to regain any context for. Tags are text which is visual structure I can choose to read, or choose to gloss over and use as bulk to shape the data in my head.

bqmjjx0kac

2 months ago

There's CBOR, but it is not nearly as compact as the C in its name implies.

user

2 months ago

[deleted]

TeaVMFan

2 months ago

This is one of my favorite things about the Flavour framework: strongly-typed web service calls:

https://frequal.com/Flavour/book.html#org44d6b49

Your single-page app code calls your backend API using strong types. Your code is clean and the framework handles marshaling and unmarshalling JSON.

beders

2 months ago

if you have control over all the consumers of your API, run whatever makes you fast and keeps you safe.

If you want others to play with your API: JSON here we come.

Ciantic

2 months ago

When Anders Hejlsberg did a lot of those talks to sell TypeScript, he described a lot of JavaScript as "stringly typed", which is very obvious with all the addEventListener("click", ...) and so on depending on certain strings. The term itself is not compliment, if you describe someone else's API as such, it's not well taken.

nine_k

2 months ago

JS is also "stringly typed" in a sense that you can access object's properties by just string names, foo.bar and foo["bar"] is the same thing.

It lead to a really nice Typescript feature, where you can declare something like type FooProp = "bar" | "baz", and the typechecker is smart enough to only allow these literal strings where you use values of FooProp type (e.g. when accessing properties by name, like above). This collapsed the whole crowd of strings, enums, and symbolic constants to just strings, without any loss of type safety, which I find a cognitive win.

dleeftink

2 months ago

Op might like Structurae's Binary Protocol, type safe from door-to-door[0]. There are lot more interesting use-cases there!

[0]: https://github.com/zandaqo/structurae

cognomano

2 months ago

As usual, insight from 2013: https://wiki.c2.com/?StringlyTyped

GauntletWizard

2 months ago

An API that uses JSON isn't "Stringly Typed". An API that lacks any validation on the JSON you pass to it is. Under their definition, nearly everything is stringly typed if if passes a system boundry, because serialization transforms everything into a string - Sometimes a byte string, sure, but you end up with transport-neutral single object whose interpretation is understood by metadata, and that's a good thing, because you don't need to waste time interpreting it at every layer it passes through.

The modern advice to "Use a serialization library" is actually encoding several hard pieces of learning into one. There was a time when save files for most games were just memory dumps of large sections of memory. You dumped raw C-objects, including pointers to other objects, directly. You ended up with a tangled mess of references, but it was simple to write code for, cheap to write to disk, cheap to read from disk, and easy to break. Basically every update to a game broke all of the save files, because the most minor of tweaks could change the object layout generated by the compiler. The first change was to put magic strings at the beginning to inform the version - So at least you displayed an error message rather that executing some unexpected part of the save file as code.

This lesson was hard learned as we entered the networked age, where you couldn't trust the incoming messages weren't malicious - And you certainly couldn't trust, with all of the terribly-behaving middleware, that they were well-formed. Writing serialization/deserialization code is not hard, but it's annoyingly rote, and you would need it for dozens upon dozens of classes. So instead we switched to standardized libraries for serialization and deserialization.

Java and Python both had serialization libraries where whole objects could be serialized - Along with everything they referenced. This lead to massive security holes, because it was easy for them to take a huge chunk of working memory with them, because circular references to root objects allowed them to grab parts of other operations, or even application secrets. Python was worse, as the pickle library allowed serializing whole bytecode; Meaning every load was an arbitrary code execution.

Modern serialization libraries have come to a compromise. They serialize data only in primitives. You have to rebuild the tangled web of cross object references yourself. This often sucks, but it's far better than the alternatives we've found.

GraphQL is popular for precisely this reason. You can avoid most of the serialization and deserialization steps and query what you want directly, allowing you to access deeply-linked properties of deeply linked objects without the expensive round trips and security barriers being checked and rechecked; But the expressiveness comes at a distinct cost in terms of getting those barriers on the server side really right, because the default allow permissions make it easy to leak.

lock1

2 months ago

Unfortunately you can't escape stringly-typed (and other mess) in language with structural type system.

teo_zero

2 months ago

An example of "stringly typed" in C is when you have to pass "r+" to fopen().

zzo38computer

2 months ago

Yes, that is another example, and I had thought using numbers would make more sense (and you can use enum or #define to give names to those numbers). It is not only the fopen function in C that does that; I had seen similar things in other C libraries as well, as well as in other programming languages.

user

2 months ago

[deleted]

strongly-typed

2 months ago

my nemesis