Against SQL (2021)

73 pointsposted 15 hours ago
by charles_irl

59 Comments

layer8

10 hours ago

This is mostly all true, but there is little incentive for RDBMS vendors to implement and maintain a second query language, in particular a shared cross-vendor one. Databases are the most long-lived and costly-to-migrate dependencies in IT systems, so keeping the SQL-based interface in parallel for a long time would be mandatory. This is compounded by the standardized SQL-centric database driver APIs like ODBC and JDBC. Despite the shortcomings of SQL, there is no real killer feature that would trigger the required concerted change across the industry.

andai

10 hours ago

The way I've heard this phrased is, for potential customers to justify switching to your solution, it can't be 10% better, it needs to be 10x better.

(And on top of that they need to clearly perceive the value of Strange New Thing, and clearly perceive the relative lack of value of the thing they have been emotionally invested in for decades...)

gavinray

10 hours ago

  > This is compounded by the standardized SQL-centric database driver APIs like ODBC and JDBC.
The criticality of JDBC/ODBC as a platform can't be understated. The JDBC API is the dominant platform for data access libraries. Compare number of drivers for JDBC, ODBC, go/sql, etc.

Newer platforms like Arrow ADBC/FlightSQL are better-suited to high-volume, OLAP style data queries we're seeing become commonplace today but the ecosystem and adoption haven't caught up.

https://arrow.apache.org/adbc/current/index.html

https://arrow.apache.org/docs/format/FlightSql.html

sema4hacker

10 hours ago

For any language as large and complicated as SQL, it's easy to come up with a long list of design problems. The difficulty is designing something better, and then even more difficult than that is getting people to use it.

jampekka

7 hours ago

Much of the critique is that it's large and complicated because of bad design.

"Because SQL is so inexpressive, incompressible and non-porous it was never able to develop a library ecosystem. Instead, any new functionality that is regularly needed is added to the spec, often with it's own custom syntax. So if you develop a new SQL implementation you must also implement the entire ecosystem from scratch too because users can't implement it themselves.

This results in an enormous language."

jaredklewis

2 hours ago

I would say that’s just another trade off though, in that extensibility and portability are invariably in tension.

The article simultaneously complains that the SQL standard is not universally implemented (fair) and that SQL is not easily extensible (also fair). But taken together it seems odd to me in that if you make SQL very extensible, then not only will it vary between databases, it will vary between every single application.

Also, the line between SQL and database feels a little fuzzy to me, but don’t a lot of postgresql extensions effectively add new functionality to SQL?

YZF

11 hours ago

In my day job the question of SQL and its role keeps coming up. Some people want to propagate SQL all the way to clients like web browsers. Perhaps operating over some virtual/abstract data and not the real physical underlying data (that's a whole other layer of complexity). This seems like a bad idea/API in general.

I'm not too familiar with GraphQL but on the surface it seems like another bad idea. Shouldn't you always have some proper API abstraction between your components? My sense for this has been like GraphQL was invented out of the frustration of the frontend team needing to rely on backend teams for adding/changing APIs. But the answer can't be have no APIs?

All that said there might be some situations where your goal is to query raw/tabular data from the client. If that's your application then APIs that enable that can make sense. But most applications are not that.

EDIT: FWIW I do think SQL is pretty good at the job it is designed to do. Trying to replace it seems hard and with unclear value.

lelanthran

10 hours ago

> All that said there might be some situations where your goal is to query raw/tabular data from the client. If that's your application then APIs that enable that can make sense. But most applications are not that.

IME, the majority of responses sent to the client is tabular data hammered into a JSON tree.

If you generalise all your response to tabular data, that lets you return scalar values (a table of exactly one row and one column), arrays (a table of exactly one row with multiple columns) or actual tables (a table of multiple rows with multiple columns).

The problem comes in when some of the values within those cells are trees themselves, but I suspect that can be solved by having a response contain multiple tables, with pointer-chasing on the client side reconstructing the trees within cells using the other tables in the response.

That would still leave the 1% of responses that actually are trees, though.

gavinray

10 hours ago

  > My sense for this has been like GraphQL was invented out of the frustration of the frontend team needing to rely on backend teams for adding/changing APIs.
GraphQL was borne out of the frustration of backend teams not DOCUMENTING their API changes.

It's no different ideologically from gRPC, OpenAPI, or OData -- except for the ability to select subsets of fields, which not all of those provide.

Just a type-documented API that the server allows clients to introspect and ask for a listing of operations + schema types.

GQL resolvers are the same code that you'd find behind endpoint handlers for REST "POST /users/1", etc

chao-

11 hours ago

Instead of a client dealing with a server that only presents unopinionated, overly-broad CRUD endpoints for core entities/resources, GraphQL is a tool through which the client tricks the server into creating a bespoke viewmodel for it.

YZF

11 hours ago

But those endpoints are abstractions. Don't we want control over the surface of the API and our abstractions? If you let the client tell the server what the abstractions are in run-time you've just lost control over that interface?

As I was saying, there might be some situations where that's the right thing, but in general it seems you want to have a well controlled layer there the specifies the contract between these pieces.

chao-

10 hours ago

My post was only intended as a commentary regarding how I approach GraphQL after a few forays into it (current stance: would not default to GraphQL, but not against it either).

I was not intending to dodge your questions, but nor was I trying to comprehensively answer them, because they felt a bit unclear. I will make an attempt, combining snippets within your two posts that seem to be related:

>Shouldn't you always have some proper API abstraction between your components?

>But those endpoints are abstractions. Don't we want control over the surface of the API and our abstractions?

I can't answer this unless I know what concepts/layers you are referring to when you say "abstraction between components". If you mean "between the client and server", then yes, and GraphQL does this by way of the schema, types, and resolvers that the server supports, along with the query language itself. The execution is still occurring on the server, and the server still chooses what to implement and support.

If by "abstraction between components" you mean "URL endpoints and HTTP methods" then no, GraphQL chose to not have the abstraction be defined by the URL endpoint. If you use GraphQL, you do so having accepted that the decision point where resources are named is not at the URL or routing level. That doesn't make it not an abstraction, or not "proper" in some way.

>But the answer can't be have no APIs?

I don't understand what you mean by "No APIs"? You also mention "control over the surface"...

Is your concern that, because the client can ask the server "Please only respond with this subset of nodes, edges and properties: _______", the server has "no API"? Or it doesn't have "control"? I assure you that you can implement a server with whatever controls you desire. That doesn't mean it will always be easy, or be organized the way you are used to, or have the same performance profile you are used to, but the server can still implement whatever behavior it wants.

>...in general it seems you want to have a well controlled layer there the specifies the contract between these pieces.

I think this wording brings me closer to understanding your main concern.

First, let me repeat: I am not a big GraphQL fan, and am only explaining my understanding after implementing it on both clients and servers. I am not attempting to convince you this is good, only to explain a GraphQL approach to these matters.

The "well-controlled layer" is the edge between nodes, implemented as resolvers. This was the "aha" moment for me in implementing GraphQL the first time: edges are a first-class concept, not just the nodes/entities. If you try using GraphQL in a small project whose domain model has lots of "ifs" and "buts", you will be forced to reach for that layer of control, and get a sense of it. It is simply located in a different place than you are used to.

This "edges are first-class concepts" has an analogue in proper hypermedia REST APIs, but most organizations don't implement REST that way, so except for the five people who fully implement true HATEOAS, it is mostly beside the point.

tomnipotent

10 hours ago

GraphQL still has schema constraints, the surface of the API you mentioned.

nesarkvechnep

11 hours ago

I hold a very unpopular opinion of GraphQL. I think it’s a great internal querying API. Every web backend project I’ve worked on tries to implement an API for querying data and it’s usually either fast and inflexible or flexible but slow. GraphQL allows to strike a balance, flexible and reasonably fast, with ways to optimise further.

RedShift1

11 hours ago

I love GraphQL, it's great. It takes away the ambiguous way to organize REST APIs (don't we all love the endless discussion about which HTTP status code to use...), and at the top level separates operations into query/mutation/subscription instead of trying to segment everything into HTTP keywords. It takes a bunch of decision layers away and that means faster development.

nevertoolate

10 hours ago

Question is: do you need that flexibility if you have the backend for frontend? Can you design such a flexible api which makes it possible to iterate faster? If not, you just pay, in the best case, a constant overhead, or worst case, exponential overhead for each request! If you need to spend time optimizing because you have monitoring for slow queries or downtime caused by never terminating queries than most likely you’ve already eaten implementation speed advantage - if it exists at all in the first place.

tomnipotent

9 hours ago

I always thought it was about developer velocity, in this particular case front-end. With a traditional REST API the front-end team needed to coordinate with the back-end team on specific UX features to determine what needed to be done, which was further exasperated when API's needed to be specialized for iPhone vs. Android vs. Web UI.

GraphQL was supposed to help front-end and back-end meet in the middle by letting front-end write specific queries to satisfy specific UX while back-end could still constrain and optimize performance. Front-end could do their work without having to coordinate with back-end, and back-end could focus on more important things than adding fields to some JSON output.

I think it's important to keep this context in mind to appreciate what problem GraphQL is solving.

chao-

9 hours ago

This is my read of the history as well.

This is also the motivation that would lead me to advocate for adopting GraphQL for a product. Moreso than a technical decision, it is an organizational decision regarding resource trade-offs, and where the highest iteration or code churn is expected to be located.

nawgz

10 hours ago

Re: GQL - Explain to me what abstraction layer should exist between the data model and what data is loaded into the client? I’ve never understood why injecting arbitrary complexity on top of the data model is wise.

Perhaps unfettered write access has its problems, and GQL has permissions that handle this issue plenty gracefully, but I don’t see why your data model should be obfuscated from your clients which rely on that data.

YZF

9 hours ago

In my view the abstraction layer should be in the domain of the application.

Let's say your software is HR software and you can add and remove employees. The abstraction is "Add an employee with these details". The data model should be completely independent of the abstraction. I.e. nobody should care how the model is implemented (even if in practice it's maybe some relational model that's more or less standard). Similarly for querying employees. Queries should not be generic, they should be driven by your application use cases, and presumably the underlying implementation and data model is optimized for those as well.

But I get it the GQL can be that thing in a more generic schema-driven thing. It still feels like a layer where you can inadvertently create the wrong contract. Especially if, as I think the case is, that different teams control the schema and the underlying models/implementation. So what it seems to be saving teams/developers is needing to spell out the exact requirements/implementation details of the API. But don't you want to do that?

How do people end up use GQL in practice? what is the layer below GQL? Is it actually a SQL database?

JaggerFoo

9 hours ago

SQL is great. I've used it to implement knapsack optimization for Daily Fantasy Sports at scale. I use it in Big Data tools and RDBMS. It's pervasive in data tech.

Feel free to innovate and bring forth other RDBMS/Data query languages and tools, perhaps something may succeed and stick as long as SQL has.

Cheers

tqi

11 hours ago

Most of these arguments against seem like personal preferences? For example, I understand it would be convenient to give special treatment to foreign key joins, but i personally find `fk_join(foo, 'bar_id', bar, 'quux_id', quux)` less easy to understand on it's own, without having to look up the underlying table structures to know which tables have which (ie is quux_id a column in foo or bar?). Not to mention I've never worked anywhere where foreign keys were consistently used, mostly for perf reasons.

andai

10 hours ago

Take a drink every time you see a comment that didn't even open the article ;)

thom

10 hours ago

SQL is one of those wonderful technologies that technologists hate but generates such incomparable value that there's a steady supply of these blog posts, like fumes from the vast ocean of their constantly boiling piss.

throwaway894345

10 hours ago

SQL itself doesn't generate any value, relational databases generate value. SQL is just a frontend for them. Anyway, your snark could be applied to _literally any change_. Are you angry that cars are replacing horses? "Horses generate such incomparable value that there's a steady supply of pro-car posts, like fumes from the vast ocean of their constantly boiling piss". Don't like the cotton gin? "Slave labor generates such incomparable value...". There's not really anything of substance in this kind of comment.

lawrencejgd

8 hours ago

I think here applies very well this quote: All right, but apart from sanitation, medicine, education, wine, public order, irrigation, roads, the fresh-water system and public health, what have the Roman (SQL) ever done for us?

woooooo

6 hours ago

SQL as a query description language absolutely brings its own value.

We had a whole period of NoSQL because its difficult to shard SQL out across distributed databases, followed by people figuring out how to make SQL work on distributed databases, because SQL is really useful and people like it.

artyom

an hour ago

The whole period of NoSQL was particularly sad one. A couple great things emerged from it (Redis etc al) but for the most part it as all smoke and mirrors.

The worst was the amount of people wanting to be the next Ed Codd and being nowhere close to his mathematical background.

woooooo

an hour ago

I'd call it a necessary phase rather than sad. People needed scalability fast, key-value stores are natural to shard, so they did that first and then we got better distributed DBs later. Even Prime Google was on bigtable for over a decade before spanner.

thom

10 hours ago

If, after 100 years of cars being available, everybody still rode horses and always got where they're going on time, and car people constantly blogged about it, this would be a great analogy.

3eb7988a1663

10 hours ago

I don't have a choice to use SQL. If I want to speak to a database, there is precisely one language available. Like Javascript on the web for so long.

qaq

10 hours ago

Thing is it's good enough and extremely widely used. Given that there is close to 0 chance an alternative will take off.

lelanthran

10 hours ago

> Thing is it's good enough and extremely widely used.

The real problem is not that "it is good enough"; it's that SQL is still better than many of the newer proposals.

I mean, sure, if newcomer tech $BAR was slightly better than existing tech $FOO, then maybe $FOO might be eventually replaced. What we are seeing is that the newcomers are simply not better than the existing $FOO.

blef

10 hours ago

Feels old when you see how it played out to become SQL for everything in the data ecosystem lately.

Even though SQL as flaws, maybe a lot, it has one upside which is: it's so easy to onboard people on it, in the data ecosystem (warehousing etc.) it means that we can do way much stuff faster than before and hire less technical people, which is great

gavinray

10 hours ago

I work at (what was previously known as) Hasura.

Specifically: the connector bits that deal w/ translating Relational Algebra IR expressed as GraphQL nodes -> SQL engine-specific code.

The author's comments about lack of standardization and portability might not get across just how nightmarishly different SQL dialects are.

I might put together a list of some of the batshit-insane bugs we've run into, even between version upgrades of the same engine.

I really think folks would raise an eyebrow if they understood just how much variance exists between implementations in what might be considered "common" functionality, and the sorts of contortions you have to do to get proper shims/polyfills/emulations.

grebc

5 hours ago

It’s not a small decision to switch database vendors.

Worrying that your data query language works across multiple vendors DB’s is not a concern ever considered imho.

zkmon

11 hours ago

So I guess the author is trying to help a decision maker to make a decision when faced with a question of whether to use SQL or not. But in reality that question would be settled by other factors and contextual reasons rather than the arguments provided by the author.

For instance, analytics usecases favor SQL stores, as slicing and dicing is better done with row or column stores instead of document databases.

Also, Postgres is getting more popular for lot of usecases, so SQL is here to stay.

geysersam

6 hours ago

> So I guess the author is trying to help a decision maker to make a decision when faced with a question of whether to use SQL or not.

That's not my impression. A decision maker today should typically make the decision to use SQL. I'm pretty sure the author would agree with that.

I think the target audience is language designers and tool builders. The author is urging people to envision and build new better interfaces to interact with relational data.

janpio

13 hours ago

(2021)

j45

11 hours ago

Trying to understand how the year is relevant - still new to folks and still seems relevant.

bruce511

11 hours ago

It's helpful to include the year in the article header because;

Everything may have been true at the time of writing, but details may be obsolete. For example this article refers to Neo4j. Knowing the article is 4 years old helps me understand that comment is not current.

The landscape can change quickly. The older an article the more one takes that into account. Given that this article promotes an alternative technique, Knowing the article is old allows me to wonder if any of the suggestions were gelled, and if so to what success.

In this case, since SQL has been around since the 70s, it's not surprising that the complaints are not novel, and are all likely to be true for years to come. SQL has truly enormous inertia on its side though.

j45

9 hours ago

The year can help for sure - is the article's content still relevant and current in your mind?

On one hand SQL, is the most established relational db. Not sure what might drastically change about.

Python and Javascript are from the 90’s and have evolved as a language in their own way like SQL and others.

I was asking about the year as an individual comment here to understand what significance the year relative to the content of the topic it bore.

pphysch

10 hours ago

I think "SQL" is fine, whatever, I'm used to working with multiple different query and programming languages and dialects. That includes the freedom to define abstractions over SQL that meet my personal needs.

Standard SQL is not helpful, though. If that (failed) experiment was ended, database implementations would have even more freedom to explore superior syntax. Prescriptive language standards are a mistake.

throwaway894345

10 hours ago

I like that SQL is a standard, and it's mostly "fine". Sure, I have to constantly read the man pages because there are half a dozen different ways to do fundamentally similar things, and there are subtle differences between each vendor, and I keep running into silly errors like trailing commas. But it mostly works.

The stuff that is more painful is building any kind of interesting application on top of a database. For example, as far as I know, it's very hard to "type check" a query (to get the "type" returned by a given query). It's also hard to efficiently compose SQL. And as far as I know, there's no standard, bulletproof way to escape SQL ("named parameters" is fine when you need to escape parameters, but most of SQL isn't parameters). There's also no good way to express sum types (a "place" can be a "park" or a "restaurant" or a "library", and each of those have different associated data--I don't need a "has_cycling_trails" boolean column for a restaurant, but I do for a park). There are various workarounds, all deeply unsatisfying.

GuinansEyebrows

11 hours ago

Question for people who actually write app and SQL code: besides convenience, what is the upside of working with JSON in SQL over having your app construct and parse JSON objects, but storing the data in a database using more primitive types? My relatively inexperienced brain is telling me that it’s probably over complex to store and manipulate JSON objects at the DB level.

taffer

10 hours ago

I use Postgres JSON functions to return nested results. The database itself contains no JSON; just a well-normalised data model. However, the queries return nested JSON in the format required by the application, e.g. for rendering an HTML template or returning JSON to the client in a single round trip. Check out the old dogs can sort of learn new tricks in this great article: https://www.scattered-thoughts.net/writing/sql-needed-struct...

gavinray

10 hours ago

  > JSON functions to return nested results. The database itself contains no JSON; just a well-normalised data model. However, the queries return nested JSON in the format required by the application
Entirely valid usecase, since the client application is likely going to parse some cartesian product of tabular relationship data into "normalized" JSON array of objects anyways.

Generally, generating the JSON response directly for consumption in the DB is faster.

gavinray

10 hours ago

  > what is the upside of working with JSON in SQL over having your app construct and parse JSON objects, but storing the data in a database using more primitive types? 
You use map-like structures (JSON/HStore, etc) for semi-structured user data that you CAN'T define/know a rigid schema for, ahead-of-time.

Think usescases like: Allowing users to write configuration rules, or lists of custom tag <-> value pairs for (whatever), things of these sorts

YZF

11 hours ago

I've occasionally stored JSON directly in a database. It really depends on what you do with this data. If you do need to query and manipulate the internals of that JSON object then you should extract that data into a proper schema. But sometimes e.g. this is just something the frontend uses and you never (or rarely) have to query the internals, i.e. you treat it as an opaque blob.

j45

11 hours ago

I'm agnostic between relational/non-relational.

SQL isn't for everything.

Neither is starting with NOSQL thinking it might be better and then proceeding to spend way too many man years making it a relational database, when learning a bit of SQL would have handled it fine.

jampekka

11 hours ago

The post is not against the relational model. It's against SQL.

> The relational model is great ... but SQL is the only widely-used implementation of the relational model ...

dalmo3

10 hours ago

404.

...Or is that the joke?

procaryote

11 hours ago

TL;DR;

* a list of things they don't like in sql

* a list of traits they think a replacement should exhibit by negating the first list

I was kind of hoping for some example of what this much better language should look like

3eb7988a1663

10 hours ago

I like PRQL[0] - fixes stupid warts about SQL.

A few top line items:

  - trailing commas not an error
  - queries can be read/written in linear order, starting with from, and ending on select
  - trivial intermediary keywords (eg you define month_total, and then can re-use month_total in a following calculation, no need to duplicate the calculation logic)
  - no need for a separate `having` keyword when `where` can just be a filter on a group
There is nothing too ground-breaking about it. Just streamlines some logic into a more holistic experience.

[0] https://prql-lang.org/

mulmen

6 hours ago

> trivial intermediary keywords (eg you define month_total, and then can re-use month_total in a following calculation, no need to duplicate the calculation logic)

Postgres already has this.

Spivak

10 hours ago

I think the much better language would be the "no language" database. Throw portability to the wind and just have the client ship the query plan directly. The frontend to the database is howerver you want to expose it in your language of choice. I don't think there's any hope of getting disparate db vendors to agree on a compatible frontend language. It seems easier to externalize it.

The closest existing database to this ideal is probably FoundationDB although it also externalizes the query planner, which I don't necessarily consider a downside.

cess11

11 hours ago

Maybe I'm holding TFA wrong but to me it seems like they're hinting at wanting a Prolog-as-database that could also become widely used, unlike actual Prolog.

It's not hyper-performant and mega web scale but the object database and Prolog like query language that comes with Picolisp is quite fun and sometimes rather useful, and has helped me think differently about how to model things in the default SQL database engines.