The article talks about how SQL lacks composability. I would like to know everyo...

bawolff · on Oct 9, 2020

I don't really feel like composability/modularity is all that important in SQL.

I want modularity in programs because they are large, and without proper abstraction, impossible to manage.

SQL queries are generally really short. Rarely more than a few lines (might be different for people doing ad-hoc analysis instead of making a db backed application). I don't need modularity for a program that is only a few lines long.

The lack of really natural integration into modern day programming languages (and data model mismatches) is a much bigger issue imo

nendroid · on Oct 9, 2020

>The lack of really natural integration into modern day programming languages (and data model mismatches) is a much bigger issue imo

The SQL data types consist of a few primitives like ints strings and chars placed in higher order data types that are tables. These types are easily isomorphic to data structures and primitive types in traditional programming languages. There is zero mismatch here, in fact the data structures in application programming languages tend to be much richer than SQL.

See orms for reference. ORMS have their own set of problems but data mismatches are not part of those problems. If the programming language enables the creation of sum types like rust or haskell than there can never be a mismatch as these languages can produce virtually any type.

>SQL queries are generally really short. Rarely more than a few lines (might be different for people doing ad-hoc analysis instead of making a db backed application). I don't need modularity for a program that is only a few lines long.

For complex applications this is not true. In general the classic model for web development is to place as much logic as possible into the SQL query and as little logic as possible into your heavy Python web app. The web app is suppose to serve as something that routes IO the bulk of your code/logic and heavy lifting should be shifted to the database. Simple apps can avoid this but in general complex apps cannot.

Case in point do you put your where clause in the application program than download the entire table? Or do you throw as much logic as possible into the query so the database outputs a result as close as possible to what you need? The later statement is the right answer.

>I don't really feel like composability/modularity is all that important in SQL.

You're not completely wrong. The bigger issue is SQL optimization. Because Databases form the backbone of computation for a standard web app SQL provides a layer of indirection that makes optimization harder. For C++ optimization is built into the syntax itself, you make choices while coding to optimize things. For SQL you open up the black box and look into the query planner to see what your High level code is compiling too. SQL is a really bad interface for doing optimizations but that's a topic for another day. The topic of this post is modularity and he's not wrong... SQL is not a composable language and there's no performance loss in making it more composeable.

bawolff · on Oct 10, 2020

> in application programming languages tend to be much richer than SQL.

Hence why i say why there is a data model mismatch.

E.g. i wouldn't say that assembly and haskell have a compatible data model just because assembly is a sequence of bytes, and haskell is a superset of that.

Not that a data model is solely about the types involved.

> See orms for reference. ORMS have their own set of problems but data mismatches are not part of those problems.

ORMs are notorious for being a leaky abstraction. Largely because the object and relational data model dont entirely match.

> In general the classic model for web development is to place as much logic as possible into the SQL query and as little logic as possible into your heavy Python web app.

When was the last time you wrote a 500 line sql query? A thousand line? My point is that sql queries are short enough, that further abstraction is not really missed. That doesn't mean you should put 0 logic in your query.

> Case in point do you put your where clause in the application program than download the entire table? Or do you throw as much logic as possible into the query so the database outputs a result as close as possible to what you need? The later statement is the right answer.

This is a strawman.

> The bigger issue is SQL optimization. Because Databases form the backbone of computation for a standard web app SQL provides a layer of indirection that makes optimization harder

I disagree. At scale you have data that has mixed cardinality. The indirection allows the db to chose the best algorithm given the size of underlying data at runtime. Sometimes that doesn't work properly, but the vast majority of time it is a significant benefit. Its sort of like how sometimes compilers dont work properly and you need to hand optimize, but in practise that is rare and you wouldn't throw out the compiler because the other 95% of time its better and lower effort than if you had to always do it by hand.

nendroid · on Oct 10, 2020

>Hence why i say why there is a data model mismatch.

There is no data mismatch if the data model is richer, than it can cover it. Unless you're saying SQL is not rich enough to cover the PL.

Well you can implement a database that does that, it just wouldn't be table based data storage anymore. Postgresql supports JSON and enums if you want it but the minute you use these types things become less predictable in terms of performance.

>ORMs are notorious for being a leaky abstraction. Largely because the object and relational data model dont entirely match.

Wrong ORMs are notoriously leaky because the query translation is extra indirection. You need to compile to SQL and the SQL needs to compile to a query plan. This is why ORMS are bad.

As for the data types, Objects easily mimic types in SQL. You basically rarely ever encounter problems with incompatible type systems.

>When was the last time you wrote a 500 line sql query? A thousand line? My point is that sql queries are short enough, that further abstraction is not really missed. That doesn't mean you should put 0 logic in your query.

When did you last write a 500 line function? When did you last write a 100 line function? You shouldn't be doing that if you are. Case in point, the total lines of SQL written in any source code usually well exceed over 500 and that is a case for composability. T

This is no different than regular source code. If you're writing 500 line functions in your source code than you're not even taking advantage of modularity in programming languages outside of SQL so of course for you it doesn't matter. Because composition doesn't matter for you period.

>This is a strawman.

No it's a statement written in the english language. A strawman is something that doesn't EXIST, because men are made out of flesh not straw.

It's sort of similar to the non-existence of what was suppose to be written in response to my statement: an actual counterpoint or an argument. But then how can you write such a thing if it doesn't exist?

>I disagree. At scale you have data that has mixed cardinality. The indirection allows the db to chose the best algorithm given the size of underlying data at runtime. Sometimes that doesn't work properly, but the vast majority of time it is a significant benefit. Its sort of like how sometimes compilers dont work properly and you need to hand optimize, but in practise that is rare and you wouldn't throw out the compiler because the other 95% of time its better and lower effort than if you had to always do it by hand.

Sure but when it comes time to hand optimize SQL is extremely bad and you need to hand optimize SQL all the time. Not necessarily most of the time, but enough times that it's a real problem. That is the point and that is what I said. It's basically the biggest headache with SQL. If you've ever done analytics you'd know this is a huge problem.

It's better to have a language that allows explicit decorators that allow the programmer to choose optimization procedures when needed. Instead in SQL often optimizations come in the form of hacks. Case in point: SELECT * FROM A is worse then SELECT A.column1 FROM A. The later optimization comes in the form of a language hack and not explicit syntax.

Not saying the alternative mentioned in the article would solve this problem nor am I saying a solution exists... but if there's any big problem with SQL today it's hand optimization for sure.

magicalhippo · on Oct 9, 2020

I've just learned SQL by doing. You're not including extensions here like stored procs and such right?

I have a lot of "where my_func(x, y, z) = 1" type where clauses, so seem that would do what you say, no?

nendroid · on Oct 9, 2020

That's a hack. The where clause itself is not decomposable. But if you made every where clause a single function taking in multiple parameters than yes that function is composable but you're taking extra steps to do a non-traditional coding style.

Not even sure if stored procedures are part of the sql standard.... these seem to me to be just specific syntax additions added on by specific databases.

I mean it works so why not. I could code all my SQL this way.

cafard · on Oct 9, 2020

Is this really a SQL feature or a relational feature? The essence of the relational model is that names are known up front: an attribute of one relation is not an attribute of another.

dragonwriter · on Oct 9, 2020

> Is this really a SQL feature or a relational feature?

SQL.

There is no reason you couldn't have a composable implementaiton of relational logic.

> The essence of the relational model is that names are known up front

That every relation consist of tuples each member of which consists of a name, a type, and a value is part of the model, sure. That doesn't impose restrictions on composability.

> an attribute of one relation is not an attribute of another.

An attribute of one relation may well be an attribute of another.

There's no reason you couldn't have a relational language that let you store and reuse clauses.

nendroid · on Oct 9, 2020

"attributes" are "generic" properties of relations.

So for example the union operator can take two different relations that have the same columns but different rows and create a new relation.

In this binary operation that takes two different relations.... the attributes of one relation (the columns) is indeed the attributes of the other relation.

See here: https://www.geeksforgeeks.org/basic-operators-in-relational-...

Look for "union" in the link above.

>Union on two relations R1 and R2 can only be computed if R1 and R2 are union compatible (These two relation should have same number of attributes and corresponding attributes in two relations have same domain)

It's the same thing for unary operations like select. A select expression can operate on different relations providing that the relations have the relevant attributes.

zem · on Oct 10, 2020

mixins are a good first step towards composability in OOP (also doesn't javascript have rebindable methods? i'm not very familiar with the language, but i think it makes it pretty easy to bind a function to a new object as long as it contains the same members that the function refers to)