Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article talks about how SQL lacks composability. I would like to know everyones thoughts about this.

This is a huge issue with programming in general not exclusive to SQL. Everyone would like to build programs that are modular and reusable but programming paradigms have been traveling in directions that prevent this from happening. Many people turn to design patterns or microservices to try to deal with this organizational issue but they fail to see that the lower level programming paradigm itself is the precursor to the problem.

In SQL the problem occurs in the statement itself. The WHERE clause or the SELECT clause cannot be reused anywhere else. I can't modularize a where clause and put it in another SQL statement. I have to rewrite the entire clause to reuse it.

In OOP the same issue occurs. In OOP your class tends to contain methods that are not combinators, or in other words methods that modify a free variable. Due to this like the SQL expression, Objects cannot be decomposed either. I cannot reuse a setter in another class or anywhere else outside of the context of the free variable it modifies.

In both cases there comes a time in the future of an application where programmers realize that similar logic could be reused but structural problems are preventing the reuse from happening so they have to implement a hack to get around it.

The issue is that everyone is unaware of this trend at a low level. They are unaware that SQL lacks composability just like how they are unaware that OOP lacks composability as well. But they are aware of this issue at a higher level and they tend to call it "technical debt" or some high level design problem.

Case in point: https://news.ycombinator.com/item?id=24732789

Most commenters above talk about minor syntactical issues and fail to address what is not only IMO the main issue, but the main issue that the article itself is addressing. Likely because they're all unaware of the true nature of the composability issue and just didn't completely understand what the article was saying.

Also note that when I talk about composition in OOP I am not talking about "object composition." These are completely different usages of the word.



I don't really feel like composability/modularity is all that important in SQL.

I want modularity in programs because they are large, and without proper abstraction, impossible to manage.

SQL queries are generally really short. Rarely more than a few lines (might be different for people doing ad-hoc analysis instead of making a db backed application). I don't need modularity for a program that is only a few lines long.

The lack of really natural integration into modern day programming languages (and data model mismatches) is a much bigger issue imo


>The lack of really natural integration into modern day programming languages (and data model mismatches) is a much bigger issue imo

The SQL data types consist of a few primitives like ints strings and chars placed in higher order data types that are tables. These types are easily isomorphic to data structures and primitive types in traditional programming languages. There is zero mismatch here, in fact the data structures in application programming languages tend to be much richer than SQL.

See orms for reference. ORMS have their own set of problems but data mismatches are not part of those problems. If the programming language enables the creation of sum types like rust or haskell than there can never be a mismatch as these languages can produce virtually any type.

>SQL queries are generally really short. Rarely more than a few lines (might be different for people doing ad-hoc analysis instead of making a db backed application). I don't need modularity for a program that is only a few lines long.

For complex applications this is not true. In general the classic model for web development is to place as much logic as possible into the SQL query and as little logic as possible into your heavy Python web app. The web app is suppose to serve as something that routes IO the bulk of your code/logic and heavy lifting should be shifted to the database. Simple apps can avoid this but in general complex apps cannot.

Case in point do you put your where clause in the application program than download the entire table? Or do you throw as much logic as possible into the query so the database outputs a result as close as possible to what you need? The later statement is the right answer.

>I don't really feel like composability/modularity is all that important in SQL.

You're not completely wrong. The bigger issue is SQL optimization. Because Databases form the backbone of computation for a standard web app SQL provides a layer of indirection that makes optimization harder. For C++ optimization is built into the syntax itself, you make choices while coding to optimize things. For SQL you open up the black box and look into the query planner to see what your High level code is compiling too. SQL is a really bad interface for doing optimizations but that's a topic for another day. The topic of this post is modularity and he's not wrong... SQL is not a composable language and there's no performance loss in making it more composeable.


> in application programming languages tend to be much richer than SQL.

Hence why i say why there is a data model mismatch.

E.g. i wouldn't say that assembly and haskell have a compatible data model just because assembly is a sequence of bytes, and haskell is a superset of that.

Not that a data model is solely about the types involved.

> See orms for reference. ORMS have their own set of problems but data mismatches are not part of those problems.

ORMs are notorious for being a leaky abstraction. Largely because the object and relational data model dont entirely match.

> In general the classic model for web development is to place as much logic as possible into the SQL query and as little logic as possible into your heavy Python web app.

When was the last time you wrote a 500 line sql query? A thousand line? My point is that sql queries are short enough, that further abstraction is not really missed. That doesn't mean you should put 0 logic in your query.

> Case in point do you put your where clause in the application program than download the entire table? Or do you throw as much logic as possible into the query so the database outputs a result as close as possible to what you need? The later statement is the right answer.

This is a strawman.

> The bigger issue is SQL optimization. Because Databases form the backbone of computation for a standard web app SQL provides a layer of indirection that makes optimization harder

I disagree. At scale you have data that has mixed cardinality. The indirection allows the db to chose the best algorithm given the size of underlying data at runtime. Sometimes that doesn't work properly, but the vast majority of time it is a significant benefit. Its sort of like how sometimes compilers dont work properly and you need to hand optimize, but in practise that is rare and you wouldn't throw out the compiler because the other 95% of time its better and lower effort than if you had to always do it by hand.


>Hence why i say why there is a data model mismatch.

There is no data mismatch if the data model is richer, than it can cover it. Unless you're saying SQL is not rich enough to cover the PL.

Well you can implement a database that does that, it just wouldn't be table based data storage anymore. Postgresql supports JSON and enums if you want it but the minute you use these types things become less predictable in terms of performance.

>ORMs are notorious for being a leaky abstraction. Largely because the object and relational data model dont entirely match.

Wrong ORMs are notoriously leaky because the query translation is extra indirection. You need to compile to SQL and the SQL needs to compile to a query plan. This is why ORMS are bad.

As for the data types, Objects easily mimic types in SQL. You basically rarely ever encounter problems with incompatible type systems.

>When was the last time you wrote a 500 line sql query? A thousand line? My point is that sql queries are short enough, that further abstraction is not really missed. That doesn't mean you should put 0 logic in your query.

When did you last write a 500 line function? When did you last write a 100 line function? You shouldn't be doing that if you are. Case in point, the total lines of SQL written in any source code usually well exceed over 500 and that is a case for composability. T

This is no different than regular source code. If you're writing 500 line functions in your source code than you're not even taking advantage of modularity in programming languages outside of SQL so of course for you it doesn't matter. Because composition doesn't matter for you period.

>This is a strawman.

No it's a statement written in the english language. A strawman is something that doesn't EXIST, because men are made out of flesh not straw.

It's sort of similar to the non-existence of what was suppose to be written in response to my statement: an actual counterpoint or an argument. But then how can you write such a thing if it doesn't exist?

>I disagree. At scale you have data that has mixed cardinality. The indirection allows the db to chose the best algorithm given the size of underlying data at runtime. Sometimes that doesn't work properly, but the vast majority of time it is a significant benefit. Its sort of like how sometimes compilers dont work properly and you need to hand optimize, but in practise that is rare and you wouldn't throw out the compiler because the other 95% of time its better and lower effort than if you had to always do it by hand.

Sure but when it comes time to hand optimize SQL is extremely bad and you need to hand optimize SQL all the time. Not necessarily most of the time, but enough times that it's a real problem. That is the point and that is what I said. It's basically the biggest headache with SQL. If you've ever done analytics you'd know this is a huge problem.

It's better to have a language that allows explicit decorators that allow the programmer to choose optimization procedures when needed. Instead in SQL often optimizations come in the form of hacks. Case in point: SELECT * FROM A is worse then SELECT A.column1 FROM A. The later optimization comes in the form of a language hack and not explicit syntax.

Not saying the alternative mentioned in the article would solve this problem nor am I saying a solution exists... but if there's any big problem with SQL today it's hand optimization for sure.


I've just learned SQL by doing. You're not including extensions here like stored procs and such right?

I have a lot of "where my_func(x, y, z) = 1" type where clauses, so seem that would do what you say, no?


That's a hack. The where clause itself is not decomposable. But if you made every where clause a single function taking in multiple parameters than yes that function is composable but you're taking extra steps to do a non-traditional coding style.

Not even sure if stored procedures are part of the sql standard.... these seem to me to be just specific syntax additions added on by specific databases.

I mean it works so why not. I could code all my SQL this way.


Is this really a SQL feature or a relational feature? The essence of the relational model is that names are known up front: an attribute of one relation is not an attribute of another.


> Is this really a SQL feature or a relational feature?

SQL.

There is no reason you couldn't have a composable implementaiton of relational logic.

> The essence of the relational model is that names are known up front

That every relation consist of tuples each member of which consists of a name, a type, and a value is part of the model, sure. That doesn't impose restrictions on composability.

> an attribute of one relation is not an attribute of another.

An attribute of one relation may well be an attribute of another.

There's no reason you couldn't have a relational language that let you store and reuse clauses.


"attributes" are "generic" properties of relations.

So for example the union operator can take two different relations that have the same columns but different rows and create a new relation.

In this binary operation that takes two different relations.... the attributes of one relation (the columns) is indeed the attributes of the other relation.

See here: https://www.geeksforgeeks.org/basic-operators-in-relational-...

Look for "union" in the link above.

>Union on two relations R1 and R2 can only be computed if R1 and R2 are union compatible (These two relation should have same number of attributes and corresponding attributes in two relations have same domain)

It's the same thing for unary operations like select. A select expression can operate on different relations providing that the relations have the relevant attributes.


mixins are a good first step towards composability in OOP (also doesn't javascript have rebindable methods? i'm not very familiar with the language, but i think it makes it pretty easy to bind a function to a new object as long as it contains the same members that the function refers to)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: