We're on the fourth major revision of Malloy, a data language aiming to replace analytical SQL. Making something that is simple, complete and understandable is really hard. 'Experimental' is no longer in our description.
In software we express our ideas through tools. In data, those tools think in rectangles. From spreadsheets to the data warehouses, to do any analytical calculation, you must first go through a rectangle. Forcing data through a rectangle shapes the way we solve problems (for example, dimensional fact tables, OLAP Cubes). But really, most data isn’t rectangular
Curious about the Malloy Language? Want to try it out with no install? Check out our new Malloy Fiddle. Malloy Fiddle runs entirely in the browser using #DuckDB and WASM. Malloy Fiddle queries semantic data models. See results as tables, JSON or SQL
> Malloy writes SQL you can't (I'm pretty sure of it).
Here you go. Malloy looks like a fine tool but it's an unnecessary mental burden for people who understand SQL and I think the below is more maintainable long-term.
with
aggregates as (
select
state,
name,
sum(sum(number)) over (partition by state, name) as state_name_births,
sum(sum(number)) over (partition by state) as state_births,
sum(sum(number)) over (partition by name) as name_births,
sum(sum(number)) over () as total_births
from bigquery-public-data.usa_names.usa_1910_current
group by state, name
qualify state_name_births > 1000
),
scoring as (
select
state,
name,
(state_name_births / state_births) / (name_births / total_births) as popularity_score,
rank() over (partition by state order by (state_name_births / state_births) / (name_births / total_births) desc) as rnk
from aggregates
qualify rnk <= 20
)
select
state,
string_agg(name order by rnk) as names
from scoring
group by state
order by state
I wrote a similar one (forgot about partitioning because I need it so rarely, adapted yours for SQL Server to compare), only 1/3 as fast compared to your solution (1.672 seconds vs 0.545), but 1/4th the reads (82,280 vs 292,066) :D
And yeah, I agree. That wasn’t actually that hard and is far more readable if you know SQL.
For the sake of completeness, here’s mine:
WITH CTE AS (SELECT main.Name,
main.State,
main.Births,
(CAST(main.Births AS float) / sb.births_in_state) /
(CAST(nb.births_of_name AS float) / total.all_births) AS popularity,
ROW_NUMBER() OVER (PARTITION BY State
ORDER BY (CAST(main.Births AS float) / sb.births_in_state) /
(CAST(nb.births_of_name AS float) / total.all_births) DESC) AS RankPerState
FROM BirthCount main
CROSS APPLY (SELECT SUM(Births) as all_births FROM BirthCount) total
CROSS APPLY (SELECT SUM(Births) as births_in_state FROM BirthCount WHERE main.State = State) sb
CROSS APPLY (SELECT SUM(Births) as births_of_name FROM BirthCount WHERE main.Name = Name) nb
WHERE main.Births > 1000
GROUP BY main.Name, main.State, main.Births, total.all_births, sb.births_in_state, nb.births_of_name)
SELECT State, STRING_AGG(Name + ' (' + CAST(ROUND(popularity, 3) AS varchar(50)) + ')', ', ')
FROM CTE
WHERE RankPerState <= 10
GROUP BY State
ORDER BY State
edit: Forgot to show the small view I created to make it more readable:
CREATE VIEW BirthCount AS
(
SELECT Name, State, SUM(Number) AS Births
FROM Names
GROUP BY Name, State
)
Your Twitter thread shows the SQL that is equivalent, so it clearly can be done. Just because Malloy produces SQL that is impossible to read or write doesn't mean that the problem is only solvable with impossible-to-read SQL.
That would be like saying that Dreamweaver lets you do things that HTML+CSS don't because "look at how impossible to read the HTML output is!"
Give it a shot. Malloy writes the query in the twitter thread with a single database read. It computes the query, simultaneously at 4 levels of dimensionalization but only touches the disk once. No joins.
My first thought was that the generated SQL looks a lot like Looker, and indeed this is from them.
Nothing in that query is remotely novel or difficult. The style isn't the most readable, but any experienced data engineer has read and written far more heinous queries.
This is obvious because someone had to write the generator for these queries, and so it's not surprising that the techniques that appear in the generated queries are recognizable.
"Malloy writes queries you wouldn't enjoy writing" is more accurate, but there's no evidence it's doing anything a human can't.
SQL + recursive queries (an extension supported by everyone since the late 90s) is Turing complete. Thus every arrangement and selection of data is reachable by a query. Therefore there is nothing Malloy can do that SQL can't (I'm absolutely sure of it).
In any language there are going to be certain possible arrangements which take an exponential number of words to reach. By simple counting arguments, some things are always incompressible. I can contort myself if I so desired to find something easily reachable in Brainfuck that would be tedious to reach in JS.
The only "way out" is for one language to force rote memorization of sophisticated abstractions as part of its vocabulary. Eg you can make a language that is superficially more capable than Brainfuck because it contains all of Brainfuck as its vocabulary + a few extra words that encapsulate everything you need to implement an extensible web server. But that's not really an improvement in expressive ease, you've just swept the difficulty of constructing those expressions into learning the language in the first place, then showed off how easy it is to express the stuff you just defined words to express.
You don't get more utility out of the marginal extra vocabulary than the work you put into defining the marginal extra vocabulary.
I think you’ve completely written off human limits to produce and maintain correct code, while being technically correct.
People swear when they have to extract a symbol from an expression because that pollutes a scope and reduces readability (see “assembly” as an extreme example of it). You can’t expect them to be cool with a ladder of queries to do simplest things. They choose languages for comfort much more than for technical yadda yadda. SQL is only comfortable in comparison to other '80s tech.
I learned SQL in the late 00s. I like SQL. It's not Stockholm Syndrome either, I don't think. SQL isn't stylish, and it isn't beautiful, but it's pretty clear in spite of that. More than that, SQL is a lingua franca for databases, and it's very powerful -- this is a pretty difficult thing to beat, and the only thing that I've seen that can do it is LINQ, which isn't a language so much as an API for constructing query ASTs.
Human language and reasoning ability also runs into the problem that certain ideas really do take a whole book to convey whereas others fit into fifty words (like this post). We just don't notice the limitation on our expressiveness for the same reason fish don't notice they live in water.
I think OP meant "can't" as in "probably wouldn't be able to come up with the queries", not that it's literally impossible. Malloy compiles to SQL, so it's by definition impossible for it to express something SQL cannot.
Showcase the typical way you'd write a complicated SQL query and compare it against Malloy. It's like looking at EF4-6 output and thinking it's impressive and doing superhuman tasks because the generated queries were suboptimal and complicated.
Seems like all the magic is in this mysterious "all" thing, which is perfect for this particular problem... but I can't tell if it's part of a general coherent approach or just a handy gadget that makes Malloy look good on this problem.
1) Nesting builds nested results (like GraphQL). This is particularly hard to do in SQL but is allows very large complex data sets to be returned in a single query.
You can pre-define calculations with 'measure:' or decalre them explicitly in a query.
> where's the windowing
It's missing with a bunch of other things (like union for example). Its coming of course. The goal is that everything represent able in SQL is represent able in Malloy.
> heck does "order by: 2" mean
Same thing as it does in SQL. We try and have reasonable defaults.
The goal here is to be able to create data models that are re-usable and compose-able and verifiable. Yeah, there is a learning curve as there is with anything powerful. There are many things expressible in Malloy that cannot be easily expressed in SQL.
Is it actually equivalent? I know the SQL is compiled from Malloy but I imagine the handwritten version, with comments, would look better in comparison.
I've certainly done plenty of complex sql querying in my time, but I don't think making nice-looking code for solving wordle bears much relation to normal analytical sql workflows.