Hacker Newsnew | past | comments | ask | show | jobs | submit | ltabb's commentslogin

We're on the fourth major revision of Malloy, a data language aiming to replace analytical SQL. Making something that is simple, complete and understandable is really hard. 'Experimental' is no longer in our description.


In software we express our ideas through tools. In data, those tools think in rectangles. From spreadsheets to the data warehouses, to do any analytical calculation, you must first go through a rectangle. Forcing data through a rectangle shapes the way we solve problems (for example, dimensional fact tables, OLAP Cubes). But really, most data isn’t rectangular


Curious about the Malloy Language? Want to try it out with no install? Check out our new Malloy Fiddle. Malloy Fiddle runs entirely in the browser using #DuckDB and WASM. Malloy Fiddle queries semantic data models. See results as tables, JSON or SQL


If you would like to play with Malloy... You can Fiddle using just a web browser. The Malloy Fiddle uses DuckDB and WASM.

  https://twitter.com/lloydtabb/status/1567671348306264064


Try writing this SQL. Malloy writes SQL you can't (I'm pretty sure of it).

  https://twitter.com/lloydtabb/status/1556287859601985536


> Malloy writes SQL you can't (I'm pretty sure of it).

Here you go. Malloy looks like a fine tool but it's an unnecessary mental burden for people who understand SQL and I think the below is more maintainable long-term.

  with
  aggregates as (
    select
        state,
        name,
        sum(sum(number)) over (partition by state, name) as state_name_births,
        sum(sum(number)) over (partition by state) as state_births,
        sum(sum(number)) over (partition by name) as name_births,
        sum(sum(number)) over () as total_births
    from bigquery-public-data.usa_names.usa_1910_current
    group by state, name
    qualify state_name_births > 1000
  ),
  
  scoring as (
    select
      state,
      name,
      (state_name_births / state_births) / (name_births / total_births) as popularity_score,
      rank() over (partition by state order by (state_name_births / state_births) / (name_births / total_births) desc) as rnk
    from aggregates
    qualify rnk <= 20
  )
  
  select
    state,
    string_agg(name order by rnk) as names
  from scoring
  group by state
  order by state


I wrote a similar one (forgot about partitioning because I need it so rarely, adapted yours for SQL Server to compare), only 1/3 as fast compared to your solution (1.672 seconds vs 0.545), but 1/4th the reads (82,280 vs 292,066) :D

And yeah, I agree. That wasn’t actually that hard and is far more readable if you know SQL.

For the sake of completeness, here’s mine:

    WITH CTE AS (SELECT main.Name,
                        main.State,
                        main.Births,
                        (CAST(main.Births AS float) / sb.births_in_state) /
                        (CAST(nb.births_of_name AS float) / total.all_births) AS popularity,
                     ROW_NUMBER() OVER (PARTITION BY State
               ORDER BY (CAST(main.Births AS float) / sb.births_in_state) /
                        (CAST(nb.births_of_name AS float) / total.all_births) DESC) AS RankPerState
                 FROM BirthCount main
                          CROSS APPLY (SELECT SUM(Births) as all_births FROM BirthCount) total
                          CROSS APPLY (SELECT SUM(Births) as births_in_state FROM BirthCount WHERE main.State = State) sb
                          CROSS APPLY (SELECT SUM(Births) as births_of_name FROM BirthCount WHERE main.Name = Name) nb
                 WHERE main.Births > 1000
                 GROUP BY main.Name, main.State, main.Births, total.all_births, sb.births_in_state, nb.births_of_name)
    
    SELECT State, STRING_AGG(Name + ' (' + CAST(ROUND(popularity, 3) AS varchar(50)) + ')', ', ')
    FROM CTE
    WHERE RankPerState <= 10
    GROUP BY State
    ORDER BY State

edit: Forgot to show the small view I created to make it more readable:

    CREATE VIEW BirthCount AS
    (
    SELECT Name, State, SUM(Number) AS Births
    FROM Names
    GROUP BY Name, State
    )


Your Twitter thread shows the SQL that is equivalent, so it clearly can be done. Just because Malloy produces SQL that is impossible to read or write doesn't mean that the problem is only solvable with impossible-to-read SQL.

That would be like saying that Dreamweaver lets you do things that HTML+CSS don't because "look at how impossible to read the HTML output is!"


Give it a shot. Malloy writes the query in the twitter thread with a single database read. It computes the query, simultaneously at 4 levels of dimensionalization but only touches the disk once. No joins.


I'm unclear: are you describing the compilation process or the way the database's query planner handles the output query?


My first thought was that the generated SQL looks a lot like Looker, and indeed this is from them.

Nothing in that query is remotely novel or difficult. The style isn't the most readable, but any experienced data engineer has read and written far more heinous queries.

This is obvious because someone had to write the generator for these queries, and so it's not surprising that the techniques that appear in the generated queries are recognizable.

"Malloy writes queries you wouldn't enjoy writing" is more accurate, but there's no evidence it's doing anything a human can't.


Malloy writes SQL that I wouldn't write, that's for sure. Nothing there is hard to do, let alone impossible, with commonly used SQL constructs.


SQL + recursive queries (an extension supported by everyone since the late 90s) is Turing complete. Thus every arrangement and selection of data is reachable by a query. Therefore there is nothing Malloy can do that SQL can't (I'm absolutely sure of it).


There is also nothing Java can do that Brainfuck cannot. It is very easy to implement an extensible web server in BF, you just need to F your B.


In any language there are going to be certain possible arrangements which take an exponential number of words to reach. By simple counting arguments, some things are always incompressible. I can contort myself if I so desired to find something easily reachable in Brainfuck that would be tedious to reach in JS.

The only "way out" is for one language to force rote memorization of sophisticated abstractions as part of its vocabulary. Eg you can make a language that is superficially more capable than Brainfuck because it contains all of Brainfuck as its vocabulary + a few extra words that encapsulate everything you need to implement an extensible web server. But that's not really an improvement in expressive ease, you've just swept the difficulty of constructing those expressions into learning the language in the first place, then showed off how easy it is to express the stuff you just defined words to express.

You don't get more utility out of the marginal extra vocabulary than the work you put into defining the marginal extra vocabulary.


I think you’ve completely written off human limits to produce and maintain correct code, while being technically correct.

People swear when they have to extract a symbol from an expression because that pollutes a scope and reduces readability (see “assembly” as an extreme example of it). You can’t expect them to be cool with a ladder of queries to do simplest things. They choose languages for comfort much more than for technical yadda yadda. SQL is only comfortable in comparison to other '80s tech.


I learned SQL in the late 00s. I like SQL. It's not Stockholm Syndrome either, I don't think. SQL isn't stylish, and it isn't beautiful, but it's pretty clear in spite of that. More than that, SQL is a lingua franca for databases, and it's very powerful -- this is a pretty difficult thing to beat, and the only thing that I've seen that can do it is LINQ, which isn't a language so much as an API for constructing query ASTs.


Human language and reasoning ability also runs into the problem that certain ideas really do take a whole book to convey whereas others fit into fifty words (like this post). We just don't notice the limitation on our expressiveness for the same reason fish don't notice they live in water.


I think OP meant "can't" as in "probably wouldn't be able to come up with the queries", not that it's literally impossible. Malloy compiles to SQL, so it's by definition impossible for it to express something SQL cannot.


By that definition, in that case, OP "can't" express themselves in plain English.


Pretty sure OP meant emphasis on _you_ can't. As in, things a human can't do.


In practice, it may be much easier to write a malloy query than the equivalent sql query.


Or it may not be.

Showcase the typical way you'd write a complicated SQL query and compare it against Malloy. It's like looking at EF4-6 output and thinking it's impressive and doing superhuman tasks because the generated queries were suboptimal and complicated.


Seems like all the magic is in this mysterious "all" thing, which is perfect for this particular problem... but I can't tell if it's part of a general coherent approach or just a handy gadget that makes Malloy look good on this problem.


We're hoping its more of a 'typescript' but yeah :)


Sorry, the link to the documentation is buried.

https://looker-open-source.github.io/malloy/documentation/la...


Scroll to the bottom of the page. It shows the returned JSON

https://looker-open-source.github.io/malloy/documentation/pa...

Here is the SQL we generate.

https://gist.github.com/lloydtabb/8c144d2dac978dda9bf3ec4d6b...


> What's nest actually doing?

1) Nesting builds nested results (like GraphQL). This is particularly hard to do in SQL but is allows very large complex data sets to be returned in a single query.

https://looker-open-source.github.io/malloy/documentation/la...

For example the dashboard on the page below is a single SQL query:

https://looker-open-source.github.io/malloy/documentation/ex...

> How does aggregate know what function to use?

You can pre-define calculations with 'measure:' or decalre them explicitly in a query.

> where's the windowing

It's missing with a bunch of other things (like union for example). Its coming of course. The goal is that everything represent able in SQL is represent able in Malloy.

> heck does "order by: 2" mean

Same thing as it does in SQL. We try and have reasonable defaults.

https://looker-open-source.github.io/malloy/documentation/la...

> "How is this actually easier?"

The goal here is to be able to create data models that are re-usable and compose-able and verifiable. Yeah, there is a learning curve as there is with anything powerful. There are many things expressible in Malloy that cannot be easily expressed in SQL.


Thanks, that is reasonable feedback. We'll work on updating the documentation.

Here is an example of Malloy solving today's worlde in 50 lines of Malloy

https://looker-open-source.github.io/malloy/documentation/ex...

And here is the equivalent SQL.

https://gist.github.com/lloydtabb/32f46e7ecbb2da1a443d1adbe9...


Is it actually equivalent? I know the SQL is compiled from Malloy but I imagine the handwritten version, with comments, would look better in comparison.


I've certainly done plenty of complex sql querying in my time, but I don't think making nice-looking code for solving wordle bears much relation to normal analytical sql workflows.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: