Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Blaze: Fast query execution engine for Apache Spark (github.com/blaze-init)
92 points by sbt567 on Oct 20, 2023 | hide | past | favorite | 13 comments


Any comparisons with Databricks Spark. When we started experimenting with Spark, we initially used AWS EMR. But then the same code was way faster on Databricks than it was on EMR, which resulted in us ditching EMR.


Databricks has kept their Photon[1][2] query engine for Spark closed sourced thus far. Unless EMR has made equivalent changes to the Spark runtime they use Databricks should be much faster. Photon brings the standard vectorized execution techniques used in SQL data warehouses for many years to Spark.

[1] https://docs.databricks.com/en/clusters/photon.html [2] https://dl.acm.org/doi/10.1145/3514221.3526054


I am a bit hazy about the exact details of how we did it since its been some time, but we definitely did not use Photon as it was too expensive.

One of the issues was that we started experimenting with Delta Tables and EMR was horrible in leveraging that.


It would be great to have a comparison to Dataframes and RDDs as well.


DataFrames are just SQL. There will be no performance difference.

RDDs will be worse, so it shouldn't matter. No vectorization, no column processing, lots of serialization and de-serialization. They're basically always slower than DataFrames barring some strange use case.


Got numbers?


Interesting, looks like it is just DataFusion engine for Spark. There is a similar project: https://github.com/oap-project/gluten - it brings ClickHouse as an engine to Spark.


Photon, velox, and now this. Why would people use spark in the first place other than for legacy application reasons?


For a split second, I thought bazel[0] finally got externally renamed to its true name.

[0] https://en.m.wikipedia.org/wiki/Bazel_(software)


Unfortunate name overlap with an under-loved PyData project: https://blaze.pydata.org


And Google's version of Bazel.


The public version was renamed Bazel because of name conflicts.


Same. I've always liked the blaze project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: