Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's also Blogel [0] which is a distributed graph processing framework in C++ and it runs significantly faster than its counterpart in Java, Apache Giraph [1].

I have started wondering if the big data developers really care about the speed; the advantages of these Java softwares start to fade out when compared with their C++ counterparts.

[0] - http://www.cse.cuhk.edu.hk/blogel/

[1] - http://www.cse.cuhk.edu.hk/blogel/papers/blogel.pdf



If you just measure milliseconds yes.

If you measure project costs, including the salaries of the developers and amount of development days, then no.

This is the main reason why there is such a big pressure from trading folks for Oracle to improve Java regarding value types and FFI to native code.


I think with Thrill, there are two different skill levels to be distinguished:

- Using it to implement things should be fairly easy and doesn't require advanced knowledge of C++. Basically you have to plug lambdas that do the processing into the provided operations, similar to Spark, but using C++ syntax. It might require some compiler error parsing skills, but altogether it shouldn't be too different from using Spark with Java/Scala

- Extending Thrill requires familiarity with modern C++, possibly including advanced template tricks.

Since there isn't a whole lot of advanced stuff available for Thrill (yet), that means that currently people with the latter skills would most likely be required at the moment. But in a world where the same libraries available for Spark are available for Thrill or a similar C++ framework, that wouldn't be the case. Note that Thrill is currently quite experimental.

I guess it's a trade-off, but dismissing the potential for 10x runtime gains "because C++" seems too one-sided. That isn't to say that the C++ frameworks don't have a long way to go before they can rival Spark etc in ease of use and tooling, they do! But at least they point out the inefficiencies and potential for improvement in these existing systems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: