Judging from your lol, you must surely remember debian aliot comparison pages, o...

kllrnohj · on May 24, 2020

Those comparison pages still exist: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Time has not been kind it seems. Not just the gap in performance, but also the gap in memory usage.

tom_mellior · on May 24, 2020

The C and C++ versions of those benchmarks are manually vectorized to death using vector intrinsics. You don't have those in Java, nor in the standard versions of C and C++. So yes, those speedups are real, if you invest a lot of work. But if you don't, there is no magical 5x speedup of C++ over Java.

kllrnohj · on May 24, 2020

And the Java ones don't look anything at all like typical Java, either. But no, the C/C++ ones have not been all manually vectorized to death. The binary-trees one, for example, is a fairly clean C++ implementation, and runs in less than half the time using less than half the memory of the Java version. It looks like 'only' 4 of the C++ ones use any vectorization intrinsics.

> But if you don't, there is no magical 5x speedup of C++ over Java.

Nobody said anything about "magic." AOTs are really good. Value types are really good. Return types consistently being on the stack without requiring escape analysis is really good.

Java's performance is impressive for how crippled it is by the language, but HotSpot is definitely far from magic. It can't recover from the limitations of the language. You're fully paying for the "high level" & simple nature of Java.

tom_mellior · on May 25, 2020

> And the Java ones don't look anything at all like typical Java, either.

Which ones do you mean?

> But no, the C/C++ ones have not been all manually vectorized to death.

True.

> The binary-trees one, for example, is a fairly clean C++ implementation, and runs in less than half the time using less than half the memory of the Java version.

Unfortunately I can't benchmark this myself because it uses some library I've never heard of. The numbers on the Debian site look pretty outdated, Java binary-trees takes about 1800 ms on my machine. And about 1450 ms after letting it warm up.

> AOTs are really good.

So are JITs, but we are both really deep in hand-waving territory here.

> Return types consistently being on the stack without requiring escape analysis is really good.

The fast path of object allocation is bumping a pointer into an allocation buffer and checking it against the buffer's limit. Objects that are short-lived enough that you would want to return them by value in C++ will usually never leave the allocation buffer. So it's not the exact same thing as allocating on the call stack, but it's not far.

kllrnohj · on May 26, 2020

> So it's not the exact same thing as allocating on the call stack, but it's not far.

Unfortunately it is far. Creating an address is quick, yes. But that address consistently won't be in L1/L2. Likely won't even be in L3. And if it's a return value containing more than one allocation that's easily more than one cache miss along with dependent reads.

igouy · on May 26, 2020

> … some library I've never heard of…

http://apr.apache.org/docs/apr/1.5/group__apr__pools.html

> The numbers on the Debian site look pretty outdated…

slow ?

unrealhoang · on May 24, 2020

That is, if you write your program for the benchmark, i.e. writing C in Java by writing procedural code with packed primitives, or juggling and write all magic hacks just to make the code falls into correct JIT path. Otherwise, if you write normal OOP Java, 10x should be the performance gap.

tom_mellior · on May 24, 2020

Are you saying that the Java benchmarks linked above are written using "magic hacks"? None of them are even close to a 10x gap, despite the fact that some of the C++ ones do use magic hacks (vector intrinsics).

erichocean · on May 24, 2020

10x gaps occur whenever you have to interface Java with the real world—disks, memory, CPUs, virtual memory subsystems, networking stacks, etc. to get high-performance.

That's why Cassandra has so much C++ in it, and why ScyllaDB is so much faster still.

It's not the C++ per se is "faster" than Java, it's what C++ lets you easily do that Java doesn't.

Other comments have said—well, Java is more maintainable. That's also highly-dependent on the context. ScyllaDB has a much better developer velocity than Cassandra, too. (Anyone can easily verify this.) I use the Cassandra/ScyllaDB example frequently because they implement the same spec, and do so in a compatible way.

It's also really easy to put C++ on the fast path, by adding Python to the mix. For "business logic"-like situations (supposedly the bread and butter of Java), that's what actual companies do, here on Earth, with people: use Python for the easy stuff.

I like Java, and there are certainly some very high-performance Java projects (LMAX, Aeron) and it's very productive, has great tooling, and tons of libraries. There's nothing wrong with it. You can even layer more productive languages on top of the JVM. Win.

What I have a problem with are claims that "all this C++ code can be replaced with Java" at some hand-wavey minor cost. That's…not true.

People are not stupid, they use C++ today because it can do the job when nothing else really can—Java included.

P.s. It's not even true that Java allows you to "forget about memory management". I don't know why people keep saying that, but it's objectively false. If you care about performance, you have to be aware of memory allocations. The GC is not some magic "make my code run fast" card.

Furthermore, there are so many kinds of resources beyond memory! And the GC is an impediment in many cases to using those kinds of resources effectively. C++ has an extremely good story when it comes to managing every kind of computing resource in a large, maintainable codebase.

tom_mellior · on May 24, 2020

> 10x gaps occur whenever you have to interface Java with the real world—disks, memory, CPUs, virtual memory subsystems, networking stacks, etc. to get high-performance.

As noted before, the CPU- and memory-intensive benchmarks upthread don't even show a factor of 10x. Despite the fact that the C++ is heavily hand-optimized in ways that are not accessible to Java programs, and the benchmarks being very short running, so heavily penalizing Java's JIT compilation. Please come off the 10x horse, it makes you look like you are arguing from prejudice, not from data. Even for hyperbole 10x is way too much.

I have experience working on high-performance compilers for both Java and C++, and I can promise you that Java compilers don't generate so very different code for using the CPU or memory. Yes, Java has some overheads, but it also has some tricks up its sleave.

nicoburns · on May 24, 2020

10x does seem like too much (although no doubt there are some cases where this happens). JavaScript is usually faster than that these days.

dirtydroog · on May 24, 2020

If you give gcc the appropriate 'march' parameter it can use vectorisation automatically if it figures out it can do it.

The main argument here is that native languages let you get the most out of your CPU if you want, whereas with the JVM you're probably stuck unless you use JNI, in which case why not just go straight to native?

tom_mellior · on May 24, 2020

> If you give gcc the appropriate 'march' parameter it can use vectorisation automatically if it figures out it can do it.

Yes, I'm aware of loop vectorization in C compilers. I'm also aware of loop vectorization in Java JIT compilers. Do you think there is anything specific to C or C++ that makes this easier than in Java? There isn't.

> The main argument here is that native languages let you get the most out of your CPU if you want

Most applications, including probably more than 99% of Chromium, are not the kind of code that would benefit from manually written vector intrinsics. Does the "main argument" claim that these 99% would also be 5x or 10x slower if they were written in Java?