J is for JVM: Why the ‘J’ in JRuby?

stephenjudkins · on Nov 30, 2009

JRuby has been invaluable at scaling out our web scraping. Specifically, we've managed to leverage its concurrency support very effectively.

We're scraping many millions of pages, so a single-threaded scraper wouldn't be practical. Since most of the time, a given thread is waiting on IO, we get our highest throughput with a few dozen scraper threads running at once.

Trying to use multiple threads in MRI is pretty much a big mess. For whatever reason, it starts falling down after three or four. Ruby 1.9's fibers might be a good solution but it wasn't out of beta when we started this project.

Using EventMachine with MRI might get us the same effect, but changing a relatively large synchronous codebase to work asynchronously is a big task for an uncertain benefit. With JRuby, we simply pointed our (synchronous) Ruby code to a event-driven HTTP client which uses futures to block each thread. Threads are certainly heavier-weight than some concurrency primitives, but with only a few dozen the JVM is great at holding all the relevant state.

We've also been able to leverage tools in java.util.concurrent and use Scala actors with pretty great effect. JRuby (and the JVM) isn't without its faults but when it comes to concurrency it's the Ruby implementation to beat.

bodhi · on Dec 1, 2009

Just curious, for your MRI tests, were you using 1.8? You probably already know, but 1.8 has green threads but 1.9 threads are native.

Someone mentioned in a sibling that 1.9 still has a GIL, thus making concurrency a bit painful. I wonder if it is released when waiting on IO?

stephenjudkins · on Dec 1, 2009

Yeah, MRI is Ruby 1.8.

I imagine 1.9 would perform a great deal better. Since we're scraping entirely on single-CPU machines, the GIL probably wouldn't hurt us much. We rewrote the scheduler and our interface to the HTTP client using Scala, however, so now we're pretty wedded to the JVM.

However, if we were to use Ruby 1.9, we would have to spend more time investigating HTTP libraries. From our experience, it's worth it to have workers threads NEVER block directly on IO. We saw a huge increase in reliability by having worker threads (1) send a request to an event-based client, running in its own thread and (2) block on a future with a set timeout, waiting for the callback. Having individual threads talk directly to the sockets, in either MRI or Java, wasn't a promising approach. If sockets got wedged (and they do, even using very reputable HTTP libraries) the event-based approach keeps on humming, while the blocking approach grinds to a halt.

There exist several event-based HTTP clients for Java that all work well (though some significantly better than others). Compare that with Ruby, where we couldn't find anything that mature.

That's not to mention all the concurrency primitives available in Scala, Clojure, and the underrated java.util.concurrent.

Even if Ruby 1.9 might look much more competitive to JRuby in synthetic tests of concurrent performance, in the real world the Java ecosystem features tons of libraries that helps one write reliable, predictable code quickly.

steveklabnik · on Dec 1, 2009

MRI is Ruby 1.8.x, and YARV (or KRI) is Ruby 1.9.x. "MRI 1.9" is a misnomer.

haasted · on Nov 30, 2009

The article's title is a bit unfortunate. The article is actually a long and very interesting summary of why the Java Virtual Machine is also a great platform for other languages. The title hints more in the direction of an anecdote.

ZeroGravitas · on Dec 1, 2009

Some of it seems a bit ill informed and fanboy-ish though.

I'm not a JVM expert but claiming, for example, that "Hotspot [is] available wherever Java is available" is clearly false. Maybe he means JIT when he says Hotspot, but that's not true either.

The Zero and Shark projects are currently trying to bring OpenJDK and then a JIT to the platforms that Sun doesn't directly support (and even then doing it in a hacky way, as doing it for real i.e. rewriting Hotspot would be too much work):

http://icedtea.classpath.org/wiki/ZeroSharkFaq

And I'd love someone knowledgeable to compare the "free upgrades" you get from JVM updates to the same effect you get from GCC or LLVM improvements, Profile Guided Optimisation or updates in underlying libraries and OSes.

In general the summary could be "We build our wacky idiosyncratic language on top of a mature, well-engineered, portable system that was designed and built to do something else". Matz also built his Ruby on top of mature, well-engineered portable systems (Unix etc.) and while Ruby is cool and benefitted from building on that base, it wasn't magic pixie dust (or people wouldn't be so keen on JRuby).

old-gregg · on Nov 30, 2009

Looking at the graphs: would I trade 2x gain in eventual performance for a 20x gain in startup speed, 3x leaner RAM consumption and great immediate performance? I wasn't sure, but after 2 months with JRuby - yes I would.

My vote goes to Ruby 1.9.4 coupled with speedy and efficient C extensions.

qw · on Nov 30, 2009

"2x gain in eventual performance"

It only takes 2 seconds before JRuby catches up with 1.9.2. That's not long, so unless your program only runs for a few seconds, the wait is worth it

old-gregg · on Nov 30, 2009

The way I see it, languages like Ruby aren't meant for high performance computing: one can pick from a dozen statically typed languages to implement time-critical algorithms. The power and beauty of Ruby is its flexibility, "fluidity", and ability to glue services provided by an operating system and a million OS-native modules. Just look into your /usr/lib.

JRuby lacks that: it only "glues" JARs together and it introduces a shockingly huge startup lag which makes your development feel like you're compiling C++ code between test runs. And even gluing JARs is kind of pointless: Java itself, with its plethora of autocompleting IDEs, is a much better environment to learn and experiment with APIs like Batik or Apache POI.

megaduck · on Dec 1, 2009

Perhaps JRuby simply isn't for you. We're using JRuby with Rails, and the tradeoffs are well worth it. Performance is quite good, and the laggy startup is irrelevant to a long-running server process like Rails.

More critically, our development time is significantly reduced by using a highly expressive language like Ruby. Code is more readable, more understandable, and a heck of a lot faster to write. Plus, we can still use all the neat Java libraries like Jetty, Lucene, and JavaMail. Add in things like java NIO, and it matches our needs nicely.

While JRuby might not suit your particular application, it's doing wonderful, useful work for a lot of people. Horses for courses, as they say.

sshconnection · on Nov 30, 2009

Ruby may not be the fastest language in the world, but that doesn't mean that we shouldn't strive for better performance. As far as startup times are concerned, have you tried using nailgun?

http://blog.headius.com/2009/05/jruby-nailgun-support-in-130...

tmountain · on Nov 30, 2009

One exception would be in multi-threaded applications. JRuby is the best choice in this domain from my experience. Ruby 1.9 still uses a GIL making JRuby a better choice for concurrent software.

megaduck · on Dec 1, 2009

Seconded.

One additional advantage is JRuby's ability to use structures like Clojure's concurrency primitives. AFAIK, there's nothing even close in the world of MRI and YARV.

Freaky · on Dec 1, 2009

JRuby's startup speed isn't that much different if you're actually running something fairly heavy, like a web framework; it's not like Rails starts up in 200ms on MRI and 4000ms on JRuby, both have you waiting for a few seconds. With source reloading they're not that different in practice.

Sure, leaner RAM consumption is nice, but so is stable RAM consumption. It's not uncommon to see a Rails app on MRI eventually grow from, say, 70MB to 300MB+ and never shrink thanks to heap fragmentation and gc confusion, while JRuby stabilizes at maybe 100MB.

Sure, I also have MRI services which sit stable at 8-15MB for 6 months or more and JRuby's not exactly going to help there, and for short running interactive tasks it's certainly not going to be my first choice, but having the choice is very, very good.