Rodrigo Rosenfeld Rosas

Running Java from MRI Ruby through DRb

Tue, 16 Jul 2013 11:17:00 +0000 (Updated at Thu, 16 Jan 2014 15:00:00 +0000)

Important update: After I wrote this article I tried to put it to work in my real application and noticed that it can't really work the way I described due to issues with objects referenced only in the DRb client side being garbage collected in the DRb server side since no references are kept for them in the server-side. I'm keeping this article anyway to explain the idea in the hope we could find a way to work around the memory management issue at some point.

Motivation

In a Ruby application I maintain, we have the requirement of exporting some statistics to XLS (not XLSX) and we had to modify a XLS template for doing that.

After searching the web I couldn't find a Ruby library that would do the job, but I knew I could count on the Apache POI java library.

MRI Ruby doesn't have native support for using Java libraries so we have to either use JRuby or some Inter-Process Communication (IPC) approach (I consider hosting a service over HTTP as another form of IPC).

I've already used JRuby for serving my web application in the past and we had some good result, but our application is currently running fine on MRI Ruby 2. I don't want to use JRuby for deployment only to enable me to use Java libraries. Sometimes we'll re-run some stress tests to test the throughput of our application using several deployment strategies, including using JRuby instead of MRI, in threaded mode (vs the multi-process and multi-threaded approaches with MRI), testing several web servers for each Ruby implementation.

Last time we run our stress tests, Unicorn was a bit faster to serve our pages when compared to using JRuby on Puma, but that wasn't the main reason why we chose Unicorn. We had some issues with some connections to PostgreSQL with JRuby by that time and we didn't want to investigate it further, specially when we didn't notice any advantages in the JRuby deployment for that time.

Things may have changed today but we don't plan to run another battery of stress tests in the short-run... I just wanted to find another way of having access to Java libraries that wouldn't attach our application to JRuby in any way. Even when we used to deploy with JRuby, all our code ran in MRI and we used MRI to actually run the tests and also in development mode since it's much faster to boot and allow faster testing through some forking techniques (spork, zeus, etc).

I didn't want to add much overhead either, by providing some HTTP service. The overhead is not only in the payload but also in the development work-flow.

What I really wanted was just a bridge that would allow me to run Java code from MRI Ruby, since I'm more comfortable with writing code with Ruby and my tests run faster on MRI rather than JRuby.

So, the obvious choice (at least for me), was to try DRb.

DRb to the rescue

Even after deciding for DRb, you may implement the service with multiple approaches. The simplest one is probably to write the service in JRuby and only access the higher-level interface from the MRI application.

That works but I wanted to avoid this approach for some reasons:

  • tests would run slower when compared to MRI due to increased boot time for the JVM (main reason)
  • we'd need to switch applications every time we wanted to work on the Java-related code (we don't use an IDE, but still, in Vim, that means ':lcd ../jruby-app')
  • Rails already provides us automatic code reloading out-of-the box for our main application, while we'd have to be constantly rebooting the JRuby application after each change or implement some auto-reloading code ourselves

So, I wanted to test another minimal approach that would only allow us to perform any generic JRuby programming directly from MRI.

Dependencies management, Maven and jbundler

Note: for this section, I'm assuming JRuby is being used. With RVM that means "rvm jruby".

Christian Meier did a great job with jbundler, a tool similar to Bundler, that will use a Jarfile instead of the Gemfile to specify the Maven dependencies.

So, basically, I created a new Gemfile with bundle init and added a gem 'jbundler' entry to it.

Then I created a Jarfile with this content: jar 'org.apache.poi:poi'. Run bundle exec jbundle and you're ready to go. Running jbundle console will provide an IRB session with the Maven libraries available.

To create a script, you add a require 'jbundler' statement and you can now run it with bundle exec ruby script-name.rb.

The DRb server

So, this is how the JRuby server process looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# java_bridge_service.rb:

POI_SERVICE_URL = "druby://localhost:8787"

require 'jbundler'
require 'drb/drb'
require 'ostruct'

class JavaBridgeService
  def run(code, _binding = nil)
    _binding = OpenStruct.new(_binding).instance_eval {binding} if _binding.is_a? Hash
    result = if _binding
      eval code, _binding
    else
      eval code
    end
    result.extend DRb::DRbUndumped if result.respond_to? :java_class # like byte[]
    result
  end

end

puts "listening to #{POI_SERVICE_URL}"
service = DRb.start_service POI_SERVICE_URL, JavaBridgeService.new

Signal.trap('SIGINT'){ service.stop_service }

DRb.thread.join

Security note

This is all you need to run arbitrary Ruby code from MRI. Since this makes use of eval, I'd strongly recommend you use this server in a sandbox environment.

The client code

I won't show the full classes we have for communicating with the server since they are implementation details and people will want to organize it in different ways. Instead I'll provide some scripting code that you may want to run in an IRB session to test the set-up:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
require 'drb/drb'

DRb.start_service

service = DRbObject.new_with_uri 'druby://localhost:8787'

[
  'java.io.FileInputStream',
  'java.io.FileOutputStream',
  'java.io.ByteArrayOutputStream',
  'org.apache.poi.hssf.usermodel.HSSFWorkbook',
].each{|java_class| service.run "import #{java_class}"}

workbook = service.run 'HSSFWorkbook.new FileInputStream.new(filename)',
      filename: File.absolute_path('template.xls')

sheet = workbook.sheet_at 0
row = sheet.create_row 0
# row.create_cell(0) will display a warning in the server-side since JRuby can't know if you want to use the
# short or int method signature
cell = service.run 'row.java_send :createCell, [Java::int], col', row: row, col: 0
cell.cell_value = 'test'

# export it to binary data
result = service.run 'ByteArrayOutputStream.new'
workbook.write result

# ruby_data is what you would be passing to send_data in controllers:
ruby_data = service.run('ByteArrayInputStream.new baos.to_byte_array', baos: result).to_io

# or, if you want to export it to some file:
os = service.run 'FileOutputStream.new filename', filename: File.absolute_path('output.xls')
workbook.write os

Conclusion

By using such a generic Java bridge, we're able to use several good Java libraries directly from MRI code.

Troubleshooting

If you're having any issues with trying that code (I haven't actually tested the code in this article), please leave a note in the comments and I'll fix the article. Also, if you have any questions, create a comment and I'll try to help you.

Or just feel free to thank me if this helped you ;)

comments powered byDisqus