Rodrigo Rosenfeld Rosas
Akita's Manga Downloadr Elixir vs Ruby performance revisited
Two weeks ago I read an article from Fabio Akita comparing the performance of his Manga Downloadr implementations in Elixir, Crystal and Ruby.
From a quick glance at its source code it seems the application consisted mostly of downloading multiple pages and another minor part would take care of parsing the HTML and extracting some location paths and attributes for the images. At least, this was the part that was being tested in his benchmark. I found it very odd that the Elixir version would finish in about 15s while the Ruby version would take 27s to complete. After all, this wasn't a CPU bound application but an I/O bound one. I would expect that the same design implemented in any programming language for this kind of application should take about the same time in whatever chosen language. Of course the HTML parser or the HTTP client implementations use on each language could make some difference but the Ruby implementation took almost twice the time taken by the Elixir implementation. I was pretty much confident it had to be a problem with the design rather than a difference in the raw performance among the used languages.
I had to prepare a deploy for the past two weeks which happened last Friday. Then on Friday I decided to take a few hours to understand what the test mode was really all about and rewrote the Ruby application with a proper design for this kind of application taking Ruby's limitations (specially MRI's ones) in mind with focus on performance.
The new implementation can be found here on Github.
Feel free to give it a try and let me know if you can think of any changes that could potentially improve the performance in any significant way. I have a few theories my self, like using a SAX parser rather than performing the full parsing, among a few other improvements I can think of, but I'm not really sure whether the changes would be significant given that most of the time is actually spent on network data transfer using a slow connection (about 10MBbps in my case), if we compare to the time needed to parse those HTMLs.
So, here are the numbers I get with a 10MBps Internet connection and an AMD Phenom II X6 1090T, with 6 cores at 3.2GHz each:
- Elixir: 13.0s (best time, usually ranges from 13.0-16s)
- JRuby: 12.3s (best time, usually ranges from 12.3-16s)
- MRI: 10.9s (best time, usually ranges from 10.9-16s)
As I suspected, they should perform about the same. JRuby needs 1.8s just to boot the JVM ( measured with time jruby --dev -e ''), which means it actually takes about the same as MRI if we don't take the boot time into consideration (which is usually the case when the application is running a long-lived daemon like a web server).
For JRuby threads are used to handle concurrency while in MRI I was forced to use a pool of forked processes to handle HTML parsing and write some simplified Inter-Process Communication (IPC) technique which is suitable for this particular test case but may not apply to others. Writing concurrent code in Ruby could be easier but for MRI it's specially hard once you want to use all cores because I find it much easier to write multi-threaded code than to deal with forked processes and special IPC that is not as trivial to write as using threads that share the same memory. You are free to test the performance of other approaches in MRI, like the threaded one, or always forking rather than using a pool of forked processes, changing the amount of workers both for the downloader as well as for the forked pool (I use 6 processes in the pool that parses the HTML since I have 6 cores in my CPU).
I have always been disappointed by the sad state of real concurrency in MRI due to the GIL. I'd love to have a switch to disable the GIL completely so that I would able to benchmark the different approaches (threads vs forks). Unfortunately, this is not possible in MRI or JRuby because MRI has the GIL and JRuby doesn't handle forking well. Also, Nokogiri does not perform the same in MRI and JRuby, which means there are many other variables involved that running an application using forks in MRI cannot be really compared to run it against JRuby using the multi-threaded approach because the difference in the design is not the only one happening.
When I really need to write some CPU bound code that would benefit from running on all cores I often do it in JRuby since I find it easier to deal with threads rather than spawn processes. Once I had to create an application similar to Akita's Manga Downloader in test mode and I wrote about how JRuby saved my week exactly due to it enabling real concurrency. I really think MRI team should take real concurrency needs more seriously or it might become irrelevant in the languages and frameworks war. Ruby usually gives us options, but we don't really have an option to deal with concurrent code in MRI as the core developers believe forking is just fine. Since Ruby usually strives for its simplicity I find this awkward since it's usually much easier to write multi-threaded code than dealing with spawn processes.
Back to the results of the timing comparison between Elixir and Ruby implementations, of course, I'm not suggesting that Ruby is faster than Elixir. I'm pretty sure the design of the Elixir implementation can be improved as well to get a better time. I'm just demonstrating that for this particular use case of I/O bound applications the raw language performance usually does not make any difference given a proper design. The design is by far the most important feature when working on performance improvements of I/O bound applications. Of course it's also important for CPU bound applications, but what I mean is that the raw performance is often irrelevant for I/O bound applications while the design is essential.
So, what's the point?
There are many features one can use to sell another language but we should really avoid the trap of comparing raw performance because it hardly matter for most of the applications web developers work with, if they are the target audience. I'm pretty sure Elixir has great sell points, just like Rust, Go, Crystal, Mirah and so on. I'd be more interested in learning about the advantages of their eco-systems (tools, people, libraries) and how they allow to write good designed software in a better way. Or how they excel in exceptions handling. Or how easy it is to write concurrent and distributed software with them. Or how robust and fault tolerant they are. Or how they can help getting zero down-times during deploy, or how fast the applications would boot (this is one of the raw performance cases where it can matter). How well documented they are and how amazing are their communities. How one can easily debug and profile applications in these environments or how easily they can test something in a REPL, or write automated tests, manage dependencies. How well autoreloading work in the development mode and so on. There are so many interesting aspects of a language and its surrounding environment that I find it frustrating every time I see someone trying to sell a language by comparing the raw performance as it often does not matter in most cases.
Look, I've worked with fast hard real-time systems (running on Linux with real-time patches such as Xenomai or RTAI) during my master thesis and I know that raw performance is very important for a broad set of applications, like Robotics, image processing, gaming, operating systems and many others. But we have to understand whom we are talking to. If the audience is web development raw performance simply doesn't matter that much. This is not the feature that will determine whether your application will scale to thousands of requests per second. Architecture/design is.
If you are working with embedded systems or hard real time systems it makes sense to use C or some other language that does not rely on garbage collectors (as it's hard to implement a garbage collector with hard timing constraints). But please forget about raw performance for the cases where it doesn't make much difference.
If you know someone who got a degree in Electrical Engineering, like me, and ask them, you'll notice it's pretty common to perform image processing in Matlab, which is an interpreted language and environment to prototype algorithm designs. It's focused on operations involving matrix and they are pretty fast since they are compiled and optimized. Which allows engineers to quickly test different designs without having to write each variation in C. Once they are happy with the design and performance of the algorithm they can go a step further and implement it in C or use one of the Matlab tools that would try to perform this step automatically.
Engineers are very pragmatic. They want to use the best tools for their jobs. That means a scripting language should be preferred over a static one during the design/prototype phase as it allows faster feedback and iterative loop. Sometimes the performance they get with Matlab is simply fast enough for their needs. The same happens with Ruby, Python, JS and many other languages. They could be used for prototypes or they could be enough for the actual application.
Also, one can start with them and once the raw performance becomes a bottleneck they are free to convert that part to a more efficient language and use some sort of integration to delegate the expensive parts to them. If there are many parts of the application that would require such approach to be taken, then it becomes a burden to maintain it and one might consider moving the complete application to another language to reduce the complexity.
However, this is not my experience with web applications in all past years I've been working as a web developer. Rails usually takes about 20ms per request as measured by nginx in production while DNS, network transfer, JS and other related jobs may take a few seconds which means the 20ms spent in the server is simply irrelevant. It could be 0ms and it wouldn't make any difference to the user experience.