How I Updated jekyll/classifier-reborn for Ruby 312 Jul 2022
A couple months ago, I discovered that the classifier-reborn gem (a popular Jekyll plugin to group related posts) was essentially incompatible with Ruby 3. I use Jekyll to build some of my own websites (including this one), and I wanted to be able to upgrade to Ruby 3 while continuing to use classifier-reborn. Although I was initially frustrated that classifier-reborn didn’t yet support Ruby 3 out of the box, I realized that I might be able to implement Ruby 3 support myself and contribute back to the project since classifier-reborn is, after all, open source software. The pull requests I submitted turned out to be some of my biggest open-source contributions yet, and I’m proud of the work I did!
Classifier-reborn is a gem that populates Jekyll’s related_posts using latent semantic indexing. It works well, but performing LSI requires computationally intensive linear algebra, and that’s very slow in pure Ruby. Fortunately, it can be made much faster (by a factor of 100x or more) using a linear algebra implementation in C. Using classifier-reborn with Jekyll in pure Ruby has always been prohibitively slow, but if you installed the Gnu Scientific Library and the gsl gem, LSI could be performed in only a couple seconds for small- to medium-sized blogs.
Unfortunately, the gsl gem has been unmaintained for several years. Someone merged a pull request that makes it compatible with Ruby 3, but no version including those changes has ever been released (nearly a year later), so there’s no way to use the gem with Ruby 3 unless you point to an unreleased commit hash on GitHub. And the absence of a Ruby version of the gsl gem is problematic for classifier-reborn because LSI is unusably slow without the speedup provided by the gem.
Of course, the simple solution is “just keep using Ruby 2”, but that’s an increasingly difficult option. Ruby 2.7 is scheduled to reach its end of life in March 2023. It’s also increasingly difficult to build Ruby 2 on modern systems. Ruby 2 depends on OpenSSL 1.1, which reaches end of life in September 2023. And OpenSSL 1.1 has already been removed from the default Ubuntu repositories for 22.04 LTS. All of this means that you can’t even build Ruby 2 on Ubuntu 22.04 without first finding and installing the OpenSSL 1.1 library. In short, it’s increasingly important that Jekyll and its dependencies work with Ruby 3 if it’s going to be easy to use on the most current operating systems.
I wanted to use Jekyll with classifier-reborn in Ruby 3 so I started looking for a solution. I enjoy working in Ruby and I’m very comfortable in that language because I’ve used it professionally for years. At first, I looked into optimizing the Ruby implementation of LSI in classifier-reborn. But I soon realized this approach would be untenable – the slow part of the code was the part that performs singular value decomposition on a matrix, and the best way to optimize that is to use a fast linear algebra library like GSL. But the gem that interfaces with GSL is unmaintained. So I started looking for alternatives to the unmaintained gsl gem that would be compatible with Ruby 3.
I discoverd that the gsl gem depends on the narray gem, but the narray gem is deprecated. It recommends using numo-narray instead. Reading about this gem led me to the numo-linalg gem, which provides a fast linear algebra implementation that’s compatible with Ruby 3! I wondered if it would be possible to modify classifier-reborn to use numo-linalg instead of the gsl gem…
I opened the source code for classifer-reborn and started poking around. My
goal was to find all the usages of the gsl gem and see if they could be replaced
with numo-linalg. Lucky for me, there weren’t too many places to update. I
convinced myself that it was probably possible to replace
by the gsl) with
Numo::Linalg.svd (provided by numo-linalg) since both
accepted similar input parameters and produced similar outputs.
If I was going to put in the effort to tackle this problem, I wanted to make sure I’d be able to share my work with others. I decided it would be best to start collaborating before writing too much code rather than trying to convince someone to merge code I’d already written, so I asked for some input and support in this GitHub issue (where another developer had already asked about the possibility of using Numo in classifier-reborn). I tried to communicate where I was headed (update CI/tests, implement numo-linalg option, and release a new classifier-reborn version) so there wouldn’t be any surprises. I was happy to receive a quick reply from one of the Jekyll maintainers to let me know that he’d help get my code merged and released!
I wanted to be cautious with my development and I planned to rely heavily on tests to let me know if I broke anything along the way. Unfortunately, classifier-reborn hadn’t been heavily worked on since about 2017, so the CI setup was a little outdated. It was testing against EOL versions of Ruby and still used TravisCI config from before they stopped providing free open source builds. To get CI running again, so my first PR migrated off TravisCI to GitHub Actions. In that PR, my goal was just to get the tests running on current versions of Ruby to validate the changes I planned to make. I followed that up with a second PR to test both the pure Ruby and GSL code paths – previously, you could run tests using GSL locally but they didn’t run in CI. After those two PRs merged, I had some guard rails in place and I was ready to try implementing LSI with numo-linalg.
This PR contains the
code changes that make classifier-reborn work with Numo. I began by adding
code to load the numo-linalg gem if it’s installed. Classifier-reborn already
had a mechanism to try to load the gsl gem before falling back to pure Ruby, so I
followed a similar pattern to try to load numo-linalg before falling back to gsl and
then pure Ruby. With the numo-linalg gem being loaded, I implemented a code
branch to replace the GSL
SV_decomp call with a call to
Once that was done, I relied heavily on the existing tests. Several tests failed
when I first ran the suite with the numo-linalg gem, and the failing tests
highlighted all the areas of the code that needed to be updated. The existing
tests had great coverage, which I was thankful for. I used Ruby’s debugging
tools to help me understand and fix the failing test cases one by one.
Ultimately, the linear algebra operations were pretty well encapsulated, so I
only had to change about a hundred lines of code!
Jekyll with classifier-reborn and Numo
My efforts were successful, and classifier-reborn 2.3.0 was relased on July 12, 2022 with support for Ruby 3 via the numo-linalg gem! Of course, open source usually isn’t a solo effort, and I owe some thanks to mattr-, who helped get my code merged and released, as well as to everyone who built Jekyll, classifier-reborn, Numo, and LAPACK.
Want to use Jekyll and classifier-reborn with Ruby 3 yourself? Before you can
install and use the numo-linalg gem, you’ll need to install
LAPACK on your system. On Ubuntu, you can do
sudo apt-get install liblapacke-dev libopenblas-dev. Once that’s
done, just add
numo-linalg to your Gemfile. If your Gemfile
classifier-reborn will use Numo to perform singular value decomposition, and
you’ll be able to use Jekyll with
related_posts in Ruby 3!