Rodrigo Rosenfeld Rosas

A Review of Code Reloaders for Ruby

Mon, 18 Jul 2016 13:59:00 +0000 (last updated at Mon, 18 Jul 2016 15:15:00 +0000)

When we are writing a service in Ruby, it's super useful to have the ability to automatically change its behavior to conform the latest changes to the code. Otherwise we'd have to manually restart the server after each change. This would slow down a lot the development flow, specially if the application takes a while before it's ready to process next request.

I guess most people using Ruby are writing web applications with Rails. Many don't notice that Rails supports auto code reloading out of the box, through ActiveSupport::Dependencies. A few will notice it once they are affected by some corner case where the automatic code reloading doesn't work well.

Another feature provided by Rails is the ability of automatic loading files if the application follows some conventions, so that the developer is not forced to manually require some code's dependencies. Another benefit is that this behavior is similar to Ruby's autoload feature, which purpose is to speed up the loading time of applications by avoiding to load files the application won't need. Matz seems to dislike this feature and discouraged its usage 4 years ago. Personally I'd love to see autoload gone as it can cause bugs that are hard to track. However, loading many files in Ruby is currently slow even if simply loading them from disk would be pretty fast. So, I guess Ruby would have to provide some sort of pre-compiled files support before deprecating autoload so that we wouldn't need it for the purpose of speeding up the start-up time.

Since automatic code reloading usually works well enough for Rails applications, most people won't research about code reloaders until they are writing web apps with other frameworks such as Sinatra, Padrino, Roda, pure Rack, whatever.

This article will review generic automatic code reloaders, including ActiveSupport::Dependencies, but leaving specific ones out of the scope, like Sinatra::Reloader and Padrino::Reloader. I've not checked Ruby version compatibility of each one, but all of them work on latest MRI.

Rack::Reloader

Rack::Reloader is bundled with the rack gem. It's very simple but it's only suitable for simple applications in my opinion. It won't unload constants, so if you remove some file or rename some class the old ones will still be available. It works as a Rack middleware.

One can provide the middleware a custom or external back-end, but I'll only discuss the default one, which is bundled with Rack::Reloader, called Rack::Reloader::Stat.

Before each request it traverse $LOADED_FEATURES, skipping .so/bundle files and call Kernel.load on each file that has been modified since the last request. Since config.ru is loaded rather than required it's not listed in $LOADED_FEATURES so it will be never reloaded. This means that the app's code should live in another file required in config.ru rather than living directly in config.ru. It worth mentioning that because I've been bitten by this more than once while testing Rack::Reloader.

Differently from the Rails approach, any changed file will be reloaded even if you modify some gem's source.

Rack::Reloader issues

I won't discuss performance issues when there are many files loaded because one could provide another back-end able to track files changes very quickly and because there are more important issues affecting this strategy.

Suppose your application has some code like this:

1require 'singleton'
2class MyClass
3 include Singleton
4 attr_reader :my_flag
5 def initialize
6 @my_flag = false
7 end
8end

Calling MyClass.instance.my_flag will return false. Now, if you change the code so that @my_flag is assigned to true in "initialize" MyClass.instance.my_flag will still return false.

Let's investigate another example where Rack::Reloader strategy won't work:

1# assets_processor.rb
2class AssetsProcessor
3 @@processors = []
4 def self.register
5 @@processors << self
6 end
7
8 def self.process
9 @@processors.each :&do_process
10 end
11end
12
13# assets_compiler.rb
14require_relative 'assets_processor'
15class AssetsCompiler < AssetsProcessor
16 register
17
18 def self.do_process
19 puts 'compiling assets'
20 end
21end
22
23# gzip_assets.rb
24require_relative 'assets_processor'
25class GzipAssets < AssetsProcessor
26 register
27
28 def self.do_process
29 puts 'gzipping assets'
30 end
31end
32
33# app.rb
34require_relative 'assets_compiler'
35require_relative 'gzip_assets'
36class App
37 def run
38 AssetsProcessor.process
39 end
40end

Running App.new.run will print "compiling assets" and then "gzipping assets". Now, if you change assets_compiler.rb, it will also print "compiling assets" once more the next time it's called.

This applies to all situations where a given class method is supposed to be run only once or when the order of files load matter. For example, suppose AssetsProcessor.register implementation is changed in assets_processor.rb. Since register was already called in its subclasses that means the change won't take effect in them since only assets_processor.rb will be reloaded by Rack::Reloader. Other reloaders discussed here also suffer with this issue but they provide some work-arounds for some of them.

rerun and shotgun: the reload everything approach

Some reloaders like rerun and shotgun will simply reload everything on each request. They fork at each request before requiring any files, which means those files are never required in the main process. Due to forking it won't work on JRuby or Windows. This is a safe approach when using MRI on Linux or Mac though. However, if your application takes a long time to boot then your requests would have a big latency during the development mode. In that case, if the reason for the slow start-up lies in the framework code and other external libraries rather than the app specific code, which we want to be reloadable, one can require them before forking to speed it up.

This approach is a safe bet, but unsuitable when running on JRuby or Windows. Also if loading all app's specific code is still slow, one may be interested in looking for faster alternatives. Besides that, this latency will exist in development mode for all requests even if no files have been changed. If you're working on performance improvements other approaches will yield to better results.

rack-unreloader

rack-unreloader takes care of unloading constants during reload, differently from Rack::Reloader.

It has basically two modes of operation. One can use "Unreloader.require('dep'){['Dep', ...]}" to require dependencies while also providing which new constants are created and those will be unloaded during reload. This is the safest approach but it's not transparent. For every required reloadable file we must manually provide a list of constants to be unloaded. On the other side this is the fastest possible approach since the reloader doesn't have to try to figure out those constants automatically, like other options that will be mentioned below do. Also, it doesn't override "require", so it's great for those that don't want any monkey patching. Ruby currently does not provide a way to safely discover those constants automatically without monkey patching require, so rack-unreloader is probably the best you can get if you want to avoid monkey patches.

The second mode of operation is to not provide that block and Unreloader will look at changes to $LOADED_FEATURES before and after the call of Unreloader.require to figure out which constants the required file define. However, without monkey patching "require" this mode can't be reliable, as I'll explain in the sub-section below.

Before getting into it, there's another feature of rack-unreloader that speed up reloading by only reloading the changed files, differently from other options I'll explore below in this article. However, reloading just changed files is not always reliable as I've discussed in the Rack::Reloader Issues section.

Finally, differently from other libraries, rack-unreloader actually calls "require" rather than "load" and deletes the reloaded files from $LOADED_FEATURES before the request so that calling "require" will actually reload the file.

rack-unlreloader Issues

It's only reliable if you always provide the constants defined on each Unreloader.require() call. This is also the fastest approach. It may be a bit boring to write code like this. Also, even in this mode, it's only reliable if your application works fine regardless of the order each file is reloaded (I've shown an example in the Rack::Reloader Issues section demonstrating how this approach is not reliable if this is not the case).

Let's explore why the automatic approach is not reliable:

1# t.rb:
2require 'json'
3module T
4 def self.call(json)
5 JSON.parse(json)
6 end
7end
8
9# app.rb:
10require 'rack/unreloader'
11require 'fileutils'
12Unreloader = Rack::Unreloader.new{ T }
13Unreloader.require('./t.rb') # {'T'} # providing the block wouldn't trigger the error
14Unreloader.call '{}'
15FileUtils.touch 't.rb' # force file to be reloaded
16sleep 1 # there's a default cooltime delay of 1s before next reload
17Unreloader.call '{}' # NameError: unitialized constant T::JSON

Since rack-unreloader does not override "require" it can't track which files define which constants in a reliable way. So, it thinks 't.rb' is responsible for defining JSON and will then unload JSON (which has some C extensions which cannot be unloaded). This also affects JRuby if the file imports some Java package among other similar cases. So, if you want to work with the automatic approach with rack-unreloader you'd have to require all those dependencies before running Unreloader.call. This is very error-prone, that's why I think it's mostly useful if you always provide the list of constants expected to be defined by the required dependency.

However rack-unreloader provides a few options like "record_dependency", "subclasses" and "record_split_class" to make it easier to specify the explicit dependencies between files so that the right files are reloaded. But that means the application author must have a good understanding on how auto-reloading works, how their dependencies work and will also require them to fully specify the dependencies. It can be a lot of work but it may worth in the case reloading all reloadable files can take a lot of time. If you're looking for the fastest possible reloader than rack-unreloader may well be your best option.

ActiveSupport::Dependencies

Now we're talking about the reloader behind Rails, which is great and battle tested and one of my favorites. Some people don't realize it's pretty simple to use it outside Rails, so let me demonstrate how it can be used since it seems it's not widely documented.

Usage

1require 'active_support' # this must be required before any other AS module as per documentation
2require 'active_support/dependencies'
3ActiveSupport::Dependencies.mechanism = :load # or :require in production environment
4ActiveSupport::Dependencies.autoload_paths = [__dir__]
5
6require_dependency 'app' # optional if app.rb defines App, since it also supports autoloading
7puts App::VERSION
8# change version number and then:
9ActiveSupport::Dependencies.clear
10require_dependency 'app'
11puts App::VERSION

Or, in the context of a Rack app:

1require 'active_support'
2require 'active_support/dependencies'
3if ENV['RACK_ENV'] == 'development'
4 ActiveSupport::Dependencies.mechanism = :load
5 ActiveSupport::Dependencies.autoload_paths = [__dir__]
6
7 run ->(env){
8 ActiveSupport::Dependencies.clear
9 App.call env
10 }
11else
12 ActiveSupport::Dependencies.mechanism = :require
13 require_relative 'app'
14 run App
15end

How it works

ActiveSupport::Dependencies has a quite complex implementation and I don't really have a solid understanding of it so please let me know about my mistakes in the comments section so that I can fix them.

Basically it will load dependencies in the autoload_paths or require them depending on the informed mechanism. It keeps track of which constants are added by overriding "require". This way it knows that JSON was actually defined by "require 'json'" if it's called by "require_dependency 't'" and would detect that T was the new constant defined by 't.rb' and the one that should be unloaded upon ActiveSupport::Dependencies.clear. Also, it doesn't reload individual changed files only but unloads all reloadable files on "clear". This is less likely to cause problems as I've explained in previous section. It's also possible to configure it to use an efficient file watcher, like the one implemented by the 'listen' gem, which uses an evented approach using OS provided system calls. This way, one can skip the "clear" call if the loaded reloadable files have not been changed by speeding up the request even in development mode.

ActiveSupport::Dependencies supports a hooks system that allow others to observe when some files are loaded and take some action. This is specially useful for Rails engines when you want to run some code only after some dependency has been loaded for example.

ActiveSupport::Dependencies is not only a code reloader but it also implements an auto code loader by overriding Object's const_missing to automatically try to require code that would define that constant by following some conventions. For example, in the first time one attempts to use ApplicationController, since it's not defined, it will look in the search paths for an 'application_controller.rb' file and load it. That means the start-up time can be improved since we only load code we actually use. However this could lead to some issues that would make the application behave differently in production due to side effects caused by the order some files would be loaded. But Rails applications have been built around this strategy for several years and it seems such caveats have only affected a few people. Those cases can usually be worked around through "require_dependency".

If your code doesn't follow the naming convention it will have to use "require_dependency". This way, if ApplicationController is defined in controllers/application.rb, you'd use "require_dependency 'controllers/application'" before using it.

Why I don't like autoload

Personally I don't like autoloading in general and always prefer explicit dependencies in all my Ruby files, so even in my Rails apps I don't rely on autoloading for my own classes. The same applies for Ruby's built-in "autoload" feature. I've been bitten already by an autoload related bug when trying to use ActionView's number helpers by requiring the specific file I was interested in. Here's a simpler use case demonstrating the issue with "autoload":

1# test.rb
2autoload :A, 'a'
3require 'a/b'
4
5# a.rb
6require 'a/b'
7
8# a/b.rb
9module A
10 module B
11 end
12end
13
14# ruby -I . test.rb
15# causes "...b.rb:1:in `<top (required)>': uninitialized constant A (NameError)"

It's not quite clear what's happening here since the message isn't very clear about the real problem and it gets even more complicated to understand in a real complex code base. Requiring 'a/b' before requiring 'a' will cause a circular dependency issue. When "module A" is seen inside "a/b.rb", it doesn't exist yet and the "autoload :A, 'a'" tells Ruby it should require 'a' in that case. So, this is what it does, but 'a.rb' will require 'a/b.rb' which we were trying to load in the first place. There are other similar problems that are caused by autoload and that's why I don't use it myself despite the potential of loading the application faster. Ideally Ruby should provide support for some sort of pre-compiled (or pre-parsed) files which would be useful for big applications to speed up code loading since the disk I/O is not the bottleneck but the Ruby parsing itself.

ActiveSupport::Dependencies Caveats

ActiveSupport::Dependencies is a pretty decent reloader and I guess most people are just fine with it and its known caveats. However there are some people, like me, which are more picky.

Before I get into the picky parts, let's explore the limitations one has to have in mind when using a reloader that relies on running some file code multiple times. The only really safe strategy I can think of for handling auto-reloading is to completely restart the application or to use the fork/exec approach. They have their own caveat, like being slower than the alternatives, so it's always about trade-offs when it comes to auto-reloaders. Running some code more than once can lead to unexpected results since not all actions can be rolled back.

For example, if you include some module to ::Object, this can't be undone. And even if we could work around it, we'd have to detect such automatically which would perform so badly that it would be probably better to simply restart everything. This applies to monkey patching, to creating some constants in namespaces which are not reloadable (like defining JSON::CustomExtension) and similar situations. So, when we are dealing with automatic reloaders we should keep that in mind and understand that reloading will never be perfect unless we actually restart the full application (or use fork/exec). ActiveSupport::Dependencies provides some options as autoload_once_paths so that such code wouldn't be executed more than once but if you have to change such code then you'll be forced to restart the full application.

Also, any file actually required rather than loaded (either with require or require_relative) won't be auto-reloaded, which forces the author to always use require_dependency to load files that are supposed to be reloadable.

Here's what I dislike about it:

  • ActiveSupport::Dependencies is part of ActiveSupport and relies on some monkey patches to core classes. I try to avoid monkey patching core classes at all costs so I don't like AS in general due to its monkey patching approach;
  • Autoloading is not opt-in as far as I know, so I can opt out and I'd rather prefer to not using it;
  • Since some Ruby sources will make use of "require_dependency" and since some Rails related gems may rely on the automatic autoloading feature provided by ActiveSupport::Dependencies it forces applications to override "require" and use ActiveSupport::Dependencies even in production mode;
  • If your application doesn't rely on ActiveSupport then this reloader will add some overhead to the download phase of Bundler.

Conclusion

Among the options covered in this article, ActiveSupport::Dependencies is my favorite one although I would consider rerun or shotgun when running on MRI and Linux if the application starts quickly and I wouldn't have to work on performance improvements (in that case, it's useful to have the behavior of performing like in production when no files have been changed).

Basically, if your application is fast to load then it may make sense to start with rerun or shotgun since they are the only real safe bets I can think of.

However, I performed a few metrics in my application and decided it worth creating a new transparent reloader that would also fix some of the caveats I see in ActiveSupport::Dependencies. I wrote a new article about auto_reloader.

If you know about other automatic code reloaders for Ruby I'd love to know about them. Please let me know in the comments section. Also let me know if you think I misunderstood how any of those mentioned in this article actually works.

Powered by Disqus