Rodrigo Rosenfeld Rosas

Creating web app monoliths that boot instantly with Ruby

2024-10-25T20:35:00+00:00

What if I told you that it’s not only possible, but actually pretty simple, to write a huge monolith web app that boots in under a second using Ruby?

Also, it doesn’t matter how big it grows, it will still boot in under a second. So, what’s the catch? How can it work?

Well, the key thing to boot quickly, whichever stack you pick, is to lazily load as much as you can. If you architecture your web app to load everything lazily, then it will boot at no time.

This is a long article, feel free to jump ahead to the last section if you’re only curious about the solution.

Why does boot time matter?

You may have been wondering: “so, what’s the point of lazily loading everything? It’s still going to spend time loading slow code when serving a request!”. And you’re right! But it doesn’t mean that instant boot isn’t valuable at all.

If you’re adopting canary deployment, for example, it doesn’t really matter too much if your deployment takes 20 minutes or an hour to happen. Unless you need to deploy an urgent fix (security patch or severe bug), of course!

However, during the development of the application, the boot time matters a lot! I’ve worked in apps that would take several seconds to boot before I could start using the app and add features or fix bugs, that’s a terrible experience, but it’s not that much of an issue if it only happens once in a while and code reloading happens pretty fast.

Automated tests benefit a lot from fast boot

The main benefit of having quick boot though is the ability of running individual automated tests very quickly.

Imagine yourself hunting some hard-to-debug flaky test. You’re probably going to run it many many times until you can figure out what’s causing it to fail sometimes. This is how the process usually goes:

If you’re lucky, you’re able to reproduce by specifying the same seed reported by CI, so you run the test with the seed and confirm it’s failing locally too;
You make some changes to the code or to the test, in an attempt to detect the reason why it’s failing;
You run the test again;
Repeat steps 2 and 3 several times until you find the culprit and fix the flaky test.

If your test takes 30 secs just to boot your app before you test it, that means fixing the flaky test will take 10 minutes at least only running the tests if you repeat steps 2 and 3 20 times. Add to this the frustration of having to wait over 30 secs before you know if your last attempted change resulted in any progress.

What if your app boots instantly and only 3 secs are required to load all dependencies for the particular request you’re exercising in your test? Suddenly, if you repeat the steps 20 times it will only take 1 minute running your test, compared to the 10 minutes from the previous scenario. And you only need to wait for 3 secs before you know if your change resulted in any progress. And it feels even better in tests where loading all required dependencies takes less than a second.

That’s the main advantage of lazily loading your code!

Rake tasks also benefit from fast boot

If some of your Rake tasks rely on booting your app first, then running that task will take over 30 secs, if this is what it takes to boot your app. Even if your task doesn’t require all dependencies loaded during the boot process.

As an example, if you’re working on a Rails app and run the routes task, it will require the app to boot. It’s not exactly a Rake task but the idea is the same: you’re running a task that relies on your app being booted. If your app takes 15 secs to boot, bin/rails routes will also take 15 secs to run.

Automatic setup has drawbacks

I’ve been interviewing Ruby candidates in the past few weeks and I often ask them what they like in Rails and almost every candidate tells me that “things just work in Rails”. No configuration required. No manual setup. This is perceived by most people as a great advantage in Rails over the alternatives.

After all, it’s a great feeling to start working on some app you’ve never seen before and you know exactly where to look when you’re searching for controllers, models, routes, settings and tests, right?

Conventions are great, indeed, I agree. But sometimes people will abuse from automatic setup (in my opinion, of course).

Let’s take the web-console gem, bundled with Rails by default, as an example. Once you require the gem, it will install a bunch of hooks, by calling Rails::Railtie.initialize with blocks that will add a middleware to the app, among other initialization tasks. Rails engines assume the existence of a singleton Rails application (Rails.application). During the boot process, this singleton app will call those hooks registered by the required Rails engines, such as web-console.

For the sake of easiness, lots of Rails engines follow this pattern for automatic setup. So, instead of providing users with instructions on how to add the provided middlewares to the app, those engines will automatically add the middleware for you. All you have to do is bundle add web-console and you’re done. No need to perform any extra steps, super easy!

But such a feature doesn’t come for free. First, Rails will load all gems from Gemfile in the default and environment groups by default. The default group is the one containing all gems declared in the top-level group. You can create as many groups in Gemfile as you want and Rails will automatically load the default and “test” groups by default when you load the application in the test environment.

Rails calls Bundler.require(*Rails.groups) in config/application.rb, which loads all project’s gems by default for the selected environment. So, unless you create a separate group for gems you want to lazily load, you must explicitly add require: false to the gem declaration.

By the way, by no means I’m suggesting there’s a problem with the web-console railtie itself. After all, it’s only loaded in the development environment by default, and you can even configure it to mount in an arbitrary path if you dislike the default one (“/__web_console”). I’m more worried about the available pattern, which can be abused by Rails engines.

Automatic setup makes lazy loading nearly impossible sometimes

If your app is designed in a similar way as Rails engines work, and dependencies rely on the ability of hooking into the boot process, then you can’t lazily load that dependency if it has to be required during the boot phase.

The top-level app should have minimal dependencies

Rack allows applications to mount other Rack applications on top of it. That allows us to build a monolith consisting of many Rack apps, mounted on the top of the main app.

This is a good step in the direction of achieving instant boot, however, it’s not an enough condition. Unless you’re able to lazily load those apps, you still must load them with their dependencies, and you won’t achieve instant boot.

If you add a dependency to the top-level route definitions, it will add to the boot time. Unless you architect your app in a way that allows you to lazily load your dependencies, you can’t guarantee that it will boot instantly even if it gets huge with hundreds or thousands of dependencies.

For example, you can’t lazily load Devise in a Rails application (try it if you don’t believe me). So, even if you want to test some model, you have to pay the cost of loading Devise and all its dependencies because in Rails the models also require the application to be initialized (in a typical setup).

If you know how to design a Rails app to always initialize in under 2s, no matter how big it gets, please let me know in the comments. For the remainder of this article, I’m assuming it’s not currently possible with Rails, so I’m going to provide examples with alternative libraries.

Hands-on: let’s build a web application with instant boot

How fast is Ruby?

If our goal is to boot our app within a second, we must first check whether Ruby can run boot itself that fast:

/usr/bin/time ruby -e ''
        0.27 real         0.07 user         0.04 sys

Awesome! We have 730 ms left to use in our own code. Let’s add a few dependencies and see how much is left:

BOOTSNAP_CACHE_DIR=tmp/cache /usr/bin/time ruby -r bundler/setup -r bootsnap/setup -e ''
        0.59 real         0.24 user         0.16 sys

That’s fine, we still have 410 ms left for our code. Can we load Rails within 410 ms?

BOOTSNAP_CACHE_DIR=tmp/cache /usr/bin/time ruby -r bundler/setup -r bootsnap/setup \
    -r rails -r action_controller/railtie \
    -e 'class MyApp < Rails::Application; config.api_only = true; end; MyApp.initialize!'
        1.27 real         0.51 user         0.50 sys

Nope, if we really want to boot within a second, we can’t use Rails. Less than 2 secs is still good enough, however there are many more serious reasons why we can’t guarantee fast booting with big Rails apps, which I’ll explore in-depth in another article.

Bare Rack app example

In the beginning of the article I said it was not only possible, but simple to write a web app with instant boot in Ruby. Let’s start demonstrating how it can be achieved with a pure Rack app:

1	# config.ru
2
3	require "rack/builder"
4	app = Rack::Builder.new do
5	map "/up" do
6	run lambda { \|env\| [ 200, { 'content-type' => 'application/json' }, [ '{ "status": "ok" }' ] ] }
7	end
8
9	map "/heavy_app" do
10	run lambda { \|env\|
11	require_relative "config/environment"
12	Rails.application.call env
13	}
14	end
15	# as many map blocks here as big your monolith grows
16	end

As you can see, the idea is to split the huge app into many smaller apps. If you exercise the “/heavy_app” endpoint in some test, of course it will take longer to complete if the app responding to that endpoint is a heavy one. However, if you’re running some test to check the “/up” endpoint, then it will complete in under a second.

This is extremely useful when working on individual tests. Ideally we should be only loading what the test is exercising. That way, we avoid having to wait for over 10 seconds every time we need our app to boot even if we only require a few dependencies to test what we want.

You might be concerned that this could lead to uncaught bugs in production because we may have forgotten to require something and the app could break in production if some requests were made in a different order. Or maybe you’d just prefer to eagerly load everything in production, like Rails does by default.

While it’s simple to modify the app above to support that feature (and eager load the app when running all tests in CI), I’m assuming you can do that by yourself, and I’ll present yet another solution using a Roda app, adding auto-reloading and eager loading support.

A Roda app example

Roda is a library created by Jeremy Evans, the maintainer of the (not so) popular Sequel gem. I’ve written about it already years ago, so in this article I’m going to focus on the solution itself.

I know many of you enjoy automatic code loading, so I’ll add the zeitwerk gem to provide both auto-reloading and auto-loading features to this app, the same way it works in a standard Rails app. Personally I’m not a fan of autoloading, so I use the auto_reloader gem instead (authored by me, by the way), but it’s a matter of preference. I’ve written about it before too. It doesn’t really matter for the purpose of building a huge monolith with instant boot.

We’re also using a very useful Roda plugin called multi_run in this example.

Let’s start by adding all dependencies:

mkdir my-huge-app
cd my-huge-app
git init
bundle init
bundle add puma roda zeitwerk rack-test cucumber ostruct logger

In case you’re curious, some of the dependencies rely on ostruct and logger but don’t explicitly depend on them, and they are no longer bundled with newer Ruby releases.

Then we’re going to create an useful boot file that we can load in our tests too:

1	# boot.rb
2
3	require "bundler/setup"
4	ENV["BOOTSNAP_CACHE_DIR"] \|\|= "tmp/cache"
5	require "bootsnap/setup"
6	require "zeitwerk"
7
8	APP_ENV = ENV["RACK_ENV"] \|\| "development"
9	loader = Zeitwerk::Loader.new
10	loader.push_dir File.expand_path("app", __dir__)
11	loader.enable_reloading if APP_ENV == "development"
12	loader.setup
13
14	loader.eager_load if APP_ENV == "production" \|\| ENV["CI"]
15
16	AppLoader = loader

I’m keeping this example very simple, but in a real app you’d be interested in using a proper Settings class, in which you’d configure the wanted behavior by environment, such as enabling auto-reloading and eager-loading, for example.

Now, the top-level app:

1	# app/apps/main.rb
2
3	require "roda"
4
5	module Apps
6	class Main < Roda
7	plugin :json
8	plugin :multi_run
9
10	route do \|r\|
11	r.get "up" do
12	{ status: "ok", env: APP_ENV }
13	end
14
15	run_app "some_path", ->{ Apps::SomeApp }
16	run_app "another_path", -> { Apps::AnotherApp }
17
18	# or something like:
19	Dir[File.expand_path("config/subapps/*.rb", APP_ROOT)].each{ \|subapp\| load subapp }
20	end
21
22	def self.run_app(path, app)
23	if APP_ENV == "production" \|\| ENV["CI"]
24	request.run path, app[]
25	else
26	request.run path, ->(env) {
27	app[].freeze.app.call env
28	}
29	end
30	end
31	end
32	end

And finally the web server entry point:

1	# config.ru
2
3	require_relative 'boot'
4
5	if APP_ENV == "development"
6	run lambda { \|env\|
7	AppLoader.reload
8	Apps::Main.freeze.app.call env
9	}
10	else
11	run Apps::Main.freeze.app
12	end

As you can see, it’s a really simple setup that allows you to boot a huge app in under a second. Go ahead and add tons of gems to Gemfile and notice how the app will still boot instantly. Add as many apps to it, add some "sleep 10" statements to them, and you should still notice the immediate boot.

Testing the app

It’s now time to see one of the main benefits of this approach by testing it.

For the sake of simplicity, I’m using test/unit to test it, but you should get the same results with RSpec.

test/test_runner.rb:

1	require_relative "test_helper"
2
3	exit Test::Unit::AutoRunner.run(true, __dir__)

test/test_helper.rb:

1	require_relative "../boot"
2	require "test/unit"
3	require "rack/test"

test/integration/health_test.rb:

1	require_relative "../test_helper" # allows for `ruby test/integration/health_test.rb`
2
3	class HealthTest < Test::Unit::TestCase
4	include Rack::Test::Methods
5
6	def app
7	Apps::Main.freeze.app
8	end
9
10	def test_response_is_ok
11	get "/up"
12	assert last_response.ok?
13	end
14	end

You don’t need require_relative "../test_helper" in your tests if you’re just using ruby test/test_runner.rb, but it simplifies running those test files individually by simply calling ruby test/integration/health_test.rb instead of ruby test/run_test.rb --location test/integration/health_test.rb.

The test runs instantly:

/usr/bin/time ruby test/run_test.rb
Loaded suite test
Started
Finished in 0.008055 seconds.
-------------------------------------------------------------------------------------
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------------
124.15 tests/s, 124.15 assertions/s
        0.77 real         0.34 user         0.22 sys

Wow! That was fast! Not only does the application boot in less than a second, but we can also test it within a second.

How about Cucumber?

Testing straightforward requests with test/unit completes within a second, but can we also achieve subsecond testing with Cucumber? Well, let’s give it a try, shall we?

First we run:

bundle binstubs --all
bin/cucumber --init

Let’s edit features/support/env.rb:

1	require_relative '../../test/test_helper'
2
3	module CucumberApp
4	def app
5	@app \|\|= Apps::Main.freeze.app
6	end
7	end
8
9	World(Test::Unit::Assertions)
10	World(Rack::Test::Methods)
11	World(CucumberApp)

Let’s create a simple test (features/health.feature):

1	Feature: Health check
2
3	Scenario: Health check monitoring
4	Given a monitoring service checks for the application health state
5	Then it responds with the current status

And the step definitions for this test (features/step_definitions/health.rb):

1	Given('a monitoring service checks for the application health state') do
2	get "/up"
3	end
4
5	Then('it responds with the current status') do
6	assert last_response.ok?
7	end

Let’s run the test:

/usr/bin/time bin/cucumber
Using the default profile...
Feature: Health check

  Scenario: Health check monitoring                                    # features/health.feature:3
    Given a monitoring service checks for the application health state # features/step_definitions/health.rb:3
    Then it responds with the current status                           # features/step_definitions/health.rb:7

1 scenario (1 passed)
2 steps (2 passed)
0m0.013s
        1.07 real         0.53 user         0.32 sys

Wow, that was really close! Maybe it’s time to ask my company for a hardware upgrade.

Summary

In this article we’ve seen that with some simple architecture we can achieve instant boot while designing a Ruby web application, no matter how big it gets, as long as we lazily load our code. We’ve also seen how it allows us to run individual simple tests very quickly.

We have also briefly discussed how such an approach would be a challenge to implement in a Rails app. In my next article I’ll explore what we can do to improve the boot time of Rails applications by applying the same idea of lazy loading as much as we can.

Why is your Rails app boot slow?

2024-10-25T20:35:00+00:00

In my previous article we explored how we can build huge web app monoliths with Ruby that can complete simple request tests in under a second. In this article we’ll focus on Rails apps. We’ll identify what causes a big app to take a long time to boot and run simple tests and what we can do to improve the situation.

How fast can Rails boot?

Let’s start by investigating how fast we can expect a Rails app to boot. Let’s check against a minimal Rails app first:

1	require 'bundler/inline'
2
3	ENV["BOOTSNAP_CACHE_DIR"] \|\|= "tmp/cache"
4
5	gemfile do
6	gem 'bootsnap', require: 'bootsnap/setup'
7	gem 'minitest', require: 'minitest/autorun'
8	gem 'ostruct', require: false
9	gem 'rack-test', require: 'rack/test'
10	gem 'railties', require: 'rails'
11	gem 'actionpack', require: 'action_controller/railtie'
12	end
13
14	class MinimalApp < Rails::Application
15	config.root = __dir__
16	config.eager_load = false
17	config.hosts << "example.org"
18	end
19
20	MinimalApp.initialize!
21
22	MinimalApp.routes.draw do
23	get 'up' => 'rails/health#show'
24	end
25
26	describe "/up" do
27	include Rack::Test::Methods
28	def app = MinimalApp
29
30	it "responds with the server status" do
31	get "/up"
32	assert last_response.ok?
33	end
34	end

Let’s run this test:

/usr/bin/time ruby minimal_test.rb
Run options: --seed 43022

# Running:

.

Finished in 0.066011s, 15.1490 runs/s, 15.1490 assertions/s.

1 runs, 1 assertions, 0 failures, 0 errors, 0 skips
        1.32 real         0.57 user         0.49 sys

We can’t get subsecond testing with Rails using hardware commonly used in a development environment, like we did with Rack and Roda in my previous article, but we can get pretty close with a minimal Rails app. How about a full Rails app?

rails new full_rails_app
cd full_rails_app

Add this test file to the project:

1	# test/system/health_test.rb
2
3	require "application_system_test_case"
4
5	class HealthCheckTest < ApplicationSystemTestCase
6	driven_by(:rack_test)
7
8	test "returns the server status" do
9	visit "/up"
10	assert_selector "body", style: "background-color: green"
11	end
12	end

Let’s test it:

/usr/bin/time bin/rails test:all
Running 1 tests in a single process (parallelization threshold is 50)
Run options: --seed 16351

# Running:

.

Finished in 0.141745s, 7.0549 runs/s, 7.0549 assertions/s.
1 runs, 1 assertions, 0 failures, 0 errors, 0 skips
        2.48 real         1.13 user         1.02 sys

Not that bad. This is what we can expect from a fresh full Rails app. If we can keep our tests running that fast, that should be good enough.

How about Spring?

If you’re reading this article, chances are that you are from the days where Spring was added by default when you ran “rails new app”. You might be asking: what about Spring?

Let’s add spring and measure again.

echo 'gem "spring", group: :development' >> Gemfile
bundle
bundle exec spring binstub

/usr/bin/time bin/spring rails test test/system
Running via Spring preloader in process 63159
Running 1 tests in a single process (parallelization threshold is 50)
Run options: --seed 14564

# Running:

.

Finished in 0.166576s, 6.0033 runs/s, 6.0033 assertions/s.
1 runs, 1 assertions, 0 failures, 0 errors, 0 skips
        0.97 real         0.18 user         0.11 sys

Yay, there we go, subsecond testing with Rails!

Why can’t we use “bin/spring rails test:all” (or test:system)? Spring supports specific Rails commands such as “test”, “console” and “runner”, but not all commands by default. Running “test:system”, “test:all”, “routes” and other commands will not go through the Spring server.

For those that don’t know what Spring is, it preloads the Rails app, initializes it (boot) and waits for commands. This is the Spring server. Then, the Spring client will send the commands to the server, which will then fork and run the command after the application has already been initialized. That’s why it can complete tests in under a second, since the app was already initialized when the command was run. It also watches some files such as the initializers and will restart the server when those files change.

Why isn’t Spring bundled with Rails by default anymore? I don’t really know, but Spring is actually a hack. You have to extend it to support more commands, such as “rspec” and “cucumber”, by installing additional gems, or write your own if you want to add support for “test:system” and “test:all”, for example.

There are over a hundred open issues in the Spring repository to this date, which indicates that such a hack is not as robust as one might think. Having said that, if your application loads a lot of code (gems, initializers, etc) during its initialization, adding Spring to your app can significantly improve your test boot time with minimal effort. But if you’re able to keep your boot fast, that’s certainly better since all commands will benefit from it, not just those supported by Spring, and you avoid all Spring pitfalls.

Anti-patterns often used in Rails apps

Let’s talk about some common patterns found in old and big Rails apps that explain why even very simple tests can take several seconds to complete. Those are most likely the reason why a Rails app takes so long to boot.

Auto-required gems

In Rails, every gem added to the default and environment groups in Gemfile will be automatically required during the application initialization. This happens through this line in config/application.rb:

1	Bundler.require(*Rails.groups)

“Rails.groups” defaults to [:default, "development"] when running Rails in the development environment, for example.

The more gems you add to the Gemfile default groups, the longer it will take for the application to initialize. The application will boot faster if we add “require: false” to the gem declaration, or if we create a separate group such as:

1	# Gemfile
2	# ...
3	group :lazily_loaded_gems
4	gem "graphql"
5	gem "sidekiq"
6	# ...
7	end

For the production environment, and for the test environment when ENV["CI"] is present, we could add the :lazily_loaded_gems group to the default Rails groups. For example, you can replace the Bundler.require line in config/application.rb with:

1	# config/application.rb
2	# ...
3	bundler_groups = Rails.groups
4	bundler_groups << :lazily_loaded_gems if Rails.env.production? \|\| ENV["CI"].present?
5
6	Bundler.require(*bundler_groups)

The downside is that this will force us to explicitly require those gems when our code depends on them, or we’ll get errors in the development and test environments.

Rails initializers

Here’s what the documentation has to say about initializers:

https://guides.rubyonrails.org/configuring.html#using-initializer-files

After loading the framework and any gems in your application, Rails turns to loading initializers. An initializer is any Ruby file stored under config/initializers in your application. You can use initializers to hold configuration settings that should be made after all of the frameworks and gems are loaded, such as options to configure settings for these parts.

Lots of gems will then suggest you to add some initializers to your project. The Devise generator will add this initializer.

The “mongoid:config” generator will add another initializer. The carrier_wave gem will suggest another initializer to configure the gem. Same happens for many other gems such as airbrake, simple_form, draper, geocoder, kaminari, money and many more popular gems. Add to this the project’s own internal gems. Now, a very simple test to check the “/up” health-check endpoint ends up loading lots of irrelevant code to the test. As a result, such a single test would take a long time to complete.

One should carefully decide whether it’s worth adding a new initializer to the application. Adding an initializer is very simple and allows us to quickly move on, but at the cost of slowing down the app’s boot a little bit more.

Just like with the issue of gems getting required by default, we can opt out by avoiding creating such initializers whenever possible. And the same drawback applies to this case, as we’re supposed to initialize those gems whenever our code depends on them.

For example, suppose we have the following initializer in our app:

1	# config/initializers/airbrake.rb
2
3	Airbrake.configure do
4	# ...
5	end

We could instead replace it with:

1	# we want to enable Airbrake during the application
2	# initialization in the production environment:
3	require "setup_airbrake" if Rails.env.production?

Then, we create “lib/setup_airbrake.rb”:

1	require "airbrake"
2	Airbrake.configure do
3	# ...
4	end

Suppose we have some custom filter configured for Airbrake and we want to test it. Then, in our test we would do something like:

1	# spec/params_sanitizer_airbrake_filter_spec.rb
2	require "rails_helper"
3	require "setup_airbrake"
4
5	RSpec.describe ParamsSanitizerAirbrakeFilter do
6	# ...
7	end

In many cases we can completely get rid of lots of initializers. For example, there’s no need to initialize geocoder if we’re not using it. Just require “setup_geocoder” once you depend on it somewhere.

There are other cases where they can’t be easily avoided though. For example, some gems must be loaded before we define the app’s routes, which happens during the initialization. It doesn’t matter if our test isn’t exercising a GraphQL controller, or some controller protected by Devise, we would still have to load both gems so that we can define the app’s routes. Unfortunately I’m not aware of any tricks we could apply to lazily load such code.

How about simple_form? It’s not required by the routes. What if you’re testing an api-only controller? Is it required to load simple_form? Well, an option would be to prepend ApplicationController with:

1	require "setup_simple_form"

This way, if you’re testing some controller inheriting from APIController instead, then it won’t load simple_form.

Surely this approach requires discipline but it also enables your individual tests to complete much faster since they no longer must require all of the application’s dependencies if the code you’re testing does not depend on them.

RSpec support files

This is similar to the Rails initializer pattern. Up to rspec-rais 3.0.2, released 10 years ago, we had the following line in spec/rails_helper.rb enabled by default:

1	Dir[Rails.root.join("spec/support/*/.rb")\].each { \|f\| require f }

If your app is that old, chances are good that you have such a line still enabled in your project and it leads to the same sort of issues as the Rails initializers described above.

These days such a line is commented by default in the generated file. See the relevant snippet:

1	# The following line is provided for convenience purposes. It has the downside
2	# of increasing the boot-up time by auto-requiring all files in the support
3	# directory. Alternatively, in the individual `*_spec.rb` files, manually
4	# require only the support files necessary.
5	#
6	# Rails.root.glob('spec/support/*/.rb').sort_by(&:to_s).each { \|f\| require f }

Some people will ignore the warning and uncomment that line to these days and will experience the same issue with boot time performance when running individual tests.

Another possibility would be to create a similar tool to the spring gem. That tool would load all support files and start a server waiting for a command to run some tests. Then it would fork the process and load the requested tests much faster. It should restart the server whenever the support files change. I don’t think such a tool exists yet, but the effort to create one would be similar to creating the spring gem. Not trivial, but not that much complicated either. It’s certainly doable.

Alternatively, the following simple script would speed up running your tests multiple times. Despite the name, it doesn’t really watch for file changes like guard, although such a feature could be added to the script, but I wanted to keep it simple since this article is already long enough. It’s important to change config.enable_reloading in config/environments/test.rb to something like:

1	config.enable_reloading = ENV["WATCHING_SPECS"] \|\| defined?(Spring)

Then, such script should work fine in most cases:

1	#!/usr/bin/env ruby -I spec
2
3	require "bundler/setup"
4	ENV["BOOTSNAP_CACHE_DIR"] = File.expand_path "../tmp/cache", __dir__
5	require "bootsnap/setup"
6	require "rspec/core"
7
8	ENV["WATCHING_SPECS"] = "true"
9
10	require "rails_helper"
11
12	ActiveRecord::Base.connection_pool.disconnect
13
14	ARGV << "spec" if ARGV.empty?
15	last_specs = ARGV
16
17	while true
18	Process.fork do
19	RSpec::Core::Runner.run last_specs
20	end
21	Process.wait
22
23	puts "Which tests to run? Press ENTER to run the same previous tests: #{last_specs}.\n" +
24	"Enter 'exit' or Ctrl+D to stop. 'reset' to run the original tests: #{ARGV}."
25	specs = STDIN.gets&.chomp&.split(" ")
26	break if specs.nil? \|\| specs == [ "exit" ]
27	specs = last_specs if specs.empty?
28	specs = ARGV if specs == [ "reset" ]
29	specs = specs.map do \|spec\|
30	next spec if spec.start_with?('-')
31	spec.start_with?("spec") ? spec : "spec/#{spec}"
32	end
33
34	last_specs = specs
35
36	RSpec.configuration.start_time = Time.now
37
38	Rails.application.reloader.reload!
39	end

Ten years ago we had spork which took a similar approach to Spring, towards improving test boot time, but that project seems to be dead since then. The rspec-rails gem also used to provide a script/spec_server script by that time with similar goals. As we can see, the Rails boot time has been a concern for over a decade. Hopefully it will get fixed once and for all some day.

Factories with provided classes

With FactoryBot one can specify which class to use for a particular factory:

1	FactoryBot.define do
2	factory :fat_model do
3	name { "MyText" }
4
5	factory :fat_model_subclass, class: FatModelSubclass do
6	end
7	end
8	end

This causes FatModelSubclass to be loaded with all its dependencies (FatModel) when loading the factory’s definitions, even if we’re not using that factory. There are better ways to ensure the class will be lazily loaded, once the factory is actually used:

Example 1 — pass class as a string to be lazily loaded when needed:

1	factory :fat_model_subclass, class: "FatModelSubclass" do
2	end

This is documented by the way:

You can pass a constant as well, if the constant is available (note that this can cause test performance problems in large Rails applications, since referring to the constant will cause it to be eagerly loaded).

Example 2 — use initialize_with:

1	FactoryBot.define do
2	factory :fat_model do
3	name { "MyText" }
4	transient do
5	klass { FatModel }
6	end
7
8	initialize_with { klass.new }
9
10	# factory :fat_model_subclass, class: "FatModelSubclass" do
11	factory :fat_model_subclass do
12	klass { FatModelSubclass }
13	end
14	end
15	end

The second approach allows some code editors to go to the model definition by using some shortcut such as Ctrl/Cmd + Click.

Rails limitations

Routes

Sometimes we just want to test some models, but some dependency would rely on Rails somehow (a secret, setting or environment, for example) and we end up requiring “rails_helper” even though we’re not making any requests. But we still must initialize the Rails app, and it happens to load the routes, which can take a significant time in large apps.

This has been fixed in Rails 8 (not released yet to the date of publication of this article) but in the meanwhile, we can use the routes_lazy_routes gem, by Akira Matsuda:

In Gemfile, we add:

1	group :development, :test do
2	gem 'routes_lazy_routes'
3	end

With that change alone, running “bin/rails environment” goes from 2.5s to 1.9s.

Even though it speeds up some Rails commands, it will still load the routes when we’re testing some model after requiring rails_helper, simply because that gem has this in its initializer:

1	ActiveSupport.on_load :action_dispatch_integration_test, run_once: true do
2	RoutesLazyRoutes.eager_load!
3	end

Surely, it could be adapted for this use case, but to be honest, it’s better to wait for Rails 8 release date.

Mountable Engines

Once you run the “graphql:install” generator, it will add the “graphiql-rails” gem to the development group. It’s a mountable Rails Engine. I don’t know how we could possibly lazily load mountable engines in a Rails application. If your app depends on many mountable engines, they will end up adding to the boot time, since you must load them during the app’s initialization. If you know how to lazily load them, please let me know in the comments.

Profiling

When we’re investigating what’s causing some code to be slow, it’s very important to profile the relevant code. If we want to know why boot is taking a long time, we want to profile at least:

require “bundler/setup” — There’s usually little we can do about it unless you’re intending to work directly on the Bundler’s source-code;
Bundler.require(…) — We can decide which gems should be loaded when this method is called by making changes to either Gemfile or to the code calling Bundler.require.
Rails initializers — There are two simple ways one can measure this. One of them would be to create a single initializer that would load the initializers from another path and measure how long they take to complete. The other one is installing the gem bumbler which I’ll discuss briefly in the next section. Or we can simply profile the next items in this list, which should include the time spent on initializers:
require “config/environment”
Rails.application.initialize!

Bumbler

Bumbler is a tool that allows us to quickly inspect how much time is spent on initializers and required gems:

bundle exec bumbler --initializers # display initializers
# specify a minimal threshold with -t 20 to display only initializers taking over 20ms to load.
bundle exec bumbler --initializers -t 20
# display all loaded gems:
bundle exec bumbler --all

It doesn’t provide as many details as the flamegraphs generated by Stackprof, but they can quickly provide you with some hints on what’s going on with your app’s boot.

Stackprof

I prefer to profile code using flamegraphs, so my suggestion is to add the stackprof gem to Gemfile and instrument the relevant code like this:

1	require 'stackprof'
2	require 'json'
3
4	GC.disable
5	profile = StackProf.run(raw: true) do
6	Rails.application.initialize!
7	end
8
9	File.write "profile-application-initialize.json", JSON.unparse(profile)

Then we can use another tool to view the flamegraphs:

npm -g speedscope
speedscope profile-application-initialize.json

The stackprof’s README mentions another way to generate and visualize the flamegraphs, but I couldn’t make it work in my environment, while speedscope did the trick for me.

If you decided to profile your app’s boot process in Linux or Mac OS, you probably noticed that loading the timezone datasource takes over 100 ms in the Rails boot. To save that time I’d recommend you to add the tzinfo-data gem to Gemfile. Rails by default adds this to Gemfile:

1	# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
2	gem "tzinfo-data", platforms: %i[ windows jruby ]

Using this gem as the datasource will load much faster in Linux/Mac compared to loading the datasource from the system files, so I’d recommend adding this gem to Gemfile for all platforms if you want to speed up your boot time as much as possible. If you still wants to use the system files as the datasource for the production environment, just add this line for this environment:

1	TZInfo::DataSource.set :zoneinfo

Conclusion

Just like mentioned in my previous article, the key for fast booting is lazily loading code. Loading code takes time, so the more code you can avoid loading during the app initialization, the faster it will boot. Avoid initializers as much as you can, require gems only once you need them and you should benefit from being able to run individual tests pretty fast.

Adding the spring gem to the project could also save you a few seconds running your tests. A “bin/watch-specs” script was provided to get a similar experience focused on speeding up the tests boot time.

Finally, I created a repository with all changes discussed in this article to help you explore what it means in practice. Check it out and try it by yourself. It may be easier to follow the project by inspecting the git logs and see the changes one by one.

If you have any other suggestions to improve the boot performance, please let us know in the comments. I’d love to hear about them.

Good luck improving your app’s boot performance.

LightBlog: a file-based blog app

2023-03-01T13:14:00+00:00

When Heroku discontinued the free plan, I decided to host my site elsewhere. It was an old Rails app, and since I moved from Rails to Roda a few years ago for the apps I maintain, I decided to take the opportunity to rewrite the blog in Roda.

While doing so, I thought it could be a good idea to make the bulk of it an open-source project, and that’s how LightBlog was born. This site is powered by LightBlog with a few changes to support multiple languages (namely English and Portuguese) and automatic code-reloading (using my auto_reloader gem), plus a few changes to the default views.

It’s basically a Roda application using the multi_run plugin to serve two LightBlog apps, one for the English articles and another one for the Portuguese articles with a slightly different configurations passed to LightBlog.create_app.

How it works?

LightBlog (and my previous Rails-based site) is inspired by Toto, a git-powered, minimalist blog engine.

Just like it happens in Toto, articles are written in Markdown and stored directly on disk (usually in a Git repository) and contain some metadata associated to it in the form of a header written in YAML, containing arbitrary information besides the article title and published / updated at dates.

The articles history is basically the git history of the articles repository. When using LightBlog to serve your articles, I’d strongly recommend you to keep your articles in a separate repository.

This approach provides many benefits:

The application is safer

There’s no database, so no risk of SQL injection and other related vulnerabilities. If your blog is attacked by some hacker, the biggest risk would be providing the markdown source of your articles to the attacker. Basically, if you keep a back-up of your articles' repository, you’re safe.

It’s simpler to setup

No need to create and tune any databases.

It’s very flexible / portable

You can easily move from LightBlog to something else in the future if you keep your articles separated from the application. Just create another application that can read your data (your markdown-based article files).

Deploying is easy and the process is very lightweight

LightBlog requires very few resources. This site is hosted in a single e2-micro Compute Engine instance at GCP at no cost at all (it’s within the free tier conditions). The e2-micro engine provides very little CPU and RAM (1GB only) and yet it’s able to serve a blog with LightBlog very easily.

Features

Out-of-the-box, LightBlog provides:

tagging support: group your articles by tags;
atom feeds (for all articles and by tag too);
optional comments provided by Disqus (just provide the Disqus forum/id in the options);
optional integration with Google Analytics (just provide the GA id in the options);
optional automatic reloading of the articles;
a rake task to generate a new empty article;
a rake task to copy the default views (which can be overridden);
a light_blog command to help you creating a new simple Rack application using LightBlog;
internacionalization/localization support (integration with the i18n gem);

Getting Started

1	gem install light_blog
2	light_blog new myblog
3	cd myblog
4	bin/rake article:new_article
5	# type in the article title
6	bin/puma -p 4000

Then simply navigate to https://localhost:4000 to view your articles.

Give it a try and let me know in the comments what you think about it.

Thank you, Heroku!

2022-11-07T21:40:00+00:00

For 13 years, I’ve hosted my articles on Heroku.

It was a great experience. Not only Heroku was kind enough to offer free hosting for Ruby apps all that time, but publishing a new article was only a “git push” operation. I haven’t experienced a single problem in all those years.

I think I talk on behalf of the Ruby Community when I say Heroku provided a very valuable resource for Rubyists for all those years and it’s still a great platform for existing and new projects, not only for Rubyists but for many other supported languages as well.

On Nov 22th, 2022, however, Heroku is no longer providing a free plan, which is certainly understandable.

When I got communicated about this change, I also realized it would be a good idea to actually use my own domain. Also, it has been some years since I last used Rails and my site was still running in an older version of Rails, so I decided it would be a good idea to rewrite it in Roda, which is what I’ve been using for the past 4 years at least.

Finally, I decided to test whether the e2-micro Google Cloud Platform compute engine type would be enough for running my site. It’s very light, so I thought it would worth a try, since I can currently run it for free using the Free Tier of Google Cloud Platform.

That’s how I moved the site to https://rosenfeld.page.

In the process of rewriting my site, I thought it would be an opportunity to also open-source it.

My site is currently split in 3 projects: a gem providing the main features, a Roda app that uses this gem, and my articles in markdown format, which are saved in yet another repository.

Right now all 3 repositories are private, but I intend to open-source the gem as soon as I finish the remaining features I’d like to add besides adding some documentation. Then, I also intend to open-source the Roda app that uses that gem at some point, although I don’t think most people would be interested on this one. For the articles repository, I’m not planning to open-source it for now since I think no one would actually benefit from it.

If you find any issues on the new site, please report on the comments section.

By the way, I’m not intending to import the Disqus discussions on my articles at Heroku. The old site is still available until Nov 29th, so feel free to check any discussions there while it’s still possible.

I also intend to remove some old articles soon, in the cases I think they would no longer be relevant nowadays.

Once again, thank you very much, Heroku, for all those years hosting my site. I really appreciate it! <3

Why proxying Bugsnag (or similar service) might be a good idea?

2018-03-01T19:45:00+00:00

Bugsnag is a great error monitoring service that takes care of reporting and filtering/notifying exceptions in several kind of applications. I used to use my own error reporting tool in the app I currently maintain but as I’m currently evaluating creating a new application, I started to evaluate Bugsnag to save me some time. But I stumbled upon an issue I didn’t have to deal with my custom error reporting tool.

When reporting errors, it’s a good idea to attach as much meaningful data as they could be quite helpful when trying to understand some errors, specially when they aren’t easily reproducible. Such data include user information which I’d prefer not to expose to the front-end, including the user id.

I was initially worried about exposing the API key to the front-end, which someone could use to report errors to my account, but then I figured out I was being too paranoid and that proxying the request wouldn’t prevent users from reporting errors to my account, unless I’d implement some sort of rate limit protection or disabling errors reporting for non authenticated users (after all, I’d be able to track authenticated users acting that way and take some action against them).

However, hiding from the front-end user data meant to be used only internally is important to me. That’s why I decided to take a few hours to proxy browsers errors through the back-end. Here’s how it was implemented using the official bugsnag-js npm package and the bugsnag Ruby gem.

In the JavaScript code, there’s something like showed below. I used XMLHttpRequest rather than fetch in order to support IE11 since the polyfills are lazy loaded as required in our application and fetch may not be available when Bugsnag is initialized in the client:

1	import bugsnag from 'bugsnag-js';
2	const bugsnagClient = bugsnag({
3	apiKey: '000000000000000000000000', // the actual api key will be inserted in the back-end
4	beforeSend: report => {
5	const original = report.toJSON(), event = {};
6	let v;
7	for (let k in original) if ((v = original[k]) !== undefined) event[k] = v;
8	report.ignore();
9
10	const csrf = (document.querySelector('meta[name=_csrf]') \|\| {}).content;
11	const xhr = new XMLHttpRequest();
12	xhr.open('POST', '/errors/bugsnag-js/notify?_csrf=' + csrf);
13	xhr.setRequestHeader('Content-type', 'application/json');
14	xhr.send(JSON.stringify(event));
15	}
16	});

The back-end is a Ruby application built on top of the Roda toolkit. It uses the multi_run plugin, splitting the main applications into multiple apps (which can be seen as powerful controllers if it helps understanding how it works). These are the relevant parts of the back-end:

lib/setup_bugsnag.rb:

1	# frozen-string-literal: true
2
3	require 'app_settings'
4	require_relative '../app_root'
5
6	if api_key = AppSettings.bugsnag_api_key
7	require 'bugsnag'
8
9	Bugsnag.configure do \|config\|
10	config.api_key = AppSettings.bugsnag_api_key
11	config.project_root = APP_ROOT
12	config.delivery_method = :synchronous
13	config.logger = AppSettings.loggers
14	end
15	end

app/apps/errors_app.rb:

1	# frozen-string-literal: true
2
3	require 'json'
4	require_relative 'base_app'
5	require 'bugsnag_setup'
6
7	module Apps
8	class ErrorsApp < BaseApp
9	private
10
11	def process(r)
12	super
13	r.post('bugsnag-js/notify'){ notify_bugsnag }
14	end
15
16	def notify_bugsnag
17	api_key = settings.bugsnag_api_key
18	head :ok unless api_key && settings.store_front_end_errors
19	event = JSON.parse request.body.read
20	user_data = auth_session.to_h
21	user_data['id'] = user_data['profile_id']
22	event['user'] = user_data
23	event['apiKey'] = api_key
24	event['appVersion'] = settings.app_version
25	payload = { apiKey: api_key, notifier: {
26	name: 'Bugsnag JavaScript', version: '4.3.0', url: 'https://github.com/bugsnag/bugsnag-js'
27	}, events: [event] }
28	configuration = Bugsnag.configuration
29	options = {
30	headers: {
31	'Bugsnag-Api-Key' => api_key,
32	'Bugsnag-Payload-Version' => event['payloadVersion'],
33	}
34	}
35	Bugsnag::Delivery[configuration.delivery_method].
36	deliver(configuration.endpoint, JSON.unparse(payload), configuration, options)
37
38	'OK' # optional response body, could be empty as well, we don't check the response
39	end
40	end
41	end

That’s it, some extra code, but it allows me to send useful information to Bugsnag while not requiring us to expose them to the front-end application. Hopefully next time I need something like that it will help to have it written down here ;)

The missing bit in the React community: a common interface

2018-01-25T17:52:00+00:00

I use Ruby for server-side programming, so I’ll illustrate the issue in the Ruby community but it basically applies to all server-side languages. Even JavaScript I’d guess, although I haven’t used JavaScript for server-side programming yet.

When it’s time to deploy our Ruby web application, we’re free to choose a web server from multiple options without requiring any changes to the application code most of the time. That’s possible because all of them support the Rack specifications, which acts like an interface between Ruby apps and Rack web servers. When we choose a process-based server such as Unicorn, we can benefit from several advantages over thread-based ones, such as Puma, but the opposite is just as true, since a thread-based approach also has benefits over a process-based one. Other web servers are more suited to applications requiring long-live connections and would take yet another approach to connections handling.

The simple fact that we can easily switch the web server without changing our code to test the impact of different in our application is awesome, but Rack will also make it easier for new Ruby frameworks to be built and inter-operate with each other pretty easily. For example, you can mount Rodauth, which is a Roda authentication app, in a Rails or Sinatra app in a very straightforward way.

It’s not really news that competition is awesome for consumers, and we, software developers, are consumers of libraries and frameworks, so we really enjoy competition of frameworks and libraries, right?

So, how could that be related to React in any way? After all, one might argue that React is competing with Angular, jQuery and others, right? Well, in that sense, competition still exists in the JavaScript framework/library market, but that’s just not enough. I’ll explain.

The Components problem

The way we build and use components are the biggest issue we currently experience in the JavaScript world. For example, the Material UI library doesn’t compete with ng-bootstrap, for example. How could them? The components interface are completely different in React apps when compared to Angular apps. In that sense, we could consider React and Angular as two different languages. One wouldn’t expect to be able to use the Ruby’s Rack gem in the Go language, right?

So, yes, it leads to duplicate efforts to build great common components such as date pickers, sliders and so on. We have jQuery UI for jQuery, ng-bootstrap for Angular and Material UI for React (among many more options for sure). But it kind of makes sense to me that components implementation could be quite different because jQuery, Angular and React take completely different approaches. One might argue that we could concentrate the efforts on pure JavaScript solutions and build wrappers around it for each major library, such as React, Angular or jQuery. But I don’t really think it would be that simple, so I’m not trying to suggest something like that.

When one creates a new programming language, it will take quite some time before it gets widely adopted. One of the reasons is that there’s no ecosystem around that language initially. When Elixir was created, for example, there was no web framework available for it, of course. Ruby not only has a lot of choices for web frameworks but also offer several libraries for all sort of things you might need. Or Java. Or JavaScript, C++, whatever. Well stablished languages will take advantage over new ones exactly because of this existing libraries, making it hard for new languages to get traction. Unfortunately, I don’t think there’s something we could do to make it easier for new languages to be created and take advantage of existing libraries from other languages. But, fortunately, the React issue is much simpler to fix, if the community wants to.

What is different in React?

When React was born, it was really a game changer.

I’m not saying everyone should be leaving jQuery or Angular or whatever to jump to the React boat. There’s no one-size-fits-all solution for creating applications. Angular, jQuery and React take completely different approaches when creating applications. Angular 1 has tried two-way data binding but gave up on it since version 2. Both Knockout.js and Vue.js still support two-way data binding. There are plenty of people that enjoy that feature. And for some use cases, it’s certainly quicker to build some use cases with it when compared to newer Angular or React.

However, it’s clear to me that React got way more traction than its competitors in the past few years. And there are plenty of reasons for that. I really think React did a great job in teaching us a new way to think about applications. It’s not a secret that programming is mostly complicated because of state management, right? I still prefer OO programming over functional one, but I do agree with the most used argument from functional programming fans: managing state is hard. They claim functional programming is easier because it’s easy to understand and test pure functions. It’s the same sort of argument used by supporters of micro-services that will tell us that it’s much easier to write and test micro apps than a big monolithic one.

And their arguments are not wrong. Just incomplete from my point of view. Because they hide the fact that they have moved the complexity to the integration part. By the way, it’s perfectly possible to create modular monolithic applications whose parts can be tested and released independently while keeping the integration much simpler than with micro-services. I’m not saying one should never adopt the micro-services approach either. There’s no silver bullet. The complexity will always exist somewhere and our job is to see what makes more sense for our project.

Anyway, I’ll not get into this discussion in this article, but just want to highlight that the most complicated part when creating applications is state management. Truth be told, trying to keep your model and view in sync using jQuery has always been a nightmare, right? That’s why we see several alternatives to jQuery for many reasons, like Knockout.js, Angular and many others. The main difference between them and jQuery is that they will let us manage the state outside of the DOM and will make sure that the DOM will change accordingly to the app’s state. That’s a big win over jQuery or any other DOM-based library.

So, why did React get much more traction than the alternatives? In my opinion, React is much simpler than the alternatives, while remaining flexible and fast. For example, Knockout.js, initial versions of Ember (I’m not following the current development, so I can’t talk about recent versions) and Angular would all try to extend the HTML in sophisticated ways. They have to parse either the HTML template itself or some special tag properties and evaluate some special constructions. Knockout, for instance, would evaluate “data-bind” attributes, which resembles very closely a JavaScript object declaration. Both Ember and Angular would also offer their own control flow extensions instead of using plain JavaScript, because they preferred declarative (logic-less) templates. Maybe that description is not fully accurate as I never worked with Angular or Ember, but this is what I remember from the articles I’ve read back in those days.

React took a completely different approach, making the JavaScript developer life much simpler in several ways. At first, I (and many others) were scared by the JSX thing, when it seemed an heresy to embed HTML in the JavaScript, bringing memories from PHP and ASP to the mind. However, I (and many others) realized that it actually made sense. After all, I often ended up doing that myself in several cases using other methods. It might seem scaring at first, but quickly we get used to it and it makes sense. We’ll even mix CSS and JS these days (also known as CSS in JS).

But JSX wasn’t the main reason why people adopted React. I guess there are many people who actually adopted React despite the JSX thing, rather than because of it. There are two key ideas that set React apart and were responsible for its success in my opinion. And I’m not talking about documentation or component-driven programming because the alternatives also offered those.

One of them is the realization that keeping the model and view in sync is the major issue to be fixed by library authors. Approaches taken by Knockout, Angular and Ember seemed too complicated. Trying to figure out what has to be changed in the view when the application state changed was really tricky. Then, React decided to try a much simpler approach. What if we just rebuilt the entire app from scratch after any changes to the application? Well, of course the alternative frameworks could also implement that brilliant idea that simplifies a lot the implementation. Except that it would be painfully slow to do that.

So, that’s the second idea, which allowed the first one to succeed, was the key for the revolution we’ve seen in the JavaScript scenario after React was born. The realization that JavaScript is pretty fast nowadays, as long as we can avoid the DOM, which is the slow part. And that’s how the concept of virtual DOM became popular and today we have tons of alternative virtual DOM implementations. The realization that comparing an in-memory DOM was really fast allowed a not too complicated algorithm to update the DOM in a very efficient way to reflect the state of the virtual DOM. Of course, there are much more optimizations applied by React and similar alternatives, but the implementation of the virtual DOM diff algorithm allowed developers to quickly understand how to build React apps. It was much simpler to reason about than the alternatives, in my opinion.

And to make things even better, they adopted modern JavaScript, which allowed us to write Object Oriented programming in JavaScript without having to resort to CoffeeScript and alternative transpiled languages. When we add bundle builders as sophisticated as Webpack, things get quickly unbeatable. Now we’re able to write modular conflict-less apps, using OO component-based programming, with an easy syntax to mix HTML in JavaScript. We can even even import CSS from JS or apply code splitting with Webpack. Or use some of the CSS in JS alternatives, such as JSS. But, again, the biggest gain is that we no longer need to touch the DOM directly, without having to learn a new template syntax. We just use plain JavaScript.

React is awesome, I know, so what is the issue after all?

The issue is that React is not just an implementation. It’s a powerful idea and mind set. The concepts are so simple, that we now have many alternative to React which are mostly compatible with it. There are Dio.js, Inferno.js, Preact and NervJS to name some well known alternatives. Each of them could be competing with each other but unfortunately it’s not that simple.

Why is that? Because you can’t simply use some UI library designed to work with React in any of the alternatives. For one to be able to do that, they would have to use something like Webpack aliases, so that whenever the code imports ‘react’, ‘react-dom’ or ‘create-react-class’ they would be actually importing some compatibility layer around the alternatives. Something like ‘inferno-compat’. What if we wanted to mix apps which are lazily loaded but developed by separate teams? Maybe one team is using React for a reason, while another team is using Inferno or Dio.js for another reason. Then the webpack rules get way more complicated to manage.

What if we could set up a common interface supported by all react-like implementations. Something like Ruby’s Rack but for React components? UI libraries are basically React components. Basically they rely on JSX (not all of them, though) and React.Component. Both interfaces are very well known. JSX is already independent from React, but it still needs to know which pragma to use when parsing JSX. When using Babel, one can easily add a plugin that will include additional imports automatically so that you shouldn’t be forced to “import React” in order to use JSX. But it would be great if we didn’t have to resort to such things when targeting interoperability.

For React like programming I believe there’s a fixed set of methods that should be enough for most apps. Functions such as like createElement (for JSX support, also known as “h” by some implementations), render, Component, createPortal and findDOMNode. What if we could create a meta package to provide us such method? Then all React-like alternatives could compete with each other by providing the implementation for such functions.

For example, let’s suppose we create a new “react-like” package. We could set up which library to use like this:

1	import ReactLike from 'react-like';
2	import dio from 'dio.js';
3	ReactLike.assign(dio); // or assign({createElement: dio.createElement, createPortal: dio.createPortal ...})

Then, instead of “import React from ‘react’”, component libraries, such as Material UI, could use it like:

1	import React from 'react-like';
2
3	export default Button extends React.Component {
4	render() {
5	return <button className="my-special-class">{ this.props.children }</button>;
6	}
7	}

It would be even better if those libraries used yet another abstraction with a fallback to ‘react-like’, so that we would be able to use Inferno for some library and Dio.js for another one, for example.

Currently it’s not an easy for us to pick up one of the great React alternatives out there because almost all component libraries seem to assume React is being used. Wouldn’t it be awesome if we could provide a common interface to be used by components libraries and promote the competition among React alternatives?

Introducing sequel_tools: Rake integration over Sequel migrations and related tasks

2017-12-18T18:55:00+00:00

The importance of the little details (skip this section unless you enjoy rants)

Seriously, this section is big and not important at all, feel free to completely skip it right now if you’re short in time or don’t enjoy rants.

This is a rant explaining how ActiveRecord migrations completely defined my career in the past years.

I became curious about programming and computers when I was kid. I remember reading a huge C++ book when I was about 10 years old. I had learned Clipper just a bit before and I recall creating a Bingo game with Clipper, just because I wanted to play Bingo in those machines but I couldn’t :) While learning Clipper I also had my first experience learning SQL and client-server design. My dad subscribed me to a few computer courses by that time, such as “DOS/dBase III Plus”, Clipper + SQL and a few years later Delphi + Advanced SQL. I learned C and C++ from books and when services like Geocities and similar were showing up and the Internet was becoming supported in lots of homes I also became interested in learning HTML to build my own sites, the new hotness for that time. Since I also wanted to serve dynamic content, I decided to learn Perl since it was possible to find some free hosting services supporting Perl, and that was the first interpreted language I learned and I was really fascinated by it by that time.

For a long while I used Perl exclusively for server-side web programming since it was the only option I could find in free hosting services, but while in Electrical Engineering college, I barely did any web programming, and my programming tasks (extra classes) were mostly related to desktop programming (Delphi / C++) and embedded and hard real-time systems using a mix of C and C++ during my master thesis in Mobile Robotics. By that time I had a solid understanding of C and C++, good times, I don’t find myself proficient with them anymore these days. That was a time where I would read and know the entire specs from W3C or HTML 4.01 and CSS. Today it’s simply unfeasible to completely follow all related specs and I’m glad we have competition in the browser’s marketing since it’s really hard to follow up with all changes happening every day.

Once I finished my master thesis and had to find a job, I looked mostly for programming jobs, since I considered myself good in programming, there were lots of interesting opportunities out there while it was really hard to find companies in Brazil working on electronic devices development or Robotics and I never actually enjoyed the other part of Electrical Engineering such as machines, power or electrical installations. I only enjoyed the micro-electronics and embedded devices creation and one should consider themselves very lucky if they can work in such area in Brazil, and I didn’t want to count on luck, so I decided to focus on the programming career instead. I remember my first curriculum was sent to Opera Software, my preferred browser, to apply to a C++ developer position, by that time, but after tons of interviews they didn’t call me, so I’m not currently living in Norway these days ;)

After working for 3 months in a new parking system using Delphi (despite asking for using C++ instead) the contract was finished, the product was already working in one of the malls in my city, and I had to look for another job. They actually extended the offer to keep working with them, but at the same time I found another opportunity and this time I would have to get back to web programming. That was in 2007. Several years later and I couldn’t really remember much of Perl and a lot had happened to web programming in the past years and I didn’t follow that progress.

After a few stressful days trying to learn about every major web programming framework (specially while trying to read about J2EE), I came to the conclusion that I would finally choose one of TurboGears, Django or Rails. I didn’t know Java, Python or Ruby by that time, so the language didn’t take an important role while choosing the framework. I was more interested in learning about how the frameworks would make my life easier. At that time I had to maintain an existing ASP application but at some point I would have to create a new application and I could choose whatever I wanted and definitely I didn’t enjoy ASP.

Since that application had to be displayed in Portuguese, I was considering the Python frameworks more than the Ruby one, as Rails didn’t support internationalization by that time (i18n support was added to Rails 2 if I recall correctly) and even supporting UTF-8 wasn’t straightforward with Ruby 1.8. Iconv and $KCODE were something you’d often hear about in the Ruby community by that time. There were tons of posts dedicated to encoding in Ruby by that time.

But there was that one Rails feature that made me change my mind and choose Rails over TurboGears or Django, which were supposed to work well with encodings and had announced internationalization support. And it was the approach used to evolve databases, which was the right strategy to use from my previous experiences, while I was pretty scared by the model-centered approaches used by TurboGears and Django to handle the database evolution.

By that time I had already plenty of experience working with RDBMS, specially Firebird, and having to deal with versioning the database and supporting multiple environments. That took me a lot of effort every time I started a new project because I basically had to implement the ActiveRecord migrations features every time and I knew that was very time consuming, so I was glad I wouldn’t have to roll my own solution if I used Rails, as ActiveRecord migrations were clearly more than enough for my needs and they worked pretty well. So, despite the issues with encoding and lack of internationalization support, I decided to pick Rails due to the ActiveRecord migrations.

And even though I don’t use ActiveRecord for several years, I’ve been still using its migrations tools since 2007, more recently through my wrapper around it called active_record_migrations.

While I don’t appreciate ActiveRecord as an ORM solution, I like its migrations tooling very much and they haven’t changed much since I used them with Rails 1. The most significant changes since then were support for time-stamped migrations, the reversible block and finally, many years later, proper support for foreign keys (I struggled to add foreign keys using plain SQL for many years).

When I first read about Sequel I was fascinated by it. ActiveRecord wasn’t built around Arel yet by that time, so all those lazy evaluations in Sequel were very appealing to me. But around 2009 I took another job opportunity and this time I would work with Grails and Java rather than Rails, so I missed many recent changes to Rails for a while. In 2011 I changed my job again, but still had to support a Grails application, but I was free to do whatever I liked to the project and since there were quite a lot of Grails bugs that were never fixed and I couldn’t find work-arounds for, I decided to slowly migrate the Grails app to Rails. By that time, Arel had been integrated to ActiveRecord, so it would finally support lazy evaluation as well, so I decided to try to stick with Rails defaults, but a week later I realized that there were still many more reasons why Sequel was far superior to ActiveRecord and decided to replace ActiveRecord with Sequel and never looked back. Best decision ever.

See, I’m a database guy. I work with the database, not against it. I don’t feel the need to abstract the database because I’d prefer to use Ruby over SQL. I was able to appreciate not only SQL but several other powerful tools provided by good database vendors, such as triggers, CTE, stored procedures, constraints, transactions, functions, foreign keys and definitely I didn’t want to avoid the database features at all. ActiveRecord seems to try to focus on hiding the database from the application, by trying to abstract as much as possible so that you feel you’re just working with objects. That’s probably the main reason why I loved Sequel. Sequel embraced the database, it didn’t fight the database. It would try to make it as easy as possible to use whatever vendor-specific feature I wanted to, without getting in my way. That’s why I don’t see Sequel as an ORM, but as a tool that allows me to write the SQL I want with a level of control and logic that would be pretty hard to achieve by building SQL queries through concatenation techniques and manual typecasting of params and result sets.

I can always have a clear idea on the SQL generated by Sequel and it’s way more readable than if I had to write the SQL by hand myself.

When I first learned about Sequel, Jeremy Evans was already its maintainer, but it seems Sequel was first created by Sharon Rosner. Recently I read this article, where this quote came to my attention:

I’m the original author of Sequel [1], an ORM for Ruby. Lately I’ve been finding that ORM’s actually get in the way of accomplishing stuff. I think there’s a case to be made for less abstraction in programming in general, and access to data stores is a major part of that.

For an in-production system I’ve been maintaining for the last 10 years, I’ve recently ripped out the ORM code, replacing it with raw SQL queries, and a bit of DRY glue code. Results: less code, better performing queries, and less dependencies.

Sharon Rosner, Sequel original author

Good that it’s working well for him, but I really find it weird to see that he would consider Sequel a traditional ORM. To me, Sequel allows me to write more maintainable queries, so I consider it more of a query builder than an ORM. If I had to build all SQL by hand and typecast params and result sets by hand, I think the result would be much worse, not better.

So, nowadays, I’m considering creating a brand new application after several years, and I’m frustrated that it takes a really long time to bootstrap a production-ready new application with the state-of-the-art features. I started working on such sample project to serve as a start point. The idea is to add features such as automated deployment, including blue-green (canary) strategies for zero downtime, using Roda as the Ruby framework, Webpack to bundle static resources, support a lightweight alternative to React, such as Dio.js or Inferno.js, supporting multiple environments, flexible configurations, client-side routing, proper security measures (CSRF, CSP headers), a proper authentication system, such as Rodauth, proper images uploading (think of Shrine), distributed logging (think of fluentd) with proper details, reliable background jobs, server-side and client-side testing, support for lazy code loading for both client-side and server-side, autoreloading of Ruby code in the server-side, analytics, APM, client-side performance tricks such as link preloading, performance tracking for both server-side and client-side code, errors tracking for both server-side and client-side code, integrated with sourcemaps and notifications from monitoring services, CDN support, full-text search through ElasticSearch or Solr, caching storage such as Redis, Docker based infra-structure, backup, high-availability of databases, and many many more features that are supposed to be found in production-ready applications. As you can see, it’s really frustrating to create a new application from scratch these days, as it seems any new product could easily take an year to reach a solid production-ready level. And, of course, support for database migrations.

The last thing I would want to worry about while working on this huge project is to waste time with a simple task, such as managing the database state through some migrations and related tools. Specially as ActiveRecord migrations have been providing that for so long and it works pretty well. However, this time I really wanted to ditch the dependency on railties for this new project, and active_record_migrations relies on railties for simplicity, so that it can take advantage of the Rails generators and just be a very simple wrapper around ActiveRecord migrations. But since AR itself won’t be used in this project, I decided to spend several hours (about two full days), replicating the most important tools from ActiveRecord to Sequel. And this is how sequel_tools was born this week.

I find it interesting how such a little detail, like Rails bundling a proper database migrations tooling, influenced a lot of my career, since I only learned Ruby because of Rails in the first place and I only chose Rails because of ActiveRecord migrations :) If I was working with Python I wouldn’t have learned Ruby most likely and wouldn’t work in my current job, and wouldn’t have created many gems such as:

active_record_migrations;
auto_reloader;
rails-web-console;
sequel-devise (no longer maintained by me);
rspec_nested_transactions;
rails_compatible_cookies_utils;
rack_web_console;
global_hotkeys_manager;
rack_toolkit;
simple_mail_builder;
and now sequel_tools, among others.

I’ve also been using Ruby for some other projects such as cert-generator, a Rack application that can be launched from a Docker container that allows development suited auto-signed root CA and HTTPS certificates in such a way supported by modern browsers. I’ve written about it in my previous article.

Or I wouldn’t have contributed to some Ruby projects such as Rails, orm_adapter-sequel, Redmine, Gitorious (now dead), Unicorn, RSpec-rails, RSpec, Capistrano, Sequel, js-routes, jbundler, database_cleaner, Devise, ChiliProject, RVM, rails-i18n, rb-readline and acl9. Most of them were minor contributions or documentation updates, but anyway… :)

Not to mention many bugs reported to MRI, JRuby and Ruby projects that have been fixed since then. And, before I forget, some features have been added to Ruby after Matz approved some of my requests. For example, the soon to be released Ruby 2.5 is introducing ERB#result_with_hash (see issue #8631.

Or my request to remove the ‘useless’ ‘contatenation’ syntax that was approved by Matz about 5 years ago, and I still hope someone would implement it at some point :)

I wonder what would be my current situation if ActiveRecord migrations weren’t bundled with Rails in 2007 :) On the other side, maybe I could have become rich working with Python? ;)

Introducing sequel_tools

If you’re a Sequel user, you probably spent a while searching for Rake integration around Sequel migrations and realized it was more time than you’d wished. I’ve been in the same situation, but it was so frustrating to me, because I wasn’t able to find all tasks I want to have at disposal, that I’d often just forget about using Sequel migrations to stick with ActiveRecord migrations. Not because I like the AR migrations DSL better (I don’t by the way), but because all tooling is already there, ready to be used through some simple rake commands.

sequel_tools is my effort in trying to come up with some de facto solution for integrating Sequel migrations and related tooling and Rake, and see if the Sequel community could concentrate the efforts on building together a solid foundation for Sequel migrations. I hope others would sympathize and contribute to the goal, so that we wouldn’t have to waste time thinking about migrations again in the future when using Sequel.

Here are some of the supported actions, which can be easily integrated to Rake, but are implemented in such a way that other interfaces, such as command lines or Thor, should be also made easy to build:

create the database;
drop the database;
migrate (optionally to a given version, or latest if not informed);
generate a migration file (time-stamp based only);
status (which migrations are applied but missing locally and which are not yet applied to the database);
version (show current version / last applied migration);
rollback last applied migration which is present in the migrations path;
run a given migration up block if it hasn’t been applied yet;
run a given migration down block if it hasn’t been applied yet;
redo: runs a given migration down and up, which is useful when writing some complex migrations;
dump schema to schema.sql (configurable, can happen automatically upon migration - implemented just for PostgreSQL for now, by calling pg_dump, but should be easy to extend to support other databases: PRs are welcomed or additional gems);
load from schema;
support for seeds.rb;
reset by re-running all migrations over a new database and running the seeds if available;
setup by loading the saved schema dump in a new database and running the seeds if available;
execute a sql console through the “shell” action;
execute an irb console through the “irb” action. This works like calling “bundle exec sequel connection_uri”. The connection is stored in the DB constant in the irb session.

I decided not to support the Integer based migrations at this point as I can’t see any drawbacks of time-stamp based migrations that would be addressed by the Integer strategy while there are many problems with the Integer strategy even if there’s a single developer working in the project. I’m open to discuss this with anyone that thinks that could convince me otherwise that supporting Integer based migrations would add something to the table. It’s just that it’s more code to maintain and test and I’m not willing to do that unless there is indeed some advantage over using time-stamp based migrations.

The project also allows missing migration files, since I find it useful specially when reviewing multiple branches, dealing with independent migrations.

I don’t think it’s a good idea to work with a Ruby format for storing the current schema, as a lot of things are specific to the database vendor. I never used the Ruby vendor-independent format in all those years, but if you think you’d value such a feature in case you just use the basics when designing the tables and want your project to support multiple database vendors, then go ahead and either send a Pull Request to make it configurable, or create an additional gem to add that feature and I can link to it in the documentation.

I’d love to get some feedback regarding what the Sequel community would think about it. I’d love for us to get to some consensus on what should be the de facto solution for managing Sequel migrations in a somewhat feature-complete fashion and would love to get the community help on making such de facto solution happen to the best interest of we, Sequel happy (and sometimes frustrated by the lack of proper tooling around migrations - no more) users ;)

Please take a look at how the code looks like and I hope you find it easy to extend to your own needs. Any suggestions and feedback are very welcome, specially now that the project is new and we can change a lot before it gets a stable API.

May I count with your help? ;)

Testing HTTPS in a Linux development environment with self-signed certificates

2017-12-03T11:45:00+00:00

Note: if you only care about getting the certificates, jump to the end of the article and you’ll find a button to just do that. This way you don’t even need Linux to generate them.

For a long time I’ve been testing my application locally using a certificate issued by Let’s encrypt, which I must renew every few months for domains such as dev.mydomain.com. Recently, I’ve been considering creating a new app and I don’t have a domain for it yet.

So I decided to take some time to learn how to create self-signed certificates in such a way that browsers such as Chrome and Firefox would accept it without any disclaimer with no extra step.

It took me about 2 hours to be able achieve this task, so I decided to write it down so that it would save me time in the future when I need to repeat this process.

I’ll use the myapp.example.com domain for my new app, since the example.com domain is reserved.

The first step is add that domain in /etc/hosts:

1	127.0.0.1 localhost myapp.example.com

Recent browsers will require the subject alternate names extension, so the script will generate that extension using a template like this:

1	[SAN]
2	subjectAltName = @alternate_names
3
4	[ alternate_names ]
5
6	DNS.1 = myapp.example.com
7	IP.1 = 127.0.0.1
8	IP.2 = 192.168.0.10

Replace the second IP with your own fixed IP if you have one just in case you need to access it from another computer in the network, like some VM, for example. Edit the script below to change the template. You’ll need to add the root CA certificate we’ll generate soon to those other computers in the network in order to do so, as I’ll explain in the last steps in this article. Just remove IP.2 if you don’t care about it.

Then create this script to help generating the certificates in ~/.ssl/generate-certificates:

1	#!/bin/bash
2
3	FQDN=${1:-myapp.example.com}
4
5	# Create our very own Root Certificate Authority
6
7	[ -f my-root-ca.key.pem ] \|\| \
8	openssl genrsa -out my-root-ca.key.pem 2048
9
10	# Self-sign our Root Certificate Authority
11
12	[ -f my-root-ca.crt.pem ] \|\| \
13	openssl req -x509 -new -nodes -key my-root-ca.key.pem -days 9131 \
14	-out my-root-ca.crt.pem \
15	-subj "/C=US/ST=Utah/L=Provo/O=ACME Signing Authority Inc/CN=example.net"
16
17	# Create Certificate for this domain
18
19	[ -f ${FQDN}.privkey.pem ] \|\| \
20	openssl genrsa -out ${FQDN}.privkey.pem 2048
21
22	# Create the extfile including the SAN extension
23
24	cat > extfile <<EOF
25	[SAN]
26	subjectAltName = @alternate_names
27
28	[ alternate_names ]
29
30	DNS.1 = ${FQDN}
31	IP.1 = 127.0.0.1
32	IP.2 = 192.168.0.10
33	EOF
34
35	# Create the CSR
36
37	[ -f ${FQDN}.csr.pem ] \|\| \
38	openssl req -new -key ${FQDN}.privkey.pem -out ${FQDN}.csr.pem \
39	-subj "/C=US/ST=Utah/L=Provo/O=ACME Service/CN=${FQDN}" \
40	-reqexts SAN -extensions SAN \
41	-config <(cat /etc/ssl/openssl.cnf extfile)
42
43	# Sign the request from Server with your Root CA
44
45	[ -f ${FQDN}.cert.pem ] \|\| \
46	openssl x509 -req -in ${FQDN}.csr.pem \
47	-CA my-root-ca.crt.pem \
48	-CAkey my-root-ca.key.pem \
49	-CAcreateserial \
50	-out ${FQDN}.cert.pem \
51	-days 9131 \
52	-extensions SAN \
53	-extfile extfile
54
55	# Update this machine to accept our own root CA as a valid one:
56
57	sudo cp my-root-ca.crt.pem /usr/local/share/ca-certificates/my-root-ca.crt
58	sudo update-ca-certificates
59
60	cat <<EOF
61	Here's a sample nginx config file:
62
63	server {
64	listen 80;
65	listen 443 ssl;
66
67	ssl_certificate ${PWD}/${FQDN}.cert.pem;
68	ssl_certificate_key ${PWD}/${FQDN}.privkey.pem;
69
70	root /var/www/html;
71
72	index index.html index.htm index.nginx-debian.html;
73
74	server_name ${FQDN};
75
76	location / {
77	# First attempt to serve request as file, then
78	# as directory, then fall back to displaying a 404.
79	try_files $uri $uri/ =404;
80	}
81	}
82	EOF
83
84	grep -q ${FQDN} /etc/hosts \|\| echo "Remember to add ${FQDN} to /etc/hosts"

Then run it:

1	cd ~/.ssl
2	chmod +x generate-certificates
3	./generate-certificates # will generate the certificates for myapp.example.com
4
5	# to generate for another app:
6	./generate-certificates otherapp.example.com

The script will output a sample nginx file demonstrating how to use the certificate and will remind you about adding the entry to /etc/hosts if it detects the domain is not present already.

That’s it. Even curl should work out-of-the-box, just like browsers such as Chrome and Firefox:

1	curl -I https://myapp.example.com

If you need to install the root certificate in other computers in the network (or VMs), it’s located in ~/.ssl/my-root-ca.crt.pem. If the other computers are running Linux:

1	# The .crt extension is important
2	sudo cp my-root-ca.crt.pem /usr/local/share/ca-certificates/my-root-ca.crt
3	sudo update-ca-certificates

I didn’t research about how to install them in other OS, so please let me know in the comments if you know and I’ll update the article explaining the instructions for setting up VM guests of other operating systems.

I’ve also created a Docker container with a simple Ruby Rack application to generate those certs. The code is simple and is available at Github.

It’s also published to Docker Hub.

You can give it a try here:

I hope you’ll find it useful as much as I do ;)

Upgrading PostgreSQL from 9.6 to 10 with minimal downtime using pglogical

2017-11-10T15:00:00+00:00

Once PostgreSQL 10 was released I wanted to upgrade our 9.6 cluster to the newest version. However, it would require a lot of coordination effort to get a maintenance window to perform the migration the way I was used to: put the application in maintenance mode, get a new dump and restore it to the new cluster and switch off the maintenance mode.

That means the application wouldn’t be available for an hour or so, maybe more. After reading once more about pglogical, I decided to finally give it a try, which allowed me to switch from 9.6 to 10 in just a few seconds.

How it works - a higher level view

pglogical implements logical replication, which allows replicating databases among different versions, which is not possible with the binary replication mechanism provided by PostgreSQL itself. Well, PG 10 added some support to logical replication, but since we want to replicate from 9.6, we’d need to resort to some external extension.

A required condition from pglogical is that all tables being replicated must have a primary key. It doesn’t need to be a single column, but a primary key must exist. Superuser access must also be provided for both databases for the replication agents. DDL replication is not supported. Truncate cascades are not replicated. Nothing fancy, after all. It should allow us to replicate most databases.

You should pay special attention to the primary key requirement though, specially if you’re using the ActiveRecord Ruby gem to manage the database migrations in older databases as the schema_migrations table didn’t have a primary key in the earlier days. If that’s your case:

1	alter table schema_migrations add primary key (version);

The idea is to install a PostgreSQL package with support for the pglogical extension, then create the new PG 10 cluster and restore the schema only in the new cluster. The current cluster should be stopped and restarted using the pglogical-enabled installed PostgreSQL. The clusters should be reachable to it other through TCP/IP. You’ll need to tell the provider (the 9.6 database being upgraded) the IP and port for the subscriber (the new PG 10 database) and vice-versa. The pglogical extension is created in both databases, postgresql.conf and pg_hba.conf are changed to enable logical replication and both databases are restarted. Finally, some pglogical statements are issued to create the provider, subscriber and subscription, which starts the replication. Once the replication is finished, you may change the port in the new cluster to match the old one, stop the old cluster and restart the new one. Finally it would be a good idea to restart the applications as well, specially if you’re using some custom types such as row types, as they will most likely have different OIDs and if you have registered those row types it won’t work as expected until you reboot the application. This would be the case if you’re using DB.register_row_type using the Sequel Ruby gem, for example.

The final switch can happen in as quickly as a few seconds, which means minimal downtime.

How it works - hands on

We use Docker to run PostgreSQL in our servers (besides the apps), so this article also uses it to demonstrate how the process works, but it should be easy to apply the instructions to other kind of set-ups. The advantage of Docker as demonstration tool is that these procedures should be easy to replicate as is and it also takes care of creating and running the databases as well.

We assume the PostgreSQL client is installed in the host too for this article.

Prepare the images and start-up script

Create the following Dockerfiles in sub-directories pg96 and pg10 (look at the instructions inside the Dockerfiles in order to replicate in your own environment if you’re not running PostgreSQL in a Docker container):

1	# pg96/Dockerfile
2	FROM postgres:9.6
3
4	RUN apt-get update && apt-get install -y wget gnupg
5	RUN echo "deb [arch=amd64] http://packages.2ndquadrant.com/pglogical/apt/ jessie-2ndquadrant main" > /etc/apt/sources.list.d/2ndquadrant.list \
6	&& wget --quiet -O - http://packages.2ndquadrant.com/pglogical/apt/AA7A6805.asc \| apt-key add - \
7	&& apt-get update \
8	&& apt-get install -y postgresql-9.6-pglogical
9
10	RUN echo "host replication postgres 172.18.0.0/16 trust" >> /usr/share/postgresql/9.6/pg_hba.conf.sample
11	RUN echo "host replication postgres ::1/128 trust" >> /usr/share/postgresql/9.6/pg_hba.conf.sample
12	RUN echo "shared_preload_libraries = 'pglogical'" >> /usr/share/postgresql/postgresql.conf.sample
13	RUN echo "wal_level = 'logical'" >> /usr/share/postgresql/postgresql.conf.sample
14	RUN echo "max_wal_senders = 20" >> /usr/share/postgresql/postgresql.conf.sample
15	RUN echo "max_replication_slots = 20" >> /usr/share/postgresql/postgresql.conf.sample

1	# pg10/Dockerfile
2	FROM postgres:10
3
4	RUN rm /etc/apt/trusted.gpg && apt-get update && apt-get install -y wget
5	RUN echo "deb [arch=amd64] http://packages.2ndquadrant.com/pglogical/apt/ stretch-2ndquadrant main" > /etc/apt/sources.list.d/2ndquadrant.list \
6	&& wget --quiet -O - http://packages.2ndquadrant.com/pglogical/apt/AA7A6805.asc \| apt-key add - \
7	&& apt-get update \
8	&& apt-get install -y postgresql-10-pglogical
9
10	RUN echo "host replication postgres 172.18.0.0/16 trust" >> /usr/share/postgresql/10/pg_hba.conf.sample
11	RUN echo "host replication postgres ::1/128 trust" >> /usr/share/postgresql/10/pg_hba.conf.sample
12	RUN echo "shared_preload_libraries = 'pglogical'" >> /usr/share/postgresql/postgresql.conf.sample
13	RUN echo "wal_level = 'logical'" >> /usr/share/postgresql/postgresql.conf.sample
14	RUN echo "max_wal_senders = 20" >> /usr/share/postgresql/postgresql.conf.sample
15	RUN echo "max_replication_slots = 20" >> /usr/share/postgresql/postgresql.conf.sample

Let’s assume both servers will run in the same machine with IP 10.0.1.10. The 9.6 instance is running on port 5432 and the new cluster will be running initially (before the switch) in port 5433.

1	cd pg96 && docker build . -t postgresql-pglogical:9.6 && cd -
2	cd pg10 && docker build . -t postgresql-pglogical:10 && cd -

This is not a tutorial on Docker, but if you’re actually using Docker, it would be a good idea to push those images to your private registry.

The first step is to stop the old 9.6 cluster and start the pglogical enabled cluster with the old data (taking a backup before is always a good idea by the way). Suppose your cluster data is located at “/var/lib/postgresql/9.6/main/” and that your config files are located at “/etc/postgresql/9.6/main/”. If “/etc/postgresql/9.6” and “/var/lib/postgresql/9.6” do not exist, don’t worry, the script will create a new cluster for you (in case you want to try with new dbs, first, which is a good idea by the way, and map some temp directories).

Create the following script at “/sbin/pg-scripts/start-pg” and make it executable. It will run the database from the container.

1	#!/bin/bash
2	version=$1
3	net=$2
4	setup_db(){
5	pg_createcluster $version main -o listen_addresses='*' -o wal_level=logical \
6	-o max_wal_senders=10 -o max_worker_processes=10 -o max_replication_slots=10 \
7	-o hot_standby=on -o max_wal_senders=10 -o shared_preload_libraries=pglogical -- -A trust
8	pghba=/etc/postgresql/$version/main/pg_hba.conf
9	echo -e "host\tall\tappuser\t$net\ttrust" >> $pghba
10	echo -e "host\treplication\tappuser\t$net\ttrust" >> $pghba
11	echo -e "host\tall\tpostgres\t172.17.0.0/24\ttrust" >> $pghba
12	echo -e "host\treplication\tpostgres\t172.17.0.0/24\ttrust" >> $pghba
13	pg_ctlcluster $version main start
14	psql -U postgres -c '\du' postgres\|grep -q appuser \|\| createuser -U postgres -l -s appuser
15	pg_ctlcluster $version main stop
16	}
17	[ -d /var/lib/postgresql/$version/main ] \|\| setup_db
18	exec pg_ctlcluster --foreground $version main start

This script will take care of creating a new cluster if one doesn’t already exist. Although not really required for the replication to work, it also takes care of creating a new “appuser” database superuser authenticated with “trust” for simplicity sake. It might be useful if you decide to use this script for spawning new databases for testing purposes. Adapt the script to suite your needs in that case, changing the user name or the authentication methods.

Run the containers

Let’s run the 9.6 cluster in port 5432 (feel free to run it in another port and use a temporary directory in the mappings if you just want to give it a try):

1	docker run --rm -v /sbin/pg-scripts:/pg-scripts -v /var/lib/postgresql:/var/lib/postgresql \
2	-v /etc/postgresql:/etc/postgresql -p 5432:5432 postgres-pglogical:9.6 \
3	/pg-scripts/start-pg 9.6 10.0.1.0/24
4	# since we're running in the foreground with the --rm option, run this in another terminal:
5	docker run --rm -v /sbin/pg-scripts:/pg-scripts -v /var/lib/postgresql:/var/lib/postgresql \
6	-v /etc/postgresql:/etc/postgresql -p 5433:5432 postgres-pglogical:10 \
7	/pg-scripts/start-pg 10 10.0.1.0/24

The first argument to start-pg is the PG version and the second and last argument is the net used to create pg_hba.conf if it doesn’t exist, to allow “appuser” to connect from using the “trust” authentication method.

If you’re curious about how to run a Docker container as a systemd service, let me know in the comments section below and I may complement this article once I find some time, but it’s not hard. There are plenty of documents explaining that in the internet, but our own service unit file is a bit different from what I’ve seen in most tutorials, as it tries to check that the port is indeed accepting connections when starting the service and it doesn’t pull the image from the registry if it is available locally already.

Edit PostgreSQL configuration

Once you make sure the old cluster is running file with the postgresql-pglogical container, it’s time to update your postgresql.conf file and restart the container. Use the following configuration as a start-point for both 9.6 and 10 clusters:

1	wal_level = logical
2	max_worker_processes = 10
3	max_replication_slots = 10
4	max_wal_senders = 10
5	shared_preload_libraries = 'pglogical'

For pg_hba.conf, include the following lines (change the network settings if you’re not using Docker, or if you’re running the containers in another net than the default one):

1	host all postgres 172.17.0.0/24 trust
2	host replication postgres 172.17.0.0/24 trust

Restart the servers and we should be ready for starting the replication.

Replicating the database

Set up the provider

In the PG 9.6 database:

1	# take a dump from the schema that we'll use to restore in PG 10
2	pg_dump -Fc -s -h 10.0.1.10 -p 5432 -U appuser mydb > mydb-schema.dump
3	psql -h 10.0.1.10 -p 5432 -c 'create extension pglogical;' -U appuser mydb
4	psql -h 10.0.1.10 -p 5432 -c "select pglogical.create_node(node_name := 'provider', dsn := 'host=10.0.1.10 port=5432 dbname=mydb');" -U appuser mydb
5	psql -h 10.0.1.10 -p 5432 -c "select pglogical.replication_set_add_all_tables('default', ARRAY['public']);" -U appuser mydb
6
7	# I couldn't get sequences replication to work, so I'll suggest another method just before switching the database
8	# psql -h 10.0.1.10 -p 5432 -c "select pglogical.replication_set_add_all_sequences('default', ARRAY['public']);" -U appuser mydb

This mark all tables and sequences from the public schema to be replicated.

Set up the subscriber and subscription

In the PG 10 database:

1	# create and restore the schema of the database
2	createdb -U appuser -h 10.0.1.10 -p 5433 mydb
3	pg_restore -s -h 10.0.1.10 -p 5433 -U appuser -d mydb mydb-schema.dump
4	# install the pglogical extension and setup the subscriber and subscription
5	psql -h 10.0.1.10 -p 5433 -c 'create extension pglogical;' -U appuser mydb
6	psql -h 10.0.1.10 -p 5433 -c "select pglogical.create_node(node_name := 'subscriber', dsn := 'host=10.0.1.10 port=5433 dbname=mydb');" -U appuser mydb
7	psql -h 10.0.1.10 -p 5433 -c "select pglogical.create_subscription(subscription_name := 'subscription', provider_dsn := 'host=10.0.1.10 port=5432 dbname=mydb');" -U appuser mydb

From now on you can follow the status of the replication with

1	select pglogical.show_subscription_status('subscription');

Once the initialization is over and the databases are synced and replicating (this may take quite a while depending on your database size) you may start the switch.

Replicating the sequence values

At this point the replication database is almost all set. I couldn’t figure out how to replicate the sequence values, so, if you’re using serial integer primary key columns relying on sequences, then you’ll also want to set proper values to the sequences otherwise you won’t be able to insert new records while relying on the serial sequence next value. Here’s how you can do that. Just to be sure, it’s inserting a 5000 gap so that you have enough time to stop the old server after gererating the set-value statements in case your database is very write intensive. You should probably review that gap value depending on how quickly your database might grow up between running those scripts and stopping the server.

1	psql -h 10.0.1.10 -p 5432 -U appuser -c "select string_agg('select ''select setval(''''' \|\| relname \|\| ''''', '' \|\| last_value + 5000 \|\| '')'' from ' \|\| relname, ' union ' order by relname) from pg_class where relkind ='S';" -t -q -o set-sequences-values-generator.sql mydb
2	psql -h 10.0.1.10 -p 5432 -U appuser -t -q -f set-sequences-values-generator.sql -o set-sequences-values.sql mydb
3	# set the new sequence values in the new database (port 5433 in this example):
4	psql -h 10.0.1.10 -p 5433 -U appuser -f set-sequences-values.sql mydb

Final switch steps

Then, basically, you should change the port for the PG10 cluster and set it to 5432 (or whatever was the port the old cluster was using). Then stop the 9.6 cluster (Ctrl+C in the example above) and restart the new cluster. Finally, it’s a good idea to also restart the apps using the database, just in case they are relying on some custom types whose conversion rules would depend on the row type OID.

This assumes your apps are able to gracefully handle disconnections for the connections in the pool by using some connection validation before issuing any SQL statements. Otherwise, it’s probably a good idea to restart the apps whenever you restart the database after tweaking “postgresql.conf” and “pg_hba.conf”.

Clean-up

Once everything is running fine with the new database, you might want to clean things up. If that’s the case:

1	select pglogical.drop_subscription('subscription');
2	select pglogical.drop_node('subscriber');
3	drop extension pglogical;

I hope that helps you getting your database upgraded with minimal downtime.

Explicit request params binding in Ruby web apps (or "convenience can be inconvenient")

2017-10-13T19:50:00+00:00

The Ruby ecosystem is famous for providing convenient ways of doing things. Very often security concerns are traded for more convenience. That makes me feel out of place because I’m always struggling to change the default route since I’m not interested in trading security with convenience when I have to make a choice.

Since it’s Friday 13, let’s talk a bit about my fears ;)

I remember that several of the security issues that were disclosed in the past few years in the Ruby community only existed in the first place because of this idea that we should try to deliver features the most convenient way. Like allowing YAML to dump/load Ruby objects, for example, when people were used to use it to serialize/deserialize. Thankfully it seems JSON is more popular these days even if more limited - you can’t serialize times or dates, for example, as allowed in YAML.

Here are some episodes I can remember of regarding how convenience was the reason behind many vulnerabilities:

Remote code execution due to convenience methods added to XML and YAML, 2013;
DoS caused by Rack conveniently converting params to hashes automatically: 1, 2
Params injection caused by Rack conveniently converting params to arrays automatically
Remote code execution due to render conveniently accepting multiple arguments formats: 1, 2
XSS vulnerability due to adding convenient JSON encoding features - For several years I only rely on the ‘json’ stdlib to parse and encode JSON using ::JSON.parse/unparse and don’t use any sort of .to_json.
More vulnerabilities in name of convenience
many more examples, but you got my point hopefully.

I remember that for a long while I was used to always explicitly convert params to the expected format, like params[:name].to_s and that alone was enough to protect my application from many of the disclosed vulnerabilities. But my application was still vulnerable to the first mentioned in the list above and the worst part is that we never ever used XML or YAML in our controllers but we were affected by that bug in the name of convenience (for others, not us).

Why is this a major issue with Ruby web applications?

Any other web framework providing seamless params binding depending on how the params keys are formatted are vulnerable for the same reasons but most (all?) people doing web development with Ruby these days will rely on Rack::Request somehow. And it will automatically convert your params to array if they are formatted like ?a[]=1&a[]=2 or hashes if they are formatted like ?a[x]=1&a[y]=2. This is built-in and you can’t change this behavior for your specific application. I mean, you could replace Rack::Utils.default_query_parser and implement parse_nested_query as parse_query for your own custom parser but then that would apply to other Rack apps mounted in your app (think of Sidekiq web, for example) and you don’t know whether or not they’re relying on such conveniences.

How to improve things

I’ve been bothered by the inconvenience of having to add .to_s to all string params (in name of providing more convenience, which is ironic anyway) for many reasons, and wanted a more convenient way of accessing params safely for years. As you can see, what is convenient to some can be inconvenient to others. But that would require a manual inspection in all controllers to review all cases where a param is fetched from the request. I wasn’t that much bothered after all, so I thought it wouldn’t worth the effort for such a big app.

Recently I noticed Rack recently deprecated Rack::Request#[] and I used it a lot as not only it was more convenient calling request[‘name’] instead of request.params[‘name’] but most examples in Roda’s README used that convenient #[] method (the examples were updated after it was deprecated). Since eventually I’d have to fix all usage of such method, and once they were used all over the places in our Roda apps (think of controllers - we use the multi_run plugin), I decided to finally take a step further and fix the old problem as well.

Fetching params through an specialized safer class

Since I realized that it wouldn’t be possible to make Rack parse queries in a more simpler way, I decided to build a solution that would wrap around Rack parsed params. For a Roda app, like ours, writing a Roda plugin for that makes perfect sense, so this is what I did:

1	# apps/plugins/safe_request_params.rb
2	require 'rack/request'
3	require 'json'
4
5	module AppPlugins
6	module SafeRequestParams
7	class Params
8	attr_reader :files, :arrays, :hashes
9
10	def initialize(env: nil, request: nil)
11	request \|\|= Rack::Request.new(env)
12	@params = {}
13	@files = {}
14	@arrays = {}
15	@hashes = {}
16	request.params.each do \|name, value\|
17	case value
18	when String then @params[name] = value
19	when Array then @arrays[name] = value
20	when Hash
21	if value.key? :tempfile
22	@files[name] = UploadedFile.new value
23	else
24	@hashes[name] = value
25	end
26	end # ignore if none of the above
27	end
28	end
29
30	# a hash representing all string values and their names
31	# pass the keys you're interested at optionally as an array
32	def to_h(keys = nil)
33	return @params unless keys
34	keys.each_with_object({}) do \|k, r\|
35	k = to_s k
36	next unless key? k
37	r[k] = self[k]
38	end
39	end
40
41	# has a string value for that key name?
42	def key?(name)
43	@params.key?(to_s name)
44	end
45
46	def file?(name)
47	@files.key?(to_s name)
48	end
49
50	# WARNING: be extra careful to verify the array is in the expected format
51	def array(name)
52	@arrays[to_s name]
53	end
54
55	# has an array value with that key name?
56	def array?(name)
57	@arrays.key?(to_s name)
58	end
59
60	# WARNING: be extra careful to verify the hash is in the expected format
61	def hash_value(name)
62	@hashes[to_s name]
63	end
64
65	# has a hash value with that key name?
66	def hash?(name)
67	@hashes.key?(to_s name)
68	end
69
70	# returns either a string or nil
71	def [](name, nil_if_empty: true, strip: true)
72	value = @params[to_s name]
73	value = value&.strip if strip
74	return value unless nil_if_empty
75	value&.empty? ? nil : value
76	end
77
78	def file(name)
79	@files[to_s name]
80	end
81
82	# raises if it can't convert with Integer(value, 10)
83	def int(name, nil_if_empty: true, strip: true)
84	return nil unless value = self[name, nil_if_empty: nil_if_empty, strip: strip]
85	to_int value
86	end
87
88	# converts a comma separated list of numbers to an array of Integer
89	# raises if it can't convert with Integer(value, 10)
90	def intlist(name, nil_if_empty: true, strip: nil)
91	return nil unless value = self[name, nil_if_empty: nil_if_empty, strip: strip]
92	value.split(',').map{\|v\| to_int v }
93	end
94
95	# converts an array of strings to an array of Integer. The query string is formatted like:
96	# ids[]=1&ids[]=2&...
97	def intarray(name)
98	return nil unless value = array(name)
99	value.map{\|v\| to_int v }
100	end
101
102	# WARNING: be extra careful to verify the parsed JSON is in the expected format
103	# raises if JSON is invalid
104	def json(name, nil_if_empty: true)
105	return nil unless value = self[name, nil_if_empty: nil_if_empty]
106	JSON.parse value
107	end
108
109	private
110
111	def to_s(name)
112	Symbol === name ? name.to_s : name
113	end
114
115	def to_int(value)
116	Integer(value, 10)
117	end
118
119	class UploadedFile
120	ATTRS = [ :tempfile, :filename, :name, :type, :head ]
121	attr_reader *ATTRS
122	def initialize(file)
123	@file = file
124	@tempfile, @filename, @name, @type, @head = file.values_at *ATTRS
125	end
126
127	def to_h
128	@file
129	end
130	end
131	end
132
133	module InstanceMethods
134	def params
135	env['app.params'] \|\|= Params.new(request: request)
136	end
137	end
138	end
139	end
140
141	Roda::RodaPlugins.register_plugin :app_safe_request_params, AppPlugins::SafeRequestParams

Here’s how it’s used in apps (controllers):

1	require_relative 'base'
2	module Apps
3	class MyApp < Base
4	def process(r) # r is an alias to self.request
5	r.post('save'){ save }
6	end
7
8	private
9
10	def save
11	assert params[:name] === params['name']
12	# Suppose a file is passed as the "file_param"
13	assert params['file_param'].nil?
14	refute params.file('file_param').tempfile.nil?
15	p params.files.map(&:filename)
16	p params.json(:json_param)['name']
17	p [ params.int(:age), params.intlist(:ids) ]
18	assert params['age'] == '36'
19	assert params.int(:age) == 36
20
21	# we don't currently use this in our application, but in case we wanted to take advantage
22	# of the convenient query parsing that will automatically convert params to hashes or arrays:
23	children = params.array 'children'
24	assert params['children'].nil?
25	user = params.hash_value :user
26	name = user['name'].to_s
27
28	# some convenient behavior we appreciate in our application:
29	assert request.params['child_name'] == ' '
30	assert params['child_name'].nil? # we call strip on the values and convert to nil if empty
31	end
32	end

An idea for those wanting to expand the safeness of the Params class above to the unsafe methods (json, array, hash_value) one could implement it in such a way that any hashes would be wrapped in a Params instance. However they should probably consider more specialized solutions in those cases, such as dry-validation or surrealist.

Final notes

In web frameworks developed in static languages this isn’t often a common reason for vulnerability because it’s harder to implement solutions like the one adopted by Rack as one would have to use some generic type such as Object for mappings params keys to their values, which is usually avoided in typed languages. Also, method signatures are often more explicit which prevents an specially crafted param to be interpreted as being of a different type than expected by methods. This is even more true in languages that don’t support method overloading, such as Java.

That’s one of the reasons I like the idea of introducing optional typing to Ruby, as I once proposed. I do like the flexibility of Ruby and that’s one of the reasons why I often preferred script languages over static ones for general purpose programming (I used to do Perl programming in my initial days when developing to the web).

But if Ruby was flexible enough to also allow me to specify optional typing, like Groovy does, it would be even better in my opinion. Until there, even though I’m not an security expert by any means, I feel like the recent changes on how our app fetch params from the request should significantly reduce the possibility of introducing bugs caused by params injection in general.

After all, security is already a quite complex topic to me and I don’t even want to have to think about what would be the impact of doing something like MyModel.where(username: params[‘username’]) and have to think what could possibly go wrong if someone would inject some special array or hash in the username param. Security is already hard to get it right. No need to make it even harder by providing automatic params binding through the same method out of the box in the name of convenience.

The day I reached the 1600 columns limit in PostgreSQL

2017-09-26T11:15:00+00:00

WARNING: skip the TLDR section if you like some drama.

TLDR: PostgreSQL doesn’t reclaim space when dropping a column. If you use some script that will add temporary columns and run it many times at some point it will reach the 1600 max columns per table limit.

It was a Friday afternoon (it’s always on Friday, right?) and we were close to start a long awaited migration process and after several tests everything seemed to be working just fine, until someone told me they were no longer able to continue testing as the servers wouldn’t allow them to port deals anymore. After a quick inspection in the logs I noticed the message saying we had reached the 1600 columns per table limit in PostgreSQL.

If you never got into this situation (and if you haven’t read the TLDR) you might be wondering: “how the hell would someone get 1600 columns in a single table?!”. Right? I was just as impressed, although I already suspected what could be happening, since I knew the script would create temporary columns to store the previous reference ids when inserting new records, even though they were dropped by the end of the transaction.

If that didn’t happen to you, you might think I was the first to face this issue but you’d be wrong. A quick search in the web for the 1600 columns limit and you’ll find many more cases of people unexpectedly reaching this limit without actually having that many columns in the table. I wasn’t the first one and won’t be the last one to face this issue but, luckily for you who are reading this article, you won’t be the next person to reach that limit ;)

Why using a temporary column?

Yes, now I agree it’s not a good idea after all, but let me try to explain why I did it in the first place.

In case you’re not aware, you can only use columns from the table being inserted in the “returning” clause of some “insert-into-select-returning” statement. But I wanted to keep a mapping between the newly inserted ids and the previous ones, from the “select” clause of the insert-into statement. So my first idea was to simply add a temporary “previous_id” column to the table and use it to store the old id so that I could map them.

Let me give some concrete example, with tables and queries so that it gets clearer for those of you who might be confused by the above explanation. We have documents, that can have many references associated to it and each reference can have multiple citations. The actual model is as much complicated as irrelevant to the problem, so let me simplify it to make my point.

Suppose we want to duplicate a document and its references and citations. We could have the following tables:

doc_refs(id, doc_id, category_id)
citations(id, ref_id, citation)

In my first implementation the strategy was to add a temporary previous_id to doc_refs and then the script would do something like:

1	insert into doc_refs(previous_id, doc_id, category_id) select id, doc_id, 30 from
2	doc_refs where category_id = 20;

This way it would be possible to know the mapping between the copied and pasted references so that the script could duplicate the citations using that mapping.

This script would have to run thousands of times to port all deals so, since I learned about the columns limit and how dropping a column wouldn’t really reclaim space in PostgreSQL, I’d need another strategy to get the mapping without resorting to some temporary column. I’d also have to figure out how to reclaim that space at some point in case I’d need to add some additional column for good at some point in the future, but I’ll discuss that part in another section below.

A better solution to the mapping problem

In case you reached those limits for the same reason as me, I’ll tell you how I modified the script to use a temporary mapping table instead of a temporary column. Our tables use a serial (integer with a generator) column. The process is just a little bit more complicated then using the temporary column:

1	create temp table refs_mapping as
2	select id, nextval('doc_refs_id_seq') from doc_refs where category_id = 20;

With that table it’s just a matter of inserting the records using this table to get the mapping between the ids. Not that hard after all, and the solution is free from the columns limit issue :)

How to reclaim back the space from dropped columns?

Once the script to port deals was fixed and running I decided to take some action to reclaim the space used by the dropped columns so that I could create new columns later in that table if I had to.

After searching the web some would tell that a full vacuum freeze would take care of rewriting the table, which would then reclaim the space. It didn’t work in my tests. It seems the easiest would be to create a dump and restore it in a new database but in our case that would mean some downtime which I wanted to avoid. Maybe it would be possible to use this strategy with some master-slave replication setup with no downtime, but I decided to try another strategy, which was simpler in our case.

Our clients only need read access to those tables, while the input is done by an internal team, which is much easier for us to manage downtime if needed.

So I decided to lock the table for write access while the script would recreate the table and then I’d replace the old one with the new one. It took only a handful seconds to complete the operation (the table had about 3 million records). The script looked something like this:

1	begin;
2	lock doc_refs in exclusive mode;
3	lock citations in exclusive mode;
4	create table new_refs (
5	id integer not null primary key default nextval('doc_refs_id_seq'),
6	doc_id integer not null references documents(id),
7	category_id integer not null references categories(id) on delete cascade
8	);
9	create index on new_refs(doc_id, category_id);
10	create index on new_refs(category_id);
11
12	insert into new_refs select * from doc_refs;
13
14	alter table citations drop constraint fk_citations_reference;
15	alter table doc_refs rename to old_refs;
16	alter table new_refs rename to doc_refs;
17	alter table citations add constraint fk_citations_reference
18	foreign key (ref_id) references doc_refs(id) on delete cascade;
19	alter sequence doc_refs_id_seq owned by doc_refs.id;
20	commit;
21
22	-- clean-up after that:
23
24	drop table references_old;

Fortunately that table was only referenced by one table, so it wasn’t that complicate as if that had happened to some other tables in our database. With a simple script like that we were able to rewrite the table with no downtime and the write access was locked for about 20 or 30 seconds only, while the read access wasn’t affected at all. I hope that could be an useful trick in case you found this article because you got yourself in a similar situation :)

If you have other suggestions on how to handle the mentioned issues I’d love to hear from you. I’m always curious about possible solutions, after all, who knows when it will be the next time I’d have to think out of the box? ;) Please let me know in the comments below. Thanks :)

Adopting React.js seems risky for long-term projects

2017-06-17T09:10:00+00:00

Important Update

Feel free to completely skip this article as it’s no longer relevant. I was confused by this part of the React documentation:

It is important to remember that the reconciliation algorithm is an implementation detail. React could rerender the whole app on every action; the end result would be the same.

It turns out “rerender”, as explained in the ticket I created on the React project, means calling render in all components, it doesn’t mean it could unmount and remount all components. If it remounted everything as I interpreted initially, it wouldn’t be possible to integrate to any third-party library, which was my main concern.

That gives me enough confidence to adopt React or some of its alternative lightweight implementations. I’m keeping the old content just in case you’re curious about it…

Old content

I’ve been working with long-term Single Page Applications (SPA) since 2009. When you know an application has to be maintained for many years you have to approach technology adoption very carefully. React.js introduced a very interesting approach based on virtual DOM and reconciliation algorithms, which seems to work great, but should it be considered safe to adopt React.js these days?

At a quick glance, the answer seems to be an obvious yes, right? React.js is used by Facebook, one of the largest company in the world, and maintained by its team with open-source contributions. It was largely adopted by many companies and there are some newsletters dedicated to React.js related technologies. There are even quite some compatible implementations such as Preact.js, Inferno.js, react-lite as well as other similar solutions such as Dio.js, MithrilJS and Maquette. All of them taking advantage of the virtual DOM concept. That means that even if React took a different route, or if Facebook moves to something else and stopped its maintenance, it should be easy to move to some of its alternatives provided we use some basic set of features that should be enough for most applications.

I was really excited by the VDOM moment and all those related technologies and I understand how they would help me to improve our current code base by not having to worry about manually managing the DOM, which gets more bug prone as you have to update an existing DOM. We adopted Knockout.js some years ago for parts of the application and it gave me about the same sense of making the code easier to maintain. However embedding HTML in JavaScript components with JSX feels much simpler to me than creating Knockout.js components (or Angular components, as they have a higher hype these days). Also, we are very concerned about the initial load time and it seems like VDOM based solutions can perform the initial rendering much quicker than MVVM alternatives such as Knockout.js and Angular.js.

My excitement quickly turned into fear after further reading the React official documentation, which is great, by the way.

Third-party components support

When you have to maintain a long-term large code base, one of your main concerns will be interoperability with third-party components. You can certainly find many articles and videos showing how easy it is for React to use third-party components. Almost all of them will mention returning false from the shouldComponentUpdate hook, or they will suggest an empty container.

It turns out it currently works pretty well, but is this really supported by React? I found React’s official documentation to be quite confusing regarding third-party components as it’s not consistent. Here’s why I’m concerned the current approach to integrate with stateful third-party components may no longer apply with future versions of React.js and I couldn’t find any official recommendation that would be more future proof.

I’ve submitted an issue a few days ago with my concerns, but got no response so far. Let me reproduce the issue content here.

So, here’s what the documentation says:

https://facebook.github.io/react/docs/integrating-with-other-libraries.html

To prevent React from touching the DOM after mounting, we will return an empty
from the render() method. The element has no properties or children, so React has no reason to update it, leaving the jQuery plugin free to manage that part of the DOM

So, it suggests using the mount/unmount hooks in order to initialize and destroy the third-party components, however this is not enough to guarantee that the integration will succeed. I’ll get more into that later.

https://facebook.github.io/react/docs/reconciliation.html

It is important to remember that the reconciliation algorithm is an implementation detail. React could rerender the whole app on every action; the end result would be the same.

https://facebook.github.io/react/docs/react-component.html#shouldcomponentupdate

Currently, if shouldComponentUpdate() returns false, then componentWillUpdate(), render(), and componentDidUpdate() will not be invoked. Note that in the future React may treat shouldComponentUpdate() as a hint rather than a strict directive, and returning false may still result in a re-rendering of the component.

Can you see the problem with that? If I can’t really rely on the reconciliation algorithm to not touch the elements React is not supposed to manage, then I have no guarantees that it would be possible to integrate React with stateful third-party components in the future.

Suppose I want to integrate with a very lightweight multi-options autocomplete component that only provides 3 public APIs, a constructor, a desctructor and some onChange hook. It’s an stateful component but we don’t have direct access to its state so that we can restore it after destroying and recreating it. It opens a menu with several items containing a checkbox and the item label. As you click on the items, checking its checkbox, onChange would be triggered, which we could use to change the state of some ancestor component managed by React.

While responding to the state change event, if React simply decides to re-render the ancestor component, without respecting shouldComponentUpdate, or if the reconciliation algorithm is not smart enough to only perform the required changes, it means it would probably call componentWillUnmount in the autocomplete component wrapper, which would only be able to destroy that component. Then, after componentDidMount we would only be able to initialize the component again, but we would have lost all of its state, like the scroll position and currently selected item and so on. In other words, that means React wouldn’t be able to play nice with stateful third-party components. In order to have such a guarantee, we need to have more guarantees from React itself.

The reconciliation algorithm shouldn’t be just an implementation detail without any guarantees. shouldComponentUpdate shouldn’t be considered just a hint. Otherwise, how are we supposed to wrap third-library components in a reliable way?

Even though I’m pretty excited about VDOM based view components I’m not willing to give up on existing third-party JavaScript components that require direct access to the DOM. React expects all of your components to act like pure functions in the sense they should be able to restore its current state by re-rendering it at any given time, even if they completely removed the mount node’s contents. That basically means most UI JS components would simply break when wrapped by a React component since they don’t provide such a complete API that would allow us to completely restore its current state.

At this point I’m not really sure I’m ready to give up on third-party components in order to adopt React. But it gets worse. What if I decide I want to move to another software stack some years from now. It’s important that I can draw the boundaries so that I can move one small component at a time to the new stack. But if React is not happy with setting such strict boundaries then in that case I’d have to move all at once, which doesn’t really work for huge code bases.

If you know of any VDOM based library that provides hard boundaries and a precise diff algorithm, please let me know in the comments below. I’m very interested in using VDOM, but interoperability is so much important for me to give up from it. It allows one to incrementally change its software stack, by mixing different stacks, replacing one component at a time for a while, without having to rewrite the whole application which is not really feasible in most cases.

Ruby on Rails: the Bad and Good parts

2017-05-04T20:00:00+00:00

In my previous article, I had a hard time trying to explain why I wanted to replace Rails with something else in the first place. This article is my attempt to write more specifically about what I dislike in Rails for the purpose of the single page application we maintain.

In summary, in the previous article I explained that I preferred to work with more focused and independent libraries, while Rails prefers to adopt a somewhat integrated and highly coupled solution, which is a fine approach too. There are trade-offs involved with either approach and I won’t get into the details for this article. As I said in my previous article this is mostly about developer’s personal taste and mindset, so by no means I ever wanted to bash on Rails. Quite the opposite. Rails served me pretty well for a long time and I could live with it for many more years, so getting it out of our stack wasn’t an urgent matter by any means.

For the purpose of this article, I won’t discuss the Good and Bad of Ruby, since it was mainly written to explain why choosing another Ruby framework instead of Rails.

In case you didn’t read the previous article, the kind of application I work with is a single page application, so keep this in mind when trying to understand my motivations for replacing Rails.

Unused Rails features

So, here are some features provided by Rails which I didn’t use when I took the decision to remove Rails from our stack:

ActiveRecord (used Sequel instead);
Turbolinks (it doesn’t make much sense for the kind of SPA we build);
YAML configuration files (we use regular Ruby files for configuration);
minitest or test/unit (used RSpec instead);
fixtures (used factories instead);
Devise (we have a very particular authentication strategy and authentication frameworks wouldn’t add much to the table);
we have just a handful views and forms rendered by Rails (most are generated with JS);
REST architecture (we deal with very specific requests rather than generic ones over common resources, which translates to specialized queries that run very quickly without having to resort to complicated caching strategies for most cases in order to get fast responses);
responds_to (most requests will simply respond with JSON);
Sprockets, also known as the Rails Assets Pipeline (not sure if this holds true after Rails 5.1 added integration to Webpack);
generators (I don’t use them for a long time because they aren’t really needed and it’s pretty quick and easy to add new controllers, models, mailers or tests manually);

So, for a long while I have been wondering how exactly Rails was helping us to build and maintain our application. The application was already very decoupled from Rails and its code didn’t rely on ActiveSupport core extensions either. We tried to keep our controllers thin, although there’s still quite some work to do before we get there.

On the other side, there were a few times I had trouble trying to debug some weird problems after upgrading Rails and it was I nightmare when I had to dig into Rails' source code and I wasted a lot of time in the process, so I did have a compelling reason to not stick with Rails. There were other parts I disliked in Rails, which I describe in the next section.

The Bad Parts

can’t upgrade individual parts, it’s all or nothing. If you’re using ActiveRecord, for example you’re forced to upgrade all Rails parts if you want to upgrade ActiveRecord to get support for some feature. Or the opposite: you might want to upgrade just the framework to get ActionCable support for example, but then you’d have to fix all deprecated usage from your ActiveRecord usage in the process;
hard to follow code base, when debugging edge cases, which makes it hard to estimate tasks involving debugging weird issues that happened after upgrading Rails for example;
buggy streaming support through ActionController::Live (had to work around them many times after upgrading Rails). Try to read its source to understand how it works and you’ll understand when I say its implementation is quite complicated;
occasional dead-locks, specially when ActionController::Live was used. That’s why those few actions were the first one I moved out of Rails;
ActiveSupport::Dependencies: implicit autoloading and their problems. You must require full action_view even if you only need action_view/helpers/number_helper for example;
monkey patches to Ruby core classes and methods pollution (it’s my opinion that libraries shouldn’t freely patch core Ruby classes except for very exceptional cases such as code instrumenting, implementing a transparent auto-reloading tool and so on, and should be avoided whenever possible);
automatic/transparent params binding (security concerns, I often wrote code such as param[:text].to_s because I didn’t want to get a hash or an array when accessing some param because they were injected by some malicious request taking advantage of Rails automatic params binding rules);
slow to boot when compared to other Ruby frameworks (more of a development issue), spring is not perfect and shouldn’t be required in the first place;
increased test load time, which is quite noticeable when running individual tests;
the API documentation is incomplete. The guides are great though, but often I wasted a lot of time trying to look for the documentation of some parts of the API;
lack of full understanding of the boot process and requests cycle;
I won’t get into the many details why I don’t like ActiveRecord because I don’t use it for several years and it’s not a requirement to use Rails, but if you’re curious I wrote an article comparing it to Sequel long ago. My main annoyance with ActiveRecord is related to its pooling implementation and its ability to checkout a connection from the pool outside of a block that would ensure it’s checked in again into the pool;

The Good Parts

Rails is still great as an entrance framework for beginners (and some experts as well). Here are the good parts:

handles static resources (assets in Rails terminology) bundling and integrates with Webpack out of the box;
good safe default HTTP headers;
CSRF protection by default;
SQL injection protection in bundled ActiveRecord by default;
optimizations to traditional web pages through Turbolinks;
bin/console and great in-site debugging with the web-console gem bundled by default in development mode;
separate configuration per environment (development/production/test) with good defaults;
e-mail integration;
jobs integration;
integrated database migrations;
great automatic code reloading capabilities in the development environment (as long as you stick with Rails conventions and don’t specify your dependencies manually);
fast to boot (when comparing to frameworks in other languages, such as Java);
awesome guides and huge community to ask your questions and get an answer very quickly;
great community and available gems for all kind of tasks;
very much audited by security experts and any discovered issues are quickly fixed and new releases are made available with responsible disclosure;
Github issues are usually quickly fixed;
Rails source code has an extensive test coverage;
provide tons of generators, including test, models, controllers, for those who appreciate them;
provides great performance-related data in the application’s logs (time spent rendering views and partials and in the database);
highly configurable;
internationalization support;
helpful view helpers such as number and currency formatting;
a big team of active maintainers and contributors;
easy websockets API through ActionCable;
flexible routing;
bundles with test runners solutions for both Ruby-land tests and full-feature tests through Capybara (it still lacks an integrated bundled JavaScript test runner though);
there are probably many more great features I can’t remember out of my head because I didn’t use myself such as RESTful resources and so on;
conventions such as paths organizations help a lot teams with lots of developers and frequent turnovers, and when hiring new members in general, or when handing the project to someone else and the like. By knowing Rails conventions, when joining an existing Rails application for the first time the newcomer will know exactly where to find controllers, models, views, workers, assets, mailers, tests and so on. It’s also very likely they will be used with many gems commonly used altogether with Rails.

So, Rails is not only a framework but a set of good practices (among a set of questionable practices that will vary accordingly to each one’s taste) bundled together as well. It’s not the only solution trying to provide a solid ground for web developers though. Another similar solution with similar goals seems to be Hanami for example, although Rails seems to be more mature to me. For example, I find code reloading to be a fundamental part of developing web applications and Hanami doesn’t seem to provide a very solid solution that would work across different Ruby implementation such as JRuby for example, accordingly to these docs.

But overall, I still find Rails to be one of the best available frameworks for developing web applications. It’s just that for my personal tastes and mindset I’m more aligned to something like Roda than to something like Rails but one should understand the motivations behind one’s decisions in order to figure out by themselves which solution works best for their own taste rather than expecting some article to tell you what is the Right Solution ™.

Feeling alone in the Ruby community and replacing Rails with Roda

2017-05-13T10:10:00+00:00

Background - the application size

Feel free to skip to the next section if you don’t care about it.

I recently finished moving a 5 years old Rails application to a custom stack on top of Roda from Jeremy Evans, also the maintainer of the awesome Sequel ORM. The application is actually older than that and I’ve been working on it for 6 years. It used to be a Grails application that was moved from SVN to Git about 7 years ago but I never had access to the SVN repository so I don’t really know how old this application is. It was completely migrated from Grails to Rails in 2013. And these days I replaced Rails with Roda but this time it was painless and only took a few weeks.

I have some experience with replacing the technology of an existing application without interrupting the regular development flow and deployment procedures and the only times I really had to interrupt the services for a little while was the day I replaced MySql with PostgreSQL and the day I moved the servers from collocation to Google Cloud Platform.

I may write about what steps I usually follow when changing the stack (I replaced Sprockets with Webpack a few years ago among, Devise with a custom solution, among many examples) in another article. But the reason I’m describing this scenario for this article’s purpose is only so that you have some raw idea about this project size, specially if you consider it had 0 tests when I joined the company as the sole developer and had to understand a messy Grails application with tons of JS embedded in GSP pages with functions comprising hundreds of lines with many many logical branches inside. Years later and there are still tons of tests lacking, specially in the front-end code and much more to improve. To give you a better idea, we currently have about 5k lines of Ruby test code, and 20k lines of other custom (not generated) Ruby code plus 5k lines of database migrations code. Besides that we have about 11k lines of CoffeeScript code, 6k lines of JS code and 2.5k lines of CoffeeScript tests code. I’m not including any external libraries in those stats. You have probably noticed already how poor is the test coverage currently, specially in the front-end. At this point I expect you to have some raw idea on this project size. It’s not a small project.

Why replacing Rails in the first place?

Understanding this section is definitely the answer on why I feel alone in the Ruby community.

More background about Rails and the Ruby community

Again, feel free to skip this subsection.

When I was working on my Master thesis (Robotics, Electrical Engineering) I stopped working with web development for a while and focused on embedded C programming, C++ hard real-time systems and the like. After I finished the Master thesis my first job was back to Delphi programming. Only in 2007 I moved my job back to web development, several years later and I only had experience with Perl so far. After a lot of research I decided for Rails and Ruby, although I have also seriously considered TurboGears and Django by that time, both using the Python language. I wasn’t worried by the language by that time as I didn’t know either Ruby or Python and they seemed similar one to the other. Ultimately I chose Rails because of how it handled database migrations.

In 2007, when looking at the alternatives, Rails was very appealing. There were conventions that would save me a lot of work when starting to work with web development again, there were generators to help me getting started, great documentation, it bundled a database migrations framework so that I wouldn’t have to recreate myself, simple to understand error stack-traces, good defaults for the production environment (such as proper 500 and 404 pages), great auto-reloading of code in the development environment, great logging, awesome testing tools and integrated to generators, quick boot, custom routes, convention over configuration and so on.

Last but not least, a very rich ecosystem with smart people working on great gems and learning Ruby together and they were all amazing by its meta-programming capabilities, the possibility of changing core classes through monkey patches and so on. And since it’s possible, we should use it in all places we can, right? Specific-domain-languages (SDL) were used by all popular gems by that time. And there wasn’t much fragmentation like in the Java community. Basically almost anyone writing web applications in Ruby were writing Rails apps and following its conventions. That allowed the community to grow fast, with several Rails plugins and projects assuming the application was running Rails. Most of us have only known Ruby because of Rails, including myself. This is already enough reason to thank DHH. Rails definitely raised the bar for other web frameworks.

As the ecosystem matured, we saw the rise of Rack and more people using what they called micro-frameworks such as the popular Sinatra, Merb among others. Rails improved internationalization support in version 2, merged with Merb in version 3, got Sprockets in version 4 and so on. The assets pipeline were really a thing when they were introduced in Rails by that time. It was probably the latest really big change introduced by Rails that really inspired the general web development scenario.

In the meantime Ruby has also evolved a lot, providing better unicode support, adding a new Hash syntax, garbage collecting symbols, improving performance and getting new great tools such as Bundler. RubyGems got a better API, the Rails guides got much better and they have a superb documentation on securing web applications that is accessible to any web developer and not only Rails ones. We have also seen lots of books and courses teaching the Rails way, as well as many dedicated blogs, videos, conferences and so on. I don’t remember watching such a fast growing in any other community until JavaScript got a lot of traction recently, motivated not only by single page applications which are becoming more and more common, but also by the creation of Node.js.

Many more languages have been created or re-discovered recently including Go, Elixir, Haskell, Scala, Rust and many many more. But up to this day, despite the existing of symbols and a poor threading model in MRI and lack of proper support for threaded applications in stdlib, Ruby is still my preferred general purpose language. That includes web applications. What about Rails?

Enough is enough! What’s wrong with Rails?

If you guessed performance was the reason, you guessed wrong. For some reason I don’t quite understand, developers seem to be obsessed by performance even in scenarios where it doesn’t matter. I never faced server-side performance issues with Rails. Accordingly to NewRelic most requests would be served by less than 20ms in the server-side. Even if we could cut those 20ms it wouldn’t make any difference at all. So, what’s wrong after all?

There’s nothing wrong with Rails in a fundamental way. It’s a matter of taste in my case I guess because it’s really hard to find an objective way to explain why I wasn’t fully satisfied with Rails. You should probably understand that this article is not about bashing on Rails in any way. It’s a personal point of view on why I feel like a strange and why it’s not a great feeling. [Update: after writing this article, I spent some time trying to list the parts I dislike in Rails and wrote a dedicated article about it, which you can read here if you’re curious]

To help you understand where I come from, I have never followed the “Rails Way” if there’s such a thing. I used jQuery when Prototype was the default library, I used RSpec when test/unit was the default one, I used factories when Rails teached fixtures, I used Sequel rather than the bundled ActiveRecord, but instead of Sequel’s migrations I used ActiveRecord’s migration through the active_record_migrations gem. Some years ago I replaced Sprockets with Webpack (which fortunately Rails just embraced in Rails 5.1 release, while I wasn’t using Rails anymore when it was released). After some frustration trying to get Devise to work well with Sequel I decided to replace Devise with a custom solution (previously I had to customize Devise a lot to make it support our non-traditional integration for dealing with sign-ins and custom password hashing inherited by the time it was written in Grails).

Since we’re talking about a single page application, almost all of the requests were JSON ones. We didn’t embrace REST, or respond_to, we had very few server-side views and often had to dig into Rails or Devise source code to try to understand why something wasn’t working as we expected them to. That included several problems we had with streamed responses (which Rails calls Live Streaming for some reason I don’t quite follow, although I suspect that’s because they introduced some optimizations to start sending the view’s header sooner and called it streaming support, so they needed another name when they introduced ActionController::Live) after each major Rails upgrade. I used to spend a lot of time trying to understand Rails internal source whenever I had to debug such problems. It was pretty confusing to me. The same happened with Devise.

At some point I started to ask myself what Rails was adding to the table. And it got worse. When I first met Rails it booted in no time. It got slower to boot at each new release and then they introduced complex solutions such as spring to try to fix this slowness. For a long time they used (and still use to this day) Ruby’s autoload feature to lazily evaluate code as it’s needed in order to decrease the boot time. Matz don’t like autoload and I don’t like it either, but this article is already long enough to discuss this subject too.

Something I never particularly enjoyed in Rails was all that magic related to auto-loading. I always preferred explicit and simple code over sophisticated code that auto-wires things. As you can guess, even though I loved how Rails booted quickly and how auto-reloading just worked with Rails (except when it didn’t - more on that later) I really wanted to specify all my dependencies explicitly in each file. But I couldn’t just use require or auto-reloading would stop working. I had to use ActiveSupport’s require_dependency and I hated it because it wasn’t just regular Ruby code.

I also didn’t like the fact that Rails enforced all monkey patches to Ruby core classes made by ActiveSupport extensions, introducing methods such as blank?, present?, presence, try, starts_with?, ends_with? and so on. That’s related to the fact I enjoy explicit dependencies as I think it’s much easier to follow a code with explicit dependencies.

So, one of my main motivations to get rid of Rails was to get rid of ActiveSupport, since Rails depends on ActiveSupport, including its monkey patches and auto-loading implementation. Replacing Rails with Roda alone didn’t allow me to get rid of ActiveSupport just yet as I’ll explain later in this article, but it was an important first move. What follows is the kind of frustration with the Ruby community in the sense of how very popular Ruby gems are written with about the same mentality of those from Rails core. Such gems include ~~the very popular mail gem as well as~~ FactoryGirl, for example. ~~Even Sidekiq will patch Ruby core classes.~~ I’ll talk more about this later, but let me introduce Roda first.

[Update: after writing this article both the mail and sidekiq gems have worked to remove their monkey patches and I’d like to congratulate them for the effort and give them “Thank you so much!”]

Why Roda?

From time to time I considered replacing Rails with something else but I always gave up for a reason or another. Sometimes I realized I liked Sprockets and the other framework didn’t provide an alternative to the Rails Assets Pipeline. Another time I realized that auto-reloading didn’t work great with the other framework. Other times I didn’t like the way code was organized with the other framework. When I read Jeremy’s announcement for Roda, it was just the right time with the right framework for me.

I greatly appreciate Jeremy from a long time since getting introduced to Sequel. He’s a lovely person, who provides awesome and kind support and he’s a great library designer. Sequel is simply the best ORM I’ve seen so far. Also, I find it quite simple to follow Sequel’s code base and after looking into Roda’s source it’s pretty much trivial to follow and understand. It’s basically one simple source file that handles routing and plugins support and basically everything else is provided by plugins you can opt-in/out and each plugin, being small and self contained, is pretty simple to understand and if you don’t agree with how it’s implemented just implement that part your own.

After having a glance over the core Roda plugins one stood out particularly: multi_run. For what I want, this plugin would give me great organization, similar to Rails controllers, with the advantage that they could have their own middleware stacks, they could be mounted anywhere, including in a separate app, they were easy to test separately as if they were a single app if desired but more importantly: it allowed me to easily lazy load the application code, which allowed the application to boot instantly with Puma, without the need of autoload and other trickery. Here’s an example:

1	require 'roda'
2	module Apps
3	class MainApp < Roda
4	plugin :multi_run
5	# you'll probably want other plugins, such as :error_handler and :not_found,
6	# or maybe error_email
7
8	def self.register_app(path, &app_block)
9	->(env) do
10	require_relative path
11	app_block[].call env
12	end
13	end
14
15	run 'sessions', register_app('sessions_app'){ SessionsApp }
16	run 'static', register_app('static_app'){ StaticApp }
17	run 'users', register_app('users_app'){ UsersApp }
18	# and so on
19	end
20	end

Even if you decide to load the main application when testing particular apps, the overhead would be negligible, since it would only load the tested app basically. And if you are afraid of using lazy loading in the production environment because you want to deliver a warmed app, it’s quite easy to change register_app:

1	require 'roda'
2	module Apps
3	class MainApp < Roda
4	plugin :multi_run
5	plugin :environments
6
7	def self.register_app(path, &app_block)
8	if production?
9	require_relative path
10	app_block[]
11	else
12	->(env) do
13	require_relative path
14	app_block[].call env
15	end
16	end
17	end
18
19	run 'sessions', register_app('sessions_app'){ SessionsApp }
20	# and so on
21	end
22	end

This is not just a theory, this is how I implemented in our application and it boots in less than a second. Just about the same as the simplest Rack app. Of course, I hadn’t really measured this in any scientific way, it’s a simple in-head count when running bundle exec puma, where most of the time is spent on Bundler and requiring Roda (about 0.6s with my gemset). No need for spring, autoload or any complicated code to make it fast. It just works and it’s just Ruby, by using explicit lazy loading rather than an automatic system.

So, I really wanted to try this approach and I had a plan where I would run both Roda and Rails stacks altogether for a while, by running the Rails app as the fallback app when the Roda stack wouldn’t match the route. I could even use the path_rewriter plugin to migrate a single action at a time to the Roda stack if I wanted to.

There was just one remaining issue I had to figure out how to solve before I started moving the app to the Roda stack: automatic code reloading. I decided to ask in the ruby-roda mail group how Roda handled code reloading and Jeremy said it was out of Roda’s responsibility and that I could choose any code reloader I wanted and pointed to some documentation listing some of them, including one of his own. I spent quite some time researching about them and still preferred the one provided by ActiveSupport::Dependencies but since I wanted to get rid of ActiveSupport and autoloading in the first place there was no point in keep using it. If you’re curious about this research, I wrote about it here. If you’re curious on why I dislike Ruby’s autoload feature, you’ll find the explanation in that article.

After some discussion around automatic code reloading in Ruby with Jeremy I suggested him an approach I think would work pretty well and transparently although it would require to patch both require and require_relative in development mode. Jeremy wasn’t much interested on it because of those monkey patches, but I was still confident it would be a better option than the others I had evaluated so far. I decided to give it a try and that’s how AutoReloader was born.

With the autoreloading issue solved, it was all set to start porting the app slowly to the Roda stack, and the process was pretty much a breeze. If you want to have some basic idea on Rails overhead, the full Ruby specs suite were about 2s faster with the same (converted) tests after getting rid of the last Rails bits. It used to take 10s to run 380 examples and thousands of assertions, and after getting rid of Rails it took 8s with an extra example. Upgrading Bundler saved me another half a second so currently it takes 7.6s to finish (about half a second for bundle exec, 1.5s to load accordingly to RSpec report and 5.6s to run).

But getting rid of Rails was just the first step in this lonely journal.

Rails is out, what’s next?

Getting rid of Rails wasn’t enough to get rid of ActiveSupport. We have a LocaleUtils class we use to format numbers among other utilities based on the user’s locale. It used to include ActionView::Helpers::NumberHelper, and by that time I learned the hard way that I couldn’t simply require 'action_view/helpers/number_helper' because I’d have problems related to ActiveSupport’s autoloading mechanism, so I had to fully require action_view. Anyway, since ActionView depends on ActiveSupport I wanted to get rid of it as well. As usual, after lots of wasted time searching for Ruby number formatting gems I decided to implement the formatting myself and a few hours later I got rid of ActionView.

But ActiveSupport was still there as a great warrior! This time it was dependency of… guess what? Yep, FactoryGirl! Oh, man :( After some research on alternative factory implementations I found Fabrication to be dependency free. An hour later I ported our factories to Fabrication and finally got rid of ActiveSupport! Yay, no more monkey patches to core Ruby classes! Right?

Well, not exactly… :( The monkey patch culture is deeply rooted in Ruby’s community. ~~Some very popular gems add monkey patches, such as the mail gem, or sidekiq.~~ While reading the mail gem source I found it very confusing, so I decided to replace it with something simpler. We use exim4 to forward e-mails to Amazon SES, so Ruby’s basic NET/SMTP support is enough for delivering e-mails to Exim, all I needed was a MIME mail formatter in order to send simple TEXT + HTML multi-part mail to users. After some more research I decided to implement it myself and this is how simple_mail_builder was born.

~~At some point I might decide to create my own simple jobs processor just to get rid of Sidekiq’s monkey patches, but~~ my point is that I have this feeling of being a lonely warrior fighting a lost battle because of my expectations mismatch with what the Ruby community overall consider acceptable practices such as modifying Ruby core classes in libraries. I agree it’s okay for instrumenting code, such as NewRelic, to patch other’s code, but for other use cases I don’t really agree with such approach.

In one hand I really love the Ruby language, except for some few caveats, but there’s a huge mismatch with the Ruby community way of writing Ruby code, and this is a big thing. I don’t really know what’s the situation in other language communities, so I guess I might be a lonely warrior in any other language I opted for instead of Ruby, but Ruby is the only language I really appreciate so far among those I’ve worked with.

I guess I should just stop dreaming about the ideal Ruby community and give up on trying to get a monkey-patch free web application…

At least, I can now easily and happily debug anything that happens to the application without having to spend a lot of time digging into Rails or Devise’s source code, which used to take me a lot of time. Everything’s clean water. I have tons of flexibility to do what I want in no time with the new stack. The application boots pretty quickly and I’ll never run into edge cases involving ActiveSupport::Dependencies auto-reloading again. Or issues involving ActionController::Live. Or Devise issues when using Sequel as the ORM.

Ultimately I feel like I got full control over the application and that’s simply priceless! It’s an awesome feeling of freedom I never experienced before. Instead of focusing on the lonely warrior fighting a lost battle bad feeling, I’ll try concentrate on those great benefits from now on.

Using RSpec Nested Transactions to speed up tests touching the database

2016-08-08T13:05:00+00:00

TLDR: This article proposes savepoints to implement nested transactions, which are supported by PostgreSQL, Oracle, Microsoft SQL Server, MySQL (with InnoDB but I think some statements would automatically cause an implicit commit, so I’m not sure it works well with MySQL) and other vendors, but not by some vendors or engines. So, if using savepoints or nested transactions are not possible with your database most likely this article won’t be useful to you. Also, not all ORM provide support for savepoints in their API. I know Sequel and ActiveRecord do. It also provides a link on how to achieve the same goal with Minitest.

I’ve been feeling lonely about my take on tests for a long time. I’ve read many articles on tests in the past years and most of them, not only in the Ruby community, seem to give us the same advices. Good advices by the way. I understand the reasoning about them but I also understand they come with trade-offs and this is where I feel kind of lonely. All articles I’ve read and some people that have worked with me have tried to convince me that I’m just plain wrong.

I never cared much about this but I never wrote about it either as I thought no one would be interested in learning about some techniques I’ve been using for quite some years to speed up my tests. Because it seems everything would simply tell me I’d go to hell for writing tests this way.

A few weeks ago I read this article from Travis Hunter which reminded me of an old TO-DO. More importantly, it made me realize I wasn’t that lonely in thinking the way I do about tests.

“Bullshit! I came here because the titles said my tests would be faster, I’m not interested in your long stories!”. Sure, feel free to completely skip the next section and go straight to the fun section.

Background

I graduated in Electrical Engineering after 5 years in the college. Then more two years working on my master thesis on hard real-time systems towards mobile robotics. I think there are two things which engineers in general get used to after a few years in the college. Almost everything involves trade-offs and one of the most important jobs of an engineering is to identify them and choose the one they consider to have the best cost benefit. The other one is related to the first one in knowing that some tools will better fit a set of goals. I mean, I know this is also understood by CS and similar graduated people, but I have this feeling it’s not as strong in general in those areas as I observe in some (electrical/mechanical/civil) engineers.

When I started using RSpec and Object Daddy (many of you may only know Factory Girl these days), a popular factory tool by that time, I noticed my suite would take almost a minute for just a few examples touching the database. That would certainly slow me down as I would have to add many more tests.

But I felt really bad when I complained about that once in the RSpec mailing list and David Chemlinsky mentioned about taking 54s to run a couple of hundred examples when actually I had only 54 examples in my suite by that time.

And it felt even worse when I contributed once to Gitorious and noticed that over a thousand examples would finish in just a few seconds, even though lots of them didn’t touch the database. Marius Mathiesen and Christian Johansen are very skilled developers and they were the main Gitorious maintainers by that time. Christian is the author of the popular Sinon.js, one of the authors of the great Buster.js and author of the Test-Driven JavaScript Development book.

For that particular application, I had to create a lot of records in order to create the record I needed to test. And I was recreating them on every single test requiring such record, through Object Daddy but I suspect the result would be about the same with FactoryGirl or any other factory tool.

When I realized that creating lots of records in the database was that expensive, I stopped following the traditional advises for writing tests and only worried about what I really cared for which remains basically the same to these days.

These are my test goals:

ensure my application works (the main goal by far);
avoid regressions (linked to the previous one);
the suite should run as fast as possible (just a few seconds if possible);
it should give me enough confidence to allow me to completely change the implementations during any refactoring without completely breaking the tests; To me that means avoid mocking or stubbing objects and performing HTTP requests against a real server for testing things like cookie-based sessions and a few other scenarios (rack_toolkit allows me to create such tests while still being fast).

These are not my test goals at all:

writing specs in such a way they would serve as a documentation. I really don’t care how the output looks like when I run a single test file with RSpec. That’s also the reason why I never used Cucumber. Worrying about this adds more complexity and I don’t think they are useful for documentation purposes anyway;
each example should have a single expectation. I simply don’t see much value on this and very often this has the potential of slowing down the test suite;
tests should be independent from each other and ideally we should run them in random order. I understand the reasoning behind this and I actually find it useful and see value in it. But if I see trade-offs I’d trade test-independence by speed. Fortunately this is not required by my tests touching the database using the technique I demonstrate in the next section, but it may speed up some request tests.

I even wrote my own JavaScript test runner because I needed one that allowed me to run my tests in the specified order, supported IE6 (by that time) and beforeAll and I couldn’t find any by that time. My application used to register some live events on document and would never unregister them because it was not necessary, so my test suite would only be allowed to initialize it once. Also, recreating a tree on every test would take a lot of time, so I wanted to run a set of tests that would work on the same tree based on the result of previous tests.

I was okay with that trade as long my tests would run fast, but JavaScript test runners authors wouldn’t agree, so I created OOJSpec for my needs. I never advertised it because I don’t consider it to be feature complete yet, although it suites my current needs. It doesn’t currently support running a single test because I need to think in some way to declare a test’s dependencies (in other tests) so that those dependent tests would also be run before the requested one. Also, maintaining a test runner is not trivial and since it’s currently hard for me to find time to review patches I preferred not to announce it. Since I can run individual test files, it’s working fine for my needs, so I don’t currently have much motivation to further improve it.

A fast approach to speed up tests touching the database

A common case while testing some scenarios is that one wants to write a set of tests that exercise about the same set of records. Most people nowadays are using either one of the two common approaches:

creating the records either manually (through the ORM usually) or through factories;
loading fixtures (which are usually faster than creating them using factories);

Loading specific fixtures before each context wouldn’t be significantly faster than using a factory when using a competent factory and ORM implementations, so some will simply use DatabaseCleaner with the truncate strategy to delete all data before the suite starts and loading the fixtures to the database. After that usually each example would run inside a transaction that would be rolled back which is usually much faster than truncating and reloading the fixtures.

I don’t particularly like fixtures because I find them to make tests more complicated to write and understand. But I would certainly consider them if they would make my tests significantly faster. Also, nothing prevents us from using the same fixtures approach with factories as we could also use the factories to populate the initial data before the suite starts, but the real problem is that writing tests would still be more complicated in my opinion.

So, I prefer to think about solutions that allows tests to remain fast even when using factories. Obviously that means that we should find some way to avoid recreating the same records for a given group since the only way to speed up a suite that takes a lot of time creating records in the database is to reduce the amount of time spent in the database creating those records.

There are other kind of optimizations that would be interesting to try but that it’s probably complicated to implement as it would probably require a change in FactoryGirl API to allow such optimizations. For example, rather than sending one statement at a time to the database I guess it would be faster to send all of them at once. However I’m not sure it would be that much faster if you are using a connection pool (usually a single connection in the test environment) that keeps the connection open and you’re using a local database.

So, let’s talk about the low-hang fruits which are also the best ones in this case. How can we reuse a set of records among a set of examples while still allowing them to be independent from each other?

The idea is to use nested transactions to achieve that goal. You begin a transaction during the suite start (or some context involving database statements) and then the suite will create a savepoint before a set/group of examples (a context in RSpec language) and rollback to that savepoint after the context finished.

Managing such savepoint names can be complex to implement on your own but if you are going this route anyway because your ORM doesn’t provide an easy API to handle nested transactions then you may not be interested in the rspec_nested_transactions gem I’ll present in the next section.

However with Sequel this is as easy as:

1	# The :auto_savepoint option will automatically add the "savepoint: true" option to inner
2	# transaction calls.
3	DB.transaction(auto_savepoint: true, savepoint: true, rollback: :always){ run_example }

With ActiveRecord the API works like this (thanks Tiago Amaro, for showing me the API):

1	ActiveRecord::Base.transaction(requires_new: true) do
2	run[]
3	raise ActiveRecord::Rollback
4	end

This will detect whether a transaction is already in place and use savepoints if it is or will issue a BEGIN to start the transaction. It will manage the savepoint names automatically for you and will even rollback it automatically when using the “rollback: :always” option. Very handy indeed. But in order to achieve this Sequel doesn’t provide methods such as “start_transaction” and “end_transaction”.

Why is this a problem? Sequel does the right thing by always requiring a block to be passed to the “transaction” method but RSpec does not support “around(:all)”. However Myron Marston posted a few years ago how to implement it using fibers and Sean Walbran created a real gem based on that article. You’d probably be interested in combining this with the well known strategy of wrapping each example in a nested transaction themselves.

If you feel confident that you will always remember to use “around(:all)” with a “DB.transaction(savepoint: true, rollback: :always){}” block whenever you want to create such a common set of records to be used inside a group of examples then the rspec_around_all gem may be all you need to implement that strategy.

Not only I find this bug prone (I could forget about the transaction block) I also bother to repeat this pattern every time I want to create a set of shared records.

There’s a caveat though. If your application creates transactions itself it should be aware of savepoints too (this is accomplished automatically when using Sequel provided you use the :auto_savepoint option in the outmost transaction) even if BEGIN-COMMIT is enough out of the tests, so that it works as expected in combination with this technique. If you are using ActiveRecord, that means using “requires_new: true”.

If you are using Sequel or ActiveRecord and PostgreSQL, Oracle, MSSQL, MySQL (with InnoDB) or any other vendor supporting nested transactions and have full control over the transaction calls, implementing this technique can speed up your suite a lot with regards to the tests touching the database. And rspec_nested_transactions will make it even easier to implement.

Let the fun begin: introducing rspec_nested_transactions

I’ve released today rspec_nested_transactions which allows one to run all (inner) examples and contexts inside a transaction (usually a database transaction) with a single configuration:

1	require 'rspec_nested_transactions'
2
3	RSpec.configure do \|c\|
4	c.nested_transaction do \|example_or_group, run\|
5	(run[]; next) unless example_or_group.metadata[:db] # or delete this line if you don't care
6	# with Sequel, assuming the database is stored in DB:
7	DB.transaction(auto_savepoint: true, savepoint: true, rollback: :always, &run)
8
9	# with ActiveRecord (Oracle, MSSQL, MySql[InnoDB], PostgreSQL):
10	ActiveRecord::Base.transaction(requires_new: true) do
11	run[]
12	raise ActiveRecord::Rollback
13	end
14	end
15	end

That’s it. I’ve been using a fork of rspec_around_all (branch config_around) since 2013 and it has always served me great since then and I never had to change it since then, so I guess it’s quite stable. However for a long time I considered moving it to a separate gem and remove the parts I didn’t actually use (like “around(:all)”). I always post-poned it but Travis' article reminded me about it and I thought that maybe others might be interested on this approach as well.

So, I improved the specs, cleaned up the code using recent Ruby features (>= 2.0 [prepend]) and released the new gem. Since the specs use the “<<~” heredoc it will only run on Ruby >= 2.3 but I guess it should work with all Ruby >= 2.0 (or even 1.9 I guess if you implement Module.prepend).

What about Minitest?

Jeremy Evans, the Ruby Hero who happens to be the maintainer of Sequel and creator of Roda, was kind enough to provide a link on how to achieve the save with Minitest) in the comments below. No need for Fibers in that case. Go check that out if you’re working with Minitest.

Final notes

Currently our application runs 364 examples (RSpec doesn’t report the expectations count, but I suspect it could be around a thousand) in 7.8s while many of them will touch the database. Also, when I started this Rails application I decided to give ActiveRecord another try since it had also included support for a lazy API when Arel was introduced, which I was already used to with Sequel. A week or two later I decided to move to Sequel after finding AR API quite limiting for the application’s needs. At that time I noticed that the tests finished considerably faster after switching from ActiveRecord to Sequel, so I guess Sequel has a lower overhead when compared to ActiveRecord and switching to Sequel could possibly help speeding up your test suite as well.

That’s it, I hope some of you would see value in this approach. If you have other suggestions (besides running the examples in parallel) to speed up a test suite, I’m always interested in speeding up our suite. We have a ton of code both in server-side and client-side and only part of them is currently tested and I’m always looking towards improving the test coverage which means potentially we could implement over 500 more tests (for both server-side and client-side) while I still want the test suite to complete in just a few seconds. I think the most hard/critical parts are currently covered in the server-side and it will be easier to test other parts once I’m moving the application to Roda (the client-side needs much more work to make it easier to test some critical parts). I would be really happy if both server and client-side suites would finish in within a second ;) (currently the client-side suite takes about 11s to complete - 204 tests / 438 assertions).

Introducing RackToolkit: a fast server and DSL designed to test Rack apps

2016-07-27T18:43:00+00:00

I started to experiment with writing big Ruby web applications as a set of smaller and fast Rack applications connected by a router using Roda’s multi_run plugin.

Such design allows the application to boot super fast in the development environment (and in the production environment too unless you prefer to eager load your code in production). Here’s how the design looks like (I’ve written about AutoReloader in another article):

1	# config.ru
2	if ENV['RACK_ENV'] == 'development'
3	require 'auto_reloader'
4	AutoReloader.activate reloadable_paths: [ 'apps', 'lib', 'models' ]
5	run ->(env) do
6	AutoReloader.reload! do
7	ActiveSupport::Dependencies.clear # avoid some issues
8	require_relative 'apps/main'
9	Apps::Main.call env
10	end
11	end
12	else
13	require_relative 'apps/main'
14	run Apps::Main
15	end
16
17	# apps/main.rb
18	require 'roda'
19	module Apps
20	class Main < Roda
21	plugin :multi_run
22	# other plugins and middlewares are added, such as :error_handler, :not_found, :environments
23	# and a logger middleware. They take some space, so I'm skipping them.
24
25	def self.register_app(path, &app_block)
26	# if you want to eager load files in production you'd change this method a bit
27	->(env) do
28	require_relative path
29	app_block[].call env
30	end
31	end
32
33	run 'sessions', register_app('session'){ Session }
34	run 'admin', register_app('admin') { Admin }
35	# other apps
36	end
37	end
38
39	# apps/base.rb
40	require 'roda'
41	module Apps
42	class Base < Roda
43	# add common plugins for rendering, CSRF protection, middlewares
44	# like ETag, authentication and so on. Most apps would inherit from this.
45	route{\|r\| process r }
46	private
47	def process(r)
48	protect_from_csrf # added by some CSRF plugin
49	end
50	end
51	end
52
53	# apps/admin.rb
54	require_relative 'base'
55	module Apps
56	class Admin < Base
57	private
58	def process(r)
59	super # protects from forgery and so on
60	r.get('/'){ "TODO Admin interface" }
61	# ...
62	end
63	end
64	end

Then I want to be able to test those applications separately and for some of them I would only get confidence if I tested against a real server since I would want them to handle with cookies or streaming and checking for some HTTP headers injected by the real server and so on. And I wanted to be able to write such tests that could run as quickly as possible.

I started experimenting with Puma and noticed it can start a new server really fast (like 1ms in my development environment). I didn’t want to add many dependencies so I decided to create some simple DSL over ‘net/http’ stdlib since its API is not much friendly. The only dependencies so far are http-cookie and Puma (WEBrick does not support full hijack support and it doesn’t provide a simple API to serve Rack apps either and it’s much slower to boot). Handling cookies correctly to keep the user session is not trivial so I decided to introduce the http-cookie dependency to manage a cookie jar.

That’s how rack_toolkit was born.

Usage

This way I can start the server before the test suite starts, change the Rack app served by the server dynamically, and stop it when the suite finishes (or you can simply start and stop it for each example since it boots really fast). Here’s a spec_helper.rb you could use if you are using RSpec:

1	# spec/spec_helper.rb
2	require 'rack_toolkit'
3	RSpec.configure do \|c\|
4	c.add_setting :server
5	c.add_setting :skip_reset_before_example
6
7	c.before(:suite) do
8	c.server = RackToolkit::Server.new start: true
9	c.skip_reset_before_example = false
10	end
11
12	c.after(:suite) do
13	c.server.stop
14	end
15
16	c.before(:context){ @server = c.server }
17	c.before(:example) do
18	@server = c.server
19	@server.reset_session! unless c.skip_reset_before_example
20	end
21	end

Testing the Admin app should be easy now:

1	# spec/apps/admin_spec.rb
2	require_relative '../../apps/admin'
3	RSpec.describe Admin do
4	before(:all){ @server.app = Admin }
5	it 'shows an expected main page' do
6	@server.get '/'
7	expect(@server.last_response.body).to eq 'TODO Admin interface'
8	end
9	end

Please take a look at the project’s README for more examples and supported API. RackToolkit allows you to get the current_path, referer, manages cookies sessions, provides a DSL for get, post and post_data on top of ‘net/http’ from stdlib, allows overriding the environment variables sent to the Rack app, simulating an https request as if the app was behind some proxy like Nginx, supports “virtual hosts”, default domain, performing requests to external Internet urls and many other options.

Future development

It currently doesn’t provide a DSL for quickly access elements from the response body, filling in forms and submitting them, but I plan to work on this once I need it. It won’t ever support JavaScript though unless it would be possible at some point to do so without slowing it down significantly. If you want to work on such DSL, please let me know.

Performance

The test suite currently runs 33 requests and finishes in ~50ms (skipping the external request example). It’s that fast.

Feedback

Looking forward your suggestions to improve it. Your feedback is very welcomed.

AutoReloader: a transparent automatic code reloader for Ruby

2016-07-18T14:35:00+00:00

I’ve been writing some Roda apps recently. Roda doesn’t come with any automatic code reloader, like Rails does. Its README lists quite a few code reloaders that could be used with Roda but while converting a JRuby on Rails small application to Roda I noticed I didn’t really like any of the options. I’ve written a review about the available options if you’re curious.

I could simply use ActiveSupport::Dependencies since I knew it was easy to set up and worked mostly fine but one of the reasons I’m thinking about leaving Rails is the autoloading behavior of ActiveSupport::Dependencies and the monkey patches to Ruby core classes added by ActiveSupport as a whole. So, I decided to create auto_reloader which provides the following features:

just like Rack::Reloader it works transparently. Just use “require” and “require_relative”. To automatically track constants definitions one has to override them anyway and I can’t think of any reliable way to track top-level constants automatically without overriding those methods. However those methods are only overridden when AutoReloader is activated, which doesn’t happen in the production environment as it does with ActiveSupport::Dependencies. Those are the only monkey patches happening in development mode;
differently from Rack::Reloader, it will detect new top-level constants defined after a request and unload them upon reloading, preventing several issues caused by not doing that;
no monkey patches to core Ruby classes in production mode;
it can use the ‘listen’ gem as a file watcher to speed up the request when no reloadable files have been changed, in which case the application would respond almost as fast as in production environments, which is important when we are working on performance optimizations. It will use ‘listen’ by default when available but it can be opted out and it won’t make much difference unless, maybe, if some request would load many reloadable files;
it’s also possible to force reloading even if no loaded files have been changed. This could be useful if such files would load some non-Ruby configuration files and they have changed but the README provides another alternative to better handle those cases by using Listen to watch them and call AutoReloader.force_next_reload;
it doesn’t provide autoloading like ActiveSupport::Dependencies does;
it’s possible to configure a minimal delay time between two code reloading procedures;
it unloads all reloadable files rather than only the changed files as I believe this is a safer approach and the one also used by ActiveSupport::Dependencies;
reloadable files are those found in one of the reloadable_paths option provided to AutoReloader;
not specific to Rack application, but could be used with any Ruby application.

What AutoReloader does not implement:

autoloading of files on missing constants. Use Ruby’s “autoload” for that if you want;
it doesn’t provide a hook system to notify when some file is loaded like ActiveSupport::Dependencies does;
it doesn’t provide an option to specify load-once files. An option would be to place them in different directories and do not include them in the reloadable_paths option;
it doesn’t reload on changes to files other than the loaded ones, like JSON or YAML configuration files, but it’s easy to set up them as explained in the project’s README.

Usage with a Rack application

1	# app.rb
2	App = -> { [ '200', { 'Content-Type' => 'text/plain' }, [ 'Sample output' ] ] }
3
4	# config.ru
5	if ENV['RACK_ENV'] != 'development'
6	require_relative 'app'
7	run App
8	else
9	require 'auto_reloader'
10	# won't reload before 1s elapsed since last reload by default. It can be overridden
11	# in the reload! call below
12	AutoReloader.activate reloadable_paths: [ '.' ]
13	run -> (env) {
14	AutoReloader.reload! do
15	require_relative 'app'
16	App.call env
17	end
18	}
19	end

If you also want it to reload if the “app.json” configuration file has changed:

1	# app.rb
2	require 'json'
3	config = JSON.parse File.read 'config/app.json'
4	App = -> { [ '200', { 'Content-Type' => 'text/plain' }, [ config['output'] ] ] }
5
6	# append this to config.ru
7	require 'listen' # add the 'listen' gem to your Gemfile
8	app_config = File.expand_path 'config/app.json'
9	Listen.to(File.expand_path 'config') do \|added, modified, removed\|
10	AutoReloader.force_next_reload if (added + modified + removed).include?(app_config)
11	end

If you decided to give it a try and found any bugs please let me know.

A Review of Code Reloaders for Ruby

2016-07-18T15:15:00+00:00

When we are writing a service in Ruby, it’s super useful to have the ability to automatically change its behavior to conform the latest changes to the code. Otherwise we’d have to manually restart the server after each change. This would slow down a lot the development flow, specially if the application takes a while before it’s ready to process next request.

I guess most people using Ruby are writing web applications with Rails. Many don’t notice that Rails supports auto code reloading out of the box, through ActiveSupport::Dependencies. A few will notice it once they are affected by some corner case where the automatic code reloading doesn’t work well.

Another feature provided by Rails is the ability of automatic loading files if the application follows some conventions, so that the developer is not forced to manually require some code’s dependencies. Another benefit is that this behavior is similar to Ruby’s autoload feature, which purpose is to speed up the loading time of applications by avoiding to load files the application won’t need. Matz seems to dislike this feature and discouraged its usage 4 years ago. Personally I’d love to see autoload gone as it can cause bugs that are hard to track. However, loading many files in Ruby is currently slow even if simply loading them from disk would be pretty fast. So, I guess Ruby would have to provide some sort of pre-compiled files support before deprecating autoload so that we wouldn’t need it for the purpose of speeding up the start-up time.

Since automatic code reloading usually works well enough for Rails applications, most people won’t research about code reloaders until they are writing web apps with other frameworks such as Sinatra, Padrino, Roda, pure Rack, whatever.

This article will review generic automatic code reloaders, including ActiveSupport::Dependencies, but leaving specific ones out of the scope, like Sinatra::Reloader and Padrino::Reloader. I’ve not checked Ruby version compatibility of each one, but all of them work on latest MRI.

Rack::Reloader

Rack::Reloader is bundled with the rack gem. It’s very simple but it’s only suitable for simple applications in my opinion. It won’t unload constants, so if you remove some file or rename some class the old ones will still be available. It works as a Rack middleware.

One can provide the middleware a custom or external back-end, but I’ll only discuss the default one, which is bundled with Rack::Reloader, called Rack::Reloader::Stat.

Before each request it traverse $LOADED_FEATURES, skipping .so/bundle files and call Kernel.load on each file that has been modified since the last request. Since config.ru is loaded rather than required it’s not listed in $LOADED_FEATURES so it will be never reloaded. This means that the app’s code should live in another file required in config.ru rather than living directly in config.ru. It worth mentioning that because I’ve been bitten by this more than once while testing Rack::Reloader.

Differently from the Rails approach, any changed file will be reloaded even if you modify some gem’s source.

Rack::Reloader issues

I won’t discuss performance issues when there are many files loaded because one could provide another back-end able to track files changes very quickly and because there are more important issues affecting this strategy.

Suppose your application has some code like this:

1	require 'singleton'
2	class MyClass
3	include Singleton
4	attr_reader :my_flag
5	def initialize
6	@my_flag = false
7	end
8	end

Calling MyClass.instance.my_flag will return false. Now, if you change the code so that @my_flag is assigned to true in “initialize” MyClass.instance.my_flag will still return false.

Let’s investigate another example where Rack::Reloader strategy won’t work:

1	# assets_processor.rb
2	class AssetsProcessor
3	@@processors = []
4	def self.register
5	@@processors << self
6	end
7
8	def self.process
9	@@processors.each :&do_process
10	end
11	end
12
13	# assets_compiler.rb
14	require_relative 'assets_processor'
15	class AssetsCompiler < AssetsProcessor
16	register
17
18	def self.do_process
19	puts 'compiling assets'
20	end
21	end
22
23	# gzip_assets.rb
24	require_relative 'assets_processor'
25	class GzipAssets < AssetsProcessor
26	register
27
28	def self.do_process
29	puts 'gzipping assets'
30	end
31	end
32
33	# app.rb
34	require_relative 'assets_compiler'
35	require_relative 'gzip_assets'
36	class App
37	def run
38	AssetsProcessor.process
39	end
40	end

Running App.new.run will print “compiling assets” and then “gzipping assets”. Now, if you change assets_compiler.rb, it will also print “compiling assets” once more the next time it’s called.

This applies to all situations where a given class method is supposed to be run only once or when the order of files load matter. For example, suppose AssetsProcessor.register implementation is changed in assets_processor.rb. Since register was already called in its subclasses that means the change won’t take effect in them since only assets_processor.rb will be reloaded by Rack::Reloader. Other reloaders discussed here also suffer with this issue but they provide some work-arounds for some of them.

rerun and shotgun: the reload everything approach

Some reloaders like rerun and shotgun will simply reload everything on each request. They fork at each request before requiring any files, which means those files are never required in the main process. Due to forking it won’t work on JRuby or Windows. This is a safe approach when using MRI on Linux or Mac though. However, if your application takes a long time to boot then your requests would have a big latency during the development mode. In that case, if the reason for the slow start-up lies in the framework code and other external libraries rather than the app specific code, which we want to be reloadable, one can require them before forking to speed it up.

This approach is a safe bet, but unsuitable when running on JRuby or Windows. Also if loading all app’s specific code is still slow, one may be interested in looking for faster alternatives. Besides that, this latency will exist in development mode for all requests even if no files have been changed. If you’re working on performance improvements other approaches will yield to better results.

rack-unreloader

rack-unreloader takes care of unloading constants during reload, differently from Rack::Reloader.

It has basically two modes of operation. One can use “Unreloader.require(‘dep’){[‘Dep’, …]}” to require dependencies while also providing which new constants are created and those will be unloaded during reload. This is the safest approach but it’s not transparent. For every required reloadable file we must manually provide a list of constants to be unloaded. On the other side this is the fastest possible approach since the reloader doesn’t have to try to figure out those constants automatically, like other options that will be mentioned below do. Also, it doesn’t override “require”, so it’s great for those that don’t want any monkey patching. Ruby currently does not provide a way to safely discover those constants automatically without monkey patching require, so rack-unreloader is probably the best you can get if you want to avoid monkey patches.

The second mode of operation is to not provide that block and Unreloader will look at changes to $LOADED_FEATURES before and after the call of Unreloader.require to figure out which constants the required file define. However, without monkey patching “require” this mode can’t be reliable, as I’ll explain in the sub-section below.

Before getting into it, there’s another feature of rack-unreloader that speed up reloading by only reloading the changed files, differently from other options I’ll explore below in this article. However, reloading just changed files is not always reliable as I’ve discussed in the Rack::Reloader Issues section.

Finally, differently from other libraries, rack-unreloader actually calls “require” rather than “load” and deletes the reloaded files from $LOADED_FEATURES before the request so that calling “require” will actually reload the file.

rack-unlreloader Issues

It’s only reliable if you always provide the constants defined on each Unreloader.require() call. This is also the fastest approach. It may be a bit boring to write code like this. Also, even in this mode, it’s only reliable if your application works fine regardless of the order each file is reloaded (I’ve shown an example in the Rack::Reloader Issues section demonstrating how this approach is not reliable if this is not the case).

Let’s explore why the automatic approach is not reliable:

1	# t.rb:
2	require 'json'
3	module T
4	def self.call(json)
5	JSON.parse(json)
6	end
7	end
8
9	# app.rb:
10	require 'rack/unreloader'
11	require 'fileutils'
12	Unreloader = Rack::Unreloader.new{ T }
13	Unreloader.require('./t.rb') # {'T'} # providing the block wouldn't trigger the error
14	Unreloader.call '{}'
15	FileUtils.touch 't.rb' # force file to be reloaded
16	sleep 1 # there's a default cooltime delay of 1s before next reload
17	Unreloader.call '{}' # NameError: unitialized constant T::JSON

Since rack-unreloader does not override “require” it can’t track which files define which constants in a reliable way. So, it thinks ’t.rb' is responsible for defining JSON and will then unload JSON (which has some C extensions which cannot be unloaded). This also affects JRuby if the file imports some Java package among other similar cases. So, if you want to work with the automatic approach with rack-unreloader you’d have to require all those dependencies before running Unreloader.call. This is very error-prone, that’s why I think it’s mostly useful if you always provide the list of constants expected to be defined by the required dependency.

However rack-unreloader provides a few options like “record_dependency”, “subclasses” and “record_split_class” to make it easier to specify the explicit dependencies between files so that the right files are reloaded. But that means the application author must have a good understanding on how auto-reloading works, how their dependencies work and will also require them to fully specify the dependencies. It can be a lot of work but it may worth in the case reloading all reloadable files can take a lot of time. If you’re looking for the fastest possible reloader than rack-unreloader may well be your best option.

ActiveSupport::Dependencies

Now we’re talking about the reloader behind Rails, which is great and battle tested and one of my favorites. Some people don’t realize it’s pretty simple to use it outside Rails, so let me demonstrate how it can be used since it seems it’s not widely documented.

Usage

1	require 'active_support' # this must be required before any other AS module as per documentation
2	require 'active_support/dependencies'
3	ActiveSupport::Dependencies.mechanism = :load # or :require in production environment
4	ActiveSupport::Dependencies.autoload_paths = [__dir__]
5
6	require_dependency 'app' # optional if app.rb defines App, since it also supports autoloading
7	puts App::VERSION
8	# change version number and then:
9	ActiveSupport::Dependencies.clear
10	require_dependency 'app'
11	puts App::VERSION

Or, in the context of a Rack app:

1	require 'active_support'
2	require 'active_support/dependencies'
3	if ENV['RACK_ENV'] == 'development'
4	ActiveSupport::Dependencies.mechanism = :load
5	ActiveSupport::Dependencies.autoload_paths = [__dir__]
6
7	run ->(env){
8	ActiveSupport::Dependencies.clear
9	App.call env
10	}
11	else
12	ActiveSupport::Dependencies.mechanism = :require
13	require_relative 'app'
14	run App
15	end

How it works

ActiveSupport::Dependencies has a quite complex implementation and I don’t really have a solid understanding of it so please let me know about my mistakes in the comments section so that I can fix them.

Basically it will load dependencies in the autoload_paths or require them depending on the informed mechanism. It keeps track of which constants are added by overriding “require”. This way it knows that JSON was actually defined by “require ‘json’” if it’s called by “require_dependency ’t'” and would detect that T was the new constant defined by ’t.rb' and the one that should be unloaded upon ActiveSupport::Dependencies.clear. Also, it doesn’t reload individual changed files only but unloads all reloadable files on “clear”. This is less likely to cause problems as I’ve explained in previous section. It’s also possible to configure it to use an efficient file watcher, like the one implemented by the ‘listen’ gem, which uses an evented approach using OS provided system calls. This way, one can skip the “clear” call if the loaded reloadable files have not been changed by speeding up the request even in development mode.

ActiveSupport::Dependencies supports a hooks system that allow others to observe when some files are loaded and take some action. This is specially useful for Rails engines when you want to run some code only after some dependency has been loaded for example.

ActiveSupport::Dependencies is not only a code reloader but it also implements an auto code loader by overriding Object’s const_missing to automatically try to require code that would define that constant by following some conventions. For example, in the first time one attempts to use ApplicationController, since it’s not defined, it will look in the search paths for an ‘application_controller.rb’ file and load it. That means the start-up time can be improved since we only load code we actually use. However this could lead to some issues that would make the application behave differently in production due to side effects caused by the order some files would be loaded. But Rails applications have been built around this strategy for several years and it seems such caveats have only affected a few people. Those cases can usually be worked around through “require_dependency”.

If your code doesn’t follow the naming convention it will have to use “require_dependency”. This way, if ApplicationController is defined in controllers/application.rb, you’d use “require_dependency ‘controllers/application’” before using it.

Why I don’t like autoload

Personally I don’t like autoloading in general and always prefer explicit dependencies in all my Ruby files, so even in my Rails apps I don’t rely on autoloading for my own classes. The same applies for Ruby’s built-in “autoload” feature. I’ve been bitten already by an autoload related bug when trying to use ActionView’s number helpers by requiring the specific file I was interested in. Here’s a simpler use case demonstrating the issue with “autoload”:

1	# test.rb
2	autoload :A, 'a'
3	require 'a/b'
4
5	# a.rb
6	require 'a/b'
7
8	# a/b.rb
9	module A
10	module B
11	end
12	end
13
14	# ruby -I . test.rb
15	# causes "...b.rb:1:in `': uninitialized constant A (NameError)"

It’s not quite clear what’s happening here since the message isn’t very clear about the real problem and it gets even more complicated to understand in a real complex code base. Requiring ‘a/b’ before requiring ‘a’ will cause a circular dependency issue. When “module A” is seen inside “a/b.rb”, it doesn’t exist yet and the “autoload :A, ‘a’” tells Ruby it should require ‘a’ in that case. So, this is what it does, but ‘a.rb’ will require ‘a/b.rb’ which we were trying to load in the first place. There are other similar problems that are caused by autoload and that’s why I don’t use it myself despite the potential of loading the application faster. Ideally Ruby should provide support for some sort of pre-compiled (or pre-parsed) files which would be useful for big applications to speed up code loading since the disk I/O is not the bottleneck but the Ruby parsing itself.

ActiveSupport::Dependencies Caveats

ActiveSupport::Dependencies is a pretty decent reloader and I guess most people are just fine with it and its known caveats. However there are some people, like me, which are more picky.

Before I get into the picky parts, let’s explore the limitations one has to have in mind when using a reloader that relies on running some file code multiple times. The only really safe strategy I can think of for handling auto-reloading is to completely restart the application or to use the fork/exec approach. They have their own caveat, like being slower than the alternatives, so it’s always about trade-offs when it comes to auto-reloaders. Running some code more than once can lead to unexpected results since not all actions can be rolled back.

For example, if you include some module to ::Object, this can’t be undone. And even if we could work around it, we’d have to detect such automatically which would perform so badly that it would be probably better to simply restart everything. This applies to monkey patching, to creating some constants in namespaces which are not reloadable (like defining JSON::CustomExtension) and similar situations. So, when we are dealing with automatic reloaders we should keep that in mind and understand that reloading will never be perfect unless we actually restart the full application (or use fork/exec). ActiveSupport::Dependencies provides some options as autoload_once_paths so that such code wouldn’t be executed more than once but if you have to change such code then you’ll be forced to restart the full application.

Also, any file actually required rather than loaded (either with require or require_relative) won’t be auto-reloaded, which forces the author to always use require_dependency to load files that are supposed to be reloadable.

Here’s what I dislike about it:

ActiveSupport::Dependencies is part of ActiveSupport and relies on some monkey patches to core classes. I try to avoid monkey patching core classes at all costs so I don’t like AS in general due to its monkey patching approach;
Autoloading is not opt-in as far as I know, so I can opt out and I’d rather prefer to not using it;
Since some Ruby sources will make use of “require_dependency” and since some Rails related gems may rely on the automatic autoloading feature provided by ActiveSupport::Dependencies it forces applications to override “require” and use ActiveSupport::Dependencies even in production mode;
If your application doesn’t rely on ActiveSupport then this reloader will add some overhead to the download phase of Bundler.

Conclusion

Among the options covered in this article, ActiveSupport::Dependencies is my favorite one although I would consider rerun or shotgun when running on MRI and Linux if the application starts quickly and I wouldn’t have to work on performance improvements (in that case, it’s useful to have the behavior of performing like in production when no files have been changed).

Basically, if your application is fast to load then it may make sense to start with rerun or shotgun since they are the only real safe bets I can think of.

However, I performed a few metrics in my application and decided it worth creating a new transparent reloader that would also fix some of the caveats I see in ActiveSupport::Dependencies. I wrote a new article about auto_reloader.

If you know about other automatic code reloaders for Ruby I’d love to know about them. Please let me know in the comments section. Also let me know if you think I misunderstood how any of those mentioned in this article actually works.

The sad state of streaming in Ruby web applications

2016-07-04T21:40:00+00:00

This article is basically a copy of this project’s README. You may read it there if you prefer. It’s a sample application demonstrating the current streaming state with Devise or Warden.

Devise is an authentication library built on top of Warden, providing a seamless integration with Rails apps. This application was created following the steps described in Devise’s Getting Started section. Take a look at the individual commits and their messages if you want to check each step.

Warden is a Rack’s middleware and authentication is handled using a “throw/catch(:warden)” approach. This works fine with Rails until streaming is enabled with ActionController::Live.

José Valim pointed out that the problem is ActionController::Live’s fault. This is because the Live module changes the “process” method so that it runs inside a spawn thread, so that it can return to finish processing the remaining middlewares in the stack. Nothing is sent to the connection before leaving that method due to the Rack issue I’ll describe next. But the “process” method will also handle all filters (before/around/after action hooks). Usually the authentication happens in a before action filter and if the user is not authentication Devise will “throw :warden” but since this is running in a spawn thread, the Warden middleware doesn’t have the chance to catch this symbol and handle it properly.

The Rack issue

I find it amusing that after so many years of web development with Ruby, Rack doesn’t seem to have evolved much to better handling streamed responses, including SSE and why not websockets. The basic blocks are basically the same as when Rack was first created in a successful attempt to add a standard API web servers and frameworks could agree and build on top of it. This is a great achievement but Rack should evolve to better handle streamed responses.

Aaron Patterson has tried to work on another API for Rack that would improve support for streaming but it seems it would break middlewares, and currently it seems the metal is dead. Sounds like HTTP 2.0 multiplexing requires yet more changes, so maybe we’ll get proper support in Rack 3.0, which should be backward compatible and keep supporting existing middlewares, by providing alternative APIs, but that seems like it could take years to get there. He has also written about the issues with Rack API over 5 years ago.

Currently, the way Rack applications handle streaming is by implementing an object that responds to each that will yield a chunk at a time until the stream is finished, which is usually implemented by providing the user an API similar to a proper stream object as properly implemented in other languages. A few years ago an alternative system has been suggested, which became known as the hijacking API. The Phusion team covered it when it was introduced but I think the “partial hijacking” section is no longer valid.

Rack was designed on top of a middleware stack which means any response will only start after all middlewares have been called and returned (except if hijacking is used), since middlewares don’t have access to the socket stream. That’s why Rails had to resort to using threads to handle streamed/chunked responses. But it can offer other alternative implementations that would be more friendly to how Warden and Devise work as demonstrated in this application, which I’ll discuss in the next section.

Before talking about Rails current options, I’d like to stress a bit more the problem with Rack without hijacking, and consequently how it affects web development in Ruby in a negative way, when compared to how this is done in most other languages.

If we compare to how streaming is handled in Grails (and most JVM based frameworks) , or most of the main web frameworks in other languages, it couldn’t be any simpler. Each request thread (or process) has access to a “response” object that accepts a “write” call that goes directly to the socket’s output (or after a “flush” call).

There’s no need to flag a controller as capable of streaming. They are just regular controllers. The request thread or process does not have to spawn another thread to handle streaming, so there’s nothing special with such controllers.

It would be awesome if Ruby web applications had the option to use a more flexible API, more friendly to streamed responses, including SSE and websockets. Hijacking currently seems to be considered a second-class citizen since they are usually ignored by major web frameworks like Rails itself.

The Rails case (or how to work around the current state in Rack apps)

So, with Rails one doesn’t flag an action as one requiring streaming support. They have to flag the full controller. In theory all other actions not taking advantage of the streaming API should work just like regular controllers not flagged with ActionController::Live.

The obvious question is then, “so, why isn’t Live always included?”. After all, the Rails users wouldn’t have to worry about enabling streaming, it would be simply enabled by default for when you want it. One might think that it would be related to performance concerns but I suspect that the main problem is that this is not issues free.

Some middleware assume that the inner middlewares have finished (some of them actually depend on them to be finished) so that they can modify the original response or headers. This kind of post-processing middlewares do not work well with streamed responses.

This includes caching middlewares (handling ETag or last-modified headers), monitoring middlewares injecting some HTML (like NewRelic does automatically by default for example) and many other. Those middlewares will block the stack until the response is fully finished which breaks the desired streamed output. Some of them will check some conditions and skip this blocking behavior under certain circumstances but some will still cause some hard to debug issues or they may be even conceptually broken.

There are also some middlewares that expect the controller’s action code to run in the same thread due to the implementation details surrounding them. For example, if a sandboxed database environment is implemented as a middleware that runs the following layer inside a transaction block that will be rolled back, and if the connection is automatically fetched using the current thread id as the access key, then spawning a new thread would run in a different connection and out of the middleware’s transaction, breaking the sandboxed environment. I think ActiveRecord fetches the connection from thread locals and since ActionController::Live will copy those locals to the new spawned thread it probably works, but I’m just warning that spawning threads may break several middlewares in unexpected ways.

This includes the behavior of Warden communication. So, enabling Live in all Rails controllers would have the immediate effect of breaking most current Rails applications as Devise is the de facto authentication standard for Rails apps. Warden assumes the code handling authentication checks is running in the same thread. It could certainly offer another strategy to inform about failed authentication, but this is not how it currently works.

Even though José Valim said there’s nothing they could do because it’s Live’s fault, this is not completely true. I guess he meant that it would be too much work to make it work. After all, we can’t simply put the fault on Live since the fault actually lies in Rack itself, so streaming is fundamentally broken.

Devise could certainly subclass Warden::Manager and use this subclass as its middleware and overwrite “call” to add some object to env, for example, that would listen to reported failures and they could replace “throw :warden” in its own code with a more higher level API that would communicate to warden properly. But I agree this is a mess and probably doesn’t worth, specially because it couldn’t be called exactly Warden compatible. Another option could be to change Warden itself so that it doesn’t expect the authentication checks to happen in the same thread. Or it could replace the “throw-catch” approach with a “raise/rescue” one, which should work out of the box to how Rails currently handles it. It shouldn’t be hard for Devise itself to wrap Warden and use Exceptions rather than throw-catch, but again, I’m not sure if this is really worthy.

So, let’s explore other options, which adds other API options to Rails itself.

A suggestion to add a new API to Rails

The Warden case is a big issue since Devise is very popular among Rails apps and shouldn’t be ignored. Usually the authentication is performed in filters rather than in the action itself. Introducing a new API would give the user the chance of performing authentication in the main request thread before spawning the streamed thread. This works even if the authentication check is done directly in the action rather than in the filters. The API would work something like:

1	def my_action
2	# optionally call authenticate_user! here, if not using filters
3	streamed do \|stream\|
4	3.times{stream.write "chunk"; sleep 1}
5	end
6	end

This way, the thread would only be spawned after the authentication check is finished. Or “streamed” could use “env[‘rack.hijack’]” when available instead of spawning a new thread.

Use Rack hijacking

Another alternative might be to support streaming only for web servers supporting Rack hijacking. This way, the stream API could work seamless, without requiring “ActionController::Live” to be included. When “response.stream” is used, it would use “env[‘rack.hijack_io’]” if available or either buffer the responses and send them at once or raise some error, based on some configuration accordingly to the user’s preferences, as sometimes streaming is not only an optimization but a requirement that shouldn’t be silently ignored. The same behavior would apply when HTTP 1.0 is used for example.

Or another module such as “ActionController::LiveHijacking” could be created so that Rails users would have that option for a while until Rails thinks this approach is stable enough to be enabled by default.

Conclusion

I’d like to propose two discussions around this issue. One would be a better solution for Rack applications to get to talk directly to the response ( or discussing an strategy for making Rack hijacking a first-class citizen and probably call it something better than hijack). And the other solution would be for Rails to improve support for streaming applications by better handling cases like the Warden/Devise issue. I’ve copied this text with some minor changes to my site so that it could be discussed in the Disqus' comments section or we could discuss it in the issues section of this sample project or in the rails-core mailing list, your call.

A sample Ruby script to achieve fast incremental back-up on btrfs partition

2016-07-04T16:32:00+00:00

For some years I have been using rsnapshot to back up our databases and documents using an incremental approach. We create a new back-up every hour and retain the last 24 hours backup, one back-up per day for the past 7 days and one back-up per week for the past 4 weeks.

Rsnapshot is great. It uses hard-links to achieve incremental back-up, saving up a lot of space. It’s a combination of “cp -al” and rsync. But we were facing a problem related to free inodes count on our ext4 partition. By the way, NewRelic does not monitor the free inodes count (df -i) so I found this problem the hard way, after the back-up stopped working due to lack of free inodes.

I’ve created a custom check in our own monitoring system to alert about low free inodes available and then I tried to tweak some ext4 settings to avoid this problem again in the new partition. We have 26GB spread on 2.6 million of individually gzipped documents (they are served directly by nginx) which will create almost 100 million hard-links in that back-up partition. There are hardlinks around the original documents as well as part of a smart strategy to save space when the same document is used in multiple transactions (they are not changed). Otherwise they would take some extra Gigabytes.

Recently, my custom monitoring system sent me an alert that 75% of the inodes were used while about only 30% of disk space was being actually used. So, I decided to investigate a bit more about other filesystems which dealt with inodes dynamically.

The btrfs filesystem

That’s how I found btrfs, a modern file-system which not only does not have a limit on inodes but, as I’ll describe, has some very interesting features for dealing with incremental back-up in a faster and better way than rsnapshot.

Initially I wasn’t thinking about replacing rsnapshot, but after reading about support for subvolumes and snapshots in btrfs I changed my mind and decided to replace rsnapshot with a custom script. I’ve tried to adapt rsnapshot for several hours to make the workflow I wanted work without success though. Here’s an issue related to btrfs support.

Before I talk about how btrfs helps our back-up system, let me explain a few issues I had with rsnapshot.

Rsnapshot issues

I’ve been living with some issues with rsnapshot in the past years. I want the full back-up procedure to take less than an hour so that we would be able to run it every hour. I had to tweak its settings a few times in order to get the script to finish in less than an hour but in the past days it was taking already almost 40 minutes to complete. A while back, before the tweaks, I had to change the interval to back-up every two hours.

One of the slow parts of rsnapshot is removing the last back-up snapshot when rotating. It doesn’t matter if you use “rm -rf” or whatever other method. Removing a big tree of files is slow. An alternative would be to move the latest snapshot to the first one (hourly.0), since this would save the “rm -rf” time and also the “cp -al” time, skipping to the rsync phase. But I wasn’t able to figure out how to make that happens with rsnapshot.

Also, some of the procedures could be done in parallel to speed up the process but rsnapshot doesn’t provide direct support to specify this and it’s hard to write proper shell script to manage those cases.

The goal

After reading about btrfs I figured out that the back-up procedure could be made much faster and be simplified. Then I created a Ruby script, which I’ll show in the next section, and integrated it in our automation tools in one day. I’ve replaced rsnapshot with it in our back-up server, with the new script and it’s running pretty well for the last two days taking about 8 minutes to complete the procedure on each run.

So, let me explain the strategy I wanted to implement to help you understanding the script.

As I said, btrfs supports subvolumes. Btrfs implements copy-on-write (CoW), so basically, this allows to both create and delete snapshots from subvolumes instantly (constant time). That means we replace the slow “rm -rf hourly.23” with the instantaneous “btrfs subvolume delete hourly.23” and “cp -al …” with the instantaneous “btrfs subvolume snapshot …”.

In order for a regular user to delete subvolumes with btrfs, the user_subvol_rm_allowed fs option must be used. Also, deleting a subvolume doesn’t work if there are other subvolumes inside it, so they must be removed first. There’s no switch or tool in the btrfs-progs package that allows you to delete them recursively. This is important to understand the script.

Our back-up procedure consists of getting a recent dump of two production PostgreSQL databases (the main database and the one used by Redmine) and syncing two directories containing files (the main application files and the files uploaded to Redmine).

The idea is to get them inside a static path as the first step. The main reason for that is that if something goes wrong in the process after syncing the documents (the slowest part), for example, we wouldn’t lose the transferred files the next time we try to run the script. So, basically here’s how I implemented it (there’s a simpler strategy I’ll explain next):

/var/backups/latest [regular directory]
/var/backups/latest/postgres [subvolume] - the main db dump is stored here
/var/backups/latest/tickets-db [subvolume] - the tickets db dump is stored here
/var/backups/latest/docmanager [subvolume] - the 2.6 million documents are rsynced here
/var/backups/latest/tickets-files [subvolume] - Redmine files go here

After the procedure is finished to get them in the latest state it creates a tmp directory and create a snapshot for each subvolume inside tmp and once everything works fine the back-ups are rotated and tmp is moved to hourly.0. Removing hourly.23 in the rotation phase has to remove the inner subvolumes first.

After implementing this (it was an iterative process) I realized it could be simplified to use a simpler infra-structure. “latest” would be a subvolume and everything inside it regular files and directories. Than the “tmp” directory wouldn’t be used and after rotating a snapshot of “latest” would be used to create “hourly.0”. I didn’t update the script yet because I’m not sure if it worths changing, since the current layout is more modular, which is useful in case I want to take some snapshot of just part of the back-up for some reason. So the sample back-up script in the next section will use my current tested approach, which is the situation described first above.

The main database has over 500MB in PostgreSQL custom format, and it’s much faster to rsync it than using scp. Initially those databases were not stored in the “latest” diretory and I used “scp” to copy them directly to the “tmp” directory, but I changed the strategy to save some time and bandwidth.

The script should exit with a message and non zero exit status code when something fails so that I would be notified if anything goes wrong by Cron (by setting the MAILTO=my@email.com in the beggining of the crontab file). It shouldn’t affect the existing valid snapshots either in that case.

It shouldn’t run in case the previous procedure hasn’t finish, so there’s a simple lock mechanism preventing that from happen in case it takes over an hour to complete. The second attempt will fail and I should get an e-mail telling me that happened.

It should also have a dry-run mode (which I call test mode) that will output the commands without running it, which is useful while designing the back-up steps. It should also allow for commands to run concurrently so it uses some indentation to show the order the commands are run.

Finally, it will report in the logs the issued commands and their status (finished or failed) as well as any commands output (STDOUT or STDERR) and the time each command took as well as the total time in the end of the procedure.

Finally, now that you understand what the script is supposed to do, here’s the actual implementation.

The script

1	#!/usr/bin/env ruby
2
3	require 'open3'
4	require 'thread'
5	require 'logger'
6	require 'time'
7
8	class Backup
9	def run(args)
10	@start_time = Time.now
11	@backup_root_path = File.expand_path '/var/backups'
12	#@backup_root_path = File.expand_path '~/backups'
13	@log_path = "#{@backup_root_path}/backup.log"
14	@tmp_path = "#{@backup_root_path}/tmp"
15
16	@exiting = false
17	Thread.current[:indenting_level] = 0
18
19	setup_logger
20
21	lock_or_exit
22
23	log 'Starting back-up procedure'
24
25	parse_args args.clone
26
27	run_scripts if @action == 'hourly'
28
29	rotate
30	unlock
31	report_completed
32	end
33
34	private
35
36	def setup_logger
37	File.write @log_path, '' unless File.exist? @log_path
38	logfile = File.open(@log_path, File::WRONLY \| File::APPEND)
39	logfile.sync = true
40	@logger = Logger.new logfile
41	@logger.level = Logger::INFO
42	@logger.datetime_format = '%Y-%m-%d %H:%M:%S'
43	@logger_mutex = Mutex.new
44	end
45
46	def lock_or_exit
47	if File.exist?(pidfile) && run_command("kill -0 #{pid = File.read pidfile}")
48	abort "There's another backup in progress. Pid: #{pid} (from #{pidfile})."
49	end
50	File.write pidfile, Process.pid
51	end
52
53	def unlock
54	File.unlink pidfile
55	end
56
57	def pidfile
58	@pidfile \|\|= "#{@backup_root_path}/backup.pid"
59	end
60
61	def run_command!(cmd, sucess_in_test_mode = true, abort_on_stderr: false)
62	run_command cmd, sucess_in_test_mode, abort_on_stderr: abort_on_stderr, abort_on_error: true
63	end
64
65	def run_command(cmd, sucess_in_test_mode = true, abort_on_stderr: false, abort_on_error: false)
66	indented_cmd = ' ' * indenting_level + cmd
67	Thread.current[:indenting_level] += 1
68	if @test_mode
69	@logger_mutex.synchronize{ puts indented_cmd}
70	return sucess_in_test_mode
71	end
72	start = Time.now
73	log "started: '#{indented_cmd}'"
74	stdout, stderr, status = Open3.capture3 cmd
75	stdout = stdout.chomp
76	stderr = stderr.chomp
77	success = status == 0
78	log stdout unless stdout.empty?
79	log stderr, :warn unless stderr.empty?
80	if (!success && abort_on_error) \|\| (abort_on_stderr && !stderr.empty?)
81	die "'#{cmd}' failed to run with exit status #{status}, aborting."
82	end
83	log "finished: '#{indented_cmd}' (#{success ? 'successful' : "failed with #{status}"}) " +
84	"[#{human_duration Time.now - start}]"
85	success
86	end
87
88	def indenting_level
89	Thread.current[:indenting_level]
90	end
91
92	def log(msg, level = :info)
93	return if @test_mode
94	@logger_mutex.synchronize{ @logger.send level, msg }
95	end
96
97	VALID_OPTIONS = ['hourly', 'daily', 'weekly'].freeze
98	def parse_args(args)
99	args.shift if @test_mode = (args.first == 'test')
100	unless args.size == 1 && VALID_OPTIONS.include?(@action = args.first)
101	abort "Usage: 'backup [test] action', where action can be hourly, daily or weekly.
102	If test is specified the commands won't run but will be shown."
103	end
104	end
105
106	def die(message)
107	log message, :fatal
108	was_exiting = @exiting
109	@exiting = true
110	delete_tmp_path_if_exists unless was_exiting
111	unlock
112	abort message
113	end
114
115	def create_tmp_path
116	delete_tmp_path_if_exists
117	create_subvolume @tmp_path
118	end
119
120	def create_subvolume(path, skip_if_exists = false)
121	return if skip_if_exists && File.exist?(path)
122	run_script %Q{btrfs subvolume create "#{path}"}
123	end
124
125	def delete_tmp_path_if_exists
126	delete_subvolume_if_exists @tmp_path, delete_children: true
127	end
128
129	def delete_subvolume_if_exists(path, delete_children: false)
130	return unless File.exist?(path)
131	Dir["#{path}/*"].each{\|s\| delete_subvolume_if_exists s } if delete_children
132	run_script %Q{btrfs subvolume delete -c "#{path}"}
133	end
134
135	def run_script(script)
136	run_command! script
137	end
138
139	def run_scripts(scripts = all_scripts)
140	case scripts
141	when Par
142	il = indenting_level
143	last_il = il
144	scripts.map do \|s\|
145	Thread.start do
146	Thread.current[:indenting_level] = il
147	run_scripts s
148	last_il = [Thread.current[:indenting_level], last_il].max
149	end
150	end.each &:join
151	Thread.current[:indenting_level] = last_il
152	when Array
153	scripts.each{\|s\| run_scripts s }
154	when String
155	run_script scripts
156	when Proc
157	scripts[]
158	else
159	die "Invalid script (#{scripts.class}): #{scripts}"
160	end
161	end
162
163	Par = Class.new Array
164	def all_scripts
165	[
166	Par[->{create_tmp_path}, "mkdir -p #{@backup_root_path}/latest", dump_main_db_on_d1,
167	dump_tickets_db_on_d1],
168	Par[local_docs_sync, local_tickets_files_sync, local_main_db_sync, local_tickets_db_sync],
169	Par[main_docs_script, tickets_files_script, main_db_script, tickets_db_script],
170	]
171	end
172
173	def dump_main_db_on_d1
174	%q{ssh backup@backup-server.com "pg_dump -Fc -f /tmp/main_db.dump } +
175	%q{main_db_production"}
176	end
177
178	def dump_tickets_db_on_d1
179	%q{ssh backup@backup-server.com "pg_dump -Fc -f /tmp/tickets.dump redmine_production"}
180	end
181
182	def local_docs_sync
183	[
184	->{ create_subvolume local_docmanager, true },
185	"rsync -azHq --delete-excluded --delete --exclude doc --inplace " +
186	"backup@backup-server.com:/var/main-documents/production/docmanager/ " +
187	"#{local_docmanager}/",
188	]
189	end
190
191	def local_docmanager
192	@local_docmanager \|\|= "#{@backup_root_path}/latest/docmanager"
193	end
194
195	def local_tickets_files_sync
196	[
197	->{ create_subvolume local_tickets_files, true },
198	"rsync -azq --delete --inplace backup@backup-server.com:/var/redmine/files/ " +
199	"#{local_tickets_files}/",
200	]
201	end
202
203	def local_tickets_files
204	@local_tickets_files \|\|= "#{@backup_root_path}/latest/tickets-files"
205	end
206
207	def local_main_db_sync
208	[
209	->{ create_subvolume local_main_db, true },
210	"rsync -azq --inplace backup@backup-server.com:/tmp/main_db.dump " +
211	"#{local_main_db}/main_db.dump",
212	]
213	end
214
215	def local_main_db
216	@local_main_db \|\|= "#{@backup_root_path}/latest/postgres"
217	end
218
219	def local_tickets_db_sync
220	[
221	->{ create_subvolume local_tickets_db, true },
222	"rsync -azq --inplace backup@backup-server.com:/tmp/tickets.dump " +
223	"#{local_tickets_db}/tickets.dump",
224	]
225	end
226
227	def local_tickets_db
228	@local_tickets_db \|\|= "#{@backup_root_path}/latest/tickets-db"
229	end
230
231	def main_docs_script
232	create_snapshot_cmd local_docmanager, "#{@tmp_path}/docmanager"
233	end
234
235	def create_snapshot_cmd(from, to)
236	"btrfs subvolume snapshot #{from} #{to}"
237	end
238
239	def main_db_script
240	create_snapshot_cmd local_main_db, "#{@tmp_path}/postgres"
241	end
242
243	def tickets_db_script
244	create_snapshot_cmd local_tickets_db, "#{@tmp_path}/tickets-db"
245	end
246
247	def tickets_files_script
248	create_snapshot_cmd local_tickets_files, "#{@tmp_path}/tickets-files"
249	end
250
251	LAST_DIR_PER_TYPE = {
252	'hourly' => 23, 'daily' => 6, 'weekly' => 3
253	}.freeze
254	def rotate
255	last = LAST_DIR_PER_TYPE[@action]
256	path = ->(n, action = @action){ "#{@backup_root_path}/#{action}.#{n}" }
257	delete_subvolume_if_exists path[last], delete_children: true
258	n = last
259	while (n -= 1) >= 0
260	run_script "mv #{path[n]} #{path[n+1]}" if File.exist?(path[n])
261	end
262	dest = path[0]
263	case @action
264	when 'hourly'
265	run_script "mv #{@tmp_path} #{dest}"
266	when 'daily', 'weekly'
267	die 'last hourly back-up does not exist' unless File.exist?(hourly0 = path[0, 'hourly'])
268	create_tmp_path
269	Dir["#{hourly0}/*"].each do \|subvolume\|
270	run_script create_snapshot_cmd subvolume, "#{@tmp_path}/#{File.basename subvolume}"
271	end
272	run_script "mv #{@tmp_path} #{dest}"
273	end
274	end
275
276	def report_completed
277	log "Backup finished in #{human_duration Time.now - @start_time}"
278	end
279
280	def human_duration(total_time_sec)
281	n = total_time_sec.round
282	parts = []
283	[60, 60, 24].each{\|d\| n, r = n.divmod d; parts << r; break if n.zero?}
284	parts << n unless n.zero?
285	pairs = parts.reverse.zip(%w(d h m s)[-parts.size..-1])
286	pairs.pop if pairs.size > 2 # do not report seconds when irrelevant
287	pairs.flatten.join
288	end
289	end
290
291	Backup.new.run(ARGV) if File.expand_path($PROGRAM_NAME) == File.expand_path(__FILE__)

So, this is what I get running the test mode:

1	$ ruby backup.rb test hourly
2	btrfs subvolume create "/home/rodrigo/backups/tmp"
3	mkdir -p /home/rodrigo/backups/latest
4	ssh backup@backup-server.com "pg_dump -Fc -f /tmp/main_db.dump main_db_production"
5	ssh backup@backup-server.com "pg_dump -Fc -f /tmp/tickets.dump redmine_production"
6	btrfs subvolume create "/home/rodrigo/backups/latest/docmanager"
7	btrfs subvolume create "/home/rodrigo/backups/latest/tickets-files"
8	btrfs subvolume create "/home/rodrigo/backups/latest/postgres"
9	btrfs subvolume create "/home/rodrigo/backups/latest/tickets-db"
10	rsync -azHq --delete-excluded --delete --exclude doc --inplace backup@backup-server.com:/var/main-documents/production/docmanager/ /home/rodrigo/backups/latest/docmanager/
11	rsync -azq --delete --inplace backup@backup-server.com:/var/redmine/files/ /home/rodrigo/backups/latest/tickets-files/
12	rsync -azq --inplace backup@backup-server.com:/tmp/main_db.dump /home/rodrigo/backups/latest/postgres/main_db.dump
13	rsync -azq --inplace backup@backup-server.com:/tmp/tickets.dump /home/rodrigo/backups/latest/tickets-db/tickets.dump
14	btrfs subvolume snapshot /home/rodrigo/backups/latest/tickets-db /home/rodrigo/backups/tmp/tickets-db
15	btrfs subvolume snapshot /home/rodrigo/backups/latest/tickets-files /home/rodrigo/backups/tmp/tickets-files
16	btrfs subvolume snapshot /home/rodrigo/backups/latest/postgres /home/rodrigo/backups/tmp/postgres
17	btrfs subvolume snapshot /home/rodrigo/backups/latest/docmanager /home/rodrigo/backups/tmp/docmanager
18	mv /home/rodrigo/backups/tmp /home/rodrigo/backups/hourly.0

The “all_scripts” method is the one you should adapt for your needs.

Final notes

I hope that script will help you serving as a base for your own back-up script in Ruby in case I was able to convince you to give this strategy a try. Unless you are already using some robust back-up solution such as Bacula or other advanced systems, this strategy is very simple to implement, takes little space and allows for fast incremental backups and might interest you.

Please let me know if you have any questions in the comments section or if you’d suggest any improvements over it. Or if you think you’ve found a bug I’d love to hear about it.

Good luck dealing with your back-ups. :)

Akita's Manga Downloadr Elixir vs Ruby performance revisited

2016-06-20T11:40:00+00:00

Two weeks ago I read an article from Fabio Akita comparing the performance of his Manga Downloadr implementations in Elixir, Crystal and Ruby.

From a quick glance at its source code it seems the application consisted mostly of downloading multiple pages and another minor part would take care of parsing the HTML and extracting some location paths and attributes for the images. At least, this was the part that was being tested in his benchmark. I found it very odd that the Elixir version would finish in about 15s while the Ruby version would take 27s to complete. After all, this wasn’t a CPU bound application but an I/O bound one. I would expect that the same design implemented in any programming language for this kind of application should take about the same time in whatever chosen language. Of course the HTML parser or the HTTP client implementations used on each language could make some difference but the Ruby implementation took almost twice the time taken by the Elixir implementation. I was pretty much confident it had to be a problem with the design rather than a difference in the raw performance among the used languages.

I had to prepare a deploy for the past two weeks which happened last Friday. Then on Friday I decided to take a few hours to understand what the test mode was really all about and rewrote the Ruby application with a proper design for this kind of application taking Ruby’s limitations (specially MRI’s ones) in mind with focus on performance.

The new implementation can be found here on Github.

Feel free to give it a try and let me know if you can think of any changes that could potentially improve the performance in any significant way. I have a few theories my self, like using a SAX parser rather than performing the full parsing, among a few other improvements I can think of, but I’m not really sure whether the changes would be significant given that most of the time is actually spent on network data transfer using a slow connection (about 10MBbps in my case), if we compare to the time needed to parse those HTMLs.

The numbers

So, here are the numbers I get with a 10MBps Internet connection and an AMD Phenom II X6 1090T, with 6 cores at 3.2GHz each:

Elixir: 13.0s (best time, usually ranges from 13.0-16s)
JRuby: 12.3s (best time, usually ranges from 12.3-16s)
MRI: 10.9s (best time, usually ranges from 10.9-16s)

As I suspected, they should perform about the same. JRuby needs 1.8s just to boot the JVM ( measured with time jruby –dev -e ‘’), which means it actually takes about the same as MRI if we don’t take the boot time into consideration (which is usually the case when the application is running a long-lived daemon like a web server).

For JRuby threads are used to handle concurrency while in MRI I was forced to use a pool of forked processes to handle HTML parsing and write some simplified Inter-Process Communication (IPC) technique which is suitable for this particular test case but may not apply to others. Writing concurrent code in Ruby could be easier but for MRI it’s specially hard once you want to use all cores because I find it much easier to write multi-threaded code than to deal with forked processes and special IPC that is not as trivial to write as using threads that share the same memory. You are free to test the performance of other approaches in MRI, like the threaded one, or always forking rather than using a pool of forked processes, changing the amount of workers both for the downloader as well as for the forked pool (I use 6 processes in the pool that parses the HTML since I have 6 cores in my CPU).

I have always been disappointed by the sad state of real concurrency in MRI due to the GIL. I’d love to have a switch to disable the GIL completely so that I would able to benchmark the different approaches (threads vs forks). Unfortunately, this is not possible in MRI or JRuby because MRI has the GIL and JRuby doesn’t handle forking well. Also, Nokogiri does not perform the same in MRI and JRuby, which means there are many other variables involved that running an application using forks in MRI cannot be really compared to run it against JRuby using the multi-threaded approach because the difference in the design is not the only one happening.

When I really need to write some CPU bound code that would benefit from running on all cores I often do it in JRuby since I find it easier to deal with threads rather than spawn processes. Once I had to create an application similar to Akita’s Manga Downloader in test mode and I wrote about how JRuby saved my week exactly due to it enabling real concurrency. I really think MRI team should take real concurrency needs more seriously or it might become irrelevant in the languages and frameworks war. Ruby usually gives us options, but we don’t really have an option to deal with concurrent code in MRI as the core developers believe forking is just fine. Since Ruby usually strives for its simplicity I find this awkward since it’s usually much easier to write multi-threaded code than dealing with spawn processes.

Back to the results of the timing comparison between Elixir and Ruby implementations, of course, I’m not suggesting that Ruby is faster than Elixir. I’m pretty sure the design of the Elixir implementation can be improved as well to get a better time. I’m just demonstrating that for this particular use case of I/O bound applications the raw language performance usually does not make any difference given a proper design. The design is by far the most important feature when working on performance improvements of I/O bound applications. Of course it’s also important for CPU bound applications, but what I mean is that the raw performance is often irrelevant for I/O bound applications while the design is essential.

So, what’s the point?

There are many features one can use to sell another language but we should really avoid the trap of comparing raw performance because it hardly matter for most of the applications web developers work with, if they are the target audience. I’m pretty sure Elixir has great sell points, just like Rust, Go, Crystal, Mirah and so on. I’d be more interested in learning about the advantages of their eco-systems (tools, people, libraries) and how they allow to write good designed software in a better way. Or how they excel in exceptions handling. Or how easy it is to write concurrent and distributed software with them. Or how robust and fault tolerant they are. Or how they can help getting zero down-times during deploy, or how fast the applications would boot (this is one of the raw performance cases where it can matter). How well documented they are and how amazing are their communities. How one can easily debug and profile applications in these environments or how easily they can test something in a REPL, or write automated tests, manage dependencies. How well autoreloading work in the development mode and so on. There are so many interesting aspects of a language and its surrounding environment that I find it frustrating every time I see someone trying to sell a language by comparing the raw performance as it often does not matter in most cases.

Look, I’ve worked with fast hard real-time systems (running on Linux with real-time patches such as Xenomai or RTAI) during my master thesis and I know that raw performance is very important for a broad set of applications, like Robotics, image processing, gaming, operating systems and many others. But we have to understand whom we are talking to. If the audience is web development raw performance simply doesn’t matter that much. This is not the feature that will determine whether your application will scale to thousands of requests per second. Architecture/design is.

If you are working with embedded systems or hard real time systems it makes sense to use C or some other language that does not rely on garbage collectors (as it’s hard to implement a garbage collector with hard timing constraints). But please forget about raw performance for the cases where it doesn’t make much difference.

If you know someone who got a degree in Electrical Engineering, like me, and ask them, you’ll notice it’s pretty common to perform image processing in Matlab, which is an interpreted language and environment to prototype algorithm designs. It’s focused on operations involving matrix and they are pretty fast since they are compiled and optimized. Which allows engineers to quickly test different designs without having to write each variation in C. Once they are happy with the design and performance of the algorithm they can go a step further and implement it in C or use one of the Matlab tools that would try to perform this step automatically.

Engineers are very pragmatic. They want to use the best tools for their jobs. That means a scripting language should be preferred over a static one during the design/prototype phase as it allows faster feedback and iterative loop. Sometimes the performance they get with Matlab is simply fast enough for their needs. The same happens with Ruby, Python, JS and many other languages. They could be used for prototypes or they could be enough for the actual application.

Also, one can start with them and once the raw performance becomes a bottleneck they are free to convert that part to a more efficient language and use some sort of integration to delegate the expensive parts to them. If there are many parts of the application that would require such approach to be taken, then it becomes a burden to maintain it and one might consider moving the complete application to another language to reduce the complexity.

However, this is not my experience with web applications in all past years I’ve been working as a web developer. Rails usually takes about 20ms per request as measured by nginx in production while DNS, network transfer, JS and other related jobs may take a few seconds which means the 20ms spent in the server is simply irrelevant. It could be 0ms and it wouldn’t make any difference to the user experience.

Getting an SPA to load the fastest possible way (and how Webpack can help you)

2016-02-29T11:08:00+00:00

This article assumes you completely understand all performance trade-offs related to each available technique to load scripts and how to modularize them. I’d highly recommend you to read another article I wrote just to explain them here.

Motivation

Feel free to skip this section if you are not interested in the background.

I’ve been using a single JS and a single CSS for my application for a long time.

I’ve optimized the code a lot, by lazily running some parts and performing all best practices with regards to how to load the resources, minifying them, gzipped them, caching them and so on and, still, every week about 10% of the users won’t meet the SLA that says the page should load in within 5s. Some users would load the page under a second even when the resources were not cached.

To be honest, it’s not really defined under which conditions a user should be able to load the application in under 5s, so I use the worst scenario to measure this time. After the page is fully loaded I send the data from the resources timing API to the back-end so that I can extract some statistics later, since NewRelic is too limiting for this kind of information. Here’s how it works in our application. The user logs in another application which will provide a link to ours, containing an authentication token, which we’ll parse, verify and redirect to the root address. I use the time provided by the resources timing API, which will include this redirect.

It should be noticed that any actions in the server-side take about 10-20ms, accordingly to the nginx logging (for the actions related to page loading - opening a transaction or searching the database might take 1s in the server-side for example, depending on the criteria). This means most of the time is spent outside the server and are influenced by latency, network bandwidth between the client and server, CDN, presence of cached resources and so on. Of course, running the JS code itself already contributes to the total time, but this part was already highly optimized before switching away from Sprockets. Half of the accesses were able to run all JS loading code in up to 637ms. 90% up to 1.3s. 3% loaded between 2 and 2.2s. That means that for the slowest client all network operations should complete in about 2.8s, including DNS lookup, redirect and bytes transfer. I can’t make those browser run faster and I can’t save more than 20ms in the server-side, so my best option is to reduce the amount of data that should be transferred from the server to the client, as I don’t have much control over our collocation service provider (Cogent - NY), or the client Internet provider or our CDN provider (CloudFront).

But I can choose which libraries to use and which code to include in the initial page loading. When working with performance improvements the first step is always measuring. I created an application to provide me the analytics I needed to understand the page loading performance so that I could confirm that I should be now focusing on the download size. To give you an idea, the fastest access to our application in the last week was 692ms, from an user accessing from London. The resources were already in cache in this request, the main document loaded in 244ms and the JS code ran in 301ms, using IE10. No redirect happened for this request.

Here’s another sample for a fast page load including redirect and non cached resources. Some user from NY loaded the full application in 1.09s. 304ms were spent on redirect, 34ms to load the main document 107ms to load the CSS and 129ms to load the JS (JS and CSS are loaded in parallel). It took 479ms for IE11 to process the scripts in this requests.

Now, let’s take a look in a request which took 8.8s to load to understand why it took so long. This request used 6s to load the same JS from the same location (NY) while the redirect took 1.9s. The CSS took 4.3s to load. And this is not a mobile browser, but IE11, and it’s a fast computer as the scripts took only 453ms to run. When I take a closer look at the other requests taking over 5s, I can confirm the bad network performance is the main reason for this.

If I want to make them load under 5s I must reduce the amount of data they are downloading. After noticing that I realized sprockets was in my way for this last bit of performance improvement. I had already cut a lot of vendored code which were big and I only used a small part of them, so it was time I had to cut out part of the application code. Well, actually the plan was to post-pone its loading to when they were needed, for example, after the user made some action like clicking some button or link. In other words, I was looking for code splitting and I’d had to implement it on my own if I were to keep using my current stack (Sprockets by that time, or the Rails Assets Pipeline) but I decided to switch to another better tool as I also wanted source-maps support and other features I couldn’t get with Sprockets.

Source maps are very important to us because we report any JS errors to our servers including backtraces for future analysis and having the source-maps available makes it much easier to figure out the exact place an exception happened.

Goals

In the context of big single page applications, the ideal resources build tool should be able to:

support code modularization (understands AMD, CommonJS, allows easy shimming and features to integrate with basically any third-party library without having to modify their sources);
concatenate sources in bundles, which should be optimized to avoid missing all cache upon frequent deploys;
support code splitting (lazy code loading) - Sprockets and many other tools do not support this, which would require each developer to roll their own solution) to not force the user to download more code than what is required for the initial page rendering;
minify JS and CSS for production environments;
provide a fast watch mode for development mode;
provide source maps;
allow CSS to be embedded in JS bundles as well as allowing a separate CSS file (more on that in the following sections);
support CSS and JS preprocessors/compilers, like Babel, CoffeeScript, SASS, templating languages and so on;
support filenames containing content-based hashes to support permanent caching;
provide great integration with NPM and bower packages;
fast build time for the production-ready configuration to speed up deploys though the usage of persistent caching (on disk, Redis or memcached, for example);

Webpack was the only solution I was able to find which supported all of the items above except for the last one. Sprockets and other solutions are able to handle persistent cache to speed up the final build and consequently the deploy process. Unfortunately the deploy will be a bit slow with webpack, but at least the application should be highly optimized for performance.

If you are aware of other tools that allow the same techniques discussed in this article to be implemented, please let me know in the comments, if possible with examples on how to reproduce the set-up presented in this article.

The webpack set-up

This article is already very long, so I don’t intend it to become a webpack tutorial. Webpack has an extensive documentation about most of what you’ll need and I’ll try to cover here the parts which are not covered by the documentation and the tricks I had to implement to make it meet the goals I stated above.

The first step is to create some webpack.config.js configuration file and to install webpack (which also means installing npm and node.js). I decided to create a new directory under app-root/app/resources and perform these commands there:

1	sudo apt-get install nodejs npm
2	# I had to create a symlink in /usr/bin too on Ubuntu/Debian to avoid some problems with some
3	# npm packages. Feel free to install node.js and npm from other means if you prefer
4	cd /usr/bin && sudo ln -s nodejs node
5	mkdir -p app/resources
6	cd app/resources
7	# you should use --save when installing packages so that they are added to package.json
8	# automatically. I also use npm shwrinkwrap to generate a npm-shrinkwrap.json file which
9	# is similar to Gemfile.lock for the bundler Ruby gem
10	npm init
11	npm install webpack --save
12	npm install bower --save
13	bower install jquery-ui --save
14	# there are many other dependencies, please check the package.json sample below for more
15	# required dependencies

The build resources would be generated in app-root/public/assets and the test files under app-root/public/assets/specs. It looks for resources in app/resources/src/js, app/resources/node_modules, app/resources/bower_components, app/assets/javascripts, app/assets/stylesheets, app/assets/images and a few other paths.

webpack.config.js:

1
2	var webpack = require('webpack');
3	var glob = require('glob');
4	var merge = require('merge');
5	var fs = require('fs');
6	var path = require('path');
7	// the AssetsPlugin generates the webpack-assets.json, used by the backend application
8	// to find the generated files per entry name
9	var AssetsPlugin = require('assets-webpack-plugin');
10
11	var PROD = JSON.parse(process.env.PROD \|\| '0');
12	var BUILD_DIR = path.resolve('../../public/assets');
13
14	var mainConfig = {
15	context: __dirname + '/src'
16	,output: {
17	publicPath: '/assets/'
18	, path: BUILD_DIR
19	, filename: '[name]-[chunkhash].min.js'
20	}
21	,resolveLoader: {
22	alias: { 'ko-loader': __dirname + '/loaders/ko-loader' }
23	, fallback: __dirname + '/node_modules'
24	}
25	,module: {
26	loaders: [
27	{ test: /\.coffee$/, loader: 'coffee' }
28	, { test: /\.(png\|gif\|jpg)$/, loader: 'file'}
29	// it's possible to specify that some files should be embedded depending on their size
30	//, { test: /\.png$/, loader: 'url?limit=5000'}
31	, { test: /\.eco$/, loader: 'eco-loader' }
32	, { test: /knockout-latest\.debug\.js$/, loader: 'ko-loader' }
33	, { test: /jquery-ujs/, loader: 'imports?jQuery=jquery'}
34	]
35	}
36	, devtool: PROD ? 'source-map' : 'cheap-source-map'
37	, plugins: [ new AssetsPlugin() ]
38	, cache: true // speed up watch mode (in-memory caching only)
39	, noParse: [ 'jquery'
40	, 'jquery-ui'
41	]
42	, resolve: {
43	root: [
44	path.resolve('./src/js')
45	, path.resolve('../assets/javascripts')
46	, path.resolve('../assets/stylesheets')
47	, path.resolve('../assets/images')
48	, path.resolve('../../vendor/assets/javascripts')
49	, path.resolve('../../vendor/assets/stylesheets')
50	, path.resolve('../../vendor/assets/images')
51	, path.resolve('./node_modules')
52	, path.resolve('./bower_components')
53	]
54	, entry: { 'app/client': ['client.js']
55	, 'app/internal': ['internal.js']
56	// other bundles go here... Since internal.js requires client.js and it's also a bundle
57	// entry, webpack will complain unless we put the dependency as an array (internal details)
58	}
59	, alias: {
60	// this is required because we are using jQuery UI from bower for the time being
61	// since the latest stable version is not published to npm and also because the new beta,
62	// which is published to npm introduces lots of incompatibilities with the previous version
63	'jquery.ui.widget$': 'jquery-ui/ui/widget.js'
64	}
65	};
66
67	// we save the current loaders for use with our themes bundles, as we'll add additional
68	// loaders to the main config for handling CSS and CSS is handled differently for each config
69	var baseLoaders = mainConfig.module.loaders.slice()
70
71	var themesConfig = merge.recursive(true, mainConfig);
72
73	// this configuration exists to generate the initial CSS file, which should be minimal, just
74	// enough to load the "Loading page..." initial layout as well as the theme specific rules
75	// for the main config we embed the CSS rules in the JS bundle and add the style tags
76	// dynamically to the DOM because the initial CSS will block the page rendering and we want
77	// to display the "loading..." information as soon as possible.
78
79	themesConfig.entry = { 'app/theme-default': './css/themes/default.js'
80	, 'app/theme-uk': './css/themes/uk.js'
81	};
82
83	var ExtractTextPlugin = require('extract-text-webpack-plugin');
84	themesConfig.plugins.push(new ExtractTextPlugin('[name]-[chunkhash].css'));
85
86	var cssExtractorLoader = path.resolve('./loaders/non-cacheable-extract-text-webpack-loader.js') +
87	'?' + JSON.stringify({omit: 1, extract: true, remove: true }) + '!style!css';
88
89	themesConfig.module.loaders.push(
90	{ test: /\.scss$/,
91	// code splitting and source-maps don't work well together when using relative paths
92	// in a background url for example. That's why source-maps are not enabled for SASS
93	loader: cssExtractorLoader + '!sass'
94	}
95	, { test: /\.css$/, loader: cssExtractorLoader }
96	);
97
98	mainConfig.module.loaders.push(
99	{ test: /\.scss$/, loaders: ['style', 'css', 'sass'] }
100	, { test: /\.css$/, loaders: ['style', 'css'] }
101	);
102
103	module.exports = [ mainConfig, themesConfig ]
104
105	if (!PROD) { // process the specs bundles - webpack must be restarted if a new spec file is created
106	var specs = glob.sync('../../spec/javascripts-src/*/_spec.js*');
107	var entries = {};
108	specs.forEach(function(s) {
109	var entry = s.replace(/.javascripts-src\/(.)\.js.*/, '$1');
110	entries[entry] = path.resolve(s);
111	});
112	var specsConfig = merge.recursive(true, mainConfig, {
113	output: { path: path.resolve('../../public/assets/specs')
114	, publicPath: '/assets/specs/'
115	, filename: '[chunkhash]-[name].min.js'
116	}
117	});
118	specsConfig.entry = entries;
119	specsConfig.resolve.root.push(path.resolve('../../spec/javascripts-src'));
120	module.exports.push(specsConfig);
121	};
122
123	mainConfig.entry.vendor = ['jquery'
124	, 'jquery-ujs'
125	, 'knockout'
126	// those jquery-ui-*.js were created to include the required CSS as well since the jquery-ui
127	// integration from the bower package is not perfect
128	, 'jquery-ui-autocomplete.js'
129	, 'jquery-ui-button.js'
130	, 'jquery-ui-datepicker.js'
131	, 'jquery-ui-dialog.js'
132	, 'jquery-ui-resizable.js'
133	, 'jquery-ui-selectmenu.js'
134	, 'jquery-ui-slider.js'
135	, 'jquery-ui-sortable.js'
136	, 'lodash/intersection.js'
137	, 'lodash/isEqual.js'
138	, 'lodash/sortedUniq.js'
139	, 'lodash/find.js'
140	, './js/vendors-loaded.js' // the application code won't run until window.VENDORS_LOADED is true
141	// which is set by vendors-loaded.js. This was implemented so that those bundles could be
142	// downloaded asynchronously
143	];
144
145	mainConfig.plugins.push(new webpack.optimize.CommonsChunkPlugin({ name: 'vendor'
146	, filename: 'vendor-[chunkhash].min.js'
147	, minChunks: Infinity
148	}));
149
150	// prepare entries for lazy loading without losing the source-maps feature
151	// we replace webpackJsonp calls with webpackJsonx and implement the latter in an inline
152	// script in the document so that it waits for the vendor script to finish loading
153	// before running the webpackJsonp with the received arguments. Webpack doesn't support
154	// async loading of the commons and entry bundles out of the box unfortunately, so this is a hack
155	mainConfig.plugins.push(function() {
156	this.plugin('after-compile', function(compilation, callback){
157	for (var file in compilation.assets) if (/\.js$/.test(file) && !(/^vendor/.test(file))) {
158	if (/^(\d+\.)/.test(file)) continue;
159	var children = compilation.assets[file].children;
160	if (!children) continue;
161	// console.log('preparing ' + file + ' for async loading.');
162	var source = children[0];
163	source._value = source._value.replace(/^webpackJsonp/, 'webpackJsonx');
164	}
165	callback();
166	});
167	});
168
169	mainConfig.plugins.push(function() {
170	// clean up old generated files since they are not overwritten due to the hash in the filename
171	this.plugin('after-compile', function(compilation, callback) {
172	for (var file in compilation.assets) {
173	var filename = compilation.outputOptions.path + '/' + file;
174	var regex = /-[0-9a-f]*.(((\.min)?\.js\|\.css)(\.map)?)$/;
175	if (regex.test(filename)) {
176	var files = glob.sync(filename.replace(regex, '-*$1'));
177	files.forEach(function(fn) { if (fn !== filename) fs.unlinkSync(fn); });
178	};
179	}
180	callback();
181	});
182	});
183
184	if (PROD) [mainConfig, themesConfig].forEach(function(config) {
185	config.plugins.push(new webpack.optimize.UglifyJsPlugin({ minimize: true
186	, compress: { warnings: false } }));
187	});
188

loaders/ko-loader.js:

1	// Allow KO to work with jQuery without requiring jQuery to be exported to window
2	module.exports = function(source) {
3	this.cacheable();
4	return source.replace('jQueryInstance = window["jQuery"]', 'jQueryInstance = require("jquery")');
5	};

loaders/non-cacheable-extract-text-webpack-loader.js (required due to a webpack bug):

1	var ExtractTextLoader = require("extract-text-webpack-plugin/loader");
2
3	// we're going to patch the extract text loader at runtime, forcing it to stop caching
4	// the caching causes bug #49, which leads to "contains no content" bugs. This is
5	// risky with new version of ExtractTextPlugin, as it has to know a lot about the implementation.
6
7	module.exports = function(source) {
8	this.cacheable = false;
9	return ExtractTextLoader.call(this, source);
10	}
11
12	module.exports.pitch = function(request) {
13	this.cacheable = false;
14	return ExtractTextLoader.pitch.call(this, request);
15	}

Here’s how jquery-ui-autocomplete.js looks like (the others are similar):

1	require('jquery-ui/ui/autocomplete.js');
2	require('jquery-ui/themes/base/core.css');
3	require('jquery-ui/themes/base/theme.css');
4	require('jquery-ui/themes/base/menu.css');
5	require('jquery-ui/themes/base/autocomplete.css');

jQuery UI was installed from bower and lives in bower_components/jquery-ui.

Here’s how my package.json looks like:

1	{
2	"name": "sample-webpack",
3	"version": "0.0.1",
4	"dependencies": {
5	"assets-webpack-plugin": "^3.2.0",
6	"bower": "^1.7.7",
7	"bundle-loader": "^0.5.4",
8	"coffee-loader": "^0.7.2",
9	"coffee-script": "^1.10.0",
10	"css-loader": "^0.14.5",
11	"eco-loader": "^0.1.0",
12	"es5-shim": "^4.4.1",
13	"exports-loader": "^0.6.2",
14	"expose-loader": "^0.7.1",
15	"extract-text-webpack-plugin": "^1.0.1",
16	"file-loader": "^0.8.5",
17	"glob": "^7.0.0",
18	"imports-loader": "^0.6.5",
19	"jquery": "^1.12.0",
20	"jquery-deparam": "^0.5.2",
21	"jquery-ujs": "^1.1.0-1",
22	"knockout": "^3.4.0",
23	"lodash": "^4.3.0",
24	"merge": "^1.2.0",
25	"node-sass": "^3.4.2",
26	"raw-loader": "^0.5.1",
27	"sass-loader": "^3.1.2",
28	"script-loader": "^0.6.1",
29	"sinon": "^1.17.3",
30	"style-loader": "^0.13.0",
31	"url-loader": "^0.5.7",
32	"webpack": "^1.12.12",
33	"webpack-bundle-size-analyzer": "^2.0.1",
34	"webpack-dev-server": "^1.14.1",
35	"webpack-sources": "^0.1.0"
36	},
37	"scripts": {
38	"start": "webpack-dev-server -d --colors"
39	}
40	}

I told you. It took me about a week to perform this migration ;)

But believe on me. It worths.

Just run “node_packages/.bin/webpack -w” to enable the watch mode. I’d recommend adding “node_packages/.bin” to PATH in .bashrc so that you can simply run webpack, bower without specifying the full path. For the production build, simply run “PROD=1 webpack”.

Vim users should set backupcopy to yes (default is auto) otherwise the watch mode won’t detect all file changes as sometimes Vim would move the back-up and create a new copy which is not detected by the watch mode. See more details here.

If you are experiencing other issues with the watch mode, please check the Troubleshooting section of Webpack documentation.

Back-end integration

If you’re interested in integrating to Rails, you can stop reading here and jump to the Rails integration section of this article. Or if you’d like to get a concrete example. Otherwise, here are the general rules for integrating to your backend.

Webpack will generate a webpack-assets.json file due to the assets-webpack-plugin, which allows us to get the generated bundle full name with the chunk hash included so that we can use it to pass to the script src attribute. The configuration above would generate 3 bundles. One for common libraries, other for clients and another for internal users (containing some additional features not available to client users).

Here’s some incomplete JavaScript code demonstrating how it works:

1	APP_ROOT = '/fill/in/here';
2
3	WEBPACK_MAPPING = APP_ROOT + '/app/resources/webpack-assets.json';
4
5	var mapping = JSON.parse(require('fs').readSync(WEBPACK_MAPPING));
6	var vendorPath = mapping['vendor']['js'];
7	var clientPath = mapping['app/client']['js'];
8	var defaultThemePath = mapping['app/theme-default']['css'];

Then, it’s used like this in the page:

1	rel="stylesheet" href="<%= themePath " %>" />
2
3
13
14
17
18
20

Specifying dependencies in the code

Webpack has good documentation on how it detects the code dependencies so I won’t get into the details but will only demonstrate two common usages. One for a regular require, which will concatenate the code and another for code splitting usage.

Take this code for example:

1	var $ = require('jquery');
2	var app = require('app.js');
3	app.load();
4	$(document).on('click', '#glossary', function() {
5	require.ensure(['glossary.js.coffee'],
6	function() {
7	require(['glossary.js.coffee'], function(glossary){ glossary.load() })
8	},
9	'glossary'
10	);
11	});

The require.ensure call is not really required but it allows you to give the lazy chunk a name which is useful if you want to add other files to the same chunk in other parts of the code.

In that example, jquery will go to the vendors bundle, app.js will go into the app bundle and glossary.js (and any other files added to that chunk) will be lazily loaded by the application. You can even preload it after initializing the application so that the click happens faster when the user click on the #glossary element.

Some numbers

Well, after all this text you must be wondering whether it really worths, so let me show you some numbers for my application.

Before those changes, there was a single JS file which was 864 KB (286 KB gzipped). If we consider the case where the user took 6s to load this file, I think it’s fair to emulate throttling for Regular 3G (750 kb/s 100 ms RTT) in the Chrome dev tool. I’ve also enabled the film-strip feature. After disabling cache, the first initial rendering (for the “loading…” state) happened at 1.16s while the application was fully loaded at 5.26s. It also took 594 ms to load the 74.2 KB CSS file (17.7KB gzipped).

Now, after enabling code splitting, and reducing the initial CSS, here are the numbers. Now the initial “loading…” state was rendered at 499ms and the page was fully loaded at 4.7s. The CSS file is now 7.2 KB (2.4 KB gzipped) and the JS files are 498 KB (169 KB) gzipped for vendor and 259 KB (77.8 KB gzipped) for the app bundle. Unfortunately I couldn’t cut much more application code in my case and most of the code is from vendored libraries, but I think there’s still room to improve now that webpack is in place. So, whether it worths or not for you to go through all these changes will depend on the percentage of your code which is required for the initial full page rendering, and on the frequency you deploy (I deploy very often, so just the ability of creating a commons bundle is good enough to justify this set-up).

Just for the sake of completeness, I’ll also show you the numbers with cache enabled and with throttling disabled.

With cache enabled, the initial render happened at 496ms and the page was fully loaded by 1.35s in Regular 3G throttling mode for the webpack version. If I disable throttling, with a 10 Mbps Internet connection and accessing the NY servers from Brazil I get 354ms for the initial rendering and 1.22s for the full load. If I disable the cache and throttling I get 445ms and 2.03s.

For the sprockets version, the initial render happened at 846ms and the page was fully loaded by 1.74s in Regular 3G throttling mode. If I disable throttling I get 553ms for the initial rendering and 1.48s for the full load. If I disable the cache and throttling I get 740ms and 2.80s.

Actually, those numbers are both for webpack, as I am no longer able to test the sprockets version. But I’m calling it sprockets anyway because the first approach should be feasible with sprockets. But after moving to webpack I was able to more easily extract only the parts we use from jQuery UI and replace underscore with lodash to use only the parts we need and I’ve also got rid of some other big libraries in the process. Before those changes the app bundle was 1.2MB minified (376KB gzipped), so I was able to reduce the amount of transferred data to about 65% of what it used to be, but it wouldn’t be fare to compare those numbers because in theory it should be possible to achieve a lot of this reduction without dropping sprockets.

But in our case, we were able to improve the page loading speed after moving to webpack even before applying code splitting due to the flexibility it provides us which I find easier to take advantage of when compared to how we used the assets from sprockets.

And now we’re able to use the source-maps for both debugging in the production environment but specially to understand the stack-traces when JS exceptions are thrown.

If you have any questions please write them in the comments or send me an e-mail and I’ll try to help if I can.

Scripts loading trade-offs: a performance analysis

2016-02-29T11:00:00+00:00

This has been written to serve as some background for two other articles focused on SPA performance:

I’ve been developing Single Page Applications (SPA) since 2009 and I can tell you something for sure. Developing web applications is hard! If you are a full-stack developer like me you have to learn about relational databases, caching technology (Redis, Memcached), a server-side language and framework, sometimes other kind of databases, full-text search (Solr, ElasticSearch), server configuration and automation tools (Chef/Puppet/Ansible), deploy tools (Capistrano), continuous integration, automatic test coverage, network infrastructure, http proxy configuration, load balancers, back-up and monitoring services, just to name a few.

But even if we leave all these technologies out and only focus on front-end development, then, it’s still hard! JavaScript is not a great language and I certainly do not like the language at all but we don’t really have any affordable options since it’s all web browsers understand and if you want your application to load fast you must use JavaScript and you must learn it and learn it well.

Code modularization in JavaScript

But particularly the lack of some sort of require/import mechanism built into the language is the worst part of the language by far and the reason why people spend so many time just to figure out some way to implementing modularization as the code gets big and this will often happen soon when implementing an SPA.

On the other side, the require mechanism when applied to a client-server architecture where the code is stored in the server-side (which is how browsers work) is much trickier than it is for most languages which assume the code is locally available. In such architecture, if you want your code to load as fast as possible you should be worried about transferring only the required bits as you need them. Requiring code on demand is possible in many languages, like Ruby, but in JavaScript it is even more tricky because JavaScript doesn’t allow threaded code (workers only popped up very recently) and works by processing events, one at a time, the so called async programming.

This means a require in JavaScript should also work asynchronously (Node.js is a different beast as it allows some code to work synchronously by blocking the code execution until the operation of the function is finished while any I/O operation in the browser is implemented asynchonously). I just don’t think this is an excuse for JavaScript not providing such mechanism out of the box, but this is not an article to say bad things about JavaScript. There are already tons of those out there, I’m just explaining why modularization is a complex subject in JavaScript and front-end development.

Solutions to JS modularization

There are many attempts to implement code modularization in JavaScript. I won’t get into the details since there are many articles covering only this subject. If you are curious you can search about CommonJS, Require.js, AMD and JavaScript modularization in general. I’m just going to review the solutions from a higher level perspective, and talk about their trade-offs as it’s important to understand them in order to explain how to load applications fast.

Sequence of script tags

When JavaScript was first introduced in Netscape people would simply add each module to the page by adding a script tag in the header for each module. This will block the page rendering until the scripts are downloaded and a user navigating to the site will see a blank page until all script sources are downloaded and executed. When you have big scripts and bad network bandwidth (which is specially true for mobile devices running on 2G, 3G and even 4G) it leads to a really bad user experience.

The main advantage of this approach is that it’s easy to set up and understand, since the scripts are executed in the specified order of the script tags. If your links and buttons depend on the scripts to work properly (which is usually the case) then, by putting the scripts in the document head you wouldn’t have to worry about that. This is the simplest solution to develop. But it’s also the one that will perform worst.

Even if you decide to put your scripts in the end of the page, it’s still a problem if you want your page to load really fast. That’s because it will delay the DOMContentLoaded and Load DOM events and if part of your code is listening on those events it means they will have to wait until all scripts are downloaded and executed. If your code doesn’t depend on those events and if your page is fully functional even before the scripts are downloaded (links and buttons work as expected) then it might be a good strategy for your case, if you target browsers supporting HTTP 2 since it allows you to have great control over per module caching so that if your users visit your application very often and you only change a few files in a new deploy then those users would only have to download the changed files with proper caching headers in place.

But most browsers will limit the amount of concurrent resources download, which means that if your application depend on many script tags they won’t be all downloaded in parallel, which can introduce some additional time to the application loading.

Another drawback for the approach of putting the scripts in the end of the document body is that their download will only start after the document download is mostly completed. This is not a big deal if your document is small, but if takes 1 second just to finish downloading your main document it means your scripts will only start to be downloaded 1s after the user requests your application to be loaded, which means your application may take an extra second to load than it should.

Async scripts

An alternative to putting the script tags at the end of the body is to keep them in the head but flag them as async scripts (or defer too if you target older IE which do not support the async attribute - even though defer and async behave differently defer is still better than the default script blocking behavior). The main advantage over scripts in the bottom is that the scripts will start downloading very soon without blocking the page rendering or the DOM load events (defer works a bit differently than async with regards to those events).

However, your scripts must be async safe for that to work. For example, you can’t load jquery and jquery-ui from CDN in two async script tags because if jquery-ui is loaded before jquery it will fail to run as it assumes jQuery is loaded already.

This strategy is usually used in combination with scripts custom bundles. It could be a single bundle, which is easier to implement or if multiple bundles are created they should be prepared to wait for their dependencies to be loaded before running its code.

Dynamically created script tags (Injected Scripts)

Script tags created dynamically do not block and could be used to implement some async require, which is the strategy adopted by Require.js and similar frameworks. Taking care of implementing this strategy correctly while still supporting old browsers is not an easy task and that’s why there are many frameworks providing this feature and why I think it’s a big failure of JavaScript to not provide such feature out of the box.

There are some different strategies for using technique though. One might simply add all required scripts dynamically to ensure they won’t block page rendering (although I think async scripts are cleaner in this case) or they could be used to dynamically load code on demand, which I will refer as code splitting in this article from now on.

At first it may sound like a good idea to load just the code your application needs so far, as they are needed because it reduces the amount of bytes transferred, but it also increases the number of requests, and more importantly, it shifts the moment when that code download starts.

If you concatenate all this code and put it in an async script on head it will load the application faster, otherwise even if you started the download of all dependencies in parallel, it would be equivalent to putting the script tags in the end of the body which means your application will load 1s late when compared to when it should finish load in the optimal case (see the last comment in the “Sequences of script tags” section).

But it’s more tempting to load each module when they are needed when using this strategy, which makes things even worse. If you need module A, which depends on B, which depends on C the browser will have to finish downloading A to figure out it should also ask to download B and only after B is finished loading the request to C would start. It may not be always obvious that A depends on both B and C so that you could require A, B and C at the same time when we are talking about real code. That’s why Require.js offers a bundling tool to deliver an optimized JS to production environments.

Creating script tags inside scripts has a performance issue, though, which is explained in depth here. Since the scripts could interact with CSSOM, it means it will block until all previous CSS resources have finished downloading, introducing an unnecessary latency. Async script tags are preprocessed by the browsers and their download will start immediately (just like regular script tags with the src attribute, the difference being that async tags won’t block the DOM). That’s why we should prefer async script tags over dynamically created scripts for the initial application loading process (loading code on demand is a separate case).

Scripts bundling - single bundle

This is considered a best practice by many currently and several tools adopt this strategy, including Sprockets, the resources build tool integrated with Ruby on Rails default stack.

How the bundles are built will depend on the bundler tool. Sprockets require the resources to specify their dependencies as special comments in the top of each resource (JS or CSS). Other tools use the AMD or CommonJS require syntax for example to specify the dependencies and will parse the JS to find them, which is more complex than the strategy used by Sprockets, but on the other side allow more powerful features, like code splitting (more on that when I’ll talk about webpack). There’s also another technique of specifying the dependencies outside the resources themselves, which is used by the Grails resources plugin for example, or by some build tools similar to Make.

Which strategy is better will also depend on personal taste. Particularly I prefer to specify the dependencies directly in the code rather than in a separate file, like it happens with Grails resources plugin. But when code splitting is desirable it’s not just a matter of taste. Implementing code splitting while using Sprockets would require a huge amount of effort for example. That’s why I think Sprockets doesn’t suite big SPA and the reason why it should be replaced with a better tool.

Such bundling tools are usually able to perform other preprocessing before generating the final optimized resource, including minifying them with uglifyjs to reduce the download size and compiling from other languages to JS and CSS (after all, as I said, many people dislike those languages and fortunately there are better alternatives out there when you can use preprocessors and transpilers).

By having a single JS file to download and run you reduce the amount of concurrent requests to your server and you can even serve them through a CDN to improve it even more as the limit of concurrent connections work in a per domain basis (even though it may not be best to use a CDN if HTTP 2 is enabled and under some conditions).

For a first not cached request this is probably the strategy with best results if we consider the bundle contains only the required code for the initial page loading, which is hardly the case.

So, here are some drawbacks for this approach. Usually all code is bundled in a single file, creating big files which take a while to finish downloading, even if it only happens once until the next deploy. And it gets worse if you are able to deploy very often. If you deploy every day then the user will often request a request which is not cached. And I wouldn’t say this is an unrealistic scenario for many healthy products.

This might be a good enough solution if your bundle is small or if you deploy once in a month or each 6 months and most of your user access are cached ones, but if you are targeting a great experience for first time users, you should look for a better alternative.

Script bundling - multiple bundles

Even if you deploy often, it’s likely that your vendored libraries don’t change that often. So it may make sense to pack your vendored libraries in a separate bundle so that it would be cached most of the times even after new deploys. Since you should be loading the vendors and application bundles asynchronously you must add some simple code to ensure the application code would only run after the vendors bundle has finished loading.

This will usually add just a little overhead for the first user access when compared to the single bundle but on the other hand it will often speed up other page loads after a new version is deployed while the vendors bundle hasn’t changed.

If your application bundle only contains code for the initial page rendering and implements lazy code loading as the user takes action (code splitting) this gets even better.

In the remaining sections I’ll show how webpack enables such strategy to be implemented and will compare it to Sprockets since I have switched from Sprockets to Webpack and should be able to highlight the weak and strong points of each.

Server-side vs client-side template rendering

Feel free to skip this subsection if you don’t care about this subject.

Some respected developers often state the clients should get a fully rendered HTML partial from the server and simply add it to some container or replace its content trying to convince us that this is the best and fastest approach. To give you one example, David, the creator of Rails, writes about the reasons why he thinks this is the best approach:

Benefit #1 is “Reuse templates without sacrificing performance”. While I agree with the reuse part in the case the content should also be rendered in the server-side and then updated with JS, I wouldn’t blindly trust the “without sacrificing performance” part. Reuse may not be a problem for many SPA, including the one I maintain, so we should evaluate whether there’s any performance difference for both approaches in a per case basis and which one is actually faster.

It’s important to understand the full concepts to get the full picture so that you can pick the right choice. First, I’d like to point out that I don’t agree with David’s terminology: “unless you’re doing a single-page JavaScript app where even the first response is done with JSON/client-side generation”. SPA should mean an application that won’t leave the initial page and use XHR to update the view. Both approaches apply to SPA in my opinion.

Then, you have to understand what is the specific case David is recommending you to render in the server-side and which I would agree. If your application is able to render an initial view, which is useful and functional even before your JS code has finished loading, then I’d also recommend you to render it in the server-side. But please notice that even this approach won’t always be the fastest. It will be the fastest when the static resources are not in cache. But if they are cached the application can load much faster if the rendering is performed in the client-side depending on the template and data. So, it will depend on the kind of access you are optimizing to: cached resources or first user access.

You’ll notice I’m inviting you to think about the reasons behind each statement because they often suppose something which is not always true, so you should understand to see whether it applies to your case or not. Much more often there are trade-offs in all choices and that’s the reason I try to provide you context around every statement I do in this article.

In that same article we can extract another example of such statement which is not always true:

“While the JavaScript with the embedded HTML template might result in a response that’s marginally larger than the same response in JSON (although that’s usually negligible when you compress with gzip)”. This is not always true. If you are working with big templates where just a small percent of it depend on dynamic data, transferring that data with JSON will often be much faster. Or if you are transferring some big table where the cells content (the dynamic part) represents only about 30% of the total HTML, chances are that it will be much faster to transfer the data as JSON.

I’d also like to notice that if your application depend on your resources to be loaded to behave properly (so that links work, menu, tabs, and so on), then I can’t see any great advantages on rendering the initial template in the server-side since you wouldn’t be able to display it to the user anyway because it wouldn’t be functional until the code is fully loaded. In that case (which is the case for the SPA’s I have worked with since 2009) I’d suggest to create a minimal document with a basic layout (footer, header, …) which is fully functional without JS and some message “Loading application… please wait” until the code is fully loaded even if that message would be displayed just for 1 or 2 seconds… With the techniques suggested in this article, you would be able to provide such “Loading application…” state to the user in within half a second, much faster than a big full HTML document leading the user to think the application is very responsive even in mobile devices, even if it will require a few extra seconds to finish loading the application.

Overall I have been noticing that the actual reason why most people prefer to render in the server-side is because they don’t like JS or feel more comfortable with their back-end language and tools. I don’t enjoy programming in JS either, but it shouldn’t matter if the goal is to provide the best user experience. I had to learn JS and learn it well. I’ve spent a lot of time to learn a lot about JS and browser performance and much more even though I don’t enjoy the language nor I do enjoy IE8, but I have to learn about it because our application sadly still has to support it. So here is my advice for those of you that avoid JS at all costs just because you don’t like it. Get over it.

On the other hand, there are some developers which are exactly the opposite. They prefer working with JS so much that they will also run the back-end on Node.js. There are some cases where the “Rails Way” (or DHH way if you prefer) is the right one. For example, if your application is publicly available rather than only for authenticated users, you’d probably want it to be indexed by search engines, like Google. Even though Google engine can now understand JS, I’d still recommend you to render those pages in the server-side if possible. Also, in those cases it’s very likely a user would like to bookmark some specific page or send the link to someone and this works more like a traditional web site than a real application. This is exactly what Turbolinks was designed for. If Turbolinks code is not loaded yet the application should keep working as expected but switching to another page may take longer than when Turbolinks code is loaded. That’s the kind of application I would recommend adopting DHH’s suggestion. If that’s your case, I’m afraid you won’t be much interested in the content of this article as this article is focused on real applications rather than optimizations over traditional web sites, which is what Turbolinks does.

XHR requests and caching

One of the arguments for the server-rendering approach is that they can be cached. But XHR requests can be cached too. But they require some additional work since caching is usually disabled by default by libraries like jQuery, for good reasons of course.

The main problem with allowing cache in XHR requests is that the browser will leave it to the code to handle caching, which can be not always possible and will often require quite some code to handle it properly. I enable caching of XHR requests in the application I maintain and it worths in our case, but the sad news is that it’s only useful if you make some request at least twice as the first request can’t be retrieved from cache unless you enable localStorage and add some extra code… This article is already too long so I won’t explain the details, but if you are curious and want to see some code, just leave a comment and I may consider writing another article just to explain how this works in practice.

When you perform a regular request to the server, the browser will send the etags or if-modified-since headers when it has a cached copy and if the server responds with 304 (Not Modified) it will load that cached response transparently to the user. But for XHR requests your code would have to handle the 304 status but it won’t get a copy of the cached content from the browser, so it’s not that useful. It’s only useful if you have stored a the response of some previous request to the same address so that you could use that response when handling a 304 status response. It’s sad that the browser doesn’t provide a better mechanism for conditional caching of XHR requests or even handle them transparently.

So, for the initial XHR requests, they have a point for rendering in the server-side to take advantage of conditional caching tags but as you can see in the next sections, such XHR requests for the initial page loading should be avoided anyway and it’s possible to cache the initial data in separate script tags loaded async (assuming the initial data is cacheable, or part of it). Keep reading.

Initial client-side rendering performance considerations

If you decide to render your templates in the client-side, you must consider how to make it so without sacrificing performance. Suppose your application relies on some JSON to render the initial page. It’s usual for the application to perform some AJAX requests upon the application load to finish loading the page, and you should avoid this technique if you want your application to load the fastest possible way.

The reason is that the AJAX request will only happen after your application code is downloaded and executed, which means it will add some overhead while that data could be downloaded in parallel or embedded in the main document. Let’s discuss each case.

Embedding all data required for the initial loading in the document body

It’s possible to avoid those extra AJAX requests upon the initial load by embedding all data you need in script tags in the end of your document body, and it should be fine if your data is small and shouldn’t prevent your main page from being cacheable.

If your main document would be cacheable otherwise, or if your data is big enough to require some considerable extra time to finish loading the main document, which would delay some DOM load events, then this technique may not be your best bet.

I don’t recommend permanent caching (even if for an specific time span) for an SPA main document. In case it has some bug and need to be fixed urgently, a permanent cached copy will prevent that for some users. But it doesn’t mean the main document can’t be cached. Your application may use Etags or if-modified-since headers.

Suppose the main document could benefit from such caching while your extra data would invalidate such caching due to its dynamic nature. In that case, you should consider whether embedding it in the end of the document body would still be a good idea. As you can see in the next subsection, it’s not the only alternative.

On the other side, if your data is big but a great part of it is cacheable, than it’s also a good idea to extract the cacheable part and load it separately so that you could take advantage of some caching to speed up the next application loadings.

Using separate async scripts to load initial data

The alternative to embed the initial data in the application document is to load that data in async script tags in the header. This way, the data would start downloading very soon, in parallel with the other required data. In that case, you should either wrap the JSON data in a function call (JSON-P like solution) or add some custom code to store that data in some global variable (window.initialData for instance) or whatever makes sense to your application (attaching data to your body element or anything you could imagine).

When combined to code splitting, where multiple scripts are loaded concurrently, I’d recommend the JSON-P style to avoid some time-based polling with setTimeout to check until all pieces have been downloaded and evaluated. Here’s how the document head could look like:

1	rel="stylesheet" href="app.css" />
2
6
24
25
26
27
28

As usual, any static builds should contain some content hash in the filename so that it could be permanently cached and your initial-data request(s) should use other cache headers like etags and if-modified-since when possible.

Each script could call onContentLoaded passing an id for that script (‘vendors’, ‘app’, ‘initial-data’) and an optional handler to be called whenever some resource is loaded (or just once if the once parameter is true). The handler gets the appSettings instance which can be used to check which resources have been loaded already for deciding when to take action. This way no polling should be required.

Security concerns

When loading user-sensitive data in the initial-data scripts one should be concerned about security to not allow cross-site script attacks to steal user’s data. I think it should be enough to check for the Referer HTTP header and compare it to a white-list of domains allowed to load that script. If you want to use a CDN for these requests you should set up your CDN to forward the Referer header in that case. It’s always a good idea to check with your security team if you have one. If you think the proposed solution here is not good enough or if you have other suggestions, please comment or send me an e-mail. I’d love your feedback.

Improving SPA loading time with webpack (and why Sprockets is in your way)

2016-02-27T11:10:00+00:00

This should be seen as a 3-parts series and it was previously published as all those articles bundled together but the article became too long. I’ve published the server-side framework agnostic part here and that part itself require some background on how scripts can be loaded and the trade-offs for each approach here.

This article will focus on how to implement those techniques in Rails and how it compares to Sprockets, the de facto solution, or the Rails Assets Pipeline.

Switching away from Sprockets

As I mention in the other related article, I realized that for our already optimized application (as far as Sprockets allows it to be) to take one step further we’d have to introduce code splitting and only load the code required for the initial state initially.

I could have implemented code splitting on my own, since Sprockets doesn’t support it out of the box, but by that time I was already feeling Sprockets was in my way for a long time for many other reasons, like lack of source-maps or ES6/Babel support and bad integration with the npm packages system and the Node.js community as a whole. About a month after I realized I should be replacing it with a better build system and started to study webpack I read the rails-assets.org team announcing they would stop supporting that effort by the end of 2017, which confirmed I was in the right direction as their team came to the same conclusion as I that it wasn’t the best approach for integrating with the JS ecosystem and also because I would no longer be able to count on rails-assets.org for the bower integration after 2018 (and that integration was never perfect anyway).

I’d love to be able to tell you how to migrate from Sprockets to Webpack in baby steps but after thinking about it for a long time I couldn’t figure out some way to do it gradually. It took me about a week to finish the migration of several sources and libraries to webpack and fortunately I had a calm week after our last deploy that would allow me to make this change happen. Before that I have invested another week or two investigating about webpack and other alternatives to be sure this was the right direction for me to take. If your application is big and has lots of modules be warned that the transition to webpack is not a trivial one. But it’s not hard either, but you need some time available to perform it and no other development should take place during the transition to avoid many conflicts which would take even more time to resolve.

However, I can recommend that the first step would be to make your libraries available through webpack, so that you can get used to it at the same time you can get rid of the rails-assets.org gems by replacing them with npm or bower packages since this can be done in parallel with other activities and with baby steps. At least, this is what I did and it took me about 2 days to move away from rails-assets.org gems to webpack managed libraries.

Webpack drawbacks when compared to Sprockets

There are basically 3 points where Sprockets is better than the Webpack approach:

1 - Sprockets supports persistent caching when compiling assets, which allows faster deploy times when you just change a few assets; 2 - Requests to the document will block until all changed assets compilation has finished. Even though the watch mode of webpack is pretty fast (assuming uglify is not enabled in development mode), it may take 2 or 3 seconds to update the bundles after some file is changed. If you try to refresh a page just after making the change, it’s possible it won’t load the latest changes, while Sprockets will block the request until the generated assets are updated, which is nicer than checking the console to see if the compilation has finished; 3 - Any errors in the assets are better displayed when loading the document due to the great integration sprockets has with Rails;

On the other side, Sprockets has so many drawbacks that I won’t list all of them here to not repeat myself. Just read the remaining of this article and the other mentioned ones. Just to name a few: lack of support for code splitting, source-maps, ES6/Babel, NPM/Bower integration (with regards to evaluating requires). Integration with several client-side test frameworks can also be made much easier with webpack, by specifying all dependencies in a separate webpack configuration without having to export anything to the global context… It also allows your front-end code to be managed independently, without any dependencies on Rails which may be desired for some teams where the front-end team would prefer to work independently from the back-end team.

Having said that, by no means I regret moving from Sprockets to Webpack. After the first week I created this Rails app to replace a Grails app I inherited, I decided to switch from ActiveRecord to Sequel. I was already a Sequel fan but Arel had just arrived to AR by that time and I decided to give it a try but gave up after one week. Replacing AR with Sequel was the best decision I took for this project and I think moving from Sprockets to Webpack will prove to be the second best choice I’ve made for this project.

Integration Webpack with Rails

Follow the instructions described in this other generic article about Webpack and then proceed with these instructions.

I do that by adding some methods to application_helper.rb:

1	require 'json'
2
3	WEBPACK_MAPPING = "#{Rails.root}/app/resources/webpack-assets.json"
4
5	module ApplicationHelper
6
7	def webpack_resource_js_path(resource_name)
8	webpack_resource_path resource_name, 'js'
9	end
10
11	def webpack_resource_css_path(resource_name)
12	webpack_resource_path resource_name, 'css'
13	end
14
15	def webpack_stylesheet_link_tag(resource_name)
16	stylesheet_link_tag webpack_resource_css_path(resource_name)
17	end
18
19	private
20
21	def webpack_resource_path(resource_name, type)
22	webpack_mapping[resource_name][type]
23	end
24
25	def webpack_mapping
26	@webpack_mapping \|\|= JSON.parse File.read WEBPACK_MAPPING
27	end
28	end

Then, it’s used like this in the page:

1	<%= webpack_stylesheet_link_tag "app/theme-#{@theme}" %>
2	<%= render partial: '/common/webpack_boot' %>
3	<%= javascript_include_tag webpack_resource_js_path('vendor'),
4	defer: 'defer', async: 'async', crossorigin: 'anonymous' %>
5	<% script = webpack_resource_js_path(current_user.internal? ? 'app/internal' : 'app/client') %>
6	<%= javascript_include_tag script, defer: 'defer', async: 'async', crossorigin: 'anonymous' %>

/common/_webpack_boot.html.erb:

1
11

I’ve also enhanced the assets:precompile task so that you don’t have to change your deploy scripts:

lib/tasks/webpack.rake:

1	namespace :webpack do
2	webpack_deps = ['resources:sprites', 'js:routes', 'webpack:generate_settings_js',
3	'webpack:install']
4
5	desc 'build webpack resources'
6	task build: webpack_deps do
7	puts 'building webpack resources...'
8	system('cd app/resources && PROD=1 node_modules/.bin/webpack --bail > /dev/null 2>&1') or
9	raise 'webpack build failed'
10	puts 'resources successfully built'
11	end
12
13	desc 'webpack watch'
14	task watch: webpack_deps do
15	system 'cd app/resources && node_modules/.bin/webpack -w'
16	end
17
18	task :install do
19	system 'cd app/resources && npm install >/dev/null 2>&1 && node_modules/.bin/bower install >/dev/null 2>&1' or
20	puts 'webpack install failed'
21	end
22
23	task :generate_settings_js do
24	require 'erb'
25	require 'fileutils'
26	FileUtils.mkdir_p 'app/resources/src/js/app'
27	File.write 'app/resources/src/js/app/settings.js',
28	ERB.new(File.read 'app/assets/javascripts/app/settings.js.erb').result(binding)
29	end
30	end
31
32	Rake::Task['assets:precompile'].enhance ['webpack:build']
33

I’ve also moved the sprites generation from compass to a custom script I created:

lib/tasks/sprites.rake:

1	namespace :resources do
2	desc 'generate theme sprites'
3	task :sprites do
4	`front-end/generate-sprites.rb`
5	end
6
7	# TODO: Fix the need for this in Capistrano
8	task :generate_fake_manifest do
9	`touch public/assets/manifest.txt`
10	end
11	end
12
13	Rake::Task['assets:precompile'].enhance ['resources:sprites', 'resources:generate_fake_manifest']

front-end/generate-sprites.rb:

1	#!/usr/bin/env ruby
2
3	require_relative 'sprite_generator'
4
5	THEMES = ['uk', 'default']
6
7	THEMES.each{\|t\| SpriteGenerator.generate t }

sprite_generator.rb:

1	require 'fileutils'
2
3	class SpriteGenerator
4	def self.generate(theme)
5	new(theme).generate
6	end
7
8	def initialize(theme)
9	@theme = theme
10	end
11
12	def generate
13	create_sprite
14	compute_size_and_offset
15	FileUtils.rm_rf css_output_path
16	FileUtils.mkdir_p css_output_path
17	generate_css
18	end
19
20	private
21
22	def create_sprite
23	FileUtils.rm_rf output_path
24	FileUtils.mkdir_p output_path
25	`convert -background transparent -append #{theme_path}/*.png #{output_path}/#{sprite_filename}`
26	end
27
28	def theme_path
29	@theme_path \|\|= "front-end/resources/images/#{@theme}/theme"
30	end
31
32	def output_path
33	@output_path \|\|= "public/assets/#{@theme}"
34	end
35
36	def sprite_filename
37	@sprite_filename \|\|= "theme-#{checksum}.png"
38	end
39
40	def checksum
41	@checksum \|\|= `cat #{theme_path}/.png\|md5sum`.match(/(.?)\s/)[1]
42	end
43
44	def compute_size_and_offset
45	dimensions = `identify -format "%wx%h,%t\\n" #{theme_path}/*.png`
46	@image_props = []
47	offset = 0
48	dimensions.split("\n").each do \|d\|
49	m = d.match /(\d+)x(\d+),(.*)/
50	w, h, name = m[1..-1]
51	@image_props << (prop = [w.to_i, h = h.to_i, name, offset])
52	@sort_ascending = prop if name == 'sort-ascending' # special behavior
53	@sort_desc = prop if name == 'sort-descending' # special behavior
54	offset += h
55	end
56	end
57
58	def css_output_path
59	@css_output_path \|\|= "app/assets/stylesheets/themes/#{@theme}"
60	end
61
62	def generate_css
63	sp = @sort_ascending
64	common_rules = [
65	@image_props.map{\|(w, h, name, offset)\| ".theme-#{name}"}.join(', '),
66	', a.sort.ascending:after, a.sort.descending:after {',
67	" background-image: url(/assets/#{@theme}/#{sprite_filename});",
68	' background-repeat: no-repeat;',
69	' display: inline-block;',
70	' border: 0;',
71	' background-color: transparent;',
72	'}',
73	@image_props.map{\|(w, h, name, offset)\| "button.theme-#{name}"}.join(', '),
74	'{',
75	" cursor: pointer;",
76	' outline: none;',
77	'}',
78	@image_props.map{\|(w, h, name, offset)\| ".theme-#{name}.disabled"}.join(', '),
79	'{',
80	" -webkit-filter: grayscale(100%);",
81	' filter: grayscale(100%);',
82	'}',
83	].join "\n"
84	content = @image_props.map do \|(w, h, name, offset)\|
85	[
86	".theme-#{name} {",
87	" height: #{h}px;",
88	" width: #{w}px;",
89	" background-position: 0 -#{offset}px;",
90	"}",
91	].join "\n"
92	end.join("\n")
93	File.write "#{css_output_path}/theme.css", "#{common_rules}\n\n#{content}"
94	end
95	end

Final notes

You can find some numbers on how this set up improved the loading time of our application in the generic webpack article “Some numbers” section.

Even though it may require a lot of effort to migrate from Sprockets to Webpack, there are tons of advantages of doing so, including performance improvements for loading your application faster and additional features support, like source-maps, much easier integration with NPM and bower packages, support for more compilers/transpilers and ability to move your front-end code to a separate project. And it’s also a much more easily customizable solution, allowing you to easily change the build configuration by using regular JavaScript in the Node.js environment.

If you want to take your loading time performance to the next level, then I’d say moving out from Sprockets is a must and webpack is the only solution I was able to find in my research that will allow you to do that.

Preventing NewRelic RUM metrics for certain clients with Rails apps

2014-08-07T12:15:00+00:00

We use the awesome Sensu monitoring framework to make sure our application works as expected. Some of our checks use a headless browser (PhantomJS) to explore parts of the application, like exporting search results to Excel or making sure no error is thrown from JS in our Single Page Application. We also use NewRelic and Pingdom to get some other metrics.

But since PhantomJS acts like a real browser, our checks will have influence over the RUM metrics we get from NewRelic, but we’re not really interested in such metrics. We want the metrics from real users, not our monitoring system.

My initial plan was to check if I could filter some IP’s from RUM metrics and asked NewRelic support about this possibility, for which they said it’s not supported yet, unless you want to filter specific controllers or actions.

Since some monitoring scripts have to go through real actions, this was not an option for us. So I decided to take a look at the newrelic_rpm gem and could come with a solution that I’ve confirmed is working fine for us.

Since we have a single page application, I simply add the before-action filter to the main action, but you may adapt it to use in your ApplicationController if you will. This is what I did:

1	class MainController < ApplicationController
2	before_action :ignore_monitoring, only: :index if defined? ::NewRelic
3
4	def index
5	# ...
6	end
7
8	private
9
10	def ignore_monitoring
11	return unless params[:monitoring]
12	::NewRelic::Agent::TransactionState.tl_get.current_transaction.ignore_enduser!
13	rescue => e
14	logger.error "Error in ignore_monitoring filter: #{e.message}\n#{e.backtrace.join "\n"}"
15	end
16	end

The rescue clause is there in case the implementation of newrelic_rpm changes and we don’t notice it. We decided to send a “monitoring=true” param to our requests performed by our monitoring scripts. This way we don’t have to worry about managing and updating a list of monitoring servers and figure out how to update that list in our application without incurring in any down-time.

But in case you want to deal with this somehow, you might be interested in testing “request.remote_ip” or “request.env[‘HTTP_X_FORWARDED_FOR’]”. Just make sure you add something like this to your nginx config file (or a similar trick for your proxy server if you’re using one):

1	location ... {
2	proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
3	}

Sequel is awesome and much better than ActiveRecord

2014-05-30T09:26:00+00:00

I’ve been using Sequel in production since 2012, April and I still think this is the best decision I’ve made so far for the whole project lifetime.

I had played with it sometimes in the past years, when Arel hasn’t been added to ActiveRecord yet and I found it amazing on how it supported lazy queries. Then I spent a few years working with Java, Groovy and Grails when I changed my job in 2009, but kept reading about Ruby (and Rails) news until I found out that AR has added support for lazy queries through Arel, when Rails 3 was released. Then I assumed AR would be a better fit than Sequel since it’s already integrated with Rails and lots of great plug-ins would support it better.

I was plain wrong! In 2011 I changed my job again to work on another Grails application. After finding a bug with no fix or workaround available I decided to create a Rails application to forward the affected requests to. So, in April of 2012 I started to create my Rails app and its models using ActiveRecord. A week later I moved all models from ActiveRecord to Sequel and have been happy since then.

Writing some queries with ActiveRecord was still a pain while Sequel made it was a joy to work with. The following sections will go to each topic I find Sequel is an improvement over AR.

Database pooling implementation

These days I decided to recreate a few models with ActiveRecord so that we could use an admin interface with the activeadmin gem, since it doesn’t support Sequel. After a few requests to the admin interface it stopped responding with timeout errors.

Then I decided to write some code to test my suspicions and run it in the console:

1	pool_size = ActiveRecord::Base.connection_pool.size
2	(pool_size + 1).times{ Thread.start{AR::Field.count}.join }

This yielded an timeout error in the last run. This didn’t happen with my Sequel models:

1	pool_size = Sequel::Model.db.pool.size
2	(pool_size + 1).times.map{ Thread.start{Field.count} }.each &:join

Notice that I don’t even need the join call inside the block for it to work since the count call is so much faster than the timeout settings.

The curious thing is that I didn’t get any timeout errors when using activeadmin with a regular Rails application, so I investigated what was so special on it that I could access the admin interface as many time I wanted and it wouldn’t ever timeout.

I knew the main difference between my application and a regular Rails application is that I only required active_record, while Rails will require active_record/railtie. So I decided to take a look at its content and found this:

1	config.app_middleware.insert_after "::ActionDispatch::Callbacks",
2	"ActiveRecord::ConnectionAdapters::ConnectionManagement"

So I found that AR was tricking here delegating the pool management to the web layer by always clearing active connections from the pool after the request was processed in that middle-ware:

1	ActiveRecord::Base.clear_active_connections! unless testing

Despite the name clear_active_connections! it seems to actually only close and checkin back to the pool the single current connection, whose id is stored in a thread local variable, from my understanding after taking a glance over AR pool management source code. That means that if the request main thread spawns a new thread any connection checked out in the new thread won’t be automatically collected by Rails and your application would start to throw timeout exceptions when waiting for a connection to be available in the pool, for no obvious reason, unless you understand how the connection pool works in AR and how it’s integrated in Rails. Here’s an example:

1	class MainController
2	def index
3	Thread.start{ Post.count }
4	head :ok
5	end
6	end

Try running this controller using a single server process 6 times (assuming the pool size is the default of 5 connections). This should fail:

1	ab -n 6 -c 1 http://localhost:3000/main/index

That means the user is responsible for closing the connection, checking it in back to the pool before the thread is terminated. This wouldn’t be a concern if Post was a Sequel model.

Then I recalled this article from Aaron Patterson.

Update note: it seems this specific case will be fixed in ActiveRecord 4.2 due to the automatic connection check-in upon dead threads strategy implemented in pull request #14360.

Ability to join the same table multiple times with different aliases

The main reason I left AR for Sequel was the need for joining the same table multiple times with different aliases for each joined table. Take a look at this snippet from this sample project:

1	module Sq
2	class Template < Sequel::Model
3	one_to_many :fields
4
5	def mapped_template_ids
6	FieldMapping.as(:m).
7	join(Field.named(:f), id: :field_id, template_id: id).
8	join(Field.named(:mf), id: :m__mapped_field_id).
9	distinct.select_map(:mf__template_id)
10	end
11	end
12	end

I still don’t know how to write such query using AR. If you do, please comment on how to do so without resorting to plain SQL or Arel, which is considered an internal implementation detail of AR for which the API could change anytime even for a patch release.

as and named are not part of Sequel::Model, but implemented as a plug-in. See next section.

Built-in plugin support for models

Although it’s not a strong reason to move to Sequel, since it’s easily implemented with regular Ruby modules in AR, it’s nice to have such a built-in API for extending models:

1	module Sequel::Plugins::AliasSupport
2	module ClassMethods
3	def as(alias_name)
4	from named alias_name
5	end
6
7	def named(alias_name)
8	Sequel.as table_name, alias_name
9	end
10	end
11	end
12	Sequel::Model.plugin :alias_support

Support for composite primary keys

Sequel does support composite primary keys, which are specially useful for join tables, while ActiveRecord requires a unique column as the primary key.

No need to monkey patch it

It seems lots of people don’t find AR’s API good enough because they keep monkey patching it all the time. I really try very hard to avoid any dependency on a library that relies on monkey patching something, specially AR, since it’s always changing its internal implementation.

So, with all major and minor Rails release we often find gems that stopped working due to such internal changes. For example, activeadmin stopped working with Rails 4.1.0.beta1 release even if the public AR public API remained the same.

It takes so much time to work on code that relies on monkey patching AR, that Ernie Miller, after several years trying to provide improvements over AR gave up.

Not surprisingly, one of the gems he used to maintain, polyamorous, was the reason why activeadmin stopped working with latest Rails release.

I never felt the need for monkey patching Sequel’s classes.

Documentation

Sequel’s documentation is awesome! That was the first thing I noticed when I moved from AR to Sequel. Arel is considered internal implementation detail and AR users are not supposed to rely on Arel’s API, which makes AR’s API much more limited besides being badly documented.

Support

Sequel’s mailing list has awesome support from Jeremy Evans, the gem maintainer. As for AR, there’s no dedicated list for it and one has to subscribe to a Rails related list to discuss AR stuff.

Separation of concerns

I like to keep the concerns separately and I can’t think about why an ORM solution should be attached to a web framework implementation. If Rails has great features in a new release with regards to action handling, I shouldn’t be forced to upgrade the ORM library at the same time I upgrade Rails.

Also, if a security fix affects AR only, why should a new Rails version be released?

Often AR will introduce incompatibilities in new versions, while I haven’t seen this happening with Sequel yet for the features I use. Also, I’m free to upgrade either Rails or Sequel any time.

Of course, this doesn’t apply to ORM solutions only, but it’s also valid for mailing handling but this is another topic, so I’ll focus on Sequel vs AR comparison only.

Sequel can also be useful without models

Sometimes it doesn’t make sense to create a model for each table. Sequel’s database object allows you to easily access any table directly while still supporting all dataset methods like you’d do with Sequel models:

1	DB = Sequel::Model.db # or Sequel.connect 'postgres://localhost/my_database'
2	mapped_template_ids = DB[:field_mappings___m]
3	join(:fields___f, id: :m__field_id, template_id: 1).
4	join(:fields___mf, id: :m__mapped_field_id).
5	where(f__deleted: false, mf__deleted: false).
6	distinct.select_map(:mf__template_id)

Philosophy

AR’s philosophy is to delegate constraints to the application model’s layer, while Sequel prefers to implement all constraints in the database level, when possible/viable. I’ve always agreed that we should enforce all constraints in the database level. But this isn’t common among most AR users. AR migrations doesn’t make it easier to create a foreign key properly using its DSL, for example and treat them as second-class citizen, as opposed to Sequel’s philosophy.

The only RDBMS database solution I currently use is PostgreSQL and I really want to use several features that are only supported by PostgreSQL. Sequel’s PG adapter allows me to use those features if I want to, even knowing that it won’t work for other database vendors.

This includes recursive transactions through save-points, options to drop temp table on commit and so on.

Another example: AR 4.1.0.beta1 introduced support for enums, in a database independent way.

I’d much prefer to use PostgreSQL’s enum type for things like that, which comes with database-side built-in validations/constraints.

Also, although you can manage association cascades in the application-side using this plugin with Sequel, usually you’d be advised to perform such cascade operations in the database level when creating the foreign keys, for instance. Also, when a database trigger better takes care of an after/before hook than an application’s code, you should not be afraid of getting advantage of those.

Faster testing when using factories

With PostgreSQL feature of using save-points in transactions, I can set-up RSpec to allow transactional before/after(:all) blocks in addition to the before/after(:each) ones.

This allows me to save quite some time when I can create several database records in a context which will then be shared among several examples, instead of recreating them every-time.

RSpec’s support for this is not good (like having a let global variant over the context) but it’s not hard to get this set-up working in a good enough way, speeding up my test suite a lot.

And it’s pretty easy to use Sequel’s core support for nested transactions so that I can be sure that the database state will be always consistent before each example is run.

Migrations

I strongly believe a database’s schema change should be handled by a separate project, instead of inside an application using the database. More applications may use the same database at some point and it makes sense that managing your database should be handled by a separate application.

I still don’t have a favorite migrations solutions as each of them have their pros and drawbacks. I’m still using AR’s migration for historical reasons, as I used the standalone_migrations gem in a separate project even when my application was written only in Grails and the Rails app didn’t exist yet. Since standalone_migrations only supports AR 3.x branch, and I was interested in some features from AR 4, I created another gem, called active_record_migrations to be able to use AR 4 migrations support in stand-alone mode.

DSL

I much prefer the Sequel’s DSL for writing the migrations as it supports more things in an easier way than AR’S migrations. Also, I’m allowed to use any dataset methods from an migration, instead of having to write everything not supported by the DSL as plain SQL queries.

On the other side, AR, since version 4, allows us to have an reversible block inside a change method which can be quite useful.

Tooling

AR provides a good migration generator, which lacks on Sequel and can be very helpful when creating new migrations.

Performance

I didn’t create any specific performance tests to compare both ORM solutions but I do remember that my specs run much faster when I migrated from AR to Sequel and I’ve also heard from other people that Sequel is faster for most use cases, in MRI at least.

Query DSL

I really like to have control over the generated SQL and a good ORM solution for me is one that will allow me to have better control over it. That’s why I don’t like the Hibernate’s HQL language.

The database should be your friend and if it supports some functions or syntax that would help you why not use them?

Sequel allows me to use nearly all features available through its DSL from my database vendor of choice: PostgreSQL. It also provides me easy access and documentation to use all kind of stuff I can do with plain SQL like “ilike” expressions, sub-queries, nested transactions, import data from file, recursive queries, Common Table Expressions (WITH queries) and so on.

Why not using straight SQL instead of some ORM when cross-database vendors is not an issue?

First, I’d like to say that most of Sequel DSL actually supports multiple database vendors.

But I only find that useful if you’re writing some kind of plug-in or library that should not depend on a single database vendor. But that’s not the case for general use applications.

Once you opt for some database vendor in your application, you shouldn’t have to worry about supporting other database vendors.

So, someone might ask why using any ORM solution if you’re fine with writing plain SQL?

There are many reasons for that. First, most plug-ins expect some Ruby interface to deal with, instead of SQL. This is the case with FactoryGirl, Devise and so on. But this is not the main reason.

An ORM provides lots of goodies, like an easy-to-use API to create and update records, automatic typecasting, creating transactions and much more. But even this is not the main reason for me to prefer an ORM over plain SQL.

The main reason for me is the ability to easily compose a query in some way that is easy to read and maintain, specially when parts of the query depend on the user requesting it or some controller’s param. It’s great that you can change some query on the fly, like this:

1	fields_dataset = Field.where(template_id: params[:id])
2	fields_dataset = fields_dataset.exclude(invisible: true) unless current_user.admin?
3	# ...

Sequel’s drawbacks

When a generic query is performed, Sequel will convert any returned rows as hashes with the column names as keys converted to symbols. This may be a problem if you generate the queries dynamically and alias them based on some table’s id that depend on the user input. If you have enough ids being queried, Sequel may create lots of symbols that will never be garbage collected.

The lack of migration generators built-in for Sequel migrations makes the creation of new migrations a less than ideal task. You may create some custom rake task to aid with migration creations and it shouldn’t be complicated but having that support built into the Sequel core would certainly help.

The main drawback of Sequel is certainly lack of native support of other great gems like Devise, ActiveAdmin and Rails itself. Quite some useful Rails plug-ins will only integrate with ActiveRecord.

Overall feeling

Most of my server-side tasks involve querying data from an RDMBS database and serving JSON representations to the client-side API. So, an ORM solution is a key library for me.

And I couldn’t be happier with all goodness I get from Sequel, which gets out of my way when querying the database in contrast with ActiveRecord, when I used to spend a lot of time trying to figure out whether some kind of query was possible at all.

Thanks, Jeremy Evans, for maintaining such a great library and being so responsive in the mailing list! I really appreciate your efforts, documentation and Sequel itself.

Also, thank you for kindly reviewing this article, providing insightful improvements over it.

Finally, if you’re interested on getting started with Sequel in a Rails application, I’ve published another article on the subject on April, 2012.

Server-side or Client-side focus?

2013-12-11T13:23:00+00:00

David wrote a great article on the subject, suggesting that keeping all views generated in the server-side is the way to go for most applications.

If you haven’t read it yet, please do so, as this article was written to approach a few topics that I think are missing in that article.

Web applications can be written in many ways. In the early days JavaScript played a marginal role in web applications, performing some simple form validations and the like but currently more and more applications are making heavy use of JavaScript and lots of them are built as Single Page Applications (SPA).

I’ve been working on SPA’s since 2009, while David’s applications are mostly built around the original concept of web applications, by delegating as much as possible to the server-side.

Both approaches are valid ones, but as the complexity of dynamic browser behavior increases, I do believe that moving the UI to the client-side is usually better than generating them in the server.

So, if that’s the case for your application, please keep reading.

Server-side template and JavaScript generation

In David’s article, he refers to this approach as Server-generated JavaScript Responses (SJR), even if most of the response is actually HTML, not JavaScript, but I’ll keep his SJR terminology in this article to make it easier for me to refer to it.

Here are some spotted advantages:

Sharing templates between server-side and client-side

By rendering some HTML and JS in the server-side, one may reuse partial templates.

This is indeed a valid argument, but it doesn’t apply if:

All your views are generated in the client-side; (which he mentioned in the article)
You use some template language that can be shared both in the server-side and client-side instead of ERB.

Less computational power needed on the client

On the other hand devices are becoming faster and faster and it’s cheaper to move the processing to the client-side when scaling your infrastructure. In the long-term I don’t think it worths insisting on server-side processing because of this reason.

Some time ago I was trying to reduce the load time for our application and I implemented server-side generation of a big table in Rails and cached it in the server. It was obvious to me that it would save me almost 200ms (the time it was taking me to render that same template in the client-side - it should save even more for slower browsers/computers/devices).

The previous approach was to embed the JSON (also cached in the server-side) in the HTML (to avoid another request/latency) and render the template in the client-side. I was surprised that the page actually took a bit longer to load after the change. Sorry, I can’t explain the reason, but I gave up on the idea. And yes, I have always been served gzipped content (both HTML, assets and JSON).

It’s supposed to be faster

But it will actually depend on lots of things. For example, if some of the templates don’t need server-side data to be rendered, the network latency will be enough to make the server-side approach for rendering the template slower.

If you’re appending a template that never changes you could cache it in the client-side and avoid the round trip to the server every time you need the template, but if it depends on other data that you already have available in the client-side, the server-side template rendering approach won’t be faster.

Also, if you have to deal with several data state that is not stored in the database, trying to keep (or pass) all that state to the server-side will lead to insane maintenance.

Views can be cached

But this is valid for JSON as well, which David didn’t mention in his article, so it can’t be viewed as a benefit of SJR over client-side rendering.

Also, one can always embed the templates in the generated (minified) application assets and they will be cached naturally by your browsers for all later requests. So, if the template generates lots of HTML, it will be certainly much faster to transfer only the data instead of the full template even when serving the template using gzipped content. The reason why David claims it doesn’t make much difference is probably because his generated HTML are small enough.

Easy-to-follow execution flow

This is really a matter of taste and I find it much easier to debug the template generation in Chrome Developer’s Tool and to follow the flow in the client-side code, so I won’t comment on this.

Also, it seems that the author suggests that a major benefit of this “simplified” flow, is that you don’t have to worry about testing it because it just uses a standard mechanism that’s already tested as part of the framework and that couldn’t go wrong. Which leads me to:

Faster initial rendering

This is indeed a very valid point. Once you serve your HTML the browser will already display it before running all JS code which improves the page load time perception from the application user point of view.

On the other hand, if the user is too fast on clicking on some element with attached behavior (although not yet in the very beginning of the page load), the user experience may not be very good and the user might perceive that lack of behavior as an application bug.

Also, with Rails, if the server-side page takes quite some time to be rendered, by default the browser won’t show anything until the render action finishes because Rails doesn’t render streamed responses by default.

On the other hand, if you send a minimal page, you’re able to inform the user that the page is loading very quickly, while you wait for a JSON response with the actual data, for instance.

It means that depending on how you design your site, the user might have a better experience with the asynchronous approach, but that indeed is not trivial to implement in a good way.

In our application we use the jQuery-layout plugin to render our panes otherwise the application would look badly so it doesn’t help if we start rendering some HTML soon…

So this is very application-specific.

The main reason I believe client-side template rendering is more interesting: Testing

When building SPA’s, lots of your code remain in the client-side, and tools you use to test server-side code, like RSpec and the like, are no longer well suited for testing browser behavior.

Trying to test all your client-side logic as Capybara tests are simply too slow to be a valid approach.

On the other side, testing in the browser is super fast. Much faster than testing the server-side code usually. You may simply mock all your requests to the server-side and test both parts of your application quickly in isolation from each other.

I’ve written rails-sandbox-assets a while ago to allow you to serve all your Rails assets in an easy way and used it as a base for running lots of runners, like Jasmine, Buster.js, Mocha/Chai and my own oojspec. They can even live together.

This way, if all my views are generated in the client-side, it’s pretty easy to recreate my client-side application from the specs without using any HTML fixtures, by simply requiring my client-side code using the Rails Asset Pipeline in my specs and running them after mocking jQuery.ajax to return data the way Rails would do.

Before David wrote that article, we discussed about this topic by e-mail and in my last e-mail I suggested him to talk about how they test their code using this approach. Since he didn’t follow my suggestion (although he followed other suggestions, like coming with a new and less confusing name than RJS) I’m assuming he doesn’t actually have a good response yet to how to test this and I’m assuming they don’t have enough client-side code to worry about this.

But if someone is considering his suggestion of using the SJR approach and has lots of behavior in the client-side, please take some time to think on how you’re gonna handle testing using that route.

Conclusion

By no means the intent of this article is to tell you that you shouldn’t write your application using SJR. It can be indeed a valid approach depending on how you’re designing your application.

The specific design I would recommend against SJR, is for SPA’s, as I’m writing solely those kind of application since 2009 and I can’t think of SJR being used with great benefit in such applications.

Running Java from MRI Ruby through DRb

2014-01-16T15:00:00+00:00

Important update: After I wrote this article I tried to put it to work in my real application and noticed that it can’t really work the way I described due to issues with objects referenced only in the DRb client side being garbage collected in the DRb server side since no references are kept for them in the server-side. I’m keeping this article anyway to explain the idea in the hope we could find a way to work around the memory management issue at some point.

Motivation

In a Ruby application I maintain, we have the requirement of exporting some statistics to XLS (not XLSX) and we had to modify a XLS template for doing that.

After searching the web I couldn’t find a Ruby library that would do the job, but I knew I could count on the Apache POI java library.

MRI Ruby doesn’t have native support for using Java libraries so we have to either use JRuby or some Inter-Process Communication (IPC) approach (I consider hosting a service over HTTP as another form of IPC).

I’ve already used JRuby for serving my web application in the past and we had some good result, but our application is currently running fine on MRI Ruby 2. I don’t want to use JRuby for deployment only to enable me to use Java libraries. Sometimes we’ll re-run some stress tests to test the throughput of our application using several deployment strategies, including using JRuby instead of MRI, in threaded mode (vs the multi-process and multi-threaded approaches with MRI), testing several web servers for each Ruby implementation.

Last time we run our stress tests, Unicorn was a bit faster to serve our pages when compared to using JRuby on Puma, but that wasn’t the main reason why we chose Unicorn. We had some issues with some connections to PostgreSQL with JRuby by that time and we didn’t want to investigate it further, specially when we didn’t notice any advantages in the JRuby deployment for that time.

Things may have changed today but we don’t plan to run another battery of stress tests in the short-run… I just wanted to find another way of having access to Java libraries that wouldn’t attach our application to JRuby in any way. Even when we used to deploy with JRuby, all our code ran in MRI and we used MRI to actually run the tests and also in development mode since it’s much faster to boot and allow faster testing through some forking techniques (spork, zeus, etc).

I didn’t want to add much overhead either, by providing some HTTP service. The overhead is not only in the payload but also in the development work-flow.

What I really wanted was just a bridge that would allow me to run Java code from MRI Ruby, since I’m more comfortable with writing code with Ruby and my tests run faster on MRI rather than JRuby.

So, the obvious choice (at least for me), was to try DRb.

DRb to the rescue

Even after deciding for DRb, you may implement the service with multiple approaches. The simplest one is probably to write the service in JRuby and only access the higher-level interface from the MRI application.

That works but I wanted to avoid this approach for some reasons:

tests would run slower when compared to MRI due to increased boot time for the JVM (main reason)
we’d need to switch applications every time we wanted to work on the Java-related code (we don’t use an IDE, but still, in Vim, that means ‘:lcd ../jruby-app’)
Rails already provides us automatic code reloading out-of-the box for our main application, while we’d have to be constantly rebooting the JRuby application after each change or implement some auto-reloading code ourselves

So, I wanted to test another minimal approach that would only allow us to perform any generic JRuby programming directly from MRI.

Dependencies management, Maven and jbundler

Note: for this section, I’m assuming JRuby is being used. With RVM that means “rvm jruby”.

Christian Meier did a great job with jbundler, a tool similar to Bundler, that will use a Jarfile instead of the Gemfile to specify the Maven dependencies.

So, basically, I created a new Gemfile with bundle init and added a gem ‘jbundler’ entry to it.

Then I created a Jarfile with this content: jar ‘org.apache.poi:poi’. Run bundle exec jbundle and you’re ready to go. Running jbundle console will provide an IRB session with the Maven libraries available.

To create a script, you add a require ‘jbundler’ statement and you can now run it with bundle exec ruby script-name.rb.

The DRb server

So, this is how the JRuby server process looks like:

1	# java_bridge_service.rb:
2
3	POI_SERVICE_URL = "druby://localhost:8787"
4
5	require 'jbundler'
6	require 'drb/drb'
7	require 'ostruct'
8
9	class JavaBridgeService
10	def run(code, _binding = nil)
11	_binding = OpenStruct.new(_binding).instance_eval {binding} if _binding.is_a? Hash
12	result = if _binding
13	eval code, _binding
14	else
15	eval code
16	end
17	result.extend DRb::DRbUndumped if result.respond_to? :java_class # like byte[]
18	result
19	end
20
21	end
22
23	puts "listening to #{POI_SERVICE_URL}"
24	service = DRb.start_service POI_SERVICE_URL, JavaBridgeService.new
25
26	Signal.trap('SIGINT'){ service.stop_service }
27
28	DRb.thread.join

Security note

This is all you need to run arbitrary Ruby code from MRI. Since this makes use of eval, I’d strongly recommend you use this server in a sandbox environment.

The client code

I won’t show the full classes we have for communicating with the server since they are implementation details and people will want to organize it in different ways. Instead I’ll provide some scripting code that you may want to run in an IRB session to test the set-up:

1
2	require 'drb/drb'
3
4	DRb.start_service
5
6	service = DRbObject.new_with_uri 'druby://localhost:8787'
7
8	[
9	'java.io.FileInputStream',
10	'java.io.FileOutputStream',
11	'java.io.ByteArrayOutputStream',
12	'org.apache.poi.hssf.usermodel.HSSFWorkbook',
13	].each{\|java_class\| service.run "import #{java_class}"}
14
15	workbook = service.run 'HSSFWorkbook.new FileInputStream.new(filename)',
16	filename: File.absolute_path('template.xls')
17
18	sheet = workbook.sheet_at 0
19	row = sheet.create_row 0
20	# row.create_cell(0) will display a warning in the server-side since JRuby can't know if you want to use the
21	# short or int method signature
22	cell = service.run 'row.java_send :createCell, [Java::int], col', row: row, col: 0
23	cell.cell_value = 'test'
24
25	# export it to binary data
26	result = service.run 'ByteArrayOutputStream.new'
27	workbook.write result
28
29	# ruby_data is what you would be passing to send_data in controllers:
30	ruby_data = service.run('ByteArrayInputStream.new baos.to_byte_array', baos: result).to_io
31
32	# or, if you want to export it to some file:
33	os = service.run 'FileOutputStream.new filename', filename: File.absolute_path('output.xls')
34	workbook.write os
35

Conclusion

By using such a generic Java bridge, we’re able to use several good Java libraries directly from MRI code.

Troubleshooting

If you’re having any issues with trying that code (I haven’t actually tested the code in this article), please leave a note in the comments and I’ll fix the article. Also, if you have any questions, create a comment and I’ll try to help you.

Or just feel free to thank me if this helped you ;)

Rails: the Good and the Bad

2013-02-17T00:15:00+00:00

A while ago I wrote an article explaining why I don’t like Grails. By that time I was doing Grails development daily for almost 2 years. Some statements there are no longer true and Grails has really improved a lot since 2.0.0. I still don’t like Grails for many more reasons I didn’t find time (or interest) on writing about.

Since almost 2 years ago I was back to Rails programming and the application I currently maintain is a mix of Grails, Rails and Java Spring working together. I feel it is now time to reflect about what I like and what I don’t in Rails.

What kind of web application I’m talking about?

I’ve been working solely on single-page-applications since 2009. All opinions reflected here apply to such kind of application, although some of them will apply to any web application. This is also what I consider the current tendency for web applications, like Twitter, Facebook, Google+, GMail and most applications I’ve seen out there.

When designing such applications one doesn’t use make heavy use of server-side views (ERB, GSP, JSP, you name) but usually render your views in the client-side, although some will prefer to render partial content generated in the server. In the applications I’ve written in those 4 years in different companies and products I’ve been mostly rendering the views in the client-side so also keep that in mind when reading my review.

Basically I only render a single page in the server-side and have plenty of JavaScript (or CoffeeScript) files that are referenced by this page, usually concatenated in a few JavaScript files for production usage.

How does Rails help me on getting my job done?

The Asset Pipeline

I’d say the feature I most like in Rails is undoubtedly the Rails Asset Pipeline. It is an assets processor that uses sprockets and some conventions to help us to declare our assets dependencies and split them in several files and mix different related languages, that will basically compile to JavaScript and CSS. Examples of languages supported out of the box are CoffeeScript and SCSS, that are better versions (in my opinion of course) than JavaScript and CSS.

This tools take out most of the pain I have with JavaScript. The main reason I hate JavaScript is the lack of an import (or require) statement to make it easier to write modular code. This is changing in ES6 but it will take a while before all target browsers support such statement. With the Asset Pipeline I don’t have to worry about it because I may use such “require” statements in comments that are processed by the Asset Pipeline without having to resort to bad techniques like AMD (my opinion, of course).

The Asset Pipeline is also well integrated with the routing system.

Automatic code reloading during development

Booting a Rails application may take a few seconds, so you can’t just load the entire application on each request as you used to do in the CGI era. It would slow down the development a lot. Being able to automatically reload your code so that you have a faster development experience is a great tool provided by Rails. It is far from simple to implement it properly and people often overlook this feature because it always worked great for most people. Creating an automatic-reloading framework for other languages can be even harder. Try to take a look at what some Java reloading frameworks are doing if you don’t believe.

Control over routes

This is supported by most frameworks nowadays but I always wanted this feature when I used to create web sites in Perl long ago. But not all frameworks will make it easy for you to get a “site map” and see all your application routes at once.

Dependency Management

Rails is the main reason why the genius Yehuda Katz decided to create Bundler, the best software dependency management software I know about. Bundler is independent from Rails but I’d say Rails has the credits for inspiring Yehuda to create Bundler but I may be wrong, of course. Ruby had RubyGems for a long while but it suffered from the same problems as Maven.

Without a tool like Bundler you have two options. Always specify the exact version of the libraries you depend on (like Maven users often do) or be prepared to face several issues that may arise from different gem versions that are resolved in different times cause by loose version requirements as it used to be the case with RubyGems users.

Bundler stores a snapshot of the current resolved gems in a file called Gemfile.lock so that it is possible to replicate the entire gem versions under production or other developer’s computer without having to specify exact version matches in your dependency file (Gemfile).

Great testing tools availability

I don’t write integration tests in Grails because it is too slow to boot up the entire framework when I only want to test my domain classes (models in Rails terminology). Writing integration tests in Rails is certainly slower than writing unit tests but it is feasible to write them because Rails boots in a few seconds in the application I maintain. So it is okay to write some integration tests in Rails. I used to use Capybara to write tests for views/controllers interaction but I ended up giving up on this approach preferring to write JavaScript specs to test my front-end code in a much faster way and simply mock jQuery.ajax using my own testing frameworks, oojspec and oojs.

For simple integration tests that only touch the database I don’t need to even load the entire Rails application, which is much faster. I find this flexibility really awesome and makes test writing a much pleasant task.

Other tools that help writing tests in Rails apps are RSpec and FactoryGirl among many others. Most of them can be used outside of Rails scope, but when comparing Rails to non-Ruby web frameworks, it is great to point out how writing web applications with Rails will make automatic testing an easier task than with other languages.

The Rails guides and community

The Rails guides are really fantastic and cover most of the common tasks you need when programming a web applications with Rails. Also, anyone is free to commit any changes to the guides through the public repository docrails and that seems to work great. I’ve even suggested this approach to the Grails core developers a while ago and it also seems this is working great for them as well as their documentation improved a lot since then.

Besides the guides there is plenty of resources about Rails on-line. Many of them are free. There are books (both print and e-books, paid or free), tutorials and several articles covering many topics of web programming in the context of a Rails application. There are even books focused on testing applications, like The RSpec Book, by David Chelimsky. I haven’t found any books focused on testing for Grails or Groovy applications for instance. And I only know about one book focused on JavaScript testing, by Christian Johansen, the author of Buster.js, Sinon.js and one of the maintainers of the Gitorious project.

Rails has a solid community behind it. There are several Rails committers applying many patches everyday and the framework seems to be stronger than ever. You’ll find many useful gems for most tasks you’d think of. They’re usually well integrated to Rails and you may have a hard time if you decide to use another Ruby web framework.

Most of the gems are hosted on GitHub, which is part of the Rails culture I’d say. That helps a lot to contribute back to those gems by adding new features or fixing bugs. And although pull requests are usually merged pretty fast, you don’t even have to wait for it to be merged. You can just instruct Bundler to get that gem from your own fork on GitHub and that is amazing (I wasn’t kidding when I said Bundler is the best software management tool I’m aware of).

Security

Despite all critical security holes found on Rails and other Ruby libraries/gems that popped out recently, Rails takes security very seriously. Once security issues are found they’re promptly fixed and publicly communicated so that users can upgrade their Rails applications. I’m not used to see this attitude in most other frameworks/ libraries I’ve worked with.

Rails also employs some security enhancements to web applications out-of-the-box by default, like CSRF protection and provides a really great security guide that everyone should read, even non-Rails developers.

How Rails gets on my way?

Even though Rails is currently my favorite web framework, it is not perfect. As a matter of fact there are actually many things I don’t like in Rails and this is what this section is all about and also the main motivation for writing this article. The same can be told about Ruby, which is my preferred language, but also has its drawbacks. Not exactly Ruby the language, but the MRI implementation. I’ll get in details in the proper section.

Monolithic design

Rails is not only a web framework and this is really bad from my point of view.

Rails release strategy is to keep the version of all its major components the same one. So, when Rails 3.2.12 is released it will also release ActiveRecord 3.2.12, ActiveSupport 3.2.12, ActionPack 3.2.12, etc. Even if it is a single security fix on ActiveRecord all components will have their version increased. This will also force you to upgrade your ORM if you decide to upgrade your web framework.

ActiveSupport should be maintained in a separate repository for instance as it is completely independent from Rails. The same should be true for ActiveRecord.

The ActiveRecord case

The ORM is a critical part of a web application built on top of a RDBMS. It doesn’t make any sense to me to assume it is part of a web framework. It is not. Its concerns are totally orthogonal (or at least they should be). So, what happens if you want to upgrade your web framework to make use of a new feature like streaming support? What if the newest ActiveRecord bundled with the latest Rails release has incompatible changes in its API? Why should you be forced to upgrade ActiveRecord when you’re only interested in upgrading Rails, the web framework?

Or, what if you love ActiveRecord but are not developing web applications or you’re using another web framework? Why would you have to contribute to Rails repository when you want to contribute to ActiveRecord? Or why don’t you have a separate discussion list for ActiveRecord? A separate site and API documentation?

I solved this problem myself a while ago by replacing ActiveRecord by Sequel and disabling AR completely in my application. Luckily enough I find Sequel has a much better API and solid understanding about how RDBMS are supposed to be used and knows how to take advantage of their features, like transactions, triggers and many others. Sequel will actually advise you to prefer triggers over before/after/around callbacks in your code for many tasks. This is in line with my own feelings about how RDBMS should be used.

Also, for a long while ActiveRecord didn’t support lazy interfaces. Since I’ve stumbled over Sequel several years ago I really loved its API and always used it instead of AR for some of my Ruby scripts, that weren’t related to Rails apps. But for my Rails applications I always tried to avoid adding more dependencies because most gems will just assume you’re using ActiveRecord.

But I couldn’t be more wrong. Since I decided to move over to Sequel I never regretted my decision. It is probably one of the best decisions I’ve made in the last few years. I’m pretty happy with Sequel and its mailing list support. The documentation is great and I have great control over the generated queries, which is very important to me as I often need complex queries in my applications. ActiveRecord is simply too way limited.

And even if Arel could help me to write such queries it is badly documented and is considered a private interface, which means I shouldn’t be relying on its API when using ActiveRecord because theorically AR could change its internal implementation anytime. And the public API provided by AR is simply too poor for the kind of usage I need.

Migrating to Sequel brought other benefits as well. Now the ORM and the web framework can be independently upgraded. For instance, recently there was a security issue found in ActiveRecord which triggered a whole Rails release which I didn’t have to upgrade because it didn’t affect Sequel.

Also, I requested a feature in Sequel a while ago and it got implemented and merged in master a day or two after my request. I tested it on my application by just instructing Bundler to use the version on master. Then I found a concurrency issue with the new feature that affected our deployment on JRuby. In the same day I reported the issue it got fixed on master and I could promptly use it without having to change any other bit of my application.

Jeremy Evans is also very kind when replying to questions in Sequel’s mailing list and will provide great insightful advices once you explain what you’re trying to achieve in your application. He is also very knowledgeable with regards to relational databases. Sequel is really carefully thought and cares a lot about databases, concurrency and many more details. I couldn’t recommend it better to anyone that cares about RDBMS.

Lack of a solid database understanding from the main designer

When I first read about Rails, in 2007, my only previous experience with databases was with Firebird when people used to use Delphi a lot in Brazil. I really loved Firebird but I knew I would have to find something else because Firebird wasn’t often used in web applications and I wanted to use something that was well supported by the community. I also wanted a free database so the options were basically either MySQL or PostgreSQL. I wasn’t really much interested on what database to use since I believed all RDBMS would be essentially the same and I haven’t experienced any issues with Firebird. “It all boils down to SQL” I used to think. So I’ve just made a small research in the web and I found lots of people complaining about MySQL and no one complaining about PostgreSQL. I wasn’t really interested in knowing what people were talking about MySQL and simply decided to go with PostgreSQL at the time since I had to choose one.

A few years later I moved to another company that also happened to use PostgreSQL. Then I used it for 2 more years (4 in total). When I moved my job again, this time the application used a MySQL database. “No problems” I thought as I still believe it all boils down to SQL in the end. Man, I was completely wrong!

After a few days working with MySQL, I noticed too many bugs and bad design decisions that I decided after an year to finally migrate the database to PostgreSQL.

But with so many good conventions that you get when you decide to use Rails, the documentation initially used to use MySQL in the examples. Since lots of people really didn’t have a strong opinion about which database vendor to choose from. That lead the community that was being formed to adopt MySQL in mass initially.

Fortunately it seems the community understands now that PostgreSQL is a much better database but I’d still prefer Rails to recommend towards PostgreSQL in the Getting Started guides.

An example of how bad Rails opinions are over RDBMS is that ActiveRecord doesn’t even support foreign keys, one of the key concepts in RDBMS, in their migrations DSL. That means that the portable Ruby format of the current database schema is not able to restore foreign keys. Hibernate, the de-facto ORM solution for Java-based applications, does support foreign keys. It will even create the foreign keys for you if you declare a belongs-to relationship in your domain classes (models) and ask Hibernate to generate the migration SQL.

If your application needs to support multiple database vendors, I’d recommend you to forget about schema.rb and simply run all migrations whenever you want to create a new database (like a test db, for instance). If you only have to care about a single DB vendor, like me, then just change the AR schema_format to use :sql instead of :ruby. If you don’t care about foreign keys, you’re just plain wrong.

I believe David Heinemeier Hansson is really a smart guy despite what some people might say. I just think he hasn’t focused much on databases before creating Rails or he wouldn’t use MySQL. But there are many other right decisions behind Rails and I find it really impressive the boom DHH has brought to web development frameworks. People often say he is arrogant between other adjectives. I don’t agree. He has a strong opinion about many subjects. So have I and many others. This shouldn’t be seen as impoliteness or arrogance.

Some arrogant core members

People have similar opinion about Linus Torvalds when he is right to the point in his phrases and opinions. He also has strong opinions and a sense of humor that many don’t understand. I just feel people get often easily offended for no good reason these days, which is unfortunate. I have to be extra careful when writing to some lists in the Internet that seems to be even more affected than the usual ones. I have received often really aggressive responses in a few mailing lists for stating my opinions in direct ways that people often consider a rude behavior when I call it a honest and direct opinion. I’m trying to avoid those opinions in some list so that people don’t get mad with me.

I really don’t know those people and I don’t have anything against them. Believe me or not, I’m a good person and have tons of friends and I meet with them very often and they don’t get offended when I’m direct to the point or when I state my strong opinions even when they don’t agree with me. With my closest friends (and even some not that close) I would refer this as the expression “after all, I’m not a girl” in a tone of joke but I can’t tell such things in the Internet or people will criticize me to dead. “You sexist! What do you have against girls?” Nothing at all, it is just an expression often used with humor in my city at least… I love my wife and my daughter is about to born and I’m pretty excited about that. I just think people take some phrases or expressions too seriously.

If you ever have the chance to talk to my friends they will tell you I’m not the kind of guy seeking conflicts but they will tell you that I have lots of strong opinions and that I’m pretty honest and direct about them. They just don’t find it rude but healthy. And I expect the same from them.

It is just sad when I find some angry response from Rails core members in the mailing list for no good reason. If I call some Rails behavior stupid that take it on personal and will threaten stopping helping me because they take my opinion as a personal attack as if I was calling them stupid people. I don’t personally know any of them. How could I find any of them stupid? They are probably much smarter than me but that doesn’t mean I can’t have my own opinions about some decisions behind Rails and find some of them stupid, which doesn’t mean others can disagree with me and think that my way of thinking is stupid. I won’t take it as a personal attack. I swear.

On the other way, I find some of their attitudes really bad. For instance, if you ask for change some behavior in Rails or any of its components some will reply: “send a pull request and we can discuss it. Otherwise we won’t take time to just discuss the ideas with words. Show us code”. I don’t usually see this behavior in most other communities I’ve participated. That basically means: “we don’t care that you spend your valuable time in a code that wouldn’t ever be merged to our project because we don’t agree with the base ideas”. There are many things that can be discussed without code. Asking someone to invest their time writing some code that will be later rejected when it could be rejected before is quite offending in my point of view.

By the way, that is the reason I don’t spend much time in complex patches to Rails. I’ve done that once long ago and I didn’t get feedback from core developers after a while even after spending a considerate amount of time in the patch and adapting many requested changes to it even though I didn’t agree with the changes. So I’d say that my user experience for many libraries is just great but that is not usually the case with the Rails core mailing list. Some of those core developers really believe they’re God gifts to the world which makes it hard to argument with them in several aspects. And if you state your strong opinion about some subject you may be seen as rude and they won’t want to talk to you anymore…

Of course different people will have different experiences but I believe Rails is not the friendlier web framework in my particular case. The Ruby-core list is a totally different beast and I can’t remember any bad experience I had when talking to Matz, Kosaki, Shugo, Nobu and many others. I also had a great experience in the JRuby mailing list, with Charles Nutter and many others. I’ve also talked about the great experience with Jeremy Evans in the Sequel mailing list. I just don’t understand why the Rails core team doesn’t seem to tolerate me. I don’t have any personal issues with any of them. But I don’t usually have a great experience there either so I avoid writing to that list sometimes.

Even after publishing my article with my strong (bad) opinions about Grails I don’t remember any bad experience when talking to them in their list. And I know they read my article as it became somewhat popular in the Grails community and I got even some replies from some of the Grails maintainers themselves.

The Rails API documentation

I remember that one of strong features of Rails 1 was the great API documentation. During the rewrite of Rails 3 lots of great documentation was deleted in the process and either got lost or was moved to the Rails guides.

Currently I just stop trying to find any documentation by looking at the API documentation site. I used to do that a lot in the Rails 1 era. So sad the current state is really bad to the point that I find it almost unusable preferring to find the answers to what I’m looking for on StackOverflow, asking on mailing lists, digging into the Rails source code or by other means. If I’m lucky, the information I’m looking for is documented in the guides, but otherwise I’ll have to spend some time searching for it.

YAML used instead of plain Ruby to store settings

Rails provides us 3 environments by default: development, production and test. But in all projects I’ve worked with I always had a staging environment as well. Currently our deployment strategy involves even more environments. Very soon we realized that it wasn’t easy to manage all those environments by having to tweak so many configuration files: config/database.yml, config/mongo.yml, config/environments/(development|test|production).rb and many other kept popping up. Also, when you run tasks like “rake assets:precompile” it will use the production environment by default while it would use development by default for most tasks.

Every time we needed to create a new environment it was too much work for us to manage. So we ended up by dropping all those YAML files and simple symlink config/settings.rb to config/settings/environment_name.rb. We also symlinked config/environments/*.rb to all point to the same file. We would also manage the different settings in config/settings.rb. So we have staging.rb, production.rb, test.rb, development.rb and a few others under config/settings. We simply symlink the one of interest in config/settings.rb, which is ignored by Git.

The only exception is that test.rb is always used when running tests. That worked out much better for us and it is much easier for us to create a new environment and have all settings, like Redis, Mongo, PostgresSQL, integration URLs and many more settings grouped in a single file symlinked as settings.rb. Pretty simple to figure out what needs to be changed as well as base our settings on top of another existing environment.

For instance, staging.rb would require production.rb and overwrite a few settings. This is a much improved way of handling multiple environments than the standard way most Rails applications implement, by maintaining sparse YAML files among some DSLs written in Ruby (like Devise and others).

I believe the Grails approach of allowing external overrides Groovy files to better configure the application in a per environment basis a better convention to follow than the one suggested by Rails. What is the advantage of YAML(.erb) files over plain Ruby configuration files?

Deployment / scalability

One of the main drawbacks of Rails in my opinion is that it waited too long to start thinking seriously about threaded deployment. Threads were often successfully used by many web frameworks in many languages but for some reason it has been neglected in the Ruby/Rails community.

I believe there are two major reasons for that. The Ruby community usually focus on MRI as the Ruby implementation of choice and MRI has a global interpreter lock that prevents multiple threads running Ruby code to be executed in parallel. So, unless your application is IO intensive you wouldn’t get much benefits from using a threaded approach. I blame MRI for this as they don’t really seem to be bothered by GIL. I mean, they would probably accept a patch to fix the issue but they’re not willing to tackle the issue themselves as they believe forking is just as good solution. And this leads to the next reason, but before that I’d just like to notice that JRuby always performed great in multi-thread environments and that I think Rails took too long before taking this approach more seriously and consider JRuby as a viable deployment environment for the threaded approach. Threads are in my opinion the proper way of handling concurrency in most cases and I really think that should be the default one as in most other web frameworks in other languages.

Now to the next reason why people usually prefer multi-process over multi-thread deployment in the Ruby community. I’ve asked once on the MRI mailing list what was the status of threads support in MRI. Some core committers told me that they wouldn’t invest time on getting rid of the GIL mainly because they feel forking was a better fit most of the times. It avoided some concurrency issues one might experience when using threads. They also argued that they didn’t want Ruby programmers to have to worry about thread-safety, locks, etc. I don’t really understand why people are so afraid of threads and why they think they’re so hard to use in a safe way. I’ve worked with threaded applications for many years and I didn’t have this bad experience several developers complain about.

I really miss proper threading support in MRI because a threaded deployment strategy allows much better memory usage under high load than the multi-process approach and it is much easier to scale. That is also the reason why I think it should be the default. It would avoid the situation where people have to worry about deployment strategies too early in the process. They think about load balancers, proxy, etc. when a single threaded instance would be enough for a long time before your application starts having throughput issues. But if you deploy a single process using a single-thread approach, you’ll very soon realize it doesn’t scale even to your few users. That’s why I believe Rails should promote threaded deployment by default since it is easier to start with.

But the MRI limitation makes this decision hard to make. Specially because the development experience is usually much better on MRI than it is on JRuby. Tests will start running much faster on MRI and some tools that will speed up it even more won’t work well on JRuby, like Spork and similar gems.

So, I can’t really recommend any solution to this deployment problem with Rails. Currently we’re using Unicorn (multi-process) + MRI to deploy our application but I really believe this isn’t the optimal solution to web deployment and I’d really love to see this situation improved in the next years.

Apart from the deployment issues I always missed streaming support in Rails but I haven’t created a section about it in this article because Rails master already seems to support it and Rails 4 will probably be released soon.

The MRI shortcomings

When it comes down to the MRI implementation itself, the lack of a good thread support isn’t the only thing that annoys me.

Symbols vs Strings confusion

I can’t really understand the motivation for symbols to exist in Ruby. They cause more harm than good. I’ve discussed my opinions already a lot here if you’re curious about it.

To make things worse, if the harm and confusion caused by symbols with no apparent benefits wasn’t a reason good enough to get rid of them, attackers are often trying to find new ways to create symbols in web applications. The reason for that is that symbols are not garbage collected. If you employ the threaded strategy when deploying your application and an attacker could get your application to create more symbols your application would crash at some point due to memory leak since symbols are never garbage collected, although it might change at some point.

Autoloading

Autoload is a Ruby feature that allows some files to be lazy loaded, thus improving the start-up time to boot Rails in development mode for instance. I’m curious to know if the lazy approach really makes such a big difference when comparing to just require/load all files. And if it does, couldn’t this load time be improved somehow?

The problem with autoload is that it can create bugs that are hard to track and I indeed have been bitten by a bug caused by autoload. Here is an example of how it can be triggered:

1	#./test.rb:
2	autoload :A, 'a'
3	require 'a/b'
4
5	#./lib/a.rb:
6	require 'a/b'
7
8	#./lib/a/b.rb:
9	module A
10	module B
11	end
12	end
13
14	#ruby -I lib test.rb

Design opinions

I really prefer code that makes its dependencies very explicit. Some languages, like Java and most static ones, will force this to happen. But that is not the case in Ruby.

Rails prefers to follow the Don’t-Repeat-Yourself principle instead of being always explicit about each file dependencies. That makes it impossible for a developer to use a small part of some Rails component because they are designed in such a way that you have to require the entire component and not just part of it even if that file is pretty independent from everything else.

Recently I wanted to use some code in ActionView::Helpers::NumberHelper in my own class ParseFormatUtils. Even though my unit tests worked fine when doing that, my application would fail due to circular dependencies issues caused by autoload and the way the Rails code is designed.

In my applications it is always very clear what each class is responsible for. Rails controllers will only be concerned about the web layer and most of the logic will be coded in a separate class or module and tested independently. That makes testing (both manual and automated) much easier and faster and also makes it easier for the project developers to understand and follow the code.

I’m really sad that Rails doesn’t share my point of view with regards to that and thinks DRY principle is more important than being explicit about all dependencies in each file.

Final notes

Even though there are several aspects of Rails I dislike I couldn’t actually suggest a better framework for a web developer. If I weren’t using Rails I’d probably be using some other Ruby web framework and create some kind of Asset Pipeline and automatic reload mechanism but I don’t really think it would worth the benefits.

All Rails issues are manageable in my opinion. I think other frameworks I’ve worked with are not manageable. The have some fundamental flaws that prevent me from actually considering them if the choice is mine to make.

I’ve reported some serious bugs to Grails JIRA almost an year ago for instance with test cases included and they haven’t been fixed yet. This is something to be really worried about. All Rails issues are easily manageable in my opinion.

I may not deploy my application they way I’d prefer but Unicorn is currently fitting our application needs well enough. I can’t require just ‘action_view/helpers/number_helper’ but requiring full ‘action_view’ instead isn’t that bad either.

I’d just like to state that even though I don’t consider Rails/Ruby to be perfect, they’re still my choice when it comes down to general web development.

Client-side code testing with oojspec

2012-07-31T13:00:00+00:00

Introduction

I’ve been working solely on single-page web applications for the last 3 years. The client-side code I write is something about 70% of my total code and this percentage has been increasing over the time. While there are excelent tools to work with for testing back-end code in Ruby (RSpec, Capybara, FactoryGirl) I still miss a great framework for writing tests for my client-side code. At least that used to be the case.

We currently have tons of great alternatives for writing client-side code: Knockout.js, Angular.js, Ember.js, Serenade.js and a thousand more. They’re awesome for helping us to build single-page applications despite JavaScript being such an horrible language that is only now considering modular programming in ES6, but this will take some years before we can rely on its support :(

Even some languages, like the awesome CoffeeScript, were born to try to make JavaScript code writing more pleasant, although they’re still unable to provide something like a require/import statement. After all, they still need to compile to JavaScript :( Fortunately there are some assets pre-processor tools available to help us writing more modular code, like the Rails Asset Pipeline that will allow me to write “require"s as comments in my source headers and that has greatly reduced the pain that is working with JavaScript for me.

But when it comes to integration tests for my client-side code I’ve never felt great with regards to current available testing frameworks for JavaScript. I’ve been using Jasmine for a long time but I always missed a beforeAll/afterAll feature. A lot! Mocha/Chai bundle seems great, but unfortunately they require a JavaScript feature that is not present in older Internet Explorer, which I still must support in my products :( Finally, Buster.js is a great modular framework but it is just not suitable for the way I write integration tests because of their random execution order.

Konacha is a great gem that took the right approach on providing some conventions to tests organization being well integrated to the Rails Asset Pipeline. But it used Mocha/Chai… So I created a while ago the rails-sandbox-assets gem with the same goal of Konacha of introducing some conventions to test organization and integrating to the Rails Asset Pipeline. But differently from Konacha, it is framework-agnostic. In fact, I’ve written adapters for all mentioned testing frameworks in this article:

And recently my own testing framework built on top of Buster.js reporter and assertions:

oojspec

All those Ruby gems integrate to the Rails Asset Pipeline and all you have to do is creating your tests/specs in specific locations and they will be all automatically loaded by the test runner. Just like it happens with Konacha, this test runner server will only serve the application assets (JavaScript, CSS, images) and won’t touch any controllers, models or any other Ruby code.

It is even possible to integrate the Rails Asset Pipeline to non-Rails application, as I’ve done with this Grails application as a proof-of-concept. See oojs_assets_enabler for a minimal Rails application that can be integrated to any other server framework to enable you to use the power of the assets pre-processor and testing tools with your non-Rails application.

If you don’t like the idea of using the Rails Asset Pipeline (because you’re averse to Rails or Ruby names), even if it won’t require from you any Ruby knowledge, you can still use oojspec standalone. I’ve created some jsfiddle’s examples in oojspec README demonstrating how to do that (or do you think that JsFiddle has included support for Rails as well?! ;) ).

Enough with small talking!

Getting started

Take a look at the reporter first, to see how it looks like.

Yes, I know it is failing. This is on purpose so that you can see the stack-traces and how failures and errors look like.

Setting-up the runner

Rails applications

The oojspec gem will already provide you an HTML runner that will include all your tests/specs located under test/javascripts/oojspec/_test.js[.coffee] and spec/javascripts/oojspec/_spec.js[.coffee] at your taste. Just include the “oojspec” dependency to your Gemfile and run “bundle”.

Stylesheets in [test|spec]/stylesheets/oojspec/*_[test|spec].css are also automatically included in the HTML runner. You can just import the required CSS files from them.

Rails Asset Pipeline-enabled applications

If you want to take full advantage of the Rails Asset Pipeline, try to disassociate the “Rails” name from it first. It has nothing to do with Rails at all. You don’t have to learn Ruby or Rails for taking advantage of it. Although, if you’re using Rails you’ll be able to integrate your dynamic routes to your assets. But even if you aren’t you can get pre-compilation and minifying tasks, automatic CoffeeScript compiling and, specially, the ability of specifying dependencies between your sources by using special comments in your source headers:

1	// bowling_spec.js
2	// this will require bowling.js or bowling.js.coffee:
3	//= require bowling
4
5	describe("Bowling", function(){
6	// ...
7	});

Please let me know if you’d like a more in-depth article on how to take full advantage of the Rails Asset Pipeline with your non-Rails application.

All you have to do is to follow the short instructions here. This example has showed how to integrate with Grails but basically all you have to do is to adapt it to add this to your project.

No Rails integration at all

Okay, so you don’t see value in the Rails Asset Pipeline or you’re using your own tools for pre-processing your assets. Then you’ll have to write an HTML runner yourself, which is also pretty simple. Here is a working example in JsFiddle on how to do it.

1
2
3
4	href="http://oojspec.herokuapp.com/" />
5	http-equiv="content-type" content="text/html; charset=utf-8">
6
7	</span>oojspec Test Runner<span style="color: #116329">
8
9	href="/assets/oojspec.css" media="screen" rel="stylesheet" type="text/css" />
10
11
12
16
17
18
19
22
23

Feel free to download oojspec.css and oojspec.js for faster local development first.

Describing your code

Now that we have our runner set up, it is time to describe our code by writing some tests/specs.

You can do it with:

1	oojspec.describe("Some description", function(){
2	this.example("Basic stuff work :P", function(){
3	this.assert(true);
4	});
5	});

When using the oojspec gem, by default it will expose the “describe” function to the global (window) namespace, although this can be disabled by adding the following line to your application.rb:

1	config.sandbox_assets.options[:skip_oojspec_expose] = true

Also when using CoffeeScript to write your specs (even if your code is written in JavaScript), that example becomes more succinct. Also, I’m using the exported “describe” this time:

1	describe "Some description", ->
2	@example "Basic stuff work :P", -> @assert true

If you prefer to keep with JavaScript, but don’t want to type “this.” all the time, you can use an alternative idiom:

1	oojspec.describe("Some description", function(s){
2	s.example("Basic stuff work :P", function(s){
3	s.assert(true);
4	});
5	});

From within a description block, the following DSL keywords are available:

example/it/specify: all of them are aliases for declaring an example.
describe/context: aliases for declaring a nested context/description.
before/after/beforeAll/afterAll: hooks for code that should run before each, after each, before all and after all examples of that description respectively.
pending/xit: alias for declaring pending examples (or descriptions) whose block isn’t executed.

Writing your examples

From within an example, you can use any assertion supported by the referee library. All of them are well documented here. You can mix both assertions and expectations in your examples. And you can even write your own custom assertions/expectations.

1	oojspec.assertions.add("isVisible", {
2	assert: function(actual) {
3	return $(actual).is(':visible');
4	},
5	assertMessage: "Expected ${0} to be visible.",
6	refuteMessage: "Expected ${0} not to be visible.",
7	expectation: "toBeVisible"
8	});

Asynchronous examples

Sometimes you need to wait for certain conditions after taking some actions and those will most probably happen in an async fashion. So, for letting you focus in the specs instead of having to write polling functions yourself, oojspec borrows the waitsFor/runs approach from Jasmine.

1	describe("Some description", function(s){
2	s.example("Operation was successful", function(s){
3	$('button#create').click();
4	s.waitsFor("dialog to pop up", function(){
5	return $('#show-message-dialog:visible').length > 0;
6	});
7	s.runs(function(s){
8	s.expect('#show-message-dialog').toBeVisible();
9	})
10	});
11	});

You can use multiple waitsFor and runs blocks in the same example at your will.

Mocks

Sometimes mocks are really useful. Specially for creating fake HTTP servers for responding to your application AJAX requests. But since they’re orthogonal to test runners, no mocking library is included in oojspec. But I’d recommend you using the excellent Sinon.js mocking and stubing library. If you’re using the Rails Asset Pipeline, this is just a matter of including the sinon-rails gem to your Gemfile and requiring it in your spec:

1	//= require sinon

Sinon.js has a fake AJAX server built-in but if you always use jQuery for your AJAX requests you might find my gem fake-ajax-server somewhat easier to use.

Object-oriented testing

Specially when writing integration tests for my client-side code, I find it easier to describe a group of behaviors like sequential examples that are depending on a given order. In those cases I find it useful to share some state between them and taking an object-oriented approach would take care of this.

Suppose you have some class that you instantiate on your application load that will take care of registering some jQuery live events which are never unregistered because it is not needed by your application. So, you’re unable to instantiate such a class several times in “before” hooks because you’d be registering the same events several times. In that case, you can instantiate it in a “beforeAll” hook once in your suite.

But then it will be impossible to get back to the original state. But I don’t see this as a major issue. Suppose you have to test a dynamic tree, using the excellent jqTree library. You can start with an empty tree and add a test for including a new item to the tree. Then you add another test for including a sub item to the item created in your prior test. Then you add a test for moving it so that it becomes a sibling of the first item. Then you add a test for deleting the first item and make sure only the last one is kept. I don’t really mind if all those tests written for a “Tree Management” context are not independent from each other. I find it easier to write those tests in this sequential order than trying to make them independent.

This is the main point where I find the other testing frameworks to be too limiting for me or they don’t target the same browsers as I do.

When writing non-oo tests with oojspec, “this” will refer to an object containing only the available DSL for that context. This same DSL object is also sent as the first arguments to the blocks used by example, context, runs, etc.

On the other hand, when writing OO tests, you are in charge of specifying what will “this” refer to.

By default, OO tests are “non-bare”, which means that the DSL will be merged with your “this” object. This allows you to write “this.example” as before. But you can opt for using a “bare” approach in which case you’ll handle the DSL through the first argument of the block.

You can provide the description directly in the passed object or as the first argument as before. It is only required that your object responds to runSpecs() as the entry point.

Here are some examples:

1	// non-bare approach, with the description in the object itself:
2	describe({
3	description: 'Plain Object binding',
4	dialog: {dialog: true},
5	runSpecs: function(){ this.example('an example', this.sampleExample); },
6	sampleExample: function(){ this.assert(this.dialog.dialog); }
7	});
8
9	// traditional description syntax and a bare approach:
10	describe('Bare description', {
11	bare: true,
12	dialog: {dialog: true},
13	runSpecs: function(s){ s.example('an example', this.sampleExample); },
14	sampleExample: function(s){ s.assert(this.dialog.dialog); }
15	});

In case you prefer CoffeeScript, like me, you can find the “class” syntax somewhat easier to work with. oojspec will instantiate a class in case it detects it is a class (its prototype responds to runSpecs instead of the object itself). It even uses the constructor’s name if a description is not provided.

1	describe class # you can use an anonymous class as well
2	@description: 'Bare class'
3	@bare: true
4
5	runSpecs: (dsl)->
6	@dialog = dialog: true
7	dsl.example 'an example', @anExample
8	dsl.context 'in some context', @aContext
9	dsl.describe NonBareClass
10
11	anExample: (s)-> s.expect(@dialog).toEqual dialog: true
12
13	# this.runs is not available from an example when using a bare approach
14	aContext: (s)-> s.example 'another example', (s)-> s.refute @runs
15
16	class NonBareClass # description will be "NonBareClass"
17	runSpecs: ->
18	@dialog = dialog: true
19	@example 'an example', @anExample
20	@context 'in some context', @aContext
21
22	anExample: -> @expect(@dialog).toEqual dialog: true
23
24	# this.describe is never available from within an example
25	aContext: -> @example 'another example', -> @refute @describe

Real examples

This article is already long enough. I’ll try to find out some time in the future to focus in some real use case to demonstrate how I write integration tests for my single-page applications using some real application as an example.

Feedback

I’d really love to hear your feedback about oojspec. Please let me know what you think about it by e-mail, GitHub, comments in this page or Twitter (rrrosenfeld). If you think you’ve found some bug, please report it on GitHub issues.

Client-side Object Oriented Programming and Testing

2012-07-31T13:30:00+00:00

Introduction

Despite the fact that I don’t like the JavaScript language, we can’t just avoid it.

Client-side programming allows for better user experience and less network traffic and is required for lots of web applications. I’ve been doing client-side code most of my time since 2009 and it takes more and more of my time. I don’t think this is gonna change.

Although not perfect, CoffeeScript took a lot of the pain of writing JavaScript code for me, although it still doesn’t provide any import/require feature as it has to compile to JavaScript anyway. So, all examples in this article will be written in CoffeeScript, but feel free to write your own tests and code in JavaScript if you prefer.

Since we have a lot of our logic now in the client-side, it is time to take it much more seriously. That means we must write specs (unit and integration ones) for our client-side code as well. That has been a pain for me for a while, but I took some time to release some code to help us with this task, and this is mostly what I’ll be talking about in this article. Specially on client-side code integration testing.

Although my released gems depend on Rails Asset Pipeline support, this article should also guide you on how to easily write your specs for whatever server-side framework you’ve chosen. I’ll provide an example on how to do that for a Grails application, but you could apply the instructions for whatever other framework you want.

Design decisions

Feel free to skip this entire section.

Why the Rails Asset Pipeline?

I should state that I’m passionated about Ruby and that Rails is currently my web framework of choice, so be warned that this is probably a biased opinion.

The biggest mistake in the design of the JavaScript language in my opinion was the lack of a require/import statement, which won’t allow us to easily split our applications into modules. This was fixed for server-side JS applications by Node.js, but is still an issue for client-side code (that running in web browsers).

ES.Next is going to add modules support for JavaScript but it can take quite a while before 99% of your client users will be using a browser that supports those modules.

Currently I know two alternatives for dealing with dependency management in JavaScript:

AMD, with implementations like RequireJS or LabJS, but I find this approach to be too complicated to be practical and I’d rather avoid it;
Concatenation by using some pre-processor tool that can process the dependencies.

The Rails Asset Pipeline falls in this second category, just like the Grails Resources plugin. But the Resources plugin will require you to set up your dependencies in a separate file, while in the Rails Asset Pipeline you set up your dependencies as comments in your asset (JavaScript and stylesheets) headers. I much prefer this approach as it reminds me of regular require/import features existing in most programming languages. Also, differently from the Rails Asset Pipeline, the Grails Resources plugin won’t support CoffeeScript out-of-the-box.

Also, the Rails Asset Pipeline is well documented and easily extended by the use of plugins (or Ruby gems if you prefer).

My application is not written in Rails!

I’m sorry about you, but this is not a reason for not reading this article. You can still take advantage of the techniques and tools I describe here in whatever framework you’re using. Just keep reading on.

Why oojspec?

Please read this article for the reasoning behind it. In short, oojspec is designed with integration tests in mind and an OO approach.

Why object-oriented JavaScript?

I really like OO programming and being able to easily share states. This allows me to write maintainable code and specs in a modular way.

Why CoffeeScript?

I find code written in CS more concise and easier to read. It supports comprehensions, destructuring assignment, splats, string interpolation, array range syntax, “class” and “extend” keywords, “@attribute” as a shortcut to “this.attribute”, easy function bindings through “=>”, and easier “for-in” and “for-of” constructions among several other great language additions.

On the other side I don’t like very much that “==” is translated to “===” and that “elvis?” has a different meaning inside functions and a few other issues I can’t remember right now.

But all in all, CS is a much better language than JS in my opinion. Even if you don’t want to write CoffeeScript for your production code, you should consider using it at least for your specs. But feel free to use JS for your specs too if you really dislike CS.

So, with CS and the Rails Asset Pipeline which will provide a require-like mechanism, client-side programming is no longer a pain to me. Well, that and the bundled helper tools for helping me out in the testing task, which I’ll explore more in-depth in this article.

Why splitting a spec in multiple files?

After writing some specs you can end up with a huge file when writing an integration testing for an application. There will be lots of “describes”/contexts and I’d rather see them split in multiple files for better organization and maintainability. But this is just a suggestion, feel free to use regular “class” constructions in CoffeeScript and put everything in a single file if you prefer.

What about full integration tests?

The integration tests I’ll be talking about in this article will use a mocked fake server that will simulate replying to AJAX requests. This will only work for requests using jQuery.ajax (or getJSON/post) which is stubbed by the excelent SinonJS written by my friend Christian Johansen from Gitorious fame.

This will allow the techniques presented in this article to be used with whatever web framework you can think of. Another advantage is that it will run pretty fast by mocking the server-side responses.

Having said that, if you really want to write full integration tests, like with Capybara, this should be pretty easy to achieve if your application is written in Rails. It is just a matter of mounting the spec runner in some route like ‘/oojspec’ for your test environment. Please leave some comment if you want some detailed instructions on how to do that, but be aware that you won’t be able to write Ruby code from your JavaScript specs, like filling some initial data in the database through some beforeEach calls… You’d need to add some extra test-only routes for helping you with that.

Enough with the small talk! Get me right into the subject!

Okay, okay, calm down :)

Installing instructions

Non-Rails applications

You’ll need a minimal Rails application in some of your application sub-directory.

Here are the instructions for doing so (You’ll need Ruby 1.9 and RubyGems installed):

gem install bundler; gem install rake;
Oojs Assets Enabler - just clone it to some first-level subdirectory;
Run “bundle” from this subdirectory;
Optionally symlink the Rakefile to your root directory;
Run “rake oojs:spec_helper” to create a sample spec_helper.js.coffee;
Run “rake oojs:spec – –name=shopping_cart” to create a sample spec;
Run “rake oojs:serve” to start the server;
navigate to http://localhost:5000 to see your specs passing.

Rails applications

Add the ‘oojs’ gem to your Gemfile and run “bundle”;
rails g coffee:assets shopping_cart; # or js:assets if you prefer
rails g oojs:asset_helper;
rails g oojs:asset shopping_cart;
rake sandbox_assets:serve;
navigate to http://localhost:5000 and see your specs passing.

Organizing your tests/specs

The specs go to “spec/javascripts/*_spec.js(.coffee)”. They usually “=require spec_helpers” in the first line.

You’re encouraged to split your spec class in several files. Just see the example specs created by the bundled generators.

If you run the spec_helper generator and then run “rails g oojs:asset shopping_cart” (or “rake oojs:spec – –name=shopping_cart” for non Rails applications), these files will be created:

spec/javascripts/spec_helper.js.coffee:

1	# =require application
2	# =require modules
3	# =require jquery
4	# =require oojspec_helpers
5	# #require jquery.ba-bbq # uncomment for enabling $.deparam()
6	#
7	# Put your common spec code here.
8	# Then put "# =require spec_helper" in your specs headers.b

You’ll need to remove the first “# =require application” line if your application doesn’t have an application.js(.coffee) file in the assets path. All other dependencies are provided by the oojs gem.

spec/javascripts/shopping_cart_spec.js.coffee:

1	# =require spec_helper
2	# =require_tree ./shopping_cart
3
4	oojspec.describe 'ShoppingCart', new specs.ShoppingCartSpec

spec/javascripts/shopping_cart/main.js.coffee:

1	extendClass 'specs.ShoppingCartSpec', (spec)->
2	initialize: ->
3	@createFakeServer()
4	@extend this, new specs.oojspec.AjaxHelpers(@fakeServer)
5
6	runSpecs: ->
7	@beforeAll -> @fakeServer.start()
8	@afterAll -> @fakeServer.stop()
9	@before -> @fakeServer.ignoreAllRequests()
10
11	@it 'passes', ->
12	@expect(@fakeServer).toBeDefined()

Feel free to add as many files you want inside the spec/javascripts/shopping_cart/ directory.

spec/javascripts/shopping_cart_spec/fake_server.js.coffee:

1	# =require fake_ajax_server
2
3	createProducts = -> [
4	{id: 1, name: 'One'}
5	{id: 2, name: 'Two'}
6	]
7
8	extendClass 'specs.ShoppingCartSpec', ->
9	createFakeServer: ->
10	@fakeServer = new FakeAjaxServer (url, settings)->
11	if settings then settings.url = url else settings = url
12	handled = false
13	switch settings.dataType
14	when 'json' then switch settings.type
15	when 'get' then switch settings.url
16	when '/products' then handled = true; settings.success createProducts()
17	# when 'post' then switch settings.url
18	# when ...
19	# when undefined then switch settings.type
20	# when 'get' then switch settings.url
21	# when ...
22	# when 'post' then switch settings.url
23	# when ...
24	return if handled
25	console.log arguments
26	throw "Unexpected AJAX call: #{settings.url}"

AJAX calls

Whenever your application issue an AJAX request, and that is handled by your fake server, you’ll need to decide what to do in your specs. For example, if you click a button and wants to wait for an ajaxRequest to complete, and then process the request, do something like:

1	@it 'asks for products when clicking on Products button', ->
2	$('#products-button').click()
3	@waitsForAjaxRequest()
4	@runs ->
5	@nextRequest '/products', 'get', 'json' # won't pass if such a request wasn't issued
6	@expect($('ul#products li:contains(One)')).toExist()

Take a look at ajax_spec_helpers.js.coffee for a list of useful available helpers.

Also take a look at oojspec-jquery.js.coffee for a list of additional matchers for usage with jQuery objects.

Conclusion

There is a lot more to discuss but this article has already taken me a lot of time. I’m intending to write another article creating a test suite for an existent sample application to further demonstrate its capabilities.

Feel free to leave any questions or suggestions in the comments so that we can improve those techniques even more.

Happy client-side coding :)

Getting started with Sequel in Rails

2013-12-20T10:25:00+00:00

Why Sequel?

In short, I feel it is better designed than ActiveRecord and makes some non-trivial queries much easier to implement and read. Detailed information can be found here.

How to use Sequel models?

I didn’t create any generator or gem for my application. It is just pretty simple to setup your environment.

Add “gem ‘sequel’” to your Gemfile
Create an initializer, like config/initializers/setup-sequel.rb (see example below)
Create your models (see example below)

1	# config/initializers/setup-sequel.rb
2	c = ActiveRecord::Base.configurations[Rails.env]
3	c['adapter'] = 'postgres' if c['adapter'] == 'postgresql'
4	c['user'] = c.delete 'username'
5	c['logger'] = [Rails.logger, Logger.new("log/#{Rails.env}_db.log")]
6	c['logger'] << Logger.new(STDOUT) if Rails.env.development?
7	DB = Sequel::Model.db = Sequel.connect c
8	Sequel::Model.db.sql_log_level = Rails.application.config.log_level \|\| :info
9
10	if ARGV.any?{\|p\| p =~ /(--sandbox\|-s)/}
11	# do everything inside a transaction when using rails c --sandbox (or -s)
12	DB.pool.after_connect = proc do \|conn\|
13	DB.send(:add_transaction, conn, {})
14	DB.send(:begin_transaction, conn, {})
15	end
16	end
17
18	# Sequel::Model.plugin :active_model
19	# Sequel::Model.plugin :validation_helpers

You can enable the available plugins directly in the initializer or in a per-class basis.

If you’re using FactoryGirl, it requires the model classes to respond to ‘save!’, so you can add this to your initializer:

1	module Sequel::Plugins::FactoryGirlSupport
2	module InstanceMethods
3	def save!
4	save_changes raise_on_save_failure: true
5	end
6	end
7	end
8	Sequel::Model.plugin Sequel::Plugins::FactoryGirlSupport # or plugin :factory_girl_support

Finally, create your models:

1	# app/models/user.rb
2	class User < Sequel::Model
3	# do whatever you want here
4	end

If you’re used to ActiveRecord you can take a look at Sequel for ActiveRecord Users.

Devise

If you want to use your Sequel model as a Devise authentication class, please take a look at the sequel-devise gem.

In short, just append “gem ‘sequel-devise’” to your Gemfile (you’ll also need the ‘devise’ gem if you’re starting from scratch).

Then, enable your User class to be compatible with Devise. If you want to keep your current User class while you’re giving this a try, just put it in another namespace, as in the example below:

1	# app/models/sq/user.rb
2	module SQ
3	class User < Sequel::Model
4	plugin :devise
5	devise :database_authenticatable
6	end
7	end

Finally, in your routes, if you’re using this namespaced User class, you’ll need to adapt your devise_for statement to something like:

1	# config/routes.rb
2	devise_for :users, class_name: 'SQ::User'

RSpec

For running your examples inside database transactions, you can add this to your spec_helper.rb:

1	# setup transactional factory for Sequel
2	config.around(:each) do \|example\|
3	DB.transaction do
4	example.run
5	raise Sequel::Error::Rollback
6	end
7	end

Have fun

Feel free to leave any questions in the comments or to report any bugs to the sequel-devise gem.

If you’re like me, you’ll enjoy Sequel way better than ActiveRecord.

MySql localhost behavior is totally insane

2012-03-26T22:40:00+00:00

I always have this issue so I found it would be better to document it for myself.

I never really liked MySql, preferring PostgreSQL instead for multiple reasons, but that is not what this article is about. While I don’t migrate the current database to PostgreSQL (everything is already set up, just waiting for permission for doing so), I’ll probably have this issue many more times.

I have some port redirects to my application production database servers. And I always try to access using this command line (or through some library API, it doesn’t make any difference):

1	mysql -h localhost -P 3307 -u my_user -p my_database_name

The problem is that I succeed on doing that, but I’m actually using my local database using sockets. WTF?!

Since I have the same users and passwords in my local database, it doesn’t complain, but it completely ignores the -P (–port) argument and I think I’m accessing the right database. There are two fixes for that. The most simple one and that will also work in my software configurations is to use 127.0.0.1 instead of localhost.

For the command line, you can use also add the –protocol=tcp argument:

1	mysql --protocol=tcp -h localhost -P 3307 -u my_user -p my_database_name

Now I don’t understand why the protocol isn’t set to TCP automatically when you specify a TCP port in the program arguments! This is just dumb! And worse than that is to find out that the state for this bug is “Not a bug”. I really hate MySql.

How NokoGiri and JRuby saved my week

2012-03-04T12:30:00+00:00

I’d like to share some experiences I had this week trying to parse some HTML with Groovy.

Then, I’ll explain how it was better done with JRuby and it was also finished much faster too.

This week I had to extract some references from some HTML documents and store them to the database.

This is the spec of what I wanted to implement in MiniTest specs written in Ruby:

1	# encoding: utf-8
2	require 'minitest/autorun'
3	require_relative '../lib/references_extractor'
4
5	describe ReferencesExtractor do
6	def example
7	%Q{
8
9
10
11	some text
12
13
14
15	First paragraph.
16	Second paragraph.
17
18	Another paragraph.
19
20
21	}
22	end
23
24	it "extract references from example" do
25	return
26	extractor = ReferencesExtractor.new example
27	{
28	['1'] => {'1' => "some text First paragraph. Second paragraph. Another paragraph."},
29	['1211', '1212', '11'] => {'121' => "First paragraph. Second paragraph."},
30	['1211', '1212', '122'] => {'12' => "First paragraph. Second paragraph. Another paragraph."},
31	['12', '1212'] => {'12' => "First paragraph. Second paragraph. Another paragraph."},
32	['1212', '122'] => {'1212' => "Second paragraph.", '122' => "Another paragraph."},
33	}.each {\|cids, expected\| extractor.get_references_texts(cids).must_equal(expected) }
34	end
35	end

I had a similar test written using JUnit, with a small change to make it more easy to implement but I’ll discuss it later on in this article. Let me just explain this situation better.

Don’t ask me what “cid” means as I wasn’t the one to name this attribute, but I guess it is “c…” id, although I have no clue what is “c…” all about. It was already called this way when I started working on this project and I’m the sole developer of this project right now after lots of other developers having worked on it before me.

Part of the application I maintain has to deal with documents obtained from Edgar filings. Then a processing is made to each HTML tag so that they’re given sequential unique numbers in the “cid” attribute. Someone will then be able to review the documents and highlight certain parts of it by clicking on the elements in the page. So the database has a reference to a document and a cid list, like “1000,1029,1030” will all elements that should be highlighted. This was stored exactly this way as a string in a database column.

But some weeks ago I was requested to export the contents of some highlighted references to an Excel spreadsheet and this is somewhat more complex than it looks like. With jQuery, it would be equivalent to “$(‘[cid=12]’).text()”.

For performance reasons in the search interface I had to import all references from over 3,000 documents to the database. For the new references, I’ll do the processing with jQuery and send it already formatted to the server, but I need to do the initial import and doing the batch processing in the client-side would be painfully slow for this case.

But getting the correct output in the server-side is not that simple. For example, for those documents, there is no CSS involved, making it simpler to deal with. So “

some t

” should be stored as “some t ex t” while “

some text” should be stored as “some text”. Since this requires a deeper understanding of HTML semantics, I decided to simplify it while dealing with Groovy and assume all elements as being block-level elements while parsing the fixed HTML as XML.

The Groovy solution

Doing that in Groovy took me a full week specially due to lack of documentation of XmlParser and XmlSlurper Groovy classes.

First, I had no clue which one to choose. As they had a similar interface I decided to start with XmlParser, and then change to XmlSlurper when it was finished to compare the performance between them.

I couldn’t find any methods for searching for some XPATH or CSS expression. When you write “new XmlParser().parseText(xmlContent)”, you get a Node.

XmlParser is not an HTML parser, so the XML content should be well formed, then you need to use some library like NekoHTML or TagSoup. Then you would use it like “new XmlParser(new Parser()).parseText(xmlContent)” That’s ok, but if you want to play with it and don’t know Groovy enough for dealing with Gradle and Maven dependencies, just use a valid XML as an example.

Since I couldn’t find a search-like method for Node, I had to look for node ‘[cid=12]’ with something like this:

1	xmlContent = ' some text as an example . '
2	root = new XmlParser().parseText(xmlContent)
3	node = root.depthFirst().find { it.@cid == '12' }

Calling “node.text()” would yield to ‘some text.’ and calling “node.children()” would yield to [‘some text’, spanNode, ‘.’], which means it ignores white spaces, so it is of no usage to me.

So, I tried XmlSlurper. In this case, node.text() yields to ‘ some text as an example .’. Great for this example, but when applied to node with cid 12 in the MiniTest example above, it would yield to ‘First paragraph.Second paragraph.Another paragraph.’ ignoring all white spaces, so I couldn’t use this.

But after searching a lot, I figured out that there was a class that would convert some node back to XML including all original white spaces, so it should be possible. Then I tried to get the text by myself.

“node.children()” returned [spanNodeChildInstance], ignoring the text nodes, so I was out of luck and had to dig into its source code. Finally after some hours digging the source-code I found what I was looking for: “node[0].children()” returning [‘ some text ’, spanNode, ‘.’].

It took a while before I could get this to work, but I wasn’t finished with it. I would have to navigate the XML tree for getting the final processed text. Look at the MiniTest example again and you’ll see that I needed to get node with cid 12 as equivalent to the cid list [1211, 1212, 122].

So, one of the features I needed is to look for the first node ancestral having a cid, so that I could try it to see if it was a possible node. It happens that it was not that simple as while traversing the parents maybe I couldn’t find any parent node with a cid. So, how could I check that I’ve reached the root node?

With XmlSlurper, when you call rootNode.parent() you’ll get rootNode. So, I tried something like this:

1	parent = node.parent()
2	while (!parent.@cid && parent != parent.parent()) parent = parent.parent()

But the problem is that the comparison is made by string, so I have no real way to see if I have reached the parent. So, my solution was to check for “node.name() != ‘html’” in this case. This is really a bad API design. Maybe root.parent() could return null. Also, I should be able to compare a node instead of its text.

After several days, in the end of last Thursday I could get a “working” version of a similar JUnit test passing with an implementation in Groovy. But as I wasn’t using really an HTML parser, but an XML one, it means that I couldn’t process white-spaces correctly for in-line blocks.

NokoGiri

Then, on Friday morning I was curious how I could parse HTML with Ruby, as I never did it before. That was when I got my first smile that morning when I read this from Aaron Patterson documentation of NokoGiri:

XML is like violence - if it doesn’t solve your problems, you are not using enough of it.

The smile got even bigger when I tried this:

1	require 'nokogiri'
2	Nokogiri::HTML(' Some Text.').text == 'Some Text.' # true

The smile has shrunk a bit when I realized that I would get the same result if I replaced the inline “b” block element with a “div”. But that is ok, it was already good enough.

Other than the “text” method being more useful than the one used by XmlSlurper (new-lines are treated differently), navigating the XML tree is also much easier with NokoGiri. But I still couldn’t find a good way of finding out if some node was a root one, as calling “root.parent” would raise an exception. Fortunately, as NokoGiri supports XPATH, I didn’t need to do this manual traversing and this wasn’t an issue to my specific needs.

But there was a remaining issue. It performed very badly when compared to the Groovy version, about 4 times slower. Looking at my CPU usage statistics it was obvious to me that it wasn’t using all my CPU power, as in the Groovy version. It didn’t matter how much threads I used with CRuby, each processor wouldn’t be over 20% of the available capacity.

JRuby to the rescue

It is a shame that the Java API actually has a better API than Ruby for dealing with a pool of threads. It is called the Executors framework. As I couldn’t find something like this in the Ruby standard library, I tried a Ruby gem called Concur.

I didn’t investigate if the performance issues were caused by Concur implementation or the CRuby one, but I decided to give JRuby or Rubinius a try. As I already had JRuby available, I tried it first and as the results were about the same as the Groovy version, I didn’t bother to check Rubinius.

With JRuby I could use the Java Executors framework just like in Groovy and I could see all my 6 cores above 90% all the time my 10 threads have been working for importing over 3,000 documents. Unfortunately my actual servers are much slower than my computer and it took more than 4 hours in the staging server when it took about an hour and a half in my computer. The CRuby version would probably take more than 4 hours in my computer, which means it could take almost a full day in the staging and production servers.

Conclusion

I must explain that I haven’t tried using Ruby first because I would be able to take advantage of my models being already mapped by the Grails application, so I wouldn’t have to deal with database set-up and would be allowed to have all my code in a single language. Of course, if I knew beforehand all the pain that it would be coding this in Groovy, I would have already done this in Ruby from the beginning. And the Ruby version was a bit better than my previous attempt with Groovy with regards to some corner cases including new-lines processing.

I’m very grateful for Aaron tendelove Paterson and Charles Nutter for their awesome work on Ruby, NokoGiri and JRuby. Thanks to them I could get my work done very fast in an elegant way, saving my week of frustration with Groovy.

Should we move forward or remain backward compatible?

2012-03-04T12:30:00+00:00

This is just an article’s title, not really a question with a right answer.

It is not always possible to both move forward and remain compatible with legacy code.

Usually, when a project starts there is no legacy code and every change is welcomed. Later on, when the project grows and the user’s code base gets bigger, some people will start complaining about incompatible changes because they’ll have to spend some time changing their code base when they decide to upgrade to a newer version.

When this time comes, the project has to make a decision. It should either keep moving forward and fixing badly designed API when they realize there is a better way of doing things or they should accept that an API change can be very painful for their framework/library users and decide to keep on with the bad API. Java definitely opted for the latter.

The Rails case

In the last weeks, I’ve been reading some articles complaining about Rails changing its API in incompatible ways too fast.

They’re not alone and I’ve seen complaints about this from several other people. In the other side I’m constantly refactoring my own code base and I appreciate Rails doing the same. In the case of libraries and framewoks, when we’re refactoring code, sometimes we come to the conclusion that some API should be better written even if it breaks old software. And I’m also not alone in thinking this way.

Unfortunately, I couldn’t find an employer to pay me to work with Rails as much as I do as a Grails/Groovy/Java developer for the last 3 years. And that is really a pain with regards to API, stability and user experience. I don’t remember complaining about anything in Ruby or Rails that I really missed since internationalization support was added to Rails in version 2.

The Groovy / Java case

This section has grown too fast, so I decided to split it in another article entitled How NokoGiri and JRuby saved my week.

You don’t have to read the entire article if you’re not curious enough, but the Groovy XML parsers API was so badly designed and documented that I could finish the logic with Ruby and NokoGiri in about 2 hours (with tests and setup included) while I spent the entire week trying to do the same in Groovy.

And the result in Ruby would take about the same time for the import to complete. I had to dig into Groovy’s source code due to lack of documentation and do lots of experiments to understand how things worked.

You can fix documentation issues without changing the API, but you can’t fix design issues with Groovy parsers without changing its API. So, is it worth keeping the API just for being backward-compatible and make XML parsing a pain to work with in Groovy?

Then what?

There is not a better approach to take when you decide for remaining backward compatible or keep forward. So, each project will adopt some philosophy and you need to know its philosophy before adopting it or not.

If you prefer API stability over consistency and easy of use, you should choose something like Java, C++, Perl, PHP or Grails. You shouldn’t be really considering Rails.

In the other hand, if you like to be on the edge, then Rails is exactly the way to go.

Which one to choose will basically depend on these questions:

Do you have a good test coverage of your code base?
Do you have to respond really fast to changes?
Will your code hardly change after it is finished?

If you answered “yes” to 3, than you should consider a framework that will avoid very hard to break its API, since no one will constantly maintaining your application to keep up with all the framework upgrades with fixed security issues, for example.

In the other hand, if you have answered “yes” to 1 and 2, using a fast pace changing framework like Rails shouldn’t be an issue. In my case, I don’t write tests for my views as they’re usually very simple and doesn’t contain logic. So, when Rails changed some rules about when to use “<%= … %>” or “<% … %>”, I had to manually look at all of my views to fix them. And I had to do that twice between Rails 2 and Rails 3.1, for example because they did change this behavior back and forward and this is the kind of unnecessary change in my opinion.

Other changes I had to manually check because I don’t test my views is due the change of the output of ERB tags being escaped by default. But that is a good change and I’m pretty sure I forgot to manually escape some of them before the upgrade. So, my application was probably safer against attacks after the upgrade, so this is a good move even so it took a while for me to finish the upgrade. There was no easy path for this change.

But other than that, it was just a matter of making the test suite pass after the upgrade, and if you valuate code refactoring as much as I do, you’ll be writing tests for all code that could possibly break in some refactoring.

And this was a hard issue I have with Grails. I find it too time demanding to write tests for Grails applications and it was really a pain before Grails 2 was released. It is still not good, but I can already write most of my unit tests in Grails without much problem.

So, I would suggest you to answer the above questions first before choosing what web framework to adopt. It is not right to get a fast moving framework because its API is better designed and then later in the future ask their maintainers to stop changing because now you have a working application.

You should know how they work beforehand and accept this when you opt of it.

How do Rails and Grails differ?

2012-01-21T14:45:00+00:00

A while ago I’ve written on why I prefer Rails over Grails, so be aware that this is another biased article.

That old article is already outdated since Grails 2 was released, and I was asked to update that article. That was my original idea, but then the comments wouldn’t make sense anymore, so I decided to write another take on Rails and Grails comparison. But this is a completely entire new article and not just an update to the old one.

Misconceptions first

Java is rock solid, while Ruby is not

I never understood this statement although I’ve been constantly told this for a long time.

Both languages were first released in 1995, more than 15 years ago, so why wouldn’t Ruby be considered as solid as Java?

Dynamic languages aren’t reliable

I have no idea why some people think that getting some program to compile is any indication that it should work.

Certainly those people don’t include Kent Beck and Erich Gamma or they wouldn’t have developed JUnit back in 1994, even before Java 1.0 being publicly released by Sun Microsystems.

So, as far as you understand that you need automated tests in whatever language you choose, it shouldn’t matter if the language is a static or a dynamic one.

Java written code runs much faster than those written in Ruby

How much? No one answers me that question. They think this way: “Java programs are compiled, so they must run faster then sofware written in any interpreted language”. People should really be worried about how fast they need their application to be before choosing their framework. If they can’t measure, they can’t compare performance, this is pretty obvious.

If you need a web application, you should be able to benchmark for your actual scenario before choosing a language and web framework. Also, if your application is very JavaScript intensive, it shouldn’t really matter the performance of the server side for many applications.

A typical web application will fetch data from some kind of database, do some parameters bindings and generate some HTML, XML or JSON result. This usually happens really fast on any language or web framework, so you shouldn’t be really concerned about language performance for web applications. Most performance improvements will be a result of some design change rather than a language change.

So, it is more likely that the framework design is more important than the language itself. If some language allows programmers to easily write better designed code, it is more likely that a framework written in such language will perform better. You should really be concerned on how fast you can develop your solution with the chosen framework/language. And I really don’t believe anyone can be as productive in Java as in any other dynamic and less verbose language.

Grails is the only Rails-like framework alternative for the JVM

Haven’t you ever heard that you can run Rails in the JVM through JRuby, a Ruby interpreter written in Java? The Rails test suite goes green on JRuby as well.

Too much talk, bro, go straight! So, what are the differences?

Reuse of software vs monolithic

Grails is built on top of the well known Spring framework and Hibernate, and integrates to Maven and Ivy.

Rails was originally considered a monolithic full-stack framework, with very few dependencies on external libraries. This has changed a lot since the Rails 3 refactoring, but somehow people still see Rails as a monolithic framework.

Integration level

While both Rails and Grails will reuse external libraries, Rails seems to be more well integrated to them than Grails.

This is very noticeable in the case of Hibernate integration on GORM, the Grails Object-Relational Mapper (ORM).

Rails uses by default the ActiveRecord library as their ORM solution, that implements the Active Record pattern in Ruby.

Hibernate, in the other side, adopted the Data Mapper / Unit of Work (Session) pattern.

I won’t cover the differences, merits and shortcomings of those patterns as it is out of the scope for this article and there is plenty of information around the web about them. I’d just like to state that you can opt for the DataMapper library in Ruby if you prefer this pattern.

The important thing here is to point that Grais will try to hide the Hibernate Session for newcomers to Grails and make some developers believe it implements the Active Record pattern, since the Data Mapper pattern add complexity for simple applications. The documentation will only cover Hibernate Sessions after explaining about Domain Modelling. This topic is so important for avoiding issues with Grails that it should be the first one as it can lead to several unexpected results.

If you’re planning to to use Grails, don’t do that before reading the entire documentation for GORM and this series of 3 articles about GORM Gotchas. This will save a lot of your time in the future.

GORM has bad defaults for newcomers and you’ll be surprised by when data is persisted and why you can’t call save() directly in some GORM instance in a background thread. That is usually the situation where you learn about the Hibernate Session if you haven’t read the entire documentation before.

On the other hand I haven’t found a single “gotcha” for the ActiveRecord gem, used by Rails as the default ORM implementation. Also all libraries used by Rails are very well integrated.

Object-Relational Mapping

Automated Testing

Bugs

Community Ecosystem

Framework source code

What did I learn about Code Writing?

2012-01-08T23:57:00+00:00

I’ve being coding for about 2 decades now. And still I don’t find it to be an exact science, as some would like to suppose. Otherwise, they wouldn’t ask you for time estimates on feature requests or try to use tools like MS Project to manage a software project, as if Gantt charts could be useful for this kind of project.

Of course, I can completely understand the reasons for the ones willing to bring such project management tools to the software world. Good luck to them! But I won’t talk about this subject in this article as it is too big. I would just like to state that software can be better understood when compared to sciences like Music or general Arts.

Both require lots of experience, personal feelings and are hard to estimate conclusion times since it is almost always something completely new. Although there are some recipes for certain kinds of music or movies, but then they are no longer art.

Some time ago I was asked to estimate how long it would take for me to implement a search system over some HTML documents taken from EDGAR filings. I’m pretty sure that this wouldn’t be something new for some of you who have already had experience with search engines before, but that wasn’t my case definitely. So, I knew I should research about tools like Lucene for search indexing, but I have never worked with them before. So how could I estimate this?

As I started following the tutorials, I thought the main problem was solved in the first 2 days, but I couldn’t predict that I would spend so much time reading about the configuration files for Solr, and how search params could be adjusted. There is a lot of stuff to know about and configure for your needs.

Particularly, one of the curiosities I’ve noticed is that even if my configuration was set to enable AND-like search for all typed terms, if it happens for a user to prepend some word with a plus (“+”) or minus (“-”), then non-prepended words would become optional. I had enabled the DisMax mode, by the way.

The challenge

So, I’d like to talk specifically about this specific challenge as it is a good example for demonstrating some techniques I’ve learned last year after reading Clean Code. Although being very Java-oriented, this book has a few simple rules that can be applied to every language and be really effective. Just like in Music and Movie Making, Software Writing is also a science in which there are lots of resources to learn from and that can be used in a systematic way. Learning those tools and techniques will help developers to deliver more in less time.

Developers should invest time on well-written code because they’ll spend most of their time reading code. So, it makes sense to invest time and money on tools that will make it easier to browse some code as well as investing some time polishing their code so that they become more readable too.

Before talking about those simple rules, I’d like to show you how I might write this code in my early times. Don’t waste your time trying to understand this code. Then, I’ll show you the code that I’ve actually written in a couple of hours, exactly as I have estimated before, since it didn’t have any external dependencies. So, basically, this is the trend:

Transform terms like ‘some +required -not-allowed “any phrase” id:(10 or 20 or 30)’ into ‘+some +required -not-allowed +“any phrase” +id:(10 or 20 or 30)’.

Pretty simple, right? But even software like this can be bug-prone. So, here is a poor implementation (in Groovy, as I’m a Grails programmer in my current job). Don’t try to really understand it (more on this later), just take a look at the code (dis)organization. I didn’t even try to compile it.

How not to code

1	class SolrService {
2	...
3	private String processQuery(String query) {
4	query = query.replaceAll('#', '')
5	def expressions = [], matches
6	while (matches = query =~ /$[^\(]*?$/) {
7	matches.each { match ->
8	expressions << match
9	query = query.replace(match, "#{${expressions.size()}}".toString())
10	}
11	}
12	(query =~ /\".*?\"/).each { match ->
13	expressions << match
14	query = query.replace(match, "#{${expressions.size()}}".toString())
15	}
16	query = query.split(' ').findAll{it}.collect { word ->
17	word[0] in ['-', '+'] ? word : "+${word}"
18	}.join(' ')
19	def s = expressions.size()
20	expressions.reverse().eachWithIndex { expression, i ->
21	query = query.replace("#{${s - i}}", expression)
22	}
23	}
24
25	def search(query) {
26	query = processQuery(query)
27	...
28	return solrServer.request(new SolrQuery(query))
29	}
30	}

Ok, I’ll agree that for this specific case, the code may be not that bad, but although processQuery is not that big, you’ll need some time for figuring it out what is happened if you’re required to modify this method.

Also, looking at it, could you be sure it will work for all cases? Or could you tell me what is the reason for some specific line? What is this code protected from? How comfortable would you be if you were to modify this code? How would you write automated tests for processQuery?

Also, as the logic gets more complex, coding this way could led to some messy code like the one I’ve just taken from a file in the project that integrates Hibernate to Grails:

1	// grails-core/grails-hibernate/src/main/groovy/grails/orm/HibernateCriteriaBuilder.java
2	// ...
3	@SuppressWarnings("rawtypes")
4	@Override
5	public Object invokeMethod(String name, Object obj) {
6	Object[] args = obj.getClass().isArray() ? (Object[])obj : new Object[]{obj};
7
8	if (paginationEnabledList && SET_RESULT_TRANSFORMER_CALL.equals(name) && args.length == 1 &&
9	args[0] instanceof ResultTransformer) {
10	resultTransformer = (ResultTransformer) args[0];
11	return null;
12	}
13
14	if (isCriteriaConstructionMethod(name, args)) {
15	if (criteria != null) {
16	throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here"));
17	}
18
19	if (name.equals(GET_CALL)) {
20	uniqueResult = true;
21	}
22	else if (name.equals(SCROLL_CALL)) {
23	scroll = true;
24	}
25	else if (name.equals(COUNT_CALL)) {
26	count = true;
27	}
28	else if (name.equals(LIST_DISTINCT_CALL)) {
29	resultTransformer = CriteriaSpecification.DISTINCT_ROOT_ENTITY;
30	}
31
32	createCriteriaInstance();
33
34	// Check for pagination params
35	if (name.equals(LIST_CALL) && args.length == 2) {
36	paginationEnabledList = true;
37	orderEntries = new ArrayList<Order>();
38	invokeClosureNode(args[1]);
39	}
40	else {
41	invokeClosureNode(args[0]);
42	}
43
44	if (resultTransformer != null) {
45	criteria.setResultTransformer(resultTransformer);
46	}
47	Object result;
48	if (!uniqueResult) {
49	if (scroll) {
50	result = criteria.scroll();
51	}
52	else if (count) {
53	criteria.setProjection(Projections.rowCount());
54	result = criteria.uniqueResult();
55	}
56	else if (paginationEnabledList) {
57	// Calculate how many results there are in total. This has been
58	// moved to before the 'list()' invocation to avoid any "ORDER
59	// BY" clause added by 'populateArgumentsForCriteria()', otherwise
60	// an exception is thrown for non-string sort fields (GRAILS-2690).
61	criteria.setFirstResult(0);
62	criteria.setMaxResults(Integer.MAX_VALUE);
63
64	// Restore the previous projection, add settings for the pagination parameters,
65	// and then execute the query.
66	if (projectionList != null && projectionList.getLength() > 0) {
67	criteria.setProjection(projectionList);
68	} else {
69	criteria.setProjection(null);
70	}
71	for (Order orderEntry : orderEntries) {
72	criteria.addOrder(orderEntry);
73	}
74	if (resultTransformer == null) {
75	criteria.setResultTransformer(CriteriaSpecification.ROOT_ENTITY);
76	}
77	else if (paginationEnabledList) {
78	// relevant to GRAILS-5692
79	criteria.setResultTransformer(resultTransformer);
80	}
81	// GRAILS-7324 look if we already have association to sort by
82	Map argMap = (Map)args[0];
83	final String sort = (String) argMap.get(GrailsHibernateUtil.ARGUMENT_SORT);
84	if (sort != null) {
85	boolean ignoreCase = true;
86	Object caseArg = argMap.get(GrailsHibernateUtil.ARGUMENT_IGNORE_CASE);
87	if (caseArg instanceof Boolean) {
88	ignoreCase = (Boolean) caseArg;
89	}
90	final String orderParam = (String) argMap.get(GrailsHibernateUtil.ARGUMENT_ORDER);
91	final String order = GrailsHibernateUtil.ORDER_DESC.equalsIgnoreCase(orderParam) ?
92	GrailsHibernateUtil.ORDER_DESC : GrailsHibernateUtil.ORDER_ASC;
93	int lastPropertyPos = sort.lastIndexOf('.');
94	String associationForOrdering = lastPropertyPos >= 0 ? sort.substring(0, lastPropertyPos) : null;
95	if (associationForOrdering != null && aliasMap.containsKey(associationForOrdering)) {
96	addOrder(criteria, aliasMap.get(associationForOrdering) + "." + sort.substring(lastPropertyPos + 1),
97	order, ignoreCase);
98	// remove sort from arguments map to exclude from default processing.
99	@SuppressWarnings("unchecked") Map argMap2 = new HashMap(argMap);
100	argMap2.remove(GrailsHibernateUtil.ARGUMENT_SORT);
101	argMap = argMap2;
102	}
103	}
104	GrailsHibernateUtil.populateArgumentsForCriteria(grailsApplication, targetClass, criteria, argMap);
105	GrailsHibernateTemplate ght = new GrailsHibernateTemplate(sessionFactory, grailsApplication);
106	PagedResultList pagedRes = new PagedResultList(ght, criteria);
107	result = pagedRes;
108	}
109	else {
110	result = criteria.list();
111	}
112	}
113	else {
114	result = GrailsHibernateUtil.unwrapIfProxy(criteria.uniqueResult());
115	}
116	if (!participate) {
117	hibernateSession.close();
118	}
119	return result;
120	}
121
122	if (criteria == null) createCriteriaInstance();
123
124	MetaMethod metaMethod = getMetaClass().getMetaMethod(name, args);
125	if (metaMethod != null) {
126	return metaMethod.invoke(this, args);
127	}
128
129	metaMethod = criteriaMetaClass.getMetaMethod(name, args);
130	if (metaMethod != null) {
131	return metaMethod.invoke(criteria, args);
132	}
133	metaMethod = criteriaMetaClass.getMetaMethod(GrailsClassUtils.getSetterName(name), args);
134	if (metaMethod != null) {
135	return metaMethod.invoke(criteria, args);
136	}
137
138	if (isAssociationQueryMethod(args) \|\| isAssociationQueryWithJoinSpecificationMethod(args)) {
139	final boolean hasMoreThanOneArg = args.length > 1;
140	Object callable = hasMoreThanOneArg ? args[1] : args[0];
141	int joinType = hasMoreThanOneArg ? (Integer)args[0] : CriteriaSpecification.INNER_JOIN;
142
143	if (name.equals(AND) \|\| name.equals(OR) \|\| name.equals(NOT)) {
144	if (criteria == null) {
145	throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here"));
146	}
147
148	logicalExpressionStack.add(new LogicalExpression(name));
149	invokeClosureNode(callable);
150
151	LogicalExpression logicalExpression = logicalExpressionStack.remove(logicalExpressionStack.size()-1);
152	addToCriteria(logicalExpression.toCriterion());
153
154	return name;
155	}
156
157	if (name.equals(PROJECTIONS) && args.length == 1 && (args[0] instanceof Closure)) {
158	if (criteria == null) {
159	throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here"));
160	}
161
162	projectionList = Projections.projectionList();
163	invokeClosureNode(callable);
164
165	if (projectionList != null && projectionList.getLength() > 0) {
166	criteria.setProjection(projectionList);
167	}
168
169	return name;
170	}
171
172	final PropertyDescriptor pd = BeanUtils.getPropertyDescriptor(targetClass, name);
173	if (pd != null && pd.getReadMethod() != null) {
174	ClassMetadata meta = sessionFactory.getClassMetadata(targetClass);
175	Type type = meta.getPropertyType(name);
176	if (type.isAssociationType()) {
177	String otherSideEntityName =
178	((AssociationType) type).getAssociatedEntityName((SessionFactoryImplementor) sessionFactory);
179	Class oldTargetClass = targetClass;
180	targetClass = sessionFactory.getClassMetadata(otherSideEntityName).getMappedClass(EntityMode.POJO);
181	if (targetClass.equals(oldTargetClass) && !hasMoreThanOneArg) {
182	joinType = CriteriaSpecification.LEFT_JOIN; // default to left join if joining on the same table
183	}
184	associationStack.add(name);
185	final String associationPath = getAssociationPath();
186	createAliasIfNeccessary(name, associationPath,joinType);
187	// the criteria within an association node are grouped with an implicit AND
188	logicalExpressionStack.add(new LogicalExpression(AND));
189	invokeClosureNode(callable);
190	aliasStack.remove(aliasStack.size() - 1);
191	if (!aliasInstanceStack.isEmpty()) {
192	aliasInstanceStack.remove(aliasInstanceStack.size() - 1);
193	}
194	LogicalExpression logicalExpression = logicalExpressionStack.remove(logicalExpressionStack.size()-1);
195	if (!logicalExpression.args.isEmpty()) {
196	addToCriteria(logicalExpression.toCriterion());
197	}
198	associationStack.remove(associationStack.size()-1);
199	targetClass = oldTargetClass;
200
201	return name;
202	}
203	}
204	}
205	else if (args.length == 1 && args[0] != null) {
206	if (criteria == null) {
207	throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here"));
208	}
209
210	Object value = args[0];
211	Criterion c = null;
212	if (name.equals(ID_EQUALS)) {
213	return eq("id", value);
214	}
215
216	if (name.equals(IS_NULL) \|\|
217	name.equals(IS_NOT_NULL) \|\|
218	name.equals(IS_EMPTY) \|\|
219	name.equals(IS_NOT_EMPTY)) {
220	if (!(value instanceof String)) {
221	throwRuntimeException(new IllegalArgumentException("call to [" + name + "] with value [" +
222	value + "] requires a String value."));
223	}
224	String propertyName = calculatePropertyName((String)value);
225	if (name.equals(IS_NULL)) {
226	c = Restrictions.isNull(propertyName);
227	}
228	else if (name.equals(IS_NOT_NULL)) {
229	c = Restrictions.isNotNull(propertyName);
230	}
231	else if (name.equals(IS_EMPTY)) {
232	c = Restrictions.isEmpty(propertyName);
233	}
234	else if (name.equals(IS_NOT_EMPTY)) {
235	c = Restrictions.isNotEmpty(propertyName);
236	}
237	}
238
239	if (c != null) {
240	return addToCriteria(c);
241	}
242	}
243
244	throw new MissingMethodException(name, getClass(), args);
245	}
246	// ...

I do really hope never to have to understand such code… I’d be curious to find how would such an automated test be written for this invokeMethod, as I couldn’t find the tests in this project.

What is wrong with this code?

Back to the original implementation, what would be wrong with such code?

It takes a lot of time for understand the code (the read cost);
It is hard to test;
Parsing the query is too much responsibility the SolrService class;
If you just need some way of indexing and searching results in stored documents, you shouldn’t be relying in a specific solution, like Solr. If you decide later to change your Search solution from Solr to Elastic Search or using Lucene directly, you’ll need to change a lot of code. It would be better to have a wrapper for something simple like this;
It can become hard to change/maintain/debug;

Even if you try to split processQuery into smaller methods, you would be required to pass some common values over and over again, like query and the expressions array, that would not only be an in-parameter but would be an out-parameter too as it would have to be changed inside some methods… When that happens, it is a hint that the overall code needs a separate class for doing the job. This is one of the simple rules I’ve learned in Clean Code.

The simple rules of Clean Code

The top-bottom code writing approach

While reading the original example, the first thing you’ll see is the processQuery method declared in the SolrService class. What does it do? Why do we need it? Who is using it? Only when we look forward, we’ll be able to detect that it is being used from the search method.

I was always used to write code that way, writing the least dependent methods first and the higher level ones as the latest ones. I guess I thought they should be declared first before they could be mentioned. Maybe that was true for some procedural languages I’ve started with before my first experience with OOP while reading a book about C++.

But in all OO languages I know about, it is ok to declare your methods in any order. Writing them top down makes it easier for another reader to understand your code because he/she will read your high-level instructions first.

Avoid more than 5 lines in a single method

Keeping your methods really small will make it easier to understand them and to write unit tests against them too. They’ll also be less error-prone.

Avoid methods with more than 2 or 3 parameters

Having lots of parameters in methods makes it really complicate to associate what is the meaning of each parameter. Looking at this code, could you understand what is the meaning of the last parameters?

1	request.setAction(ACTION.POST, true, true, 10, false)

You’d certainly have to checkout the API for AbstractUpdateRequest.

Avoid out-parameters

This is a typical example where you’d probably be better served by a separate class.

When you find out some situation where you’d like to return multiple values (I’m not talking about returning a single list, here) and you need some parameter for returning them (and out-parameter), you should reconsider if you’re taking the right path.

Also, you should really try to avoid modifying any parameter as debugging such code can be really frustrating.

Provide good names for your variables, classes and methods

This one is gold. Good names are essential for a good code reading experience. It can save you several hours trying to understand some snippet of code.

Take a look at the signature of the invokeMethod method in the Grails-Hibernate integration example code:

1	Object invokeMethod(String name, Object obj)

Wouldn’t it be easier to understand what it does if the signature was changed to this one?

1	Object invokeMethodWith(String methodName, Object methodArguments)
2
3	// code would look like (just supposing, I'm not sure):
4	criteria.invokeMethodWith("eq", [attributeName, expectedValue])

What does “obj” mean in the actual implementation? It could be anything with such generic description. Investing some time choosing good names for your methods and variables can save a lot of time from others trying to understand what the code does.

The result of applying such simple rules

Just by making use of those simple rules, you’ll be able to:

Easily read your code;
Easily write your unit tests;
Easily modify and evolve your logic;
Have a well-documented code;
Get rid of some otherwise hard-to-find bugs.

Some extra rules

Some rules I’ve being using for my entire life and I’m not sure if they are all documented in the Clean Code book or not. But I’d like to talk a bit about them too.

Don’t use deep-depth blocks for handling validation rules

I’ve seen pseudo-code like this, so many times:

1	declare square_root(number) {
2	if (number >= 0) {
3	do_real_calculations_with(number)
4	}
5	}

Often, there are even more validation rules inside each block and this style gets really hard to read. And, worse than that, it is only protecting the software from crashing or generating an unexpected exception, but it does not properly handle bad inputs (negative numbers).

Also, usually do_real_calculations_with(number) is written as pages of code in a way you won’t be able to see the enclosing brackets of the block in a single page. Take a look again at the Hibernate-Grails integration code to see if you can easily find out where the block beginning at “if (isCriteriaConstructionMethod(name, args)) {” ends.

Even when you don’t have to do anything if the necessary conditions are not met, I’d rather code this way:

1	declare square_root(number) {
2	if (number < 0) return // or raise "Taking the square root of negative numbers is not supported by this implementation"
3	do_real_calculations_with(number)
4	}

This is a real example found in PersistentManagerBase.java from the Tomcat project.

1	protected void processMaxIdleSwaps() {
2
3	if (!getState().isAvailable() \|\| maxIdleSwap < 0)
4	return;
5
6	Session sessions[] = findSessions();
7	long timeNow = System.currentTimeMillis();
8
9	// Swap out all sessions idle longer than maxIdleSwap
10	if (maxIdleSwap >= 0) {
11	for (int i = 0; i < sessions.length; i++) {
12	StandardSession session = (StandardSession) sessions[i];
13	synchronized (session) {
14	if (!session.isValid())
15	continue;
16	int timeIdle = // Truncate, do not round up
17	(int) ((timeNow - session.getThisAccessedTime()) / 1000L);
18	if (timeIdle > maxIdleSwap && timeIdle > minIdleSwap) {
19	if (session.accessCount != null &&
20	session.accessCount.get() > 0) {
21	// Session is currently being accessed - skip it
22	continue;
23	}
24	if (log.isDebugEnabled())
25	log.debug(sm.getString
26	("persistentManager.swapMaxIdle",
27	session.getIdInternal(),
28	Integer.valueOf(timeIdle)));
29	try {
30	swapOut(session);
31	} catch (IOException e) {
32	// This is logged in writeSession()
33	}
34	}
35	}
36	}
37	}
38	}

It is hard to see what bracket is closing which bracket in the end… This could be rewritten as:

1	...
2	if (maxIdleSwap < 0) return;
3	for (int i = 0; i < sessions.length; i++) {
4	...
5	if (timeIdle <= maxIdleSwap \|\| timeIdle < minIdleSwap) continue;
6	if (session.accessCount != null && session.accessCount.get() > 0) continue;
7	...

Simple code should be handled first

The pattern is:

1	if some_condition
2	lots of lines of complex code handling here
3	else
4	simple handling for the case where some_condition is false

Here is a concrete example taken from ActiveRecord::Explain.

1	def logging_query_plan # :nodoc:
2	threshold = auto_explain_threshold_in_seconds
3	current = Thread.current
4	if threshold && current[:available_queries_for_explain].nil?
5	begin
6	queries = current[:available_queries_for_explain] = []
7	start = Time.now
8	result = yield
9	logger.warn(exec_explain(queries)) if Time.now - start > threshold
10	result
11	ensure
12	current[:available_queries_for_explain] = nil
13	end
14	else
15	yield
16	end
17	end

I would rather write such code as:

1	def logging_query_plan # :nodoc:
2	threshold = auto_explain_threshold_in_seconds
3	current = Thread.current
4	return yield unless threshold && current[:available_queries_for_explain].nil?
5	queries = current[:available_queries_for_explain] = []
6	start = Time.now
7	result = yield
8	logger.warn(exec_explain(queries)) if Time.now - start > threshold
9	result
10	ensure
11	current[:available_queries_for_explain] = nil
12	end

Of course, this isn’t exactly the same as the original code in the case yield generates some exception for the “else” code, but I’m sure this could be worked around.

Don’t handle separate exceptions when you don’t need to

I’ve often found this pattern while reading Java code and I believe that is the result of using some Java IDE. The IDE will tell the developer that some exceptions were not handled and will automatically fill the code as:

1	void myMethod() throws MyOwnException {
2	try {
3	someMethod()
4	}
5	catch(FileNotFoundException ex) {
6	throw MyOwnException("File was not found")
7	}
8	catch(WrongPermissionException ex) {
9	throw MyOwnException("You don't have the right permission to write to the file")
10	}
11	catch(CorruptFileException ex) {
12	throw MyOwnException("The file is corrupted")
13	}
14	...
15	}

If you’re only interested in gracefully handle exceptions to give your user a better feedback, why doesn’t you just write this instead:

1	void myMethod() throws MyOwnException {
2	try {
3	someMethod()
4	} catch(Exception ex) {
5	log.error("Couldn't perform XYZ action", ex)
6	throw new MyOwnException("Sorry, couldn't perform XYZ action. Please contact our support team and we'll investigate this issue.")
7	}
8	}

The original challenge actual implementation

And, finally, following those techniques, here is how I actually coded that original challenge and implemented the tests in JUnit:

1	class SearchService {
2	...
3	def search(query) {
4	query = new QueryProcessor(query).processedQuery
5	...
6	new SearchResult(solrServer.request(new SolrQuery(query)))
7	}
8	}

I’ll omit the implementation of SearchResult class, as it is irrelevant to this specific challenge. I just want to point out that I’ve abstracted the search feature in some wrapper classes for not exposing Solr internals.

And here is the real implementation code:

1	package myappname.search
2
3	/* Solr behaves in an uncommon way:
4	Even when configured for making an "AND" search, when a signal (+ or -)
5	is prepended to any word, the ones that are not prepended are considered optionals.
6	We don't want that, so we're prefixing all terms with a "+" unless they're already
7	prefixed.
8	*/
9	class QueryProcessor {
10	private query, expressions = [], words = []
11
12	QueryProcessor(query) { this.query = query }
13
14	def getProcessedQuery() {
15	removeHashesFromQuery()
16	extractParenthesis()
17	extractQuotedText()
18	splitWords()
19	addPlusSignToUnsignedWords()
20	joinProcessedWords()
21	replaceExpressions()
22	query
23	}
24
25	private removeHashesFromQuery() { query = query.replaceAll('#', '') }
26
27	private extractParenthesis() {
28	def matches = query =~ /$[^\(]*?$/
29	if (!matches) return
30	replaceMatches(matches)
31	// keep trying in case of nested parenthesis
32	extractParenthesis()
33	}
34
35	private replaceMatches(matches) {
36	matches.each {
37	expressions << it
38	query = query.replace(it, "#{${expressions.size()}}".toString())
39	}
40	}
41
42	private extractQuotedText() {
43	replaceMatches(query =~ /\".*?\"/)
44	}
45
46	private splitWords() {
47	words = query.split(' ').findAll{it}
48	}
49
50	private addPlusSignToUnsignedWords() {
51	words = words.collect { word ->
52	word[0] in ['-', '+'] ? word : "+${word}"
53	}
54	}
55
56	private joinProcessedWords() { query = words.join(' ') }
57
58	private replaceExpressions() {
59	def s = expressions.size()
60	expressions.reverse().eachWithIndex { expression, i ->
61	query = query.replace("#{${s - i}}", expression)
62	}
63	}
64	}

And the unit tests:

1	package myappname.search
2
3	import org.junit.*
4
5	class QueryProcessorTests {
6	@Test
7	void removeHashesFromQuery() {
8	def p = new QueryProcessor('some#hashes # in # query')
9	p.removeHashesFromQuery()
10	assert p.query == 'somehashes in query'
11	}
12
13	@Test
14	void extractParenthesis() {
15	def p = new QueryProcessor('(abc (cde fgh)) no parenthesis transaction_id:(ijk) (lmn)')
16	p.extractParenthesis()
17	assert p.query == '#{4} no parenthesis transaction_id:#{2} #{3}'
18	assert p.expressions == ['(cde fgh)', '(ijk)', '(lmn)', '(abc #{1})']
19	}
20
21	@Test
22	void extractQuotedText() {
23	def p = new QueryProcessor('some "quoted" text and "some more"')
24	p.extractQuotedText()
25	assert p.query == 'some #{1} text and #{2}'
26	assert p.expressions == ['"quoted"', '"some more"']
27	}
28
29	@Test
30	void splitWords() {
31	def p = new QueryProcessor('some #{1} text and id:#{2} ')
32	p.splitWords()
33	assert p.words == ['some', '#{1}', 'text', 'and', 'id:#{2}']
34	}
35
36	@Test
37	void addPlusSignToUnsignedWords() {
38	def p = new QueryProcessor('some #{1} -text and id:#{2} +text ')
39	p.splitWords()
40	p.addPlusSignToUnsignedWords()
41	assert p.words == ['+some', '+#{1}', '-text', '+and', '+id:#{2}', '+text']
42	}
43
44	@Test
45	void joinProcessedWords() {
46	def p = new QueryProcessor('')
47	p.words = ['+some', '-minus', '+#{1}']
48	p.joinProcessedWords()
49	assert p.query == "+some -minus +#{1}"
50	}
51
52	@Test
53	void replaceExpressions() {
54	def p = new QueryProcessor('+#{1} -minus +transaction_id:#{2}')
55	p.expressions = ['first', '(23 or 98)']
56	p.replaceExpressions()
57	assert p.query == '+first -minus +transaction_id:(23 or 98)'
58	}
59
60	@Test
61	void processedQuery() {
62	def p = new QueryProcessor('coca-cola -pepsi transaction_id:(34 or 76)')
63	assert p.processedQuery == '+coca-cola -pepsi +transaction_id:(34 or 76)'
64	}
65	}

Conclusion

That is it. I’d like you to share your opinions on other techniques I may have not talked about here. Are there any improvements that you think would make this code even easier to understand? I’d really appreciate any other considerations you might have since I’m always very interested in writing Clean Code.

Facebook, Twitter, Google+ - something is still missing

2017-10-02T08:00:00+00:00

What is missing in all those social networking media? Facebook, Orkut, Twitter, Google+?

Twitter

I’ll start with specific issues for Twitter and then I’ll discuss the major general issue none of them have managed to fix yet.

Being able to fit an idea in just a few chars means it can’t be significant. Yet there are lots of people trying to express their political opinions on Twitter as if had any meaningful value.

Twitter could be an useful platform if was basic a set of article titles followed by a link. Something like Reddit, but instead of subscribing to topics (subreddit) one would subscribe to some people’s suggested articles. I could certainly use Twitter if that was how it worked.

The big issue with all of them: lack of proper filtering by tag

It seems all social media didn’t realize yet that people are interested in multiple subjects, but not in all of them.

David Heinemeier Hansson (DHH) seems to be interested in Ruby, Rails, programming and car racing, for example. David Chelimsky seems to be interested in Ruby and Choro (a Brazilian music genre).

Maybe both Davids would be interested in listening to each other’s opinion on Ruby and programming, but I’d suspect Chelimsky wouldn’t be interested in what DHH has to say about car racing as much as DHH is probably not interested in videos from Chelimsky playing the cavaco (a Brazilian instrument typically used in Choro and Samba).

These days everyone has a strong opinion on many political topics, such as liberalism/communism, feminism, left/right, immigration, abortion, religion or whatever trending subject. We’re all specialists in everything and we get angry when our friends expresses themselves with an opposite point of view. Sometimes that’s enough for completely breaking the relationship.

This leads to a really toxic environment, since people are not interested in arguing at all. They already have an strong opinion and they think they will be able to change other’s opinion with their arguments but that never happens in practice. All they get is an hostile environment.

Just like David Chelimsky, I do also love Choro and Samba and several of my friends are related to those genres and we often meet each other to play Choro or Samba. That’s how we met in the first place. Then I connected to them in Facebook and that’s when certain problems arise.

Several of them are big supporters of Lula, Brazilian’s president between 2002 and 2010, while I never supported him. I always found him to be a liar and corrupt and have always expressed this way in Facebook. As a result I lost some of those friends that didn’t tolerate my opinions on politics. On the other side we never had any kind of problems when playing together in a Choro or Samba session.

Social medias should be able to understand how toxic an environment could become if we don’t filter what we’re going to say to other people. Or they seem to get it in the opposite way of how I think things should work.

Facebook and Google+ allows one to group their connections. So you’re able to group them by topic like Ruby, Programming, Choro, Politics and so on. That could possibly fix the issue, but there’s a problem. How can you possibly know who would be actually interested in what you have to say regarding each topic. It could be a wild guess.

It should work the other way around. Whenever publishing something we would tag the subject(s) of the post from a list of tags we maintain. So, Chelimsky would be able to see that DHH provides 2 tags: #racing and #programming. He might choose to subscribe to #programming. By doing that he wouldn’t see in his timeline any posts by DHH related to racing or any other general subject except for programming related ones. DHH on the other side would be able to subscribe to the Ruby tag from Chelimsky and wouldn’t see videos of Choro sessions from Chelimsky in his timeline. Since I’m both interested in Ruby and Choro I might not filter Chelimsky’s posts at all.

Sometimes the filter works the other way around. Rather than willing to filter in specific topics we’d want to be able to filter out some tags. Maybe I’m interested in all activities from my friends except for their opinion on politics. So it would be quite useful if we could “flag” what we’d consider SPAM basically. I know some people who love to post comic/fun posts. I don’t usually get the fun out of it, so if they tagged such posts as #joke I could opt to filter out any post tagged that way.

The lack of such tag subscription/filtering mechanism leads to a very toxic environment with lots of unnecessary anger and a very polluted timeline. The result being a loss of interest in social media as it wastes a lot of our time, while providing very little value. There are so many jokes or political discussions that we often miss what our friends are actually doing.

And the existing social media networks get it very close as they allow people to tag their posts. But they don’t allow us to actually see filtered content only. Once they fix this missing bit I think it will make all the difference.

Testing JavaScript with Node.js, Jasmine and Sinon.js

2011-11-03T13:27:00+00:00

For some years now, I’ve been writing lots of JavaScript. Not that I chose to, but it is the only available language for client-side programming. Well, not really since there are some languages that will compile to JavaScript. So, I chose to work with CoffeeScript lately, since it is far better than JavaScript for my tastes.

All this client-side programming requires testing too. While sometimes testing using real browsers suits better, tools like Selenium are extremely slow if you have tons of JavaScript to test. So, I was looking for a faster alternative that allowed me to test my client-side code.

Before I present the approach I decided to take, I’d like to warn you that there are lots of good alternatives out there. If you want to take a look at how to use the excellent PhantomJS headless webkit browser, you might be interested in this article.

I decided to go with a solution based on Node.js, a fast runtime JavaScript environment built on top of Google’s V8 engine. Even using Node.js, you’ll find out many good alternatives like Zombie.js, which can also be integrated to the excellent integration test framework Capybara through capybara-zombie. It can also be integrated to Jasmine through zombie-jasmine-spike.

Even though there are great options out there, I still chose another approach for no special reason. The interesting thing about Node.js, is that there’s an interesting ecosystem behind it with tools like NPM which is a package manager for Node, similar to apt on Debian, for instance. On Debian, it can be installed with:

1	apt-get install -y node npm

But I would recommend installing just node through apt, and install npm using the instructions here:

1	curl http://npmjs.org/install.sh \| sh

The reason for that is that the search command of the npm command provided by the Debian package was not working for me, running the list command instead. Maybe this happens only in the unstable distribution, but I don’t want to get out of the main subject here.

Since we want to test our client-side script, it is necessary to install some library to emulate the browsers DOM, since Node won’t provide one itself. The jsdom library seems to be the de facto standard one for creating a DOM environment.

I don’t really like to read assertions, prefering expectations instead. If you’re like me, you’ll like the Jasmine.js library for writing your expectations in JavaScript. If you don’t want to write integration tests, chances are that you’ll need to mock your AJAX calls. Sinon.js is an excellent framework that will allow you to do that. And since I avoid JavaScript itself at all cost, I’ll write all my examples using CoffeeScript.

If your web framework, differently from Rails doesn’t support CoffeeScript by default, and still you got an interest on this language, you can use Jitter to watch your CoffeeScript files and convert them to JavaScript on the fly. It will replicate your directory structure, converting all your .coffee files to .js:

1	jitter src/coffee/ web-app/js/

Install all those dependencies with NPM:

1	npm install jitter jasmine-node jsdom

Although you can install jQuery and Sinon.js with ‘npm install jquery sinon’, that won’t make sense, since you’ll want to load them from your DOM environment. So download Sinon.js to your hard-disk to get faster tests.

I don’t practice TDD (or BDD) and I this is a conscious choice. I find it faster to write the implementation first and then write the tests. So, proceeding with this approach, let me show you an example for a “Terms and Conditions” page. Here’s a possible implementation (I’m showing only the client-side part):

1
2
3
4
5	rel="stylesheet" type="text/css" href="css/jquery.ui.css">
6
7
8
9
10
11
12

Showdown is a JS library for converting Markdown to HTML. Here is the show-terms-and-conditions.coffee equivalent in CoffeeScript:

1	$ ->
2	converter = new Attacklab.showdown.converter()
3	lastTermsAndConditions = {}
4	$.get 'termsAndConditions/lastTermsAndConditions', (data) ->
5	lastTermsAndConditions = data
6	$(' ').html(converter.makeHtml(lastTermsAndConditions.termsAndConditions))
7	.dialog
8	width: 800, height: 600, modal: true, buttons:
9	'I agree': onAgreement, 'Log out': onLogout
10
11	onAgreement = ->
12	$.post 'termsAndConditions/agree', id: lastTermsAndConditions.id, =>
13	$(this).dialog('close')
14	window.location = '../' # redirect to home
15
16	onLogout = ->
17	$(this).dialog('close')
18	window.location = '../logout' # sign out

As you can see, this will issue an AJAX request as soon as the page is loaded. So, we need to fake the AJAX call before we run show-terms-and-conditions.js. This can be easily done with this fake-ajax.js, using Sinon.js:

1	sinon.stub($, 'ajax')

If you’re not using jQuery, you can try the “sinon.useFakeXMLHttpRequest()” documented in the “Fake XHR” example in Sinon.js site.

Ok, so here is a possible example of specification for this code in CoffeeScript. Jasmine-sinon can help you to write better expectations, so download it to ‘spec/js/jasmine-sinon.js’.

1	# spec/js/show-terms-and-conditions.spec.coffee:
2
3	require './jasmine-sinon' # wouldn't you love if vanilla JavaScript also supported 'require'?
4	dom = require 'jsdom'
5
6	#f = (fn) -> __dirname + '/../../web-app/js/' + fn # if you prefer to be more explicit
7	f = (fn) -> '../../web-app/js/' + fn
8
9	window = $ = null
10
11	dom.env
12	html: '' # or require('fs').readFileSync("#{__dirname}/spec/fixures/any.html").toString()
13	scripts: ['sinon.js', f('jquery/jquery.min.js'), f('jquery/jquery-ui.min.js'), f('wmd/showdown.js'), 'ajax-faker.js',
14	f('showTermsAndConditions.js')]
15	# src: ["console.log('all scripts were loaded')", "var loaded=true"]
16	done: (errors, _window) ->
17	console.log("errors:", errors) if errors
18	window = _window
19	$ = window.$
20	# jasmine.asyncSpecDone() if window.loaded
21
22	# We must tell Jasmine to wait until the DOM is loaded and the script is run
23	# Jasmine doesn't support a beforeAll, like RSpec
24	beforeEach(-> waitsFor -> $) unless $
25	# another approach: (you should uncomment the line above for it to work)
26	# already_run = false
27	# beforeEach -> already_run \|\|= jasmine.asyncSpecWait() or true
28
29	describe 'showing Terms and Conditions', ->
30
31	it 'should get last Terms and Conditions', ->
32	@after -> $.ajax.restore() # undo the stubbed ajax call introduced by fake-ajax.js after this example.
33	expect($.ajax).toHaveBeenCalledOnce()
34	firstAjaxCallArgs = $.ajax.getCall(0).args[0]
35	expect(firstAjaxCallArgs.url).toEqual 'termsAndConditions/lastTermsAndConditions'
36	firstAjaxCallArgs.success id: 1, termsAndConditions: '# title'
37
38	describe 'after set-up', ->
39	beforeEach -> window.sinon.stub $, 'ajax'
40	afterEach -> $.ajax.restore()
41	afterEach -> $('.ui-dialog').dialog 'open' # it is usually closed at the end of each example
42
43	it 'should convert markdown to HTML', -> expect($('h1').text()).toEqual 'title'
44
45	it 'should close the dialog, send a request to server and redirect to ../ when the terms are accepted', ->
46	$('button:contains(I agree)').click()
47	ajaxRequestArgs = $.ajax.args[0][0]
48	expect(ajaxRequestArgs.url).toEqual 'termsAndConditions/agree'
49	expect(ajaxRequestArgs.data).toEqual id: 1
50
51	ajaxRequestArgs.success()
52	expect(window.location).toEqual '../'
53	expect($('.ui-dialog:visible').length).toEqual 0
54
55	it 'should close the dialog and redirect to ../logout when the terms are not accepted', ->
56	# the page wasn't really redirected in this simulation by the prior example
57	$('button:contains(Log out)').click()
58	expect(window.location).toEqual '../logout'
59	expect($('.ui-dialog:visible').length).toEqual 0

You can run this spec with:

1	jasmine-node --coffee spec/js/

The output should be something like:

1	Started
2	....
3
4	Finished in 0.174 seconds
5	2 tests, 9 assertions, 0 failures

Instead of writing “expect($(‘.ui-dialog:visible’).length).toEqual 0”, BDD would advice you to write “expect($(‘.ui-dialog’)).toBeVisible()” instead. Jasmine allows you to write custom matchers. Take a look at my jQuery matchers for an example.

Unfortunately, due to a bug in jsdom, the expected implementations of toBeVisible and toBeHidden won’t work for my cases, where I usually do that by toggling the hidden CSS class (.hidden {display: none}) of my elements. So, I check for this CSS class on my jQuery matchers.

Anyway, I’m just starting to write tests this way. Maybe there are better ways of writing tests like those.

Finally, if you want, you can also set up some auto-testing environment using a tool such as Guard that will watch your JavaScript (or CoffeeScript) files for changes and call jasmine-node on them. Here is an example Guardfile:

1	guard 'jasmine-node', jasmine_node_bin: File.expand_path("#{ENV['HOME']}/node_modules/jasmine-node/bin/jasmine-node") do
2	watch(%r{^(spec/js/[^\.].+\.spec\.coffee)}) { \|m\| m[1] }
3	watch('spec/js/jasmine-sinon.js'){ 'spec/js/' }
4	end

If you have any tips, please leave a comment.

Enjoy!

Adding parts of a modified file to git stage

2011-08-13T15:35:00+00:00

Have you always wanted to add just part of your modified file to the index stage?

Usually, that happens when you’re working in a feature or bug and then realizes another issue in the file. It could be another bug, interesting feature, documentation, comment or just code formatting.

If you’re like me, you won’t include both modifications into a single commit. Then what to do?

What I used to do when I realized this before actually fixing that bug was calling “git stash”, fix the bug and “git stash pop”. This works well for simple fixes, if you didn’t change your database, so that the application will continue to work after “git stash”.

But what if you have already fixed the code? You could undo the fix, save the file, add it to index, and then redo the fix. Believe me, I’ve done that several times.

But I won’t do it anymore! Don’t worry, I’ll keep my commits separate. It’s just that I found a better way of doing this: “git add -e” (or “git add -p” and choosing the “e” option). Go try it if you don’t know this already. Much easier to try it than to try to explain it! ;) Also “git help add” will explain it better than me. See EDITING PATCHES section.

Why I Prefer Rails over Grails

2011-08-07T21:20:00+00:00

I’ve been willing to write such an article for 2 years now. A recent thread in Grails users mailing list triggered the initiative to finally write it. Actually, I was replying a message but it became too big and I decided to take the chance to write an article on the subject.

Should I use Grails?

That was the thread subject. And the text following is my answer.

I’ve been working with Grails for more than 2 years now. Before that, I learned Rails in 2007 and like it. I didn’t move to Grails because I love Grails though.

I moved because I changed my job and Grails was used in the new job. I’ve changed my job again last month, initially to work with Rails but then, when they found out that I also knew Groovy and Grails, they decided to offer me another Grails opportunity.

So here I am, working with Grails for probably more two years at least I would guess… Since 2007, I never stopped watching Rails or Ruby closely, so I think I’m pretty able to compare both.

Then, I would say that choosing between them will depend on what you want to achieve. If you want to run your application in a Java web container, maybe Grails is the way to go. I’ve never deployed a Rails application with JRuby and Warbler, so I’m just guessing.

If you just want to be able to integrate your web application to your legacy Java code, than both Groovy and JRuby will allow you to do that easily. Differently from Groovy, though, JRuby will allow you to “require” jar’s at run-time easily. But maybe Grails has better integration with Maven. Again, I say maybe because I never tried to do that with the JRuby + Warbler approach besides really simple experiments.

If you just want to write web applications, than you’re in the situation as me and I can help you more on that.

Let me explain to you what are the reasons I prefer Rails myself and what I don’t like in Grails. I invite all Grails community to participate in this discussion and help alleviate the shortcomings perceived by me about Grails.

Testing

I don’t know if that is your case, but I don’t even consider writing a new application without a good test coverage. Unfortunately I was not given the opportunity to do that yet because the companies I worked with didn’t want to give me time for writing the tests.

Unfortunately, this seems to be a common approach in Grails community as most of the plugins I used didn’t have test coverage, so I guess my companies were not alone. In the other side, it is a strong practice of Rubysts to write tests for their code, including most plugins available. Also, the Rails code base itself has a great test coverage. In the other side I’ve experienced some bugs in Grails like runtime dependencies added to BuildConfig.groovy not being included in the war in previous releases which suggests me that its test coverage is not comparable with the Rails' one.

Then, if you search for books written entirely about tests for Rails, you’ll find lots of them:

Also, testing uses to be one of the first chapters in almost Rails book, reflecting the importance that Ruby and Rails users give to automated testing.

Also, there are tons of projects dedicated to some part of test creation for Ruby:

In the other side, I didn’t find a single book specialized in testing Grails applications. I’ve only seen a single small chapter about testing in Grails in some Grails books. Also, there are lots of great articles and tutorials on Rails testing while I can’t find good resources on Grails testing.

Since I prefer specifications over assertions, I started to write some tests for Grails with EasyB. But its documentation and features can’t be compared with the Rspec one. Also, I don’t find so many alternatives in the Groovy world yet. I have some problems with EasyB, but it was the best I could find and that’s what I’ve being using for testing Groovy and Grails code.

Also, while I can write unit tests for Rails that can actually touch the database, this is not possible with Grails. Grails will force me to use mocks in unit tests. But if part of the logic involves direct queries to the database, which is almost always my situation, then I’m forced to use integration tests for all my tests which, added to the slow boot time for Grails applications, make test writing a very slow task. Also, writing an integration test when actually I want to unit test my class just because of a Grails limitation doesn’t seem right for me.

Documentation

Grails documentation is usually sparse with references to Hibernate’s documentation, Spring’s documentation, Shiro’s documentation etc. While I agree that using existent libraries is a good thing, I also like to see a well organized and comprehensive documentation instead of jumping between several sites, each one using a different documentation organization and style. Specially when most of them are crappy for my taste.

In the other side, I usually find great documentation for Rails and its several available plugins with concise information showing how to use them in a glance.

Speed of development

Class automatic reloading

This seems to be changing in Grails 2.0, but for the last 2 years I’ve had enormous trouble writing Grails application because every change I make in my domain classes (which I do often), Grails will restart my application, loosing any session and spending a lot of time in the rebooting process. This really slows down the development time. This also happens to classes under src/ while doesn’t happen to controllers and GSP’s.

Necessary time for booting

Compare the time of booting a fresh Grails application with booting a Rails one. Rails will make the application available barely instantly. This becomes more annoying when Grails will insist in rebooting after changing some classes and while the application gets bigger or when you do lots of processing in Bootstrap. In Rails, this is super fast in development mode because of the Ruby autoload feature that will allow you to lazily evaluate your classes.

Language API and features

Groovy API is based in Java API, which was badly designed in my opinion. Ruby, differently from Java, will have Date, DateTime and Time classes, for instance. Java, in the other side, has java.util.Date and java.sql.TimeStamp, etc. I’ve seen people arguing that it’s because Ruby is much newer, but actually both languages were born in 1995.

The Ruby API is also very well written in my opinion and has also great documentation. Everything fits great in Ruby while Groovy tries to make some methods simpler adding methods to standard Java classes but still it is built on top of Java’s API, which means it couldn’t be as well integrated and well-thought as one that was built specifically considering the language features from the beginning.

With regards to the language itself, I really prefer the Ruby way of monkey-patching (reopening classes) and its way of writing meta-programming. Specially, I love Ruby modules and the concept of mixins (instead of supporting multiple inheritance), while I don’t think there’s something like that in Groovy.

Also, I don’t understand why Groovy created a new syntax (“”“ - triple quotes) for multi-line strings instead of allowing multi-line strings using single quotes just like Ruby. On the other hand, I don’t like the fact that Ruby doesn’t support multi-line comment like most languages (no, don’t tell me that =begin and =end were really intended to be used as multi-line comments).

Dependency management

Ruby had RubyGems for a long time for managing dependencies and easily install gems (libraries, programs). There’s a huge repository of Ruby gems. Java has Maven, but Maven doesn’t allow you to specify “hibernate > 3.6”. You need to be specific.

And then, Maven will try to solve conflicts if you need a dependency that depends in Hibernate 3.6.5 and another one that depends on Hibernate 3.6.6. And Maven will not always be able to solve this dependency well.

In Ruby, suppose one gem depends on “hibernate >= 3.6” and another one depends on “hibernate = 3.6.6”. Then RubyGems will be able to choose hibernate 3.6.6. But what if your application depends on latest gem version? Than you don’t specify the version and it will fetch the last one. Then, say that some time has passed and another developer needs to replicate the dependencies. It wouldn’t be so uncommon that the newest version of one of the dependencies is not compatible anymore with that one used when the application was first developed. For solving this specific problem Rails had a rake task (rake rails:freeze) in its early times that would copy the gems to a vendor folder so that the application could be easily deployed anywhere. But that wasn’t a really good solution and then, some years ago, Yehuda Katz released Bundler, which solved this problem by writing a file that recorded all gem versions used in last “bundle” command which allowed that configuration to be replicated anytime without vendoring all gems.

Bundler is a great tool and all Rails application starting from Rails 3.0 use it for managing dependencies. I don’t know a similar handy project for Groovy.

Mountable applications

The next version of Rails (3.1.0), soon to be released, will allow mounting some applications in certain paths that could interact with the main app. I guess Django supported this for a longer time, but Grails won’t support this feature in 2.0 as far as I know. This is also a great feature.

Memory usage

Unless JRuby is being used, you don’t need to previously allocate memory to your application. The memory will increase as it needs more memory. That means you can run lots of Rails application in the same time in your development environment without being concerned about limiting their memory before running the application. That usually means you have more free available RAM.

Database evolution

My first web applications were written in Perl about 15 years ago or more. While at Electrical Engineering college I didn’t have lots of web development spending most of my developing time with C and C++, working in embedded and real-time systems.

In 2007, I was back to web development and needed to update my knowledge. When I looked for web frameworks, I was evaluating mostly TurboGears, Django and Rails, after discarding MS .NET and Java-based ones. I didn’t know Ruby nor Python at that time so I wasn’t biased against any of them. The argument that I really bought while choosing Rails over the other alternatives was the database evolution approach. If I remember correctly, both TurboGears and Django used the same approach used by Grails. You write your domain classes and then generate the database tables based on these classes attributes. I didn’t like this approach at all because I was really concerned about database evolution. In the other hand, Rails supported database migrations and the model classes attributes didn’t have to be replicated since they would be dynamically fetched from the mapped database table at run-time during Rails initializatin. I really prefer this approach but database migrations only seems to be supported by the Grails framework itself in Grails 2.0, which wasn’t released yet by the time I’m writing this.

For a long time we used to “dbCreate=update” in DataSource.groovy and that is simple not maintainable. I hope Grails 2.0 will teach developers best practices like those used by Rails since always.

Framework API

Regarding the framework API itself, I really prefer the Rails API. There are lots of useful DSLs, that I don’t find in Grails, specially for defining hooks like before_save, after_save, before_validation, etc. You can specify these hooks in many useful ways and calling them multiple times. Also, instead of static variables you have a declarative DSL for defining associations like has_many, belongs_to, etc. I also always found odd that Grails used closures instead of methods for controller’s actions, although this seems to have changed to better in next to be released Grails. Also, I like the fact that Rails generators will create controllers inherited from ApplicationController by default, which means you can add methods to the ApplicationController class if you want to add them to all controllers.

Also, Rails allow me to specify which layout to apply directly in the controller instead of in ERB (GSP equivalent). Also, I don’t need to write boilerplate code in my views like in GSPs.

Vim support

I’m a Vim user and Vim support for Groovy indentation and code highlighting is terrible. In the other side, there’s good support for the Ruby language and the Rails framework.

Concerns about good default

Rails has always been worried for offering good default for web applications. This is specially true with security concerns. All text will be sanitized inside “<%= … %>” blocks unless explicitly said not to do that. In Grails you can do that, but that is not set by default and will only work with the “${…}” style, which can’t be always used as my long experience with Grails has showed. I’m not sure when they’re not allowed through because it never made sense to me… :( But it seems the problem is using this syntax in a nested context like “${[something, "abc: ${2 * someValue}”].join(‘
’)}“ but I don’t remember exactly.

Interactive console and tab-completion

Another time-saving while writing Rails applications is that auto-complete works in the interactive console (irb) and the “delete” key works as expected in Linux, differently from “groovysh”. I’ve also opened an issue in JIRA presenting a patch to Jline to fix this annoyance that was also present with JRuby at that time. JRuby fixed the problem but groovysh still doesn’t behaves correctly with regards to the “delete” key.

The tab-completion will be also available while debugging a Ruby application using the ruby-debugger gem for instance. And I can even debug Ruby applications in Vim, my favorite editor. :)

Hard to debug errors

Errors in GSP’s will display unrelated lines. Also, the stack-trace is so big when errors happen, as usual in Java applications, that a friend called them MonsterExceptions.

Both of them were said to be fixed for Grails 2.0 but I didn’t test it yet.

Rails errors on the other hand are very precise and easy to find the source of the error.

New code - old behavior

I remember that one of the oddest behavior I experienced while first learning Grails was that after fixing some piece of code that bug persisted and some while later it worked. It was the first time in my life as a programmer that I’ve seen such behavior. In Rails, when you change some code, the change will be in effect immediately or it won’t make effect at all until you restart your application depending on what you’re modifying. But since Java didn’t support listening to file-system events asynchronously until the recent Java 7, Java applications use to implement file-change monitoring using the poller method. So, it may take a while before your changes make effect and you’ll never know if the file was already recompiled or not.

Final words

Actually, I was expecting to write a more detailed article some years ago with more concrete examples but that would take some time and that’s the reason why I didn’t write it before. But, as I was replying the message by e-mail, the answer was becoming so big that I decided to write such an article even if it’s not the way I would like it to be. I hope I get some time in the future to polish it. Also, as I get some feedback from Groovy and Grails users and after Grails 2.0 is finally released, I intend to update this article to reflect the changes and any possible mistake that I could have made, as soon as I get some time.

So, sorry for the unpolished article, but that’s what I can currently write. I hope it can be useful anyway. So, good luck in your framework decision, whatever it be!

How to write maintainable code?

2011-08-06T08:37:00+00:00

I have been willing to write such an article for a long time and finally found some inspiration and time for doing it.

Working software does not suffice

No software is finished. Even Vi, which was created in 1976, is not finished. If no one is working in some software anymore it just means it is not being maintained or has been replaced by another one. That means your code will be changed or entirely replaced.

Unless you’re expecting your software to be replaced soon, you should consider writing maintainable code. It’s very important to your code to be readable and maintainable because most of the time developers will spend reading it. So, while knowing well your editor is important, you should consider spending more time refactoring your code to make it more readable than finding new ways of writing efficiently using your editor because a clean source code will save you much more time than any editor key mapping. But also, a good editor/IDE will also help you to refactor your code.

Continuous refactoring

You should really apply the good advices present in all books about Agile Software Developing since it is the only way I know of writing software that actually works in the real world. I’ll not talk about Agile in this article, since it is out of the scope and there are also great books out there about this subject. I’m assuming that the reader is familiar with the subject though for better understanding this article. Here are the software writing guidelines that I’m talking about, although I won’t explain the reasoning behind them, as they are a bit long and all books and articles on the subject will explain them:

Don’t write code for the future
Write automated tests that cover your requirements
Continuous refactoring (will be extended in this topic)
Continuous integration
Continuous delivering

Since you’ll be writing code for today’s usage, sometime you’ll face the situation where you need to write a new feature that shares lots of implementation details of a prior feature. You shouldn’t be copying and pasting code from the prior feature. This seems obvious but if I didn’t often find code written that way I wouldn’t be talking about this. WARNING: whenever you find yourself copying and pasting some code, even for different projects, you should think twice. Most probably you should separate the common part in another method, class or library. Some languages will require some boilerplate code, but make sure you’re copying and pasting only the necessary boilerplate if that’s your case.

The commonest reason why developers don’t rafactor their code is because they’re afraid of breaking some critical production system. This is often related to the lack of a good suite of automated tests. Specially if your application is a critical production system, it should be covered by tests. The more you copy and paste code, the harder it will be to evolve the code base and understand it.

The same bug will also happen in multiple places in the source code and even if you fix it in some part of the code, the bug will show up again on Friday, 5pm, and you’ll have to cancel your weekend planned schedule to work hard to find a hidden bug that was already fixed in other part of the code but you don’t know that because you were not the one that fixed it. And people will be asking you why does it take so long for fixing the application under production in the most critical time where it shouldn’t really fail while presenting it to a big potential client corporation!

Automated test writing

TODO: talk about test simplicity, coverage, documentation tool and careless about TDD or testing after. TODO: talk about test priorizing. TODO: talk about mocks and importance of speed and isolation of concerns

Keep your code minimal

You should really keep your code minimal to be polite with the other developers that will work in your code some time later. Maybe that developer will be you again. Having small methods, classes and files will help reading the code without the need of scrolling the text. Also, some editors like Vim allow you to display multiple source files at the same time. Having small methods will help you to understand the overall code.

Naming

TODO: talk about spending time thinking in good names

Avoid comments

TODO: talk about how comments can be avoided with clean code

Choose a good language if possible

TODO: Compare C++ and Java to dynamic languages like Ruby, Python or Groovy TODO: talk about tradeoffs and performance concerns vs development speed TODO: talk about legacy Java code and JRuby, Groovy, Scala, Clojure and JPython. TODO: also talk about network-based API integration

Adopt great frameworks and libraries

TODO

Upgrade often

TODO

Keep It Super Simple - the KISS principle

TODO: Avoid uncommon solutions and complicated architectures

Avoid proprietary or language-specific solutions

TODO: Give preference to common network based APIs

Understand the Single Responsibility Principle (SRP)

TODO: you can apply or not but it’s important to understand it

Don’t bother too much about the Open/Closed Principle (OCP)

TODO: explain differences between writing end-software and libraries and talk about tests here

Take decisions by yourself (avoid just following well-stablished patterns)

TODO: talk about Java, setters/getters, private/protected/public, interfaces and its abuse

Use dependency-resolving tools

TODO

Use the best VCS tool you can find

TODO: and invest time learning it

Coding style examples

Early interruption pattern (or handle exceptions first)

TODO: return if exceptional_case

Don’t handle exceptions at all if possible

TODO: talk about the try-catch approach and the type of applications (libraries, unsaved data) as well as about tests.

Don’t catch each exception for general algorithm

TODO

Avoid deeply nested constructions

TODO: talk about nested if’s, while’s and alternatives like catch-throw

Sort method caveat

TODO: talk about <=> and how to deal with its lack in some languages. Sort should return -1, 0 or 1. TODO: talk about wrong usage of sort for getting max and min.

Switch-case and handling by hashes (or maps)

TODO

Security concerns

Mass-assignment

TODO

Multi-threading

TODO

Java specifics

TODO: talk about synchronized methods

Performance and Scalability

TODO: talk about language vs architecture, and concerning before due time or without benchmark/profiling.

TODO: talk about simple web APIs and queuing systems for integrating applications in possibly different languages. TODO: Avoid writing language or vendor specific solutions

Memory leak

TODO: It does happen in Java. Talk about unbounded in-memory cache.

The Danger in Software Customization

2011-07-07T23:40:00+00:00

Several years ago, one of the partners of a company I worked for, commented the following with me:

In USA, people tend to adapt their work-flow to the software they use. In Brazil, we always want to adapt the software to our needs.

He was actually complaining about the way we Brazilians behave regarding software adaptation vs software customization. And I mostly agree with him that we Brazilians really do that while there are several situations where it would be much simpler (meaning less cost) if people could just adapt themselves to the way the software already works. I can’t speak for people in USA though, since I don’t really know them enough! :)

Martin Fowler also wrote great articles on the subject.

In this article, I’ll present my thoughts on what this means for those working on this customizations.

Jason Fried and David Heinemeier Hansson, founders of 37signals, also talked about this subject in their (recommended) book Rework:

Build half a product, not a half-assed product

Some ideas are really great and powerful.

But if things go wrong they can become a disaster!

The authors greatly summarized the problem in the topic “Let your customers outgrow you”:

…There’s a customer that’s paying a company a lot of money. The company tries to please that customer in any way possible… Then one day that big customer winds up leaving and the company is left holding the bag - and the bag is a product that’s ideally suited to someone who’s not there anymore. And now it’s a bad fit for everyone else.

They also give great advices like “Say no by default”.

This is really important because it is so easy to say yes as it is ineffective. A good manager is that one that is able to understand a “no” suggestion from developers and convince their client to accept the “no” too.

There are lots of situations where we just want some change because we don’t want to change ourselves. Think in about how much developers still use a centralized version control system like Subversion or CVS just because they don’t want to learn how distributed VCS like Git and Mercurial work. Think how much they lose by choosing to not change their minds.

The decisions around customization should be well thought. Not all customizations are worth. If you have a product that is shared among some clients, all of them having some different requirements, you shouldn’t agree delivering every requested feature.

As a rule of thumb, I would ask myself some simple questions:

Is that change something that could be implemented as some sort of a plug-in?
Is that change really useful, including for the client who is asking for it?
Will the other clients use that feature?
Is it easy to isolate it and add some software switch for enabling/disabling that feature?

If any of these questions can be answered affirmatively, then it probably worths implement the feature. Of course, the effort/cost vs goodies should be measured as well.

In the other side, you should probably say “no” to your clients if the change implies in either:

There will be a substantial change to your system, increasing the risk of failures. This is specially true if the application doesn’t rely on a good test suite or if it is a critical application.
Adding such a feature means that you’ll need several conditionals among all over the code.
Accepting the change means that any future change would become much slower to achieve.
Implementing the change will make it much harder to get some reports from the system and/or will make the report generation or interface navigation time much slower.

I guess there will be a gray shadow between these lines, but the main problem is that most people I worked with in my life simply will never consider saying “no” at all. They say it is just a matter of money vs time needed to accomplish it. But usually they’re not able to really estimate the costs of such a bad decision in the long run.

I wonder if Brazilians will change their mind some day and start to evaluate adapting themselves to some software or process sometimes instead of insisting in customizations. I also wonder if they’ll learn to negotiate better the features instead of just accepting the requirements as they are. I hope so, but I don’t really believe it will happen any time soon…

Installing Gitorious has never been so easy

2011-05-07T20:00:00+00:00

Important update (May 07, 2011)

This article explains how to install Gitorious using RVM Ruby on Nginx + Passenger. I’ve recently created another cookbook for installing Gitorious on a recent Debian using native Ruby as well as Apache2 and using Exim as a smarthost for sending e-mail, which means that Gitorious won’t block waiting the SMTP server for replying (specially if internet or the mail server is down). Also, this new cookbook will install much faster (less than 15 minutes in my PC). Feel free to continue reading this article if you want Gitorious with RVM Ruby. Otherwise, I would recommend the new cookbook.

Note: if you just want to install Gitorious, feel free to jump to Installing Gitorious section.

Installing Gitorious is HARD

Installing Gitorious is one of the most time-consuming servers to set-up I’ve seen. And I’m a Rails developer.

Installing the Rails application itself was not trivial as it should in the first time I set up a Gitorious server, some years ago. The usual “rake gems:install” procedure of the Rails 2 era didn’t work and we had to follow the instructions, often outdated, of which gems we should install manually with all the problems it brings as version compatibility issues. Fortunately, the Ruby community brought Bundler some time ago which changed this painful process completely making it a breeze to manage Ruby project dependencies.

The good news is that Gitorious recently started to use bundler for managing its dependencies. So, installing the Rails application became trivial. But setting the web application itself was never the most time consuming task. Far from it. The Gitorious overall system uses a mixin of technologies including the web server, memcached for dealing with cache, a stomp server, like ActiveMQ, for managing queues of jobs like creating or cloning repositories, Sphinx for searching in the projects and repositories, a MySQL database for storing data, an HTTP server like Apache or Nginx plus Passenger, several custom services for serving the Git protocol with some changes from the original “git daemon”, etc. Additionally, there are lots of configurations for e-mail delivery, SSH and the web application itself.

Some impatient people or with not enough skill often gave up on setting up a private Gitorious server for their companies. Others will successfully set-up after a whole day or two. If you’re feeling adventurous or are just curious about how Gitorious work, try to follow these instructions or read the documentation present on Gitorious mainline repository.

This is no more true - Chef to the rescue

Well, I didn’t mind very much that setting up a Gitorious was a hard task since I just needed to set it up once some years ago and another time when another developers team asked me about an year later. It is working great for us for a long time.

These days I decided that I should finally play a bit with Chef, a configuration management system. Nothing better than a real complex case to learn a new tool. Than I chose Gitorious for learning Chef. After 2 days (about 4hs/day) learning about Chef and writing the Gitorious cookbook, I found that Fletcher Nichol already had written one. I threw away almost all I had already done and continued from his work, since it didn’t work on a fresh Debian 6.0 (Squeeze) system.

The result is a really easy process to install Gitorious on a fresh Debian system.

Installing Gitorious

If something documented here goes wrong, go bug Fletcher Nichol, since he is the original author of the Chef cookbooks ;) Just kidding! He did an awesome work, but feel free to post me a message if you have any issues.

These instructions are known to work on a Debian Squeeze 6.0 Linux distribution with only the base system checked to install on package selection. You can install it with VirtualBox, for instance, using the “netinst” CD image, which can be downloaded here or from some mirror near you. You’ll need at least 1GB of virtual memory (I would suggest 512MB of physical memory plus 512MB of swap for a local testing only environment) and 4GB of hard-disk (maybe 3GB will be enough). I recommend you to use LVM since it is easier to expand your partitions if you need later. You can use a single partition if you prefer. If you are using VirtualBox, you’ll probably like to set the network to bridge mode so that you can connect to your VM from your host machine. When asked for the machine name, if you don’t have a fully qualified domain name (FQDN), you can use “gitorious.local”.

Don’t create a “git” user, since Chef will already create it correctly for you. If you need help creating the bridge network on Debian, check out these instructions.

With the fresh Debian system installed, logged as root:

1	echo 'deb http://apt.opscode.com/ squeeze main' > /etc/apt/sources.list.d/opscode.list
2	wget -qO - http://apt.opscode.com/packages@opscode.com.gpg.key \| apt-key add -
3	apt-get update
4	apt-get install chef git
5	wget -O /etc/chef/solo.rb https://gist.github.com/raw/847256/chef-gitorious-etc-solo.rb
6	mkdir /root/chef-solo
7	wget -O /root/chef-solo/node.json https://gist.github.com/raw/847256/chef-gitorious-node.json

Change your /root/chef-solo/node.json to reflect your Gitorious settings, like SMTP server, etc. The downloaded node.json will download Gitorious from my fork, which includes a tree source view to Gitorious (see Merge Request #2220), as you can see in the picture below. Feel free to remove the “git” parameter from the “gitorious” entry and it will use the vanilla Gitorious repository. Also note that “locale” is set to “pt-BR”. Change it to “en” or choose one of “pt-BR”, “es” or “fr”. I can tell you “en” and “pt-BR” will work but I haven’t tested the other ones. If you change the “web_server” attribute to “apache2”, you’ll be on your own and you’ll probably have to tweak it yourself.

The “run_list” only needs to contain “recipe[gitorious]”. The other ones are optional.

After changing the settings to reflect your preferences proceed with Gitorious automated installation:

1	git clone git://github.com/rosenfeld/cookbooks.git /root/chef-solo/cookbooks
2	cd /root/chef-solo/cookbooks
3	git submodule update --init
4	chef-solo

If you don’t have a FQDN nor a local DNS server, you can add your machine IP to your /etc/hosts on your host system. You need to access Gitorious with the FQDN provided in /root/chef-solo/node.json, which defaults to “gitorious.local”, or the web application won’t work.

Following these instructions, I was able to install Debian in 10 minutes and Gitorious within an extra hour. The actual time will depend mainly on your internet speed as well as your CPU and how many cores you’ll make available for your VM. Maybe it is a good idea to allocate more resources to your VM on the installation process and then you could reduce them. There are several techniques that could speed up this installation process, like doing some tasks in parallel, using multiple cores on the VMs, using Debian Ruby instead of RVM, replacing ActiveMQ with stompserver gem, among other things. But this is beyond the scope of this article. I guess installing gitorious on a real fast server with great band-width could take about 20 minutes using these instructions. But since this process is automated, you can use this installation time to learn how to become productive with Vim :)

Simple, right? Get in touch if you have any issues.

Good luck! ;)

Thanks

Thanks go to many people who made this possible, including:

Fletcher Nichol who developed the original Gitorious, RVM and RVM_Passenger cookbooks;
Opscode and Chef community for making this automated stuff possible;
The Gitorious developers and contributers;
David Heinemeier Hansson (DHH) for creating the fabulous Rails framework and all other rails-core team members as well as its contributors;
Yehuda Katz and all Bundler developers;
Yukihiro Matsumoto (Matz) and all Ruby developers and contributors for the lovely programming language they created and maintain;
Wayne Seguin for the excelent RVM tool and all of its contributors;
The Debian community for the wonderful Linux distribution;
Sun (recently sold to Oracle) for the great VirtualBox for managing virtual machines and all its contributors;
The Phusion company for delivering Passenger, easing the deployment of Rails and Rack applications;
Linus Torvalds, of course, not only for having started the Linux kernel development, but specially for the development of the greatest version control system: Git. Thanks also goes to Junio Hamano and the other Git developers and contributors;

There are probably lots more to thank and it is impressive how much people and work were required before one could easily install Gitorious for her/his company intranet. :)

Achieving Productivity with Vim as IDE

2012-03-20T21:30:00+00:00

Finally I’ve got some time to finish translating my Vim original article in Portuguese (written in September/2010):

I’ve long insisted on trying to use Java written IDEs like Netbeans, RubyMine, Aptana/Eclipse or IntelliJ IDEA for software developing. They are fine except that they use too much system resources and you never know when the next garbage collection will happen (usually in the greatest inspiration moment).

I was so upset with memory usage (my 4GB RAM computer was swapping very often) and garbage collection that I decided to take 3 full days of my last holiday to learn how to get productivity with Vim. The result was good enough and here is the summary of what I could get from Vim and what I could not.

Note 1: if you are already a Vim user, backup your configuration files before trying this setup. Note 2: I would like to thank Michael Durrant and Vim spelling support for helping with translation.

What to expect?

Light speed!
Auto-complete (with current setup, works well for HTML, CSS, XML, leaving the desire for others IDEs for Java)
Snippets
Tabbed editing
Recording Session
Auto-completion of words contained in the document
Support for browsing RDoc (Ruby)
File browser
Fast file opening
View number of rows and “go to line n”;
Switch to the definition of the class / method / tag under the cursor
Spell check (getting spelling support for Netbeans was really hard when I tried to, while it is built-in for Vim 7 and easy to add new dictionaries)

In addition to these features, I’ve got much more ones that I’ve never used on my prior IDEs experience, as shown in this article.

Installation

Here are the install procedures tested on a Debian Unstable Linux distribution that should work almost seamless with Ubuntu too. In Windows, apparently the change is that the configuration directory of Vim is called “vimfiles” instead of “.vim”. If you have any questions about the installation process, just post a comment.

You need to be root (or use sudo in Ubuntu) for installing the required packages.

1	apt-get install exuberant-ctags vim-gtk git
2	cd
3	git clone --recursive git://github.com/rosenfeld/vimfiles.git .vim
4	ln -s .vim/vimrc .vimrc

Some additional notes in case you have any issues with the above steps or are just curious:

If you further install gitk and git-gui, there are shortcuts for launching them from Vim.
Gnome users might prefer installing vim-gnome instead of vim-gtk. Some shortcuts (like Ctrl+S) won’t work on Vim when running in some terminal emulator as Konsole or gnome-terminal because they will capture the shortcuts before Vim can handle them and gVim is recommended instead.
The exuberant-ctags package is required for tag navigation.

Features

Vim has much more features than what I’ll introduce on this article. I would suggest reading other resources on the subject if you have some free time.

Basics

Editing, saving, navigation and quiting

Unlike other editors, Vim has different modes. It starts in Normal mode, in which typed chars are interpreted as commands. Pressing ‘i’ or ‘Insert’, enter Vim in insert mode, from which you can type anything. To exit insert mode, just press ‘Escape’.

Most commands are available through a command line that shows up when a colon (‘:’) is pressed. Some of them are:

‘:q’: Quit without saving. Vim will warn you instead of leaving when any changes are unsaved.
‘:w’: Write buffer to current file (save the file).
‘Ctrl+x, s’: Save current file, while on Insert mode.
‘:x’ or ‘:wq’ or ‘ZZ’: Quit saving changes.
‘:q!’: Quit discarding changes.
‘:qa’: Quit closing all buffers (read “files”, for simplicity sake). Actually, prior commands will only act on current buffer.
‘:e path/to/file’: Open file in the current window (auto-complete is achieved with the TAB key). If a relative path is given, the current Vim directory (‘:pwd’ will show it) is used. This can be changed with the ‘:cd /new/path’ command or ‘:lcd’ for changing the path just for current window (more on windows later).
‘,f path/relative/to/file’: ‘,f’ expands current file path and puts you on the command line
‘:tabe’ and ‘,t’: the same thing bug open file in a tab instead of the current window.
‘Ctrl+PageUp/PageDown’: navigate through tabs (may not work on terminal Vim)
‘:tabnew’: Open an empty buffer on a new tab
‘:e!’: Discard file changes and load last saved content
‘w’: Position the cursor to the beginning of next word
‘e’: Position the cursor to the end of next word
‘b’: Position the cursor backward to the beginning of the word
‘,w’ and ‘,b’: The same considering CamelCase words
‘0’: Position the cursor at start of current line
‘^’: Position the cursor at the first non-blank character of the current line
‘$’: Position the cursor at the end of the line
‘%’: Go to the corresponding pair of ‘[]’, ‘()’ and ‘{}’
‘gg’: Go to the beginning of current buffer (document)
‘G’: Go to the end of buffer
‘45G’: Go to line 45
‘~’: Change the case of the letter under cursor
‘u’: Undo
‘Ctrl-r’: Redo
‘.’: repeat last command
‘J’: join lines
Ctrl+e: scrolls one line down without moving the cursor
Ctrl+y: scrolls one line up without moving the cursor

On insert/editing mode, you can call normal mode commands by pressing Ctrl+O before the command. While on normal mode, it is possible to change to insert mode using some commands:

i: doesn’t change current cursor position
I: position the cursor in the beginning of the line
o: appends a new line below the current line
O: appends a new line above the current line
a: position the cursor one character after current cursor position
A: position the cursor at the end of current lines

Commands for deleting lines, words, blocks, managing surrounds and toggling comments:

dd: delete current line (actually, moves it to Vim internal clipboard)
D: delete until the end of the line
x or Delete: deletes a character under cursor
Backspace: deletes a character backward
dw: delete from current cursor until the end of the word under cursor
diw: delete inner word (the entire word under the cursor)
db: delete until the beginning of the word or a word backward if the cursor is already in the beginning of some word
ds', ds", ds{, ds[, ds(: delete surrounds (‘’, “”, {}, (), [])
dst: delete surrounding tag
di', di", di{, di[, di(: delete content inside the given surround
da', da", da{, da[, da(: delete all content of the given surround, including the surround characters
dit: delete inner tag content
cs: works like ds, but replacing the surround instead of deleting them. For instance, ci"‘ will turn “text” into 'text’. ci"t
will result in
text
…
yss*: apply surround around the entire line. Ex.: yss' will apply an apostrophe around the line, while yss
will surround the line with a div tag.
s*: adds a surround while on visual mode (click and drag with mouse or press ‘v’ to ender visual mode and use the movement commands)
ys: applies surround around the region described by the movement command. Ex.: With cursor under “word” ysiw results in word
C, cw, ciw, cb, ci*, etc: Works like the delete commands but finish the command on insert mode (c stands for change)
gv: Reselect last visual selection
\c: toggle line (or block in visual mode) commenting
ggdG: [d]eletes entire buffer - from beginning [gg] to end [G] of the document

Copy and paste

In normal mode (don’t use ‘:’):

yy: copy (yank) current line to clipboard
p: paste content from clipboard
yyp: duplicate current line
‘:%y’: copy the whole buffer (document) for internal use in Vim, only
‘:%y+’: copy the whole document to system clipboard
‘:%y’: the * register is a clipboard register associated with the middle button on nix systems. This command copies the document to this clipboard area.
“+yy (or Ctrl+X, c): copy current line to the system clipboard (register +)
“yy: copy current line to the middle-click associated clipboard (register )
Ctrl+R,+ (ou Ctrl+X, v) e Ctrl+R * (ou Ctrl+X, b): paste from system clipboard and middle-click clipboard respectively
Ctrl+C: In visual mode, copy selection to system clipboard

In visual mode, ‘y’ copies the selection, while ‘“+y’ / ‘”y’ copy the content to registers + and .

Windows and Tabs

I’ve already commented about basic tabs-related commands. Further commands follow below:

Ctrl+w, s: (Press Ctrl+w, then ’s') - split window horizontally
Ctrl+w, v: (Press Ctrl+w, then ‘v’) - split window vertically
Ctrl+w, c: Close current buffer or tab if has a single window
Ctrl+w, o: Keep Only current window on tab, closing the others
Ctrl+w, w: Alternate to next window
Ctrl+w, arrow key: Alternate to window pointed by the arrow key
Ctrl+w, T: Note the capital T. Move current buffer to a new tab

Quick file open

The ‘’ (Ctrl+X Ctrl+F) shortcut activates the quick open file dialog.

Vim will list files in your current dir (launch ‘:pwd’ command to see what is it and ‘:cd ~/new/path’ to change to a new path). While you type, files are filtered considering the typed expression. For instance, ‘a/c/uc’ will list ‘app/controllers/user_controller.rb’ as an option.

Hit Enter to open the file in the current buffer. Ctrl+t will open it in a new tab. Ctrl+Enter will open in a new window.

Snippets

Snippets are expanded with the TAB key. For instance, div will expand to

The bundled snippets are located in ~/.vim/bundle/snipmate/snippets and ~/.vim/bundle/rosenfeld/snippets.

Feel free to modify them and include new ones on bundle/*/snippets and ~/.vim/snippets.

Editing HTML, XML, ERB, ASP, JSP, PHP, GSP, etc

Shortcuts for working with HTML/XML also work on PHP, ASP, ERB, JSP, etc, once the file type is properly configured like “html.erb”. This can be achieved with command “:set ft=html.erb”, for ERB files, for instance. You can also set these associations automatically according to file extension. See some examples in ~/.vim/filetype.vim.

Some shortcuts for working on HTML have been already discussed. Here are some more shorcuts, for being used on insert mode:

Ctrl+x, /: Closes the last open tag.
Ctrl+x, space: convert word in a tag and put the cursor inside it. Ex.: div results in
|
, where ‘|’ denotes the final cursor position
Ctrl+x, Enter: similar to prior command, but with a line break between the tag start and its end
Ctrl+x, ‘: creates to a comment tag
Ctrl+x, “: comment current line
Ctrl+x, !: open a menu with DOCTYPE choices to choose from to insert on document
Ctrl+x, @: inserts a stylesheet tag
Ctrl+x, #: inserts a meta tag with charset=utf8
Ctrl+x, $: inserts a script tag for the Javascript language

For template files, like ERB, JSP, PHP, etc:

Ctrl+x, =: <%= | %> or the equivalent for the file format
Ctrl+x, -: <% | %> or the equivalent for the file format

For ERB (Ruby), I’ve created the following alternative snippets:

re: <%= | %>
rc: <% | %>

If you use KDE, it’s possible to launch kcolorchooser for returning a hex color into the document (a CSS, for instance), hitting F12. Take a look at ~/.vim/initializers/kcolorchooser-mapping.vim for changing your software of choice.

Spelling check

Commands:

spen: enable spelling check for English
‘:set nospell’: disable spelling check
z= or right-clicking the word: open a menu with spelling correction suggestions to choose one from
Ctrl+x, s: the same while on insert mode
]s: next misspelled word
[s: prior misspelled word
zg: add word under cursor as a Good word. The word is added to a local dictionary, which can be configured with the spellfile variable (“:set spellfile=~/.vim/spell/custom”)
zw: mark word as wrong, commenting it on the spellfile if it already appears there
zG and zW: the same, but doesn’t persist changes, making them valid only in the current Vim session
zug, zuw, zuG e zuW: undo related command
‘:spellr’: repeat the replacement done by z= for all matches with the replaced word in the current window

Indenting

Commands:

==: indents current line
=: in visual mode, indents the selected block
gg=G: go to beginning of the buffer (gg) and indents (=) until the end of buffer (G)
< and >: indents a block (in visual mode) to left or right. Press ‘.’ to repeat last indenting and ‘u’ to undo.

Finding and Replacing

Commands:

F4: replace text in interactive mode
/search_pattern: Find next match. Examples: “/function” or “/\d\{4}-\d\{2}-\d\{2}” to locate some date like “1981-06-13”
?search_pattern: Find match backward.
n: repeat the next ‘/’ or ‘?’ command.
N: same as ‘n’ but in reverse direction.
‘:%s/text/other/’: Replace ‘text’ by ‘other’ in the whole document (some commands accepts ranges and % stands for the whole document range - see ‘:h range’)
‘:s/text/other/’: Replace ‘text’ by ‘other’ in the current line. Actually, any character can be used instead of ‘/’, like ’s.7/11/2010.11/7/2010.'
“:‘<,’>s/text/other/”: Replace ‘text’ by ‘other’ in the last visual selection. ‘< and ’> are the markers for the beginning and ending of the visual selection. Pressing ‘:’ while on visual mode, these markers are automatically inserted in the command line.
&: repeat last substitution command
‘:Rgrep word .rb’: search for ‘word’ recursively in all ‘.rb’ files in the project. The ‘:vimgrep’ command can also be used if the external programs ‘grep’ and ‘find’ aren’t available but the search will be much slower. There are also other differences - take a look at ‘:h vimgrep’. For instance, you can open the file in the matched line by typing ‘:cc 33’ (go to 33th result, numbers are listed with ‘:cl’). Ex.: ‘:vimgrep word */.rb’

Markers

Commands:

ma: mark current position in the ‘a’ register. Any letter can be used as a register name.
‘a: go to register 'a’ mark
‘’ (two simple quotes): go to the position before the latest jump
Ctrl+O, Ctrl+i: go to the prior and next positions

Changes list

Commands: - ‘:changes’: list all changes in the current buffer - g;: go to the last change - g,: go to next change - 4g;: go to the change #4 (numbers are displayed by the ‘:changes’ command)

Navigation among buffers

Commands:

Ctrl+x Ctrl+x: in any mode, opens a window presenting the opened buffers to switch to (press ‘q’ to cancel or ‘Enter’ to choose an option)

File tree navigation

Ctrl+n: alternate file navigation window
\n: the same but expand the tree in the location of the file being currently edited
‘:e.’: replace current window by a file browser starting in the project root, that allows you to choose any file to open in the current window
‘:Ex’: the same but uses the current file path as the start location

File tree shortcuts:

Enter: open the file in a new horizontal split or in the same window if the file is not modified
t: open in a new tab
?: list the other shortcuts

External commands

For running an external command:

‘:! git gui&’: execute ‘git gui’ in background (doesn’t work on Windows, of course)
‘:.! ls .txt’: replaces current line with the output of the command ‘ls .txt’
‘:+! ls .txt’: creates a new line below the current line with the output of the command ‘ls .txt’ (use ‘-’ instead for creating the line above the current line)

Git integration

\g: Starts the git gui for the current project (doesn’t work on Windows currently). Use ‘:lcd ~/project/path’ for changing the project directory in the current window, or ‘:cd’ for changing the path for the hole vim session
\k: Starts gitk in background (doesn’t work on Windows)

See $VIMHOME/bundle/vcscommand/doc/vcscommand.txt for other commands. For instance: - \cd: show the diff for the current file in a new horizontal split - \cr: review the last committed version of the file in a new window

Suppose you want to know what are the differences from your current unsaved changes and the original file: - \cr: split the original version in a new horizontal split. If you want the split to be vertical, you can move the window to the left (Ctrl+w, H) or right (Ctrl+w, L). H and L must be capital. - run ‘:diffthis’ in both windows: see next topic on diff.

You can also take a look at $VIMHOME/bundle/fugitive/doc/fugitive.txt for further git shortcuts, like: - ‘:Gstatus’: show the output of ‘git status’ and allows you to stage or unstage files under cursor pressing ‘-’, or viewing the diff in a vertical window (pressing ’D') or in a horizontal window (pressing ‘dh’). - ‘:Gcommit’, ‘:Gblame’ and ‘:Gmove’ are other self-explanatory examples. Take a look at fugitive documentation for more details.

Viewing files difference

Open at least two windows with the text you want to see the differences and type ‘:diffthis’ on each window. For turning the diff off, type ‘:diffoff’. Use ‘dp’ in one highlighted diff for putting it in the other window or ‘do’ to obtain the difference content from the other window. Use ‘[c’ and ‘]c’ for navigating backwards and forwards to the next start of a change. See ‘:h diff’ for more details.

Getting vim help

‘:h’ or F1: open the Vim main help
‘:h command’: open the command help in a the help window
Ctrl+]: open a link in the help
Ctrl+T: go back to the prior help position

Ruby specifics (Rspec, RDoc, etc)

Commands (won’t work in some terminals, use gVim or MacVim):

Ctrl+s, r: get the RDoc for the word under cursor
Ctrl+s, s: run rspec in the current opened spec
Ctrl+s, x: alternate between spec and model

Rails commands (use tab for auto-complete most commands):

‘:Rview users/list.erb’: open the view
‘:Rcontroller users’ and ‘:Rmodel user’ are similar commands
‘gf’: when pressed over a line such as ‘render “users/list”’ will open ‘users/list.erb’ for instance. When pressed over the ‘ApplicationController’ word, it will take you to ‘application_controller.rb’.
‘:R’: Alternate between the controller action and the view when you follow the conventions.

Debugging:

You need to install the ‘ruby-debug-ide19’ or ‘ruby-debug-ide’ gem for this to work:

‘:Rdebugger bin/ruby_script’ or ‘Rdebugger script/rails server’ for a Rails application
\db: alternate breakpoint
\dn: step over
\ds: step into
\df: step out
\dc: continue
\dv: open variables window
\dm: open breakpoints window
\dt: open backtrace window
\dd: remove all breakpoints
‘:RdbEval User.count’ will evaluate ‘User.count’
‘:RdbCommand where’ will send the ‘where’ command to rdebug
‘:RdbCond user.admin?’ will set the condition ‘user.admin?’ to the breakpoint
‘:RdbCatch Errno::ENOENT’ will catch the file not found exception, jumping to the file line of the exception, allowing you to investigate the stack-trace, variables, etc.
‘:RdbStop’ stops the debugger

Refactoring

Although Vim doesn’t allow you to directly refactor some variable for instance (at least, I don’t know how to do that in Vim), it can help you refactoring your code in many ways, from substitution commands to variable extraction like the example above:

Suppose you want to refactor the code below as follows:

1	if User.find(params[:id]) and current_user.admin?
2	# ...
3	end

to:

1	@user = User.find(params[:id])
2	raise NotFoundException unless @user and current_user.admin?
3	# ...

For extracting the “User.find(params[:id])” to the “@user” variable, you can position the cursor under the “U” and run the commands “c% @user” (change the content “User.find(params[:id])” with “@user”), “Ctrl+o, O” (execute the ‘O’ command while on insert mode [Ctrl+o] - create a new line above the current), “@user = ” (just typing), ‘Ctrl+R “’ (paste the cut content).

With all these explanations, it may seem hard, but take a look at how we can achieve this with so few keystrokes: c% @user O @user = “.

Learning how to use Vim in its full power will allow you to do many tasks quicker than any other editor or IDE in the overall. For instance, RubyMine will allow you to do the same with less keystrokes for this specific case, but for special cases, Vim will still be more useful and not much less productive than RubyMine for this common case. Actually, cutting “User.find(params[:id])” is much faster in Vim (“c%”) than selecting the whole text in RubyMine or any other IDE. The same apply for change the content inside quotes, parenthesis, XML tags, etc among other features.

What doesn’t work?

Unfortunately, I couldn’t find every feature I wanted in Vim yet. Some of them present on regular IDEs include:

Except from Ruby, integrated debugging is probably missing for most languages
Recent tab navigation using Ctrl+Tab like usually work in most IDEs
Seamless integration with the system clipboard, unless using vim in ‘easy’ mode with ‘evim’ or ‘vim -y’ commands
For Java development, traditional IDEs like Netbeans, Eclipse or IntelliJ are more competent with auto-completion and other language features
Jumping to a tag in an existent tab, or open in a new one (currently I could just open in a new one). Maybe we should get used to work with buffers instead of tabs in Vim
Integration with the Rails i18n infrastructure with the default backend. Rubymine has a great integration and a friend of mine has also developed something similar as a Netbeans plugin

More to come

There are still more useful commands like folding and other interesting features that I’ll comment when I have more time available.

I’ve already commented about several commands and I suggest you to start learning those that you use more often, like snippets, simple search and replace, quick file opening, tabs usage and buffer navigation. For those that work with HTML, I would recommend taking a look at the “surround” plug-in, that are specially useful for working with XML/HTML tags.

As a last keynote, this article was written with Vim in the Markdown format. Many of these examples include tags and for escaping them in the document, I’ve used the command ‘:%HTMLSpecialChars’ from the plug-in ‘htmlspecialchars’.

If you can take some time to improve your Vim skills it will save you many coding time during your coder life.

Good advantage and have fun!

Generating PDF with ODF templates in Rails

2010-03-16T21:00:00+00:00

In 2009, I wrote an article for the Rails Magazine Issue #4 - The Future of Rails - where I presented an alternative to PDF generation from ODF templates, which can be generated using a regular text processor such as OpenOffice.org or Microsoft Office (after converting the document to ODF).

You can read the entire article downloading this magazine for free or purchasing it. The application code illustrating this approach was published by the magazine on Github.

Unfortunately, I can’t host a working system providing a live demonstration due to my Heroku account limitations, but it should be easy to follow the instructions in the article on your development or production environment.

Do not hesitate in sending me any questions, through comments on this site or by e-mail, if you prefer.

Site's Debut

2010-03-16T20:40:00+00:00

For my first article, I chose to write about the reasons that resulted in my decision to finally develop my site, as well as talking about its technical structure and why I have taken such approach.

Motivation

I have been considering writing my site for a long while. There were lots of subjects I had interest in writing about, but for several years I really had no free time to do it. There were tons of distractions I had to deal with: graduation, writing my master thesis, looking for jobs, working, marriage, more working. When I had some time at night, I was really exhausted.

During this time, I considered using some ready to deploy system, such as Blogger, Wordpress and others, but I didn’t like the idea of loosing control over by articles. Although customizing Wordpress was an option, it is written in PHP, so, in short: no, thanks. I wanted my site to be exactly as I desired and these tools wouldn’t allow me total flexibility over it and I guessed it would be too hard to migrate all my articles to a new system later if I decided to.

When I found some available time at night, I came to think in building my own site, in my way, with Rails. But I always faced the same problem. I wasn’t inclined to invest money regularly on some hosting provider when I didn’t intended any payback from my site. I never found a free hosting service either that supported Rails and neither was willing to develop my site with other framework.

Recently, lots of blogs that I follow commented about Toto. So many of them that I decided to get a deeper understanding of it and doing some tests. Toto is a blog system written in about 300 lines of Ruby code, in top of Rack.

Toto was the reason that made me decide to finally write my site. These were the main ideas that inspired this site design:

The articles get stored on disk, instead of a database. This allows easy version management of the articles, using my favorite version control system: Git.
Comments are managed by Disqus, a system I had never heard of before and that is just fantastic!
Finally, the main reason was that Toto increased my interest in Heroku. I had already read about it before, but every time I tried to understand what was it about, I didn’t get the idea instantly and, with little time to read all my feeds, I ended up not being interested enough for digging deeper. After reading more about Toto, I understood that Heroku was a service that would allow me to host a Ruby web application with no cost. My sincerely thanks to Toto and Heroku who made this site possible!

Why not Toto?

My first attempt was doing exactly Toto’s recommended procedure. As long as I developed my site with Toto, I faced some obstacles:

The first one was related to code highlighting. This one was easy to solve after following some instructions found in some blogs, explaining how to embed CodeRay in Toto, for instance.
The next challenge was about internationalization. I wanted to write articles both in English and Portuguese. I wanted some support for internationalization and Toto didn’t worry about this, as it was designed to be minimalistic. I don’t blame it, but this was a concern while writing deciding or not to use Toto.
Finally, I wanted to group my articles in directories for organizing the articles by topics such as Ruby/Rails, general programming, operating systems, infrastructure, etc.

Fortunately, Toto is so compact and well written that it is a trivial task to adapt it in a full Rails application and change it to meet my expectations. Rails has I18n native support, so I only needed to implement the topics organization.

Basically, the main ideas implemented on this site were extracted from Toto. I’m very grateful to its creator Alexis Sellier for the inspiration that resulted on this site.

No databases are being used for now. Site statistics are handled by Google Analytics. The images used in this site are hosted at Amazon S3 and Ultraviolet is the installed code highlighter. RDiscount was chosen for parsing Markdown.

Given the opening, I hope you enjoy the next articles.