Rodrigo Rosenfeld Rosas

Introducing sequel_tools: Rake integration over Sequel migrations and related tasks

2017-12-18T18:55:00+00:00

The importance of the little details (skip this section unless you enjoy rants)

Seriously, this section is big and not important at all, feel free to completely skip it right now if you’re short in time or don’t enjoy rants.

This is a rant explaining how ActiveRecord migrations completely defined my career in the past years.

I became curious about programming and computers when I was kid. I remember reading a huge C++ book when I was about 10 years old. I had learned Clipper just a bit before and I recall creating a Bingo game with Clipper, just because I wanted to play Bingo in those machines but I couldn’t :) While learning Clipper I also had my first experience learning SQL and client-server design. My dad subscribed me to a few computer courses by that time, such as “DOS/dBase III Plus”, Clipper + SQL and a few years later Delphi + Advanced SQL. I learned C and C++ from books and when services like Geocities and similar were showing up and the Internet was becoming supported in lots of homes I also became interested in learning HTML to build my own sites, the new hotness for that time. Since I also wanted to serve dynamic content, I decided to learn Perl since it was possible to find some free hosting services supporting Perl, and that was the first interpreted language I learned and I was really fascinated by it by that time.

For a long while I used Perl exclusively for server-side web programming since it was the only option I could find in free hosting services, but while in Electrical Engineering college, I barely did any web programming, and my programming tasks (extra classes) were mostly related to desktop programming (Delphi / C++) and embedded and hard real-time systems using a mix of C and C++ during my master thesis in Mobile Robotics. By that time I had a solid understanding of C and C++, good times, I don’t find myself proficient with them anymore these days. That was a time where I would read and know the entire specs from W3C or HTML 4.01 and CSS. Today it’s simply unfeasible to completely follow all related specs and I’m glad we have competition in the browser’s marketing since it’s really hard to follow up with all changes happening every day.

Once I finished my master thesis and had to find a job, I looked mostly for programming jobs, since I considered myself good in programming, there were lots of interesting opportunities out there while it was really hard to find companies in Brazil working on electronic devices development or Robotics and I never actually enjoyed the other part of Electrical Engineering such as machines, power or electrical installations. I only enjoyed the micro-electronics and embedded devices creation and one should consider themselves very lucky if they can work in such area in Brazil, and I didn’t want to count on luck, so I decided to focus on the programming career instead. I remember my first curriculum was sent to Opera Software, my preferred browser, to apply to a C++ developer position, by that time, but after tons of interviews they didn’t call me, so I’m not currently living in Norway these days ;)

After working for 3 months in a new parking system using Delphi (despite asking for using C++ instead) the contract was finished, the product was already working in one of the malls in my city, and I had to look for another job. They actually extended the offer to keep working with them, but at the same time I found another opportunity and this time I would have to get back to web programming. That was in 2007. Several years later and I couldn’t really remember much of Perl and a lot had happened to web programming in the past years and I didn’t follow that progress.

After a few stressful days trying to learn about every major web programming framework (specially while trying to read about J2EE), I came to the conclusion that I would finally choose one of TurboGears, Django or Rails. I didn’t know Java, Python or Ruby by that time, so the language didn’t take an important role while choosing the framework. I was more interested in learning about how the frameworks would make my life easier. At that time I had to maintain an existing ASP application but at some point I would have to create a new application and I could choose whatever I wanted and definitely I didn’t enjoy ASP.

Since that application had to be displayed in Portuguese, I was considering the Python frameworks more than the Ruby one, as Rails didn’t support internationalization by that time (i18n support was added to Rails 2 if I recall correctly) and even supporting UTF-8 wasn’t straightforward with Ruby 1.8. Iconv and $KCODE were something you’d often hear about in the Ruby community by that time. There were tons of posts dedicated to encoding in Ruby by that time.

But there was that one Rails feature that made me change my mind and choose Rails over TurboGears or Django, which were supposed to work well with encodings and had announced internationalization support. And it was the approach used to evolve databases, which was the right strategy to use from my previous experiences, while I was pretty scared by the model-centered approaches used by TurboGears and Django to handle the database evolution.

By that time I had already plenty of experience working with RDBMS, specially Firebird, and having to deal with versioning the database and supporting multiple environments. That took me a lot of effort every time I started a new project because I basically had to implement the ActiveRecord migrations features every time and I knew that was very time consuming, so I was glad I wouldn’t have to roll my own solution if I used Rails, as ActiveRecord migrations were clearly more than enough for my needs and they worked pretty well. So, despite the issues with encoding and lack of internationalization support, I decided to pick Rails due to the ActiveRecord migrations.

And even though I don’t use ActiveRecord for several years, I’ve been still using its migrations tools since 2007, more recently through my wrapper around it called active_record_migrations.

While I don’t appreciate ActiveRecord as an ORM solution, I like its migrations tooling very much and they haven’t changed much since I used them with Rails 1. The most significant changes since then were support for time-stamped migrations, the reversible block and finally, many years later, proper support for foreign keys (I struggled to add foreign keys using plain SQL for many years).

When I first read about Sequel I was fascinated by it. ActiveRecord wasn’t built around Arel yet by that time, so all those lazy evaluations in Sequel were very appealing to me. But around 2009 I took another job opportunity and this time I would work with Grails and Java rather than Rails, so I missed many recent changes to Rails for a while. In 2011 I changed my job again, but still had to support a Grails application, but I was free to do whatever I liked to the project and since there were quite a lot of Grails bugs that were never fixed and I couldn’t find work-arounds for, I decided to slowly migrate the Grails app to Rails. By that time, Arel had been integrated to ActiveRecord, so it would finally support lazy evaluation as well, so I decided to try to stick with Rails defaults, but a week later I realized that there were still many more reasons why Sequel was far superior to ActiveRecord and decided to replace ActiveRecord with Sequel and never looked back. Best decision ever.

See, I’m a database guy. I work with the database, not against it. I don’t feel the need to abstract the database because I’d prefer to use Ruby over SQL. I was able to appreciate not only SQL but several other powerful tools provided by good database vendors, such as triggers, CTE, stored procedures, constraints, transactions, functions, foreign keys and definitely I didn’t want to avoid the database features at all. ActiveRecord seems to try to focus on hiding the database from the application, by trying to abstract as much as possible so that you feel you’re just working with objects. That’s probably the main reason why I loved Sequel. Sequel embraced the database, it didn’t fight the database. It would try to make it as easy as possible to use whatever vendor-specific feature I wanted to, without getting in my way. That’s why I don’t see Sequel as an ORM, but as a tool that allows me to write the SQL I want with a level of control and logic that would be pretty hard to achieve by building SQL queries through concatenation techniques and manual typecasting of params and result sets.

I can always have a clear idea on the SQL generated by Sequel and it’s way more readable than if I had to write the SQL by hand myself.

When I first learned about Sequel, Jeremy Evans was already its maintainer, but it seems Sequel was first created by Sharon Rosner. Recently I read this article, where this quote came to my attention:

I’m the original author of Sequel [1], an ORM for Ruby. Lately I’ve been finding that ORM’s actually get in the way of accomplishing stuff. I think there’s a case to be made for less abstraction in programming in general, and access to data stores is a major part of that.

For an in-production system I’ve been maintaining for the last 10 years, I’ve recently ripped out the ORM code, replacing it with raw SQL queries, and a bit of DRY glue code. Results: less code, better performing queries, and less dependencies.

Sharon Rosner, Sequel original author

Good that it’s working well for him, but I really find it weird to see that he would consider Sequel a traditional ORM. To me, Sequel allows me to write more maintainable queries, so I consider it more of a query builder than an ORM. If I had to build all SQL by hand and typecast params and result sets by hand, I think the result would be much worse, not better.

So, nowadays, I’m considering creating a brand new application after several years, and I’m frustrated that it takes a really long time to bootstrap a production-ready new application with the state-of-the-art features. I started working on such sample project to serve as a start point. The idea is to add features such as automated deployment, including blue-green (canary) strategies for zero downtime, using Roda as the Ruby framework, Webpack to bundle static resources, support a lightweight alternative to React, such as Dio.js or Inferno.js, supporting multiple environments, flexible configurations, client-side routing, proper security measures (CSRF, CSP headers), a proper authentication system, such as Rodauth, proper images uploading (think of Shrine), distributed logging (think of fluentd) with proper details, reliable background jobs, server-side and client-side testing, support for lazy code loading for both client-side and server-side, autoreloading of Ruby code in the server-side, analytics, APM, client-side performance tricks such as link preloading, performance tracking for both server-side and client-side code, errors tracking for both server-side and client-side code, integrated with sourcemaps and notifications from monitoring services, CDN support, full-text search through ElasticSearch or Solr, caching storage such as Redis, Docker based infra-structure, backup, high-availability of databases, and many many more features that are supposed to be found in production-ready applications. As you can see, it’s really frustrating to create a new application from scratch these days, as it seems any new product could easily take an year to reach a solid production-ready level. And, of course, support for database migrations.

The last thing I would want to worry about while working on this huge project is to waste time with a simple task, such as managing the database state through some migrations and related tools. Specially as ActiveRecord migrations have been providing that for so long and it works pretty well. However, this time I really wanted to ditch the dependency on railties for this new project, and active_record_migrations relies on railties for simplicity, so that it can take advantage of the Rails generators and just be a very simple wrapper around ActiveRecord migrations. But since AR itself won’t be used in this project, I decided to spend several hours (about two full days), replicating the most important tools from ActiveRecord to Sequel. And this is how sequel_tools was born this week.

I find it interesting how such a little detail, like Rails bundling a proper database migrations tooling, influenced a lot of my career, since I only learned Ruby because of Rails in the first place and I only chose Rails because of ActiveRecord migrations :) If I was working with Python I wouldn’t have learned Ruby most likely and wouldn’t work in my current job, and wouldn’t have created many gems such as:

active_record_migrations;
auto_reloader;
rails-web-console;
sequel-devise (no longer maintained by me);
rspec_nested_transactions;
rails_compatible_cookies_utils;
rack_web_console;
global_hotkeys_manager;
rack_toolkit;
simple_mail_builder;
and now sequel_tools, among others.

I’ve also been using Ruby for some other projects such as cert-generator, a Rack application that can be launched from a Docker container that allows development suited auto-signed root CA and HTTPS certificates in such a way supported by modern browsers. I’ve written about it in my previous article.

Or I wouldn’t have contributed to some Ruby projects such as Rails, orm_adapter-sequel, Redmine, Gitorious (now dead), Unicorn, RSpec-rails, RSpec, Capistrano, Sequel, js-routes, jbundler, database_cleaner, Devise, ChiliProject, RVM, rails-i18n, rb-readline and acl9. Most of them were minor contributions or documentation updates, but anyway… :)

Not to mention many bugs reported to MRI, JRuby and Ruby projects that have been fixed since then. And, before I forget, some features have been added to Ruby after Matz approved some of my requests. For example, the soon to be released Ruby 2.5 is introducing ERB#result_with_hash (see issue #8631.

Or my request to remove the ‘useless’ ‘contatenation’ syntax that was approved by Matz about 5 years ago, and I still hope someone would implement it at some point :)

I wonder what would be my current situation if ActiveRecord migrations weren’t bundled with Rails in 2007 :) On the other side, maybe I could have become rich working with Python? ;)

Introducing sequel_tools

If you’re a Sequel user, you probably spent a while searching for Rake integration around Sequel migrations and realized it was more time than you’d wished. I’ve been in the same situation, but it was so frustrating to me, because I wasn’t able to find all tasks I want to have at disposal, that I’d often just forget about using Sequel migrations to stick with ActiveRecord migrations. Not because I like the AR migrations DSL better (I don’t by the way), but because all tooling is already there, ready to be used through some simple rake commands.

sequel_tools is my effort in trying to come up with some de facto solution for integrating Sequel migrations and related tooling and Rake, and see if the Sequel community could concentrate the efforts on building together a solid foundation for Sequel migrations. I hope others would sympathize and contribute to the goal, so that we wouldn’t have to waste time thinking about migrations again in the future when using Sequel.

Here are some of the supported actions, which can be easily integrated to Rake, but are implemented in such a way that other interfaces, such as command lines or Thor, should be also made easy to build:

create the database;
drop the database;
migrate (optionally to a given version, or latest if not informed);
generate a migration file (time-stamp based only);
status (which migrations are applied but missing locally and which are not yet applied to the database);
version (show current version / last applied migration);
rollback last applied migration which is present in the migrations path;
run a given migration up block if it hasn’t been applied yet;
run a given migration down block if it hasn’t been applied yet;
redo: runs a given migration down and up, which is useful when writing some complex migrations;
dump schema to schema.sql (configurable, can happen automatically upon migration - implemented just for PostgreSQL for now, by calling pg_dump, but should be easy to extend to support other databases: PRs are welcomed or additional gems);
load from schema;
support for seeds.rb;
reset by re-running all migrations over a new database and running the seeds if available;
setup by loading the saved schema dump in a new database and running the seeds if available;
execute a sql console through the “shell” action;
execute an irb console through the “irb” action. This works like calling “bundle exec sequel connection_uri”. The connection is stored in the DB constant in the irb session.

I decided not to support the Integer based migrations at this point as I can’t see any drawbacks of time-stamp based migrations that would be addressed by the Integer strategy while there are many problems with the Integer strategy even if there’s a single developer working in the project. I’m open to discuss this with anyone that thinks that could convince me otherwise that supporting Integer based migrations would add something to the table. It’s just that it’s more code to maintain and test and I’m not willing to do that unless there is indeed some advantage over using time-stamp based migrations.

The project also allows missing migration files, since I find it useful specially when reviewing multiple branches, dealing with independent migrations.

I don’t think it’s a good idea to work with a Ruby format for storing the current schema, as a lot of things are specific to the database vendor. I never used the Ruby vendor-independent format in all those years, but if you think you’d value such a feature in case you just use the basics when designing the tables and want your project to support multiple database vendors, then go ahead and either send a Pull Request to make it configurable, or create an additional gem to add that feature and I can link to it in the documentation.

I’d love to get some feedback regarding what the Sequel community would think about it. I’d love for us to get to some consensus on what should be the de facto solution for managing Sequel migrations in a somewhat feature-complete fashion and would love to get the community help on making such de facto solution happen to the best interest of we, Sequel happy (and sometimes frustrated by the lack of proper tooling around migrations - no more) users ;)

Please take a look at how the code looks like and I hope you find it easy to extend to your own needs. Any suggestions and feedback are very welcome, specially now that the project is new and we can change a lot before it gets a stable API.

May I count with your help? ;)

Explicit request params binding in Ruby web apps (or "convenience can be inconvenient")

2017-10-13T19:50:00+00:00

The Ruby ecosystem is famous for providing convenient ways of doing things. Very often security concerns are traded for more convenience. That makes me feel out of place because I’m always struggling to change the default route since I’m not interested in trading security with convenience when I have to make a choice.

Since it’s Friday 13, let’s talk a bit about my fears ;)

I remember that several of the security issues that were disclosed in the past few years in the Ruby community only existed in the first place because of this idea that we should try to deliver features the most convenient way. Like allowing YAML to dump/load Ruby objects, for example, when people were used to use it to serialize/deserialize. Thankfully it seems JSON is more popular these days even if more limited - you can’t serialize times or dates, for example, as allowed in YAML.

Here are some episodes I can remember of regarding how convenience was the reason behind many vulnerabilities:

Remote code execution due to convenience methods added to XML and YAML, 2013;
DoS caused by Rack conveniently converting params to hashes automatically: 1, 2
Params injection caused by Rack conveniently converting params to arrays automatically
Remote code execution due to render conveniently accepting multiple arguments formats: 1, 2
XSS vulnerability due to adding convenient JSON encoding features - For several years I only rely on the ‘json’ stdlib to parse and encode JSON using ::JSON.parse/unparse and don’t use any sort of .to_json.
More vulnerabilities in name of convenience
many more examples, but you got my point hopefully.

I remember that for a long while I was used to always explicitly convert params to the expected format, like params[:name].to_s and that alone was enough to protect my application from many of the disclosed vulnerabilities. But my application was still vulnerable to the first mentioned in the list above and the worst part is that we never ever used XML or YAML in our controllers but we were affected by that bug in the name of convenience (for others, not us).

Why is this a major issue with Ruby web applications?

Any other web framework providing seamless params binding depending on how the params keys are formatted are vulnerable for the same reasons but most (all?) people doing web development with Ruby these days will rely on Rack::Request somehow. And it will automatically convert your params to array if they are formatted like ?a[]=1&a[]=2 or hashes if they are formatted like ?a[x]=1&a[y]=2. This is built-in and you can’t change this behavior for your specific application. I mean, you could replace Rack::Utils.default_query_parser and implement parse_nested_query as parse_query for your own custom parser but then that would apply to other Rack apps mounted in your app (think of Sidekiq web, for example) and you don’t know whether or not they’re relying on such conveniences.

How to improve things

I’ve been bothered by the inconvenience of having to add .to_s to all string params (in name of providing more convenience, which is ironic anyway) for many reasons, and wanted a more convenient way of accessing params safely for years. As you can see, what is convenient to some can be inconvenient to others. But that would require a manual inspection in all controllers to review all cases where a param is fetched from the request. I wasn’t that much bothered after all, so I thought it wouldn’t worth the effort for such a big app.

Recently I noticed Rack recently deprecated Rack::Request#[] and I used it a lot as not only it was more convenient calling request[‘name’] instead of request.params[‘name’] but most examples in Roda’s README used that convenient #[] method (the examples were updated after it was deprecated). Since eventually I’d have to fix all usage of such method, and once they were used all over the places in our Roda apps (think of controllers - we use the multi_run plugin), I decided to finally take a step further and fix the old problem as well.

Fetching params through an specialized safer class

Since I realized that it wouldn’t be possible to make Rack parse queries in a more simpler way, I decided to build a solution that would wrap around Rack parsed params. For a Roda app, like ours, writing a Roda plugin for that makes perfect sense, so this is what I did:

1	# apps/plugins/safe_request_params.rb
2	require 'rack/request'
3	require 'json'
4
5	module AppPlugins
6	module SafeRequestParams
7	class Params
8	attr_reader :files, :arrays, :hashes
9
10	def initialize(env: nil, request: nil)
11	request \|\|= Rack::Request.new(env)
12	@params = {}
13	@files = {}
14	@arrays = {}
15	@hashes = {}
16	request.params.each do \|name, value\|
17	case value
18	when String then @params[name] = value
19	when Array then @arrays[name] = value
20	when Hash
21	if value.key? :tempfile
22	@files[name] = UploadedFile.new value
23	else
24	@hashes[name] = value
25	end
26	end # ignore if none of the above
27	end
28	end
29
30	# a hash representing all string values and their names
31	# pass the keys you're interested at optionally as an array
32	def to_h(keys = nil)
33	return @params unless keys
34	keys.each_with_object({}) do \|k, r\|
35	k = to_s k
36	next unless key? k
37	r[k] = self[k]
38	end
39	end
40
41	# has a string value for that key name?
42	def key?(name)
43	@params.key?(to_s name)
44	end
45
46	def file?(name)
47	@files.key?(to_s name)
48	end
49
50	# WARNING: be extra careful to verify the array is in the expected format
51	def array(name)
52	@arrays[to_s name]
53	end
54
55	# has an array value with that key name?
56	def array?(name)
57	@arrays.key?(to_s name)
58	end
59
60	# WARNING: be extra careful to verify the hash is in the expected format
61	def hash_value(name)
62	@hashes[to_s name]
63	end
64
65	# has a hash value with that key name?
66	def hash?(name)
67	@hashes.key?(to_s name)
68	end
69
70	# returns either a string or nil
71	def [](name, nil_if_empty: true, strip: true)
72	value = @params[to_s name]
73	value = value&.strip if strip
74	return value unless nil_if_empty
75	value&.empty? ? nil : value
76	end
77
78	def file(name)
79	@files[to_s name]
80	end
81
82	# raises if it can't convert with Integer(value, 10)
83	def int(name, nil_if_empty: true, strip: true)
84	return nil unless value = self[name, nil_if_empty: nil_if_empty, strip: strip]
85	to_int value
86	end
87
88	# converts a comma separated list of numbers to an array of Integer
89	# raises if it can't convert with Integer(value, 10)
90	def intlist(name, nil_if_empty: true, strip: nil)
91	return nil unless value = self[name, nil_if_empty: nil_if_empty, strip: strip]
92	value.split(',').map{\|v\| to_int v }
93	end
94
95	# converts an array of strings to an array of Integer. The query string is formatted like:
96	# ids[]=1&ids[]=2&...
97	def intarray(name)
98	return nil unless value = array(name)
99	value.map{\|v\| to_int v }
100	end
101
102	# WARNING: be extra careful to verify the parsed JSON is in the expected format
103	# raises if JSON is invalid
104	def json(name, nil_if_empty: true)
105	return nil unless value = self[name, nil_if_empty: nil_if_empty]
106	JSON.parse value
107	end
108
109	private
110
111	def to_s(name)
112	Symbol === name ? name.to_s : name
113	end
114
115	def to_int(value)
116	Integer(value, 10)
117	end
118
119	class UploadedFile
120	ATTRS = [ :tempfile, :filename, :name, :type, :head ]
121	attr_reader *ATTRS
122	def initialize(file)
123	@file = file
124	@tempfile, @filename, @name, @type, @head = file.values_at *ATTRS
125	end
126
127	def to_h
128	@file
129	end
130	end
131	end
132
133	module InstanceMethods
134	def params
135	env['app.params'] \|\|= Params.new(request: request)
136	end
137	end
138	end
139	end
140
141	Roda::RodaPlugins.register_plugin :app_safe_request_params, AppPlugins::SafeRequestParams

Here’s how it’s used in apps (controllers):

1	require_relative 'base'
2	module Apps
3	class MyApp < Base
4	def process(r) # r is an alias to self.request
5	r.post('save'){ save }
6	end
7
8	private
9
10	def save
11	assert params[:name] === params['name']
12	# Suppose a file is passed as the "file_param"
13	assert params['file_param'].nil?
14	refute params.file('file_param').tempfile.nil?
15	p params.files.map(&:filename)
16	p params.json(:json_param)['name']
17	p [ params.int(:age), params.intlist(:ids) ]
18	assert params['age'] == '36'
19	assert params.int(:age) == 36
20
21	# we don't currently use this in our application, but in case we wanted to take advantage
22	# of the convenient query parsing that will automatically convert params to hashes or arrays:
23	children = params.array 'children'
24	assert params['children'].nil?
25	user = params.hash_value :user
26	name = user['name'].to_s
27
28	# some convenient behavior we appreciate in our application:
29	assert request.params['child_name'] == ' '
30	assert params['child_name'].nil? # we call strip on the values and convert to nil if empty
31	end
32	end

An idea for those wanting to expand the safeness of the Params class above to the unsafe methods (json, array, hash_value) one could implement it in such a way that any hashes would be wrapped in a Params instance. However they should probably consider more specialized solutions in those cases, such as dry-validation or surrealist.

Final notes

In web frameworks developed in static languages this isn’t often a common reason for vulnerability because it’s harder to implement solutions like the one adopted by Rack as one would have to use some generic type such as Object for mappings params keys to their values, which is usually avoided in typed languages. Also, method signatures are often more explicit which prevents an specially crafted param to be interpreted as being of a different type than expected by methods. This is even more true in languages that don’t support method overloading, such as Java.

That’s one of the reasons I like the idea of introducing optional typing to Ruby, as I once proposed. I do like the flexibility of Ruby and that’s one of the reasons why I often preferred script languages over static ones for general purpose programming (I used to do Perl programming in my initial days when developing to the web).

But if Ruby was flexible enough to also allow me to specify optional typing, like Groovy does, it would be even better in my opinion. Until there, even though I’m not an security expert by any means, I feel like the recent changes on how our app fetch params from the request should significantly reduce the possibility of introducing bugs caused by params injection in general.

After all, security is already a quite complex topic to me and I don’t even want to have to think about what would be the impact of doing something like MyModel.where(username: params[‘username’]) and have to think what could possibly go wrong if someone would inject some special array or hash in the username param. Security is already hard to get it right. No need to make it even harder by providing automatic params binding through the same method out of the box in the name of convenience.

Ruby on Rails: the Bad and Good parts

2017-05-04T20:00:00+00:00

In my previous article, I had a hard time trying to explain why I wanted to replace Rails with something else in the first place. This article is my attempt to write more specifically about what I dislike in Rails for the purpose of the single page application we maintain.

In summary, in the previous article I explained that I preferred to work with more focused and independent libraries, while Rails prefers to adopt a somewhat integrated and highly coupled solution, which is a fine approach too. There are trade-offs involved with either approach and I won’t get into the details for this article. As I said in my previous article this is mostly about developer’s personal taste and mindset, so by no means I ever wanted to bash on Rails. Quite the opposite. Rails served me pretty well for a long time and I could live with it for many more years, so getting it out of our stack wasn’t an urgent matter by any means.

For the purpose of this article, I won’t discuss the Good and Bad of Ruby, since it was mainly written to explain why choosing another Ruby framework instead of Rails.

In case you didn’t read the previous article, the kind of application I work with is a single page application, so keep this in mind when trying to understand my motivations for replacing Rails.

Unused Rails features

So, here are some features provided by Rails which I didn’t use when I took the decision to remove Rails from our stack:

ActiveRecord (used Sequel instead);
Turbolinks (it doesn’t make much sense for the kind of SPA we build);
YAML configuration files (we use regular Ruby files for configuration);
minitest or test/unit (used RSpec instead);
fixtures (used factories instead);
Devise (we have a very particular authentication strategy and authentication frameworks wouldn’t add much to the table);
we have just a handful views and forms rendered by Rails (most are generated with JS);
REST architecture (we deal with very specific requests rather than generic ones over common resources, which translates to specialized queries that run very quickly without having to resort to complicated caching strategies for most cases in order to get fast responses);
responds_to (most requests will simply respond with JSON);
Sprockets, also known as the Rails Assets Pipeline (not sure if this holds true after Rails 5.1 added integration to Webpack);
generators (I don’t use them for a long time because they aren’t really needed and it’s pretty quick and easy to add new controllers, models, mailers or tests manually);

So, for a long while I have been wondering how exactly Rails was helping us to build and maintain our application. The application was already very decoupled from Rails and its code didn’t rely on ActiveSupport core extensions either. We tried to keep our controllers thin, although there’s still quite some work to do before we get there.

On the other side, there were a few times I had trouble trying to debug some weird problems after upgrading Rails and it was I nightmare when I had to dig into Rails' source code and I wasted a lot of time in the process, so I did have a compelling reason to not stick with Rails. There were other parts I disliked in Rails, which I describe in the next section.

The Bad Parts

can’t upgrade individual parts, it’s all or nothing. If you’re using ActiveRecord, for example you’re forced to upgrade all Rails parts if you want to upgrade ActiveRecord to get support for some feature. Or the opposite: you might want to upgrade just the framework to get ActionCable support for example, but then you’d have to fix all deprecated usage from your ActiveRecord usage in the process;
hard to follow code base, when debugging edge cases, which makes it hard to estimate tasks involving debugging weird issues that happened after upgrading Rails for example;
buggy streaming support through ActionController::Live (had to work around them many times after upgrading Rails). Try to read its source to understand how it works and you’ll understand when I say its implementation is quite complicated;
occasional dead-locks, specially when ActionController::Live was used. That’s why those few actions were the first one I moved out of Rails;
ActiveSupport::Dependencies: implicit autoloading and their problems. You must require full action_view even if you only need action_view/helpers/number_helper for example;
monkey patches to Ruby core classes and methods pollution (it’s my opinion that libraries shouldn’t freely patch core Ruby classes except for very exceptional cases such as code instrumenting, implementing a transparent auto-reloading tool and so on, and should be avoided whenever possible);
automatic/transparent params binding (security concerns, I often wrote code such as param[:text].to_s because I didn’t want to get a hash or an array when accessing some param because they were injected by some malicious request taking advantage of Rails automatic params binding rules);
slow to boot when compared to other Ruby frameworks (more of a development issue), spring is not perfect and shouldn’t be required in the first place;
increased test load time, which is quite noticeable when running individual tests;
the API documentation is incomplete. The guides are great though, but often I wasted a lot of time trying to look for the documentation of some parts of the API;
lack of full understanding of the boot process and requests cycle;
I won’t get into the many details why I don’t like ActiveRecord because I don’t use it for several years and it’s not a requirement to use Rails, but if you’re curious I wrote an article comparing it to Sequel long ago. My main annoyance with ActiveRecord is related to its pooling implementation and its ability to checkout a connection from the pool outside of a block that would ensure it’s checked in again into the pool;

The Good Parts

Rails is still great as an entrance framework for beginners (and some experts as well). Here are the good parts:

handles static resources (assets in Rails terminology) bundling and integrates with Webpack out of the box;
good safe default HTTP headers;
CSRF protection by default;
SQL injection protection in bundled ActiveRecord by default;
optimizations to traditional web pages through Turbolinks;
bin/console and great in-site debugging with the web-console gem bundled by default in development mode;
separate configuration per environment (development/production/test) with good defaults;
e-mail integration;
jobs integration;
integrated database migrations;
great automatic code reloading capabilities in the development environment (as long as you stick with Rails conventions and don’t specify your dependencies manually);
fast to boot (when comparing to frameworks in other languages, such as Java);
awesome guides and huge community to ask your questions and get an answer very quickly;
great community and available gems for all kind of tasks;
very much audited by security experts and any discovered issues are quickly fixed and new releases are made available with responsible disclosure;
Github issues are usually quickly fixed;
Rails source code has an extensive test coverage;
provide tons of generators, including test, models, controllers, for those who appreciate them;
provides great performance-related data in the application’s logs (time spent rendering views and partials and in the database);
highly configurable;
internationalization support;
helpful view helpers such as number and currency formatting;
a big team of active maintainers and contributors;
easy websockets API through ActionCable;
flexible routing;
bundles with test runners solutions for both Ruby-land tests and full-feature tests through Capybara (it still lacks an integrated bundled JavaScript test runner though);
there are probably many more great features I can’t remember out of my head because I didn’t use myself such as RESTful resources and so on;
conventions such as paths organizations help a lot teams with lots of developers and frequent turnovers, and when hiring new members in general, or when handing the project to someone else and the like. By knowing Rails conventions, when joining an existing Rails application for the first time the newcomer will know exactly where to find controllers, models, views, workers, assets, mailers, tests and so on. It’s also very likely they will be used with many gems commonly used altogether with Rails.

So, Rails is not only a framework but a set of good practices (among a set of questionable practices that will vary accordingly to each one’s taste) bundled together as well. It’s not the only solution trying to provide a solid ground for web developers though. Another similar solution with similar goals seems to be Hanami for example, although Rails seems to be more mature to me. For example, I find code reloading to be a fundamental part of developing web applications and Hanami doesn’t seem to provide a very solid solution that would work across different Ruby implementation such as JRuby for example, accordingly to these docs.

But overall, I still find Rails to be one of the best available frameworks for developing web applications. It’s just that for my personal tastes and mindset I’m more aligned to something like Roda than to something like Rails but one should understand the motivations behind one’s decisions in order to figure out by themselves which solution works best for their own taste rather than expecting some article to tell you what is the Right Solution ™.

Feeling alone in the Ruby community and replacing Rails with Roda

2017-05-13T10:10:00+00:00

Background - the application size

Feel free to skip to the next section if you don’t care about it.

I recently finished moving a 5 years old Rails application to a custom stack on top of Roda from Jeremy Evans, also the maintainer of the awesome Sequel ORM. The application is actually older than that and I’ve been working on it for 6 years. It used to be a Grails application that was moved from SVN to Git about 7 years ago but I never had access to the SVN repository so I don’t really know how old this application is. It was completely migrated from Grails to Rails in 2013. And these days I replaced Rails with Roda but this time it was painless and only took a few weeks.

I have some experience with replacing the technology of an existing application without interrupting the regular development flow and deployment procedures and the only times I really had to interrupt the services for a little while was the day I replaced MySql with PostgreSQL and the day I moved the servers from collocation to Google Cloud Platform.

I may write about what steps I usually follow when changing the stack (I replaced Sprockets with Webpack a few years ago among, Devise with a custom solution, among many examples) in another article. But the reason I’m describing this scenario for this article’s purpose is only so that you have some raw idea about this project size, specially if you consider it had 0 tests when I joined the company as the sole developer and had to understand a messy Grails application with tons of JS embedded in GSP pages with functions comprising hundreds of lines with many many logical branches inside. Years later and there are still tons of tests lacking, specially in the front-end code and much more to improve. To give you a better idea, we currently have about 5k lines of Ruby test code, and 20k lines of other custom (not generated) Ruby code plus 5k lines of database migrations code. Besides that we have about 11k lines of CoffeeScript code, 6k lines of JS code and 2.5k lines of CoffeeScript tests code. I’m not including any external libraries in those stats. You have probably noticed already how poor is the test coverage currently, specially in the front-end. At this point I expect you to have some raw idea on this project size. It’s not a small project.

Why replacing Rails in the first place?

Understanding this section is definitely the answer on why I feel alone in the Ruby community.

More background about Rails and the Ruby community

Again, feel free to skip this subsection.

When I was working on my Master thesis (Robotics, Electrical Engineering) I stopped working with web development for a while and focused on embedded C programming, C++ hard real-time systems and the like. After I finished the Master thesis my first job was back to Delphi programming. Only in 2007 I moved my job back to web development, several years later and I only had experience with Perl so far. After a lot of research I decided for Rails and Ruby, although I have also seriously considered TurboGears and Django by that time, both using the Python language. I wasn’t worried by the language by that time as I didn’t know either Ruby or Python and they seemed similar one to the other. Ultimately I chose Rails because of how it handled database migrations.

In 2007, when looking at the alternatives, Rails was very appealing. There were conventions that would save me a lot of work when starting to work with web development again, there were generators to help me getting started, great documentation, it bundled a database migrations framework so that I wouldn’t have to recreate myself, simple to understand error stack-traces, good defaults for the production environment (such as proper 500 and 404 pages), great auto-reloading of code in the development environment, great logging, awesome testing tools and integrated to generators, quick boot, custom routes, convention over configuration and so on.

Last but not least, a very rich ecosystem with smart people working on great gems and learning Ruby together and they were all amazing by its meta-programming capabilities, the possibility of changing core classes through monkey patches and so on. And since it’s possible, we should use it in all places we can, right? Specific-domain-languages (SDL) were used by all popular gems by that time. And there wasn’t much fragmentation like in the Java community. Basically almost anyone writing web applications in Ruby were writing Rails apps and following its conventions. That allowed the community to grow fast, with several Rails plugins and projects assuming the application was running Rails. Most of us have only known Ruby because of Rails, including myself. This is already enough reason to thank DHH. Rails definitely raised the bar for other web frameworks.

As the ecosystem matured, we saw the rise of Rack and more people using what they called micro-frameworks such as the popular Sinatra, Merb among others. Rails improved internationalization support in version 2, merged with Merb in version 3, got Sprockets in version 4 and so on. The assets pipeline were really a thing when they were introduced in Rails by that time. It was probably the latest really big change introduced by Rails that really inspired the general web development scenario.

In the meantime Ruby has also evolved a lot, providing better unicode support, adding a new Hash syntax, garbage collecting symbols, improving performance and getting new great tools such as Bundler. RubyGems got a better API, the Rails guides got much better and they have a superb documentation on securing web applications that is accessible to any web developer and not only Rails ones. We have also seen lots of books and courses teaching the Rails way, as well as many dedicated blogs, videos, conferences and so on. I don’t remember watching such a fast growing in any other community until JavaScript got a lot of traction recently, motivated not only by single page applications which are becoming more and more common, but also by the creation of Node.js.

Many more languages have been created or re-discovered recently including Go, Elixir, Haskell, Scala, Rust and many many more. But up to this day, despite the existing of symbols and a poor threading model in MRI and lack of proper support for threaded applications in stdlib, Ruby is still my preferred general purpose language. That includes web applications. What about Rails?

Enough is enough! What’s wrong with Rails?

If you guessed performance was the reason, you guessed wrong. For some reason I don’t quite understand, developers seem to be obsessed by performance even in scenarios where it doesn’t matter. I never faced server-side performance issues with Rails. Accordingly to NewRelic most requests would be served by less than 20ms in the server-side. Even if we could cut those 20ms it wouldn’t make any difference at all. So, what’s wrong after all?

There’s nothing wrong with Rails in a fundamental way. It’s a matter of taste in my case I guess because it’s really hard to find an objective way to explain why I wasn’t fully satisfied with Rails. You should probably understand that this article is not about bashing on Rails in any way. It’s a personal point of view on why I feel like a strange and why it’s not a great feeling. [Update: after writing this article, I spent some time trying to list the parts I dislike in Rails and wrote a dedicated article about it, which you can read here if you’re curious]

To help you understand where I come from, I have never followed the “Rails Way” if there’s such a thing. I used jQuery when Prototype was the default library, I used RSpec when test/unit was the default one, I used factories when Rails teached fixtures, I used Sequel rather than the bundled ActiveRecord, but instead of Sequel’s migrations I used ActiveRecord’s migration through the active_record_migrations gem. Some years ago I replaced Sprockets with Webpack (which fortunately Rails just embraced in Rails 5.1 release, while I wasn’t using Rails anymore when it was released). After some frustration trying to get Devise to work well with Sequel I decided to replace Devise with a custom solution (previously I had to customize Devise a lot to make it support our non-traditional integration for dealing with sign-ins and custom password hashing inherited by the time it was written in Grails).

Since we’re talking about a single page application, almost all of the requests were JSON ones. We didn’t embrace REST, or respond_to, we had very few server-side views and often had to dig into Rails or Devise source code to try to understand why something wasn’t working as we expected them to. That included several problems we had with streamed responses (which Rails calls Live Streaming for some reason I don’t quite follow, although I suspect that’s because they introduced some optimizations to start sending the view’s header sooner and called it streaming support, so they needed another name when they introduced ActionController::Live) after each major Rails upgrade. I used to spend a lot of time trying to understand Rails internal source whenever I had to debug such problems. It was pretty confusing to me. The same happened with Devise.

At some point I started to ask myself what Rails was adding to the table. And it got worse. When I first met Rails it booted in no time. It got slower to boot at each new release and then they introduced complex solutions such as spring to try to fix this slowness. For a long time they used (and still use to this day) Ruby’s autoload feature to lazily evaluate code as it’s needed in order to decrease the boot time. Matz don’t like autoload and I don’t like it either, but this article is already long enough to discuss this subject too.

Something I never particularly enjoyed in Rails was all that magic related to auto-loading. I always preferred explicit and simple code over sophisticated code that auto-wires things. As you can guess, even though I loved how Rails booted quickly and how auto-reloading just worked with Rails (except when it didn’t - more on that later) I really wanted to specify all my dependencies explicitly in each file. But I couldn’t just use require or auto-reloading would stop working. I had to use ActiveSupport’s require_dependency and I hated it because it wasn’t just regular Ruby code.

I also didn’t like the fact that Rails enforced all monkey patches to Ruby core classes made by ActiveSupport extensions, introducing methods such as blank?, present?, presence, try, starts_with?, ends_with? and so on. That’s related to the fact I enjoy explicit dependencies as I think it’s much easier to follow a code with explicit dependencies.

So, one of my main motivations to get rid of Rails was to get rid of ActiveSupport, since Rails depends on ActiveSupport, including its monkey patches and auto-loading implementation. Replacing Rails with Roda alone didn’t allow me to get rid of ActiveSupport just yet as I’ll explain later in this article, but it was an important first move. What follows is the kind of frustration with the Ruby community in the sense of how very popular Ruby gems are written with about the same mentality of those from Rails core. Such gems include ~~the very popular mail gem as well as~~ FactoryGirl, for example. ~~Even Sidekiq will patch Ruby core classes.~~ I’ll talk more about this later, but let me introduce Roda first.

[Update: after writing this article both the mail and sidekiq gems have worked to remove their monkey patches and I’d like to congratulate them for the effort and give them “Thank you so much!”]

Why Roda?

From time to time I considered replacing Rails with something else but I always gave up for a reason or another. Sometimes I realized I liked Sprockets and the other framework didn’t provide an alternative to the Rails Assets Pipeline. Another time I realized that auto-reloading didn’t work great with the other framework. Other times I didn’t like the way code was organized with the other framework. When I read Jeremy’s announcement for Roda, it was just the right time with the right framework for me.

I greatly appreciate Jeremy from a long time since getting introduced to Sequel. He’s a lovely person, who provides awesome and kind support and he’s a great library designer. Sequel is simply the best ORM I’ve seen so far. Also, I find it quite simple to follow Sequel’s code base and after looking into Roda’s source it’s pretty much trivial to follow and understand. It’s basically one simple source file that handles routing and plugins support and basically everything else is provided by plugins you can opt-in/out and each plugin, being small and self contained, is pretty simple to understand and if you don’t agree with how it’s implemented just implement that part your own.

After having a glance over the core Roda plugins one stood out particularly: multi_run. For what I want, this plugin would give me great organization, similar to Rails controllers, with the advantage that they could have their own middleware stacks, they could be mounted anywhere, including in a separate app, they were easy to test separately as if they were a single app if desired but more importantly: it allowed me to easily lazy load the application code, which allowed the application to boot instantly with Puma, without the need of autoload and other trickery. Here’s an example:

1	require 'roda'
2	module Apps
3	class MainApp < Roda
4	plugin :multi_run
5	# you'll probably want other plugins, such as :error_handler and :not_found,
6	# or maybe error_email
7
8	def self.register_app(path, &app_block)
9	->(env) do
10	require_relative path
11	app_block[].call env
12	end
13	end
14
15	run 'sessions', register_app('sessions_app'){ SessionsApp }
16	run 'static', register_app('static_app'){ StaticApp }
17	run 'users', register_app('users_app'){ UsersApp }
18	# and so on
19	end
20	end

Even if you decide to load the main application when testing particular apps, the overhead would be negligible, since it would only load the tested app basically. And if you are afraid of using lazy loading in the production environment because you want to deliver a warmed app, it’s quite easy to change register_app:

1	require 'roda'
2	module Apps
3	class MainApp < Roda
4	plugin :multi_run
5	plugin :environments
6
7	def self.register_app(path, &app_block)
8	if production?
9	require_relative path
10	app_block[]
11	else
12	->(env) do
13	require_relative path
14	app_block[].call env
15	end
16	end
17	end
18
19	run 'sessions', register_app('sessions_app'){ SessionsApp }
20	# and so on
21	end
22	end

This is not just a theory, this is how I implemented in our application and it boots in less than a second. Just about the same as the simplest Rack app. Of course, I hadn’t really measured this in any scientific way, it’s a simple in-head count when running bundle exec puma, where most of the time is spent on Bundler and requiring Roda (about 0.6s with my gemset). No need for spring, autoload or any complicated code to make it fast. It just works and it’s just Ruby, by using explicit lazy loading rather than an automatic system.

So, I really wanted to try this approach and I had a plan where I would run both Roda and Rails stacks altogether for a while, by running the Rails app as the fallback app when the Roda stack wouldn’t match the route. I could even use the path_rewriter plugin to migrate a single action at a time to the Roda stack if I wanted to.

There was just one remaining issue I had to figure out how to solve before I started moving the app to the Roda stack: automatic code reloading. I decided to ask in the ruby-roda mail group how Roda handled code reloading and Jeremy said it was out of Roda’s responsibility and that I could choose any code reloader I wanted and pointed to some documentation listing some of them, including one of his own. I spent quite some time researching about them and still preferred the one provided by ActiveSupport::Dependencies but since I wanted to get rid of ActiveSupport and autoloading in the first place there was no point in keep using it. If you’re curious about this research, I wrote about it here. If you’re curious on why I dislike Ruby’s autoload feature, you’ll find the explanation in that article.

After some discussion around automatic code reloading in Ruby with Jeremy I suggested him an approach I think would work pretty well and transparently although it would require to patch both require and require_relative in development mode. Jeremy wasn’t much interested on it because of those monkey patches, but I was still confident it would be a better option than the others I had evaluated so far. I decided to give it a try and that’s how AutoReloader was born.

With the autoreloading issue solved, it was all set to start porting the app slowly to the Roda stack, and the process was pretty much a breeze. If you want to have some basic idea on Rails overhead, the full Ruby specs suite were about 2s faster with the same (converted) tests after getting rid of the last Rails bits. It used to take 10s to run 380 examples and thousands of assertions, and after getting rid of Rails it took 8s with an extra example. Upgrading Bundler saved me another half a second so currently it takes 7.6s to finish (about half a second for bundle exec, 1.5s to load accordingly to RSpec report and 5.6s to run).

But getting rid of Rails was just the first step in this lonely journal.

Rails is out, what’s next?

Getting rid of Rails wasn’t enough to get rid of ActiveSupport. We have a LocaleUtils class we use to format numbers among other utilities based on the user’s locale. It used to include ActionView::Helpers::NumberHelper, and by that time I learned the hard way that I couldn’t simply require 'action_view/helpers/number_helper' because I’d have problems related to ActiveSupport’s autoloading mechanism, so I had to fully require action_view. Anyway, since ActionView depends on ActiveSupport I wanted to get rid of it as well. As usual, after lots of wasted time searching for Ruby number formatting gems I decided to implement the formatting myself and a few hours later I got rid of ActionView.

But ActiveSupport was still there as a great warrior! This time it was dependency of… guess what? Yep, FactoryGirl! Oh, man :( After some research on alternative factory implementations I found Fabrication to be dependency free. An hour later I ported our factories to Fabrication and finally got rid of ActiveSupport! Yay, no more monkey patches to core Ruby classes! Right?

Well, not exactly… :( The monkey patch culture is deeply rooted in Ruby’s community. ~~Some very popular gems add monkey patches, such as the mail gem, or sidekiq.~~ While reading the mail gem source I found it very confusing, so I decided to replace it with something simpler. We use exim4 to forward e-mails to Amazon SES, so Ruby’s basic NET/SMTP support is enough for delivering e-mails to Exim, all I needed was a MIME mail formatter in order to send simple TEXT + HTML multi-part mail to users. After some more research I decided to implement it myself and this is how simple_mail_builder was born.

~~At some point I might decide to create my own simple jobs processor just to get rid of Sidekiq’s monkey patches, but~~ my point is that I have this feeling of being a lonely warrior fighting a lost battle because of my expectations mismatch with what the Ruby community overall consider acceptable practices such as modifying Ruby core classes in libraries. I agree it’s okay for instrumenting code, such as NewRelic, to patch other’s code, but for other use cases I don’t really agree with such approach.

In one hand I really love the Ruby language, except for some few caveats, but there’s a huge mismatch with the Ruby community way of writing Ruby code, and this is a big thing. I don’t really know what’s the situation in other language communities, so I guess I might be a lonely warrior in any other language I opted for instead of Ruby, but Ruby is the only language I really appreciate so far among those I’ve worked with.

I guess I should just stop dreaming about the ideal Ruby community and give up on trying to get a monkey-patch free web application…

At least, I can now easily and happily debug anything that happens to the application without having to spend a lot of time digging into Rails or Devise’s source code, which used to take me a lot of time. Everything’s clean water. I have tons of flexibility to do what I want in no time with the new stack. The application boots pretty quickly and I’ll never run into edge cases involving ActiveSupport::Dependencies auto-reloading again. Or issues involving ActionController::Live. Or Devise issues when using Sequel as the ORM.

Ultimately I feel like I got full control over the application and that’s simply priceless! It’s an awesome feeling of freedom I never experienced before. Instead of focusing on the lonely warrior fighting a lost battle bad feeling, I’ll try concentrate on those great benefits from now on.

Using RSpec Nested Transactions to speed up tests touching the database

2016-08-08T13:05:00+00:00

TLDR: This article proposes savepoints to implement nested transactions, which are supported by PostgreSQL, Oracle, Microsoft SQL Server, MySQL (with InnoDB but I think some statements would automatically cause an implicit commit, so I’m not sure it works well with MySQL) and other vendors, but not by some vendors or engines. So, if using savepoints or nested transactions are not possible with your database most likely this article won’t be useful to you. Also, not all ORM provide support for savepoints in their API. I know Sequel and ActiveRecord do. It also provides a link on how to achieve the same goal with Minitest.

I’ve been feeling lonely about my take on tests for a long time. I’ve read many articles on tests in the past years and most of them, not only in the Ruby community, seem to give us the same advices. Good advices by the way. I understand the reasoning about them but I also understand they come with trade-offs and this is where I feel kind of lonely. All articles I’ve read and some people that have worked with me have tried to convince me that I’m just plain wrong.

I never cared much about this but I never wrote about it either as I thought no one would be interested in learning about some techniques I’ve been using for quite some years to speed up my tests. Because it seems everything would simply tell me I’d go to hell for writing tests this way.

A few weeks ago I read this article from Travis Hunter which reminded me of an old TO-DO. More importantly, it made me realize I wasn’t that lonely in thinking the way I do about tests.

“Bullshit! I came here because the titles said my tests would be faster, I’m not interested in your long stories!”. Sure, feel free to completely skip the next section and go straight to the fun section.

Background

I graduated in Electrical Engineering after 5 years in the college. Then more two years working on my master thesis on hard real-time systems towards mobile robotics. I think there are two things which engineers in general get used to after a few years in the college. Almost everything involves trade-offs and one of the most important jobs of an engineering is to identify them and choose the one they consider to have the best cost benefit. The other one is related to the first one in knowing that some tools will better fit a set of goals. I mean, I know this is also understood by CS and similar graduated people, but I have this feeling it’s not as strong in general in those areas as I observe in some (electrical/mechanical/civil) engineers.

When I started using RSpec and Object Daddy (many of you may only know Factory Girl these days), a popular factory tool by that time, I noticed my suite would take almost a minute for just a few examples touching the database. That would certainly slow me down as I would have to add many more tests.

But I felt really bad when I complained about that once in the RSpec mailing list and David Chemlinsky mentioned about taking 54s to run a couple of hundred examples when actually I had only 54 examples in my suite by that time.

And it felt even worse when I contributed once to Gitorious and noticed that over a thousand examples would finish in just a few seconds, even though lots of them didn’t touch the database. Marius Mathiesen and Christian Johansen are very skilled developers and they were the main Gitorious maintainers by that time. Christian is the author of the popular Sinon.js, one of the authors of the great Buster.js and author of the Test-Driven JavaScript Development book.

For that particular application, I had to create a lot of records in order to create the record I needed to test. And I was recreating them on every single test requiring such record, through Object Daddy but I suspect the result would be about the same with FactoryGirl or any other factory tool.

When I realized that creating lots of records in the database was that expensive, I stopped following the traditional advises for writing tests and only worried about what I really cared for which remains basically the same to these days.

These are my test goals:

ensure my application works (the main goal by far);
avoid regressions (linked to the previous one);
the suite should run as fast as possible (just a few seconds if possible);
it should give me enough confidence to allow me to completely change the implementations during any refactoring without completely breaking the tests; To me that means avoid mocking or stubbing objects and performing HTTP requests against a real server for testing things like cookie-based sessions and a few other scenarios (rack_toolkit allows me to create such tests while still being fast).

These are not my test goals at all:

writing specs in such a way they would serve as a documentation. I really don’t care how the output looks like when I run a single test file with RSpec. That’s also the reason why I never used Cucumber. Worrying about this adds more complexity and I don’t think they are useful for documentation purposes anyway;
each example should have a single expectation. I simply don’t see much value on this and very often this has the potential of slowing down the test suite;
tests should be independent from each other and ideally we should run them in random order. I understand the reasoning behind this and I actually find it useful and see value in it. But if I see trade-offs I’d trade test-independence by speed. Fortunately this is not required by my tests touching the database using the technique I demonstrate in the next section, but it may speed up some request tests.

I even wrote my own JavaScript test runner because I needed one that allowed me to run my tests in the specified order, supported IE6 (by that time) and beforeAll and I couldn’t find any by that time. My application used to register some live events on document and would never unregister them because it was not necessary, so my test suite would only be allowed to initialize it once. Also, recreating a tree on every test would take a lot of time, so I wanted to run a set of tests that would work on the same tree based on the result of previous tests.

I was okay with that trade as long my tests would run fast, but JavaScript test runners authors wouldn’t agree, so I created OOJSpec for my needs. I never advertised it because I don’t consider it to be feature complete yet, although it suites my current needs. It doesn’t currently support running a single test because I need to think in some way to declare a test’s dependencies (in other tests) so that those dependent tests would also be run before the requested one. Also, maintaining a test runner is not trivial and since it’s currently hard for me to find time to review patches I preferred not to announce it. Since I can run individual test files, it’s working fine for my needs, so I don’t currently have much motivation to further improve it.

A fast approach to speed up tests touching the database

A common case while testing some scenarios is that one wants to write a set of tests that exercise about the same set of records. Most people nowadays are using either one of the two common approaches:

creating the records either manually (through the ORM usually) or through factories;
loading fixtures (which are usually faster than creating them using factories);

Loading specific fixtures before each context wouldn’t be significantly faster than using a factory when using a competent factory and ORM implementations, so some will simply use DatabaseCleaner with the truncate strategy to delete all data before the suite starts and loading the fixtures to the database. After that usually each example would run inside a transaction that would be rolled back which is usually much faster than truncating and reloading the fixtures.

I don’t particularly like fixtures because I find them to make tests more complicated to write and understand. But I would certainly consider them if they would make my tests significantly faster. Also, nothing prevents us from using the same fixtures approach with factories as we could also use the factories to populate the initial data before the suite starts, but the real problem is that writing tests would still be more complicated in my opinion.

So, I prefer to think about solutions that allows tests to remain fast even when using factories. Obviously that means that we should find some way to avoid recreating the same records for a given group since the only way to speed up a suite that takes a lot of time creating records in the database is to reduce the amount of time spent in the database creating those records.

There are other kind of optimizations that would be interesting to try but that it’s probably complicated to implement as it would probably require a change in FactoryGirl API to allow such optimizations. For example, rather than sending one statement at a time to the database I guess it would be faster to send all of them at once. However I’m not sure it would be that much faster if you are using a connection pool (usually a single connection in the test environment) that keeps the connection open and you’re using a local database.

So, let’s talk about the low-hang fruits which are also the best ones in this case. How can we reuse a set of records among a set of examples while still allowing them to be independent from each other?

The idea is to use nested transactions to achieve that goal. You begin a transaction during the suite start (or some context involving database statements) and then the suite will create a savepoint before a set/group of examples (a context in RSpec language) and rollback to that savepoint after the context finished.

Managing such savepoint names can be complex to implement on your own but if you are going this route anyway because your ORM doesn’t provide an easy API to handle nested transactions then you may not be interested in the rspec_nested_transactions gem I’ll present in the next section.

However with Sequel this is as easy as:

1	# The :auto_savepoint option will automatically add the "savepoint: true" option to inner
2	# transaction calls.
3	DB.transaction(auto_savepoint: true, savepoint: true, rollback: :always){ run_example }

With ActiveRecord the API works like this (thanks Tiago Amaro, for showing me the API):

1	ActiveRecord::Base.transaction(requires_new: true) do
2	run[]
3	raise ActiveRecord::Rollback
4	end

This will detect whether a transaction is already in place and use savepoints if it is or will issue a BEGIN to start the transaction. It will manage the savepoint names automatically for you and will even rollback it automatically when using the “rollback: :always” option. Very handy indeed. But in order to achieve this Sequel doesn’t provide methods such as “start_transaction” and “end_transaction”.

Why is this a problem? Sequel does the right thing by always requiring a block to be passed to the “transaction” method but RSpec does not support “around(:all)”. However Myron Marston posted a few years ago how to implement it using fibers and Sean Walbran created a real gem based on that article. You’d probably be interested in combining this with the well known strategy of wrapping each example in a nested transaction themselves.

If you feel confident that you will always remember to use “around(:all)” with a “DB.transaction(savepoint: true, rollback: :always){}” block whenever you want to create such a common set of records to be used inside a group of examples then the rspec_around_all gem may be all you need to implement that strategy.

Not only I find this bug prone (I could forget about the transaction block) I also bother to repeat this pattern every time I want to create a set of shared records.

There’s a caveat though. If your application creates transactions itself it should be aware of savepoints too (this is accomplished automatically when using Sequel provided you use the :auto_savepoint option in the outmost transaction) even if BEGIN-COMMIT is enough out of the tests, so that it works as expected in combination with this technique. If you are using ActiveRecord, that means using “requires_new: true”.

If you are using Sequel or ActiveRecord and PostgreSQL, Oracle, MSSQL, MySQL (with InnoDB) or any other vendor supporting nested transactions and have full control over the transaction calls, implementing this technique can speed up your suite a lot with regards to the tests touching the database. And rspec_nested_transactions will make it even easier to implement.

Let the fun begin: introducing rspec_nested_transactions

I’ve released today rspec_nested_transactions which allows one to run all (inner) examples and contexts inside a transaction (usually a database transaction) with a single configuration:

1	require 'rspec_nested_transactions'
2
3	RSpec.configure do \|c\|
4	c.nested_transaction do \|example_or_group, run\|
5	(run[]; next) unless example_or_group.metadata[:db] # or delete this line if you don't care
6	# with Sequel, assuming the database is stored in DB:
7	DB.transaction(auto_savepoint: true, savepoint: true, rollback: :always, &run)
8
9	# with ActiveRecord (Oracle, MSSQL, MySql[InnoDB], PostgreSQL):
10	ActiveRecord::Base.transaction(requires_new: true) do
11	run[]
12	raise ActiveRecord::Rollback
13	end
14	end
15	end

That’s it. I’ve been using a fork of rspec_around_all (branch config_around) since 2013 and it has always served me great since then and I never had to change it since then, so I guess it’s quite stable. However for a long time I considered moving it to a separate gem and remove the parts I didn’t actually use (like “around(:all)”). I always post-poned it but Travis' article reminded me about it and I thought that maybe others might be interested on this approach as well.

So, I improved the specs, cleaned up the code using recent Ruby features (>= 2.0 [prepend]) and released the new gem. Since the specs use the “<<~” heredoc it will only run on Ruby >= 2.3 but I guess it should work with all Ruby >= 2.0 (or even 1.9 I guess if you implement Module.prepend).

What about Minitest?

Jeremy Evans, the Ruby Hero who happens to be the maintainer of Sequel and creator of Roda, was kind enough to provide a link on how to achieve the save with Minitest) in the comments below. No need for Fibers in that case. Go check that out if you’re working with Minitest.

Final notes

Currently our application runs 364 examples (RSpec doesn’t report the expectations count, but I suspect it could be around a thousand) in 7.8s while many of them will touch the database. Also, when I started this Rails application I decided to give ActiveRecord another try since it had also included support for a lazy API when Arel was introduced, which I was already used to with Sequel. A week or two later I decided to move to Sequel after finding AR API quite limiting for the application’s needs. At that time I noticed that the tests finished considerably faster after switching from ActiveRecord to Sequel, so I guess Sequel has a lower overhead when compared to ActiveRecord and switching to Sequel could possibly help speeding up your test suite as well.

That’s it, I hope some of you would see value in this approach. If you have other suggestions (besides running the examples in parallel) to speed up a test suite, I’m always interested in speeding up our suite. We have a ton of code both in server-side and client-side and only part of them is currently tested and I’m always looking towards improving the test coverage which means potentially we could implement over 500 more tests (for both server-side and client-side) while I still want the test suite to complete in just a few seconds. I think the most hard/critical parts are currently covered in the server-side and it will be easier to test other parts once I’m moving the application to Roda (the client-side needs much more work to make it easier to test some critical parts). I would be really happy if both server and client-side suites would finish in within a second ;) (currently the client-side suite takes about 11s to complete - 204 tests / 438 assertions).

Introducing RackToolkit: a fast server and DSL designed to test Rack apps

2016-07-27T18:43:00+00:00

I started to experiment with writing big Ruby web applications as a set of smaller and fast Rack applications connected by a router using Roda’s multi_run plugin.

Such design allows the application to boot super fast in the development environment (and in the production environment too unless you prefer to eager load your code in production). Here’s how the design looks like (I’ve written about AutoReloader in another article):

1	# config.ru
2	if ENV['RACK_ENV'] == 'development'
3	require 'auto_reloader'
4	AutoReloader.activate reloadable_paths: [ 'apps', 'lib', 'models' ]
5	run ->(env) do
6	AutoReloader.reload! do
7	ActiveSupport::Dependencies.clear # avoid some issues
8	require_relative 'apps/main'
9	Apps::Main.call env
10	end
11	end
12	else
13	require_relative 'apps/main'
14	run Apps::Main
15	end
16
17	# apps/main.rb
18	require 'roda'
19	module Apps
20	class Main < Roda
21	plugin :multi_run
22	# other plugins and middlewares are added, such as :error_handler, :not_found, :environments
23	# and a logger middleware. They take some space, so I'm skipping them.
24
25	def self.register_app(path, &app_block)
26	# if you want to eager load files in production you'd change this method a bit
27	->(env) do
28	require_relative path
29	app_block[].call env
30	end
31	end
32
33	run 'sessions', register_app('session'){ Session }
34	run 'admin', register_app('admin') { Admin }
35	# other apps
36	end
37	end
38
39	# apps/base.rb
40	require 'roda'
41	module Apps
42	class Base < Roda
43	# add common plugins for rendering, CSRF protection, middlewares
44	# like ETag, authentication and so on. Most apps would inherit from this.
45	route{\|r\| process r }
46	private
47	def process(r)
48	protect_from_csrf # added by some CSRF plugin
49	end
50	end
51	end
52
53	# apps/admin.rb
54	require_relative 'base'
55	module Apps
56	class Admin < Base
57	private
58	def process(r)
59	super # protects from forgery and so on
60	r.get('/'){ "TODO Admin interface" }
61	# ...
62	end
63	end
64	end

Then I want to be able to test those applications separately and for some of them I would only get confidence if I tested against a real server since I would want them to handle with cookies or streaming and checking for some HTTP headers injected by the real server and so on. And I wanted to be able to write such tests that could run as quickly as possible.

I started experimenting with Puma and noticed it can start a new server really fast (like 1ms in my development environment). I didn’t want to add many dependencies so I decided to create some simple DSL over ‘net/http’ stdlib since its API is not much friendly. The only dependencies so far are http-cookie and Puma (WEBrick does not support full hijack support and it doesn’t provide a simple API to serve Rack apps either and it’s much slower to boot). Handling cookies correctly to keep the user session is not trivial so I decided to introduce the http-cookie dependency to manage a cookie jar.

That’s how rack_toolkit was born.

Usage

This way I can start the server before the test suite starts, change the Rack app served by the server dynamically, and stop it when the suite finishes (or you can simply start and stop it for each example since it boots really fast). Here’s a spec_helper.rb you could use if you are using RSpec:

1	# spec/spec_helper.rb
2	require 'rack_toolkit'
3	RSpec.configure do \|c\|
4	c.add_setting :server
5	c.add_setting :skip_reset_before_example
6
7	c.before(:suite) do
8	c.server = RackToolkit::Server.new start: true
9	c.skip_reset_before_example = false
10	end
11
12	c.after(:suite) do
13	c.server.stop
14	end
15
16	c.before(:context){ @server = c.server }
17	c.before(:example) do
18	@server = c.server
19	@server.reset_session! unless c.skip_reset_before_example
20	end
21	end

Testing the Admin app should be easy now:

1	# spec/apps/admin_spec.rb
2	require_relative '../../apps/admin'
3	RSpec.describe Admin do
4	before(:all){ @server.app = Admin }
5	it 'shows an expected main page' do
6	@server.get '/'
7	expect(@server.last_response.body).to eq 'TODO Admin interface'
8	end
9	end

Please take a look at the project’s README for more examples and supported API. RackToolkit allows you to get the current_path, referer, manages cookies sessions, provides a DSL for get, post and post_data on top of ‘net/http’ from stdlib, allows overriding the environment variables sent to the Rack app, simulating an https request as if the app was behind some proxy like Nginx, supports “virtual hosts”, default domain, performing requests to external Internet urls and many other options.

Future development

It currently doesn’t provide a DSL for quickly access elements from the response body, filling in forms and submitting them, but I plan to work on this once I need it. It won’t ever support JavaScript though unless it would be possible at some point to do so without slowing it down significantly. If you want to work on such DSL, please let me know.

Performance

The test suite currently runs 33 requests and finishes in ~50ms (skipping the external request example). It’s that fast.

Feedback

Looking forward your suggestions to improve it. Your feedback is very welcomed.

AutoReloader: a transparent automatic code reloader for Ruby

2016-07-18T14:35:00+00:00

I’ve been writing some Roda apps recently. Roda doesn’t come with any automatic code reloader, like Rails does. Its README lists quite a few code reloaders that could be used with Roda but while converting a JRuby on Rails small application to Roda I noticed I didn’t really like any of the options. I’ve written a review about the available options if you’re curious.

I could simply use ActiveSupport::Dependencies since I knew it was easy to set up and worked mostly fine but one of the reasons I’m thinking about leaving Rails is the autoloading behavior of ActiveSupport::Dependencies and the monkey patches to Ruby core classes added by ActiveSupport as a whole. So, I decided to create auto_reloader which provides the following features:

just like Rack::Reloader it works transparently. Just use “require” and “require_relative”. To automatically track constants definitions one has to override them anyway and I can’t think of any reliable way to track top-level constants automatically without overriding those methods. However those methods are only overridden when AutoReloader is activated, which doesn’t happen in the production environment as it does with ActiveSupport::Dependencies. Those are the only monkey patches happening in development mode;
differently from Rack::Reloader, it will detect new top-level constants defined after a request and unload them upon reloading, preventing several issues caused by not doing that;
no monkey patches to core Ruby classes in production mode;
it can use the ‘listen’ gem as a file watcher to speed up the request when no reloadable files have been changed, in which case the application would respond almost as fast as in production environments, which is important when we are working on performance optimizations. It will use ‘listen’ by default when available but it can be opted out and it won’t make much difference unless, maybe, if some request would load many reloadable files;
it’s also possible to force reloading even if no loaded files have been changed. This could be useful if such files would load some non-Ruby configuration files and they have changed but the README provides another alternative to better handle those cases by using Listen to watch them and call AutoReloader.force_next_reload;
it doesn’t provide autoloading like ActiveSupport::Dependencies does;
it’s possible to configure a minimal delay time between two code reloading procedures;
it unloads all reloadable files rather than only the changed files as I believe this is a safer approach and the one also used by ActiveSupport::Dependencies;
reloadable files are those found in one of the reloadable_paths option provided to AutoReloader;
not specific to Rack application, but could be used with any Ruby application.

What AutoReloader does not implement:

autoloading of files on missing constants. Use Ruby’s “autoload” for that if you want;
it doesn’t provide a hook system to notify when some file is loaded like ActiveSupport::Dependencies does;
it doesn’t provide an option to specify load-once files. An option would be to place them in different directories and do not include them in the reloadable_paths option;
it doesn’t reload on changes to files other than the loaded ones, like JSON or YAML configuration files, but it’s easy to set up them as explained in the project’s README.

Usage with a Rack application

1	# app.rb
2	App = -> { [ '200', { 'Content-Type' => 'text/plain' }, [ 'Sample output' ] ] }
3
4	# config.ru
5	if ENV['RACK_ENV'] != 'development'
6	require_relative 'app'
7	run App
8	else
9	require 'auto_reloader'
10	# won't reload before 1s elapsed since last reload by default. It can be overridden
11	# in the reload! call below
12	AutoReloader.activate reloadable_paths: [ '.' ]
13	run -> (env) {
14	AutoReloader.reload! do
15	require_relative 'app'
16	App.call env
17	end
18	}
19	end

If you also want it to reload if the “app.json” configuration file has changed:

1	# app.rb
2	require 'json'
3	config = JSON.parse File.read 'config/app.json'
4	App = -> { [ '200', { 'Content-Type' => 'text/plain' }, [ config['output'] ] ] }
5
6	# append this to config.ru
7	require 'listen' # add the 'listen' gem to your Gemfile
8	app_config = File.expand_path 'config/app.json'
9	Listen.to(File.expand_path 'config') do \|added, modified, removed\|
10	AutoReloader.force_next_reload if (added + modified + removed).include?(app_config)
11	end

If you decided to give it a try and found any bugs please let me know.

A Review of Code Reloaders for Ruby

2016-07-18T15:15:00+00:00

When we are writing a service in Ruby, it’s super useful to have the ability to automatically change its behavior to conform the latest changes to the code. Otherwise we’d have to manually restart the server after each change. This would slow down a lot the development flow, specially if the application takes a while before it’s ready to process next request.

I guess most people using Ruby are writing web applications with Rails. Many don’t notice that Rails supports auto code reloading out of the box, through ActiveSupport::Dependencies. A few will notice it once they are affected by some corner case where the automatic code reloading doesn’t work well.

Another feature provided by Rails is the ability of automatic loading files if the application follows some conventions, so that the developer is not forced to manually require some code’s dependencies. Another benefit is that this behavior is similar to Ruby’s autoload feature, which purpose is to speed up the loading time of applications by avoiding to load files the application won’t need. Matz seems to dislike this feature and discouraged its usage 4 years ago. Personally I’d love to see autoload gone as it can cause bugs that are hard to track. However, loading many files in Ruby is currently slow even if simply loading them from disk would be pretty fast. So, I guess Ruby would have to provide some sort of pre-compiled files support before deprecating autoload so that we wouldn’t need it for the purpose of speeding up the start-up time.

Since automatic code reloading usually works well enough for Rails applications, most people won’t research about code reloaders until they are writing web apps with other frameworks such as Sinatra, Padrino, Roda, pure Rack, whatever.

This article will review generic automatic code reloaders, including ActiveSupport::Dependencies, but leaving specific ones out of the scope, like Sinatra::Reloader and Padrino::Reloader. I’ve not checked Ruby version compatibility of each one, but all of them work on latest MRI.

Rack::Reloader

Rack::Reloader is bundled with the rack gem. It’s very simple but it’s only suitable for simple applications in my opinion. It won’t unload constants, so if you remove some file or rename some class the old ones will still be available. It works as a Rack middleware.

One can provide the middleware a custom or external back-end, but I’ll only discuss the default one, which is bundled with Rack::Reloader, called Rack::Reloader::Stat.

Before each request it traverse $LOADED_FEATURES, skipping .so/bundle files and call Kernel.load on each file that has been modified since the last request. Since config.ru is loaded rather than required it’s not listed in $LOADED_FEATURES so it will be never reloaded. This means that the app’s code should live in another file required in config.ru rather than living directly in config.ru. It worth mentioning that because I’ve been bitten by this more than once while testing Rack::Reloader.

Differently from the Rails approach, any changed file will be reloaded even if you modify some gem’s source.

Rack::Reloader issues

I won’t discuss performance issues when there are many files loaded because one could provide another back-end able to track files changes very quickly and because there are more important issues affecting this strategy.

Suppose your application has some code like this:

1	require 'singleton'
2	class MyClass
3	include Singleton
4	attr_reader :my_flag
5	def initialize
6	@my_flag = false
7	end
8	end

Calling MyClass.instance.my_flag will return false. Now, if you change the code so that @my_flag is assigned to true in “initialize” MyClass.instance.my_flag will still return false.

Let’s investigate another example where Rack::Reloader strategy won’t work:

1	# assets_processor.rb
2	class AssetsProcessor
3	@@processors = []
4	def self.register
5	@@processors << self
6	end
7
8	def self.process
9	@@processors.each :&do_process
10	end
11	end
12
13	# assets_compiler.rb
14	require_relative 'assets_processor'
15	class AssetsCompiler < AssetsProcessor
16	register
17
18	def self.do_process
19	puts 'compiling assets'
20	end
21	end
22
23	# gzip_assets.rb
24	require_relative 'assets_processor'
25	class GzipAssets < AssetsProcessor
26	register
27
28	def self.do_process
29	puts 'gzipping assets'
30	end
31	end
32
33	# app.rb
34	require_relative 'assets_compiler'
35	require_relative 'gzip_assets'
36	class App
37	def run
38	AssetsProcessor.process
39	end
40	end

Running App.new.run will print “compiling assets” and then “gzipping assets”. Now, if you change assets_compiler.rb, it will also print “compiling assets” once more the next time it’s called.

This applies to all situations where a given class method is supposed to be run only once or when the order of files load matter. For example, suppose AssetsProcessor.register implementation is changed in assets_processor.rb. Since register was already called in its subclasses that means the change won’t take effect in them since only assets_processor.rb will be reloaded by Rack::Reloader. Other reloaders discussed here also suffer with this issue but they provide some work-arounds for some of them.

rerun and shotgun: the reload everything approach

Some reloaders like rerun and shotgun will simply reload everything on each request. They fork at each request before requiring any files, which means those files are never required in the main process. Due to forking it won’t work on JRuby or Windows. This is a safe approach when using MRI on Linux or Mac though. However, if your application takes a long time to boot then your requests would have a big latency during the development mode. In that case, if the reason for the slow start-up lies in the framework code and other external libraries rather than the app specific code, which we want to be reloadable, one can require them before forking to speed it up.

This approach is a safe bet, but unsuitable when running on JRuby or Windows. Also if loading all app’s specific code is still slow, one may be interested in looking for faster alternatives. Besides that, this latency will exist in development mode for all requests even if no files have been changed. If you’re working on performance improvements other approaches will yield to better results.

rack-unreloader

rack-unreloader takes care of unloading constants during reload, differently from Rack::Reloader.

It has basically two modes of operation. One can use “Unreloader.require(‘dep’){[‘Dep’, …]}” to require dependencies while also providing which new constants are created and those will be unloaded during reload. This is the safest approach but it’s not transparent. For every required reloadable file we must manually provide a list of constants to be unloaded. On the other side this is the fastest possible approach since the reloader doesn’t have to try to figure out those constants automatically, like other options that will be mentioned below do. Also, it doesn’t override “require”, so it’s great for those that don’t want any monkey patching. Ruby currently does not provide a way to safely discover those constants automatically without monkey patching require, so rack-unreloader is probably the best you can get if you want to avoid monkey patches.

The second mode of operation is to not provide that block and Unreloader will look at changes to $LOADED_FEATURES before and after the call of Unreloader.require to figure out which constants the required file define. However, without monkey patching “require” this mode can’t be reliable, as I’ll explain in the sub-section below.

Before getting into it, there’s another feature of rack-unreloader that speed up reloading by only reloading the changed files, differently from other options I’ll explore below in this article. However, reloading just changed files is not always reliable as I’ve discussed in the Rack::Reloader Issues section.

Finally, differently from other libraries, rack-unreloader actually calls “require” rather than “load” and deletes the reloaded files from $LOADED_FEATURES before the request so that calling “require” will actually reload the file.

rack-unlreloader Issues

It’s only reliable if you always provide the constants defined on each Unreloader.require() call. This is also the fastest approach. It may be a bit boring to write code like this. Also, even in this mode, it’s only reliable if your application works fine regardless of the order each file is reloaded (I’ve shown an example in the Rack::Reloader Issues section demonstrating how this approach is not reliable if this is not the case).

Let’s explore why the automatic approach is not reliable:

1	# t.rb:
2	require 'json'
3	module T
4	def self.call(json)
5	JSON.parse(json)
6	end
7	end
8
9	# app.rb:
10	require 'rack/unreloader'
11	require 'fileutils'
12	Unreloader = Rack::Unreloader.new{ T }
13	Unreloader.require('./t.rb') # {'T'} # providing the block wouldn't trigger the error
14	Unreloader.call '{}'
15	FileUtils.touch 't.rb' # force file to be reloaded
16	sleep 1 # there's a default cooltime delay of 1s before next reload
17	Unreloader.call '{}' # NameError: unitialized constant T::JSON

Since rack-unreloader does not override “require” it can’t track which files define which constants in a reliable way. So, it thinks ’t.rb' is responsible for defining JSON and will then unload JSON (which has some C extensions which cannot be unloaded). This also affects JRuby if the file imports some Java package among other similar cases. So, if you want to work with the automatic approach with rack-unreloader you’d have to require all those dependencies before running Unreloader.call. This is very error-prone, that’s why I think it’s mostly useful if you always provide the list of constants expected to be defined by the required dependency.

However rack-unreloader provides a few options like “record_dependency”, “subclasses” and “record_split_class” to make it easier to specify the explicit dependencies between files so that the right files are reloaded. But that means the application author must have a good understanding on how auto-reloading works, how their dependencies work and will also require them to fully specify the dependencies. It can be a lot of work but it may worth in the case reloading all reloadable files can take a lot of time. If you’re looking for the fastest possible reloader than rack-unreloader may well be your best option.

ActiveSupport::Dependencies

Now we’re talking about the reloader behind Rails, which is great and battle tested and one of my favorites. Some people don’t realize it’s pretty simple to use it outside Rails, so let me demonstrate how it can be used since it seems it’s not widely documented.

Usage

1	require 'active_support' # this must be required before any other AS module as per documentation
2	require 'active_support/dependencies'
3	ActiveSupport::Dependencies.mechanism = :load # or :require in production environment
4	ActiveSupport::Dependencies.autoload_paths = [__dir__]
5
6	require_dependency 'app' # optional if app.rb defines App, since it also supports autoloading
7	puts App::VERSION
8	# change version number and then:
9	ActiveSupport::Dependencies.clear
10	require_dependency 'app'
11	puts App::VERSION

Or, in the context of a Rack app:

1	require 'active_support'
2	require 'active_support/dependencies'
3	if ENV['RACK_ENV'] == 'development'
4	ActiveSupport::Dependencies.mechanism = :load
5	ActiveSupport::Dependencies.autoload_paths = [__dir__]
6
7	run ->(env){
8	ActiveSupport::Dependencies.clear
9	App.call env
10	}
11	else
12	ActiveSupport::Dependencies.mechanism = :require
13	require_relative 'app'
14	run App
15	end

How it works

ActiveSupport::Dependencies has a quite complex implementation and I don’t really have a solid understanding of it so please let me know about my mistakes in the comments section so that I can fix them.

Basically it will load dependencies in the autoload_paths or require them depending on the informed mechanism. It keeps track of which constants are added by overriding “require”. This way it knows that JSON was actually defined by “require ‘json’” if it’s called by “require_dependency ’t'” and would detect that T was the new constant defined by ’t.rb' and the one that should be unloaded upon ActiveSupport::Dependencies.clear. Also, it doesn’t reload individual changed files only but unloads all reloadable files on “clear”. This is less likely to cause problems as I’ve explained in previous section. It’s also possible to configure it to use an efficient file watcher, like the one implemented by the ‘listen’ gem, which uses an evented approach using OS provided system calls. This way, one can skip the “clear” call if the loaded reloadable files have not been changed by speeding up the request even in development mode.

ActiveSupport::Dependencies supports a hooks system that allow others to observe when some files are loaded and take some action. This is specially useful for Rails engines when you want to run some code only after some dependency has been loaded for example.

ActiveSupport::Dependencies is not only a code reloader but it also implements an auto code loader by overriding Object’s const_missing to automatically try to require code that would define that constant by following some conventions. For example, in the first time one attempts to use ApplicationController, since it’s not defined, it will look in the search paths for an ‘application_controller.rb’ file and load it. That means the start-up time can be improved since we only load code we actually use. However this could lead to some issues that would make the application behave differently in production due to side effects caused by the order some files would be loaded. But Rails applications have been built around this strategy for several years and it seems such caveats have only affected a few people. Those cases can usually be worked around through “require_dependency”.

If your code doesn’t follow the naming convention it will have to use “require_dependency”. This way, if ApplicationController is defined in controllers/application.rb, you’d use “require_dependency ‘controllers/application’” before using it.

Why I don’t like autoload

Personally I don’t like autoloading in general and always prefer explicit dependencies in all my Ruby files, so even in my Rails apps I don’t rely on autoloading for my own classes. The same applies for Ruby’s built-in “autoload” feature. I’ve been bitten already by an autoload related bug when trying to use ActionView’s number helpers by requiring the specific file I was interested in. Here’s a simpler use case demonstrating the issue with “autoload”:

1	# test.rb
2	autoload :A, 'a'
3	require 'a/b'
4
5	# a.rb
6	require 'a/b'
7
8	# a/b.rb
9	module A
10	module B
11	end
12	end
13
14	# ruby -I . test.rb
15	# causes "...b.rb:1:in `': uninitialized constant A (NameError)"

It’s not quite clear what’s happening here since the message isn’t very clear about the real problem and it gets even more complicated to understand in a real complex code base. Requiring ‘a/b’ before requiring ‘a’ will cause a circular dependency issue. When “module A” is seen inside “a/b.rb”, it doesn’t exist yet and the “autoload :A, ‘a’” tells Ruby it should require ‘a’ in that case. So, this is what it does, but ‘a.rb’ will require ‘a/b.rb’ which we were trying to load in the first place. There are other similar problems that are caused by autoload and that’s why I don’t use it myself despite the potential of loading the application faster. Ideally Ruby should provide support for some sort of pre-compiled (or pre-parsed) files which would be useful for big applications to speed up code loading since the disk I/O is not the bottleneck but the Ruby parsing itself.

ActiveSupport::Dependencies Caveats

ActiveSupport::Dependencies is a pretty decent reloader and I guess most people are just fine with it and its known caveats. However there are some people, like me, which are more picky.

Before I get into the picky parts, let’s explore the limitations one has to have in mind when using a reloader that relies on running some file code multiple times. The only really safe strategy I can think of for handling auto-reloading is to completely restart the application or to use the fork/exec approach. They have their own caveat, like being slower than the alternatives, so it’s always about trade-offs when it comes to auto-reloaders. Running some code more than once can lead to unexpected results since not all actions can be rolled back.

For example, if you include some module to ::Object, this can’t be undone. And even if we could work around it, we’d have to detect such automatically which would perform so badly that it would be probably better to simply restart everything. This applies to monkey patching, to creating some constants in namespaces which are not reloadable (like defining JSON::CustomExtension) and similar situations. So, when we are dealing with automatic reloaders we should keep that in mind and understand that reloading will never be perfect unless we actually restart the full application (or use fork/exec). ActiveSupport::Dependencies provides some options as autoload_once_paths so that such code wouldn’t be executed more than once but if you have to change such code then you’ll be forced to restart the full application.

Also, any file actually required rather than loaded (either with require or require_relative) won’t be auto-reloaded, which forces the author to always use require_dependency to load files that are supposed to be reloadable.

Here’s what I dislike about it:

ActiveSupport::Dependencies is part of ActiveSupport and relies on some monkey patches to core classes. I try to avoid monkey patching core classes at all costs so I don’t like AS in general due to its monkey patching approach;
Autoloading is not opt-in as far as I know, so I can opt out and I’d rather prefer to not using it;
Since some Ruby sources will make use of “require_dependency” and since some Rails related gems may rely on the automatic autoloading feature provided by ActiveSupport::Dependencies it forces applications to override “require” and use ActiveSupport::Dependencies even in production mode;
If your application doesn’t rely on ActiveSupport then this reloader will add some overhead to the download phase of Bundler.

Conclusion

Among the options covered in this article, ActiveSupport::Dependencies is my favorite one although I would consider rerun or shotgun when running on MRI and Linux if the application starts quickly and I wouldn’t have to work on performance improvements (in that case, it’s useful to have the behavior of performing like in production when no files have been changed).

Basically, if your application is fast to load then it may make sense to start with rerun or shotgun since they are the only real safe bets I can think of.

However, I performed a few metrics in my application and decided it worth creating a new transparent reloader that would also fix some of the caveats I see in ActiveSupport::Dependencies. I wrote a new article about auto_reloader.

If you know about other automatic code reloaders for Ruby I’d love to know about them. Please let me know in the comments section. Also let me know if you think I misunderstood how any of those mentioned in this article actually works.

The sad state of streaming in Ruby web applications

2016-07-04T21:40:00+00:00

This article is basically a copy of this project’s README. You may read it there if you prefer. It’s a sample application demonstrating the current streaming state with Devise or Warden.

Devise is an authentication library built on top of Warden, providing a seamless integration with Rails apps. This application was created following the steps described in Devise’s Getting Started section. Take a look at the individual commits and their messages if you want to check each step.

Warden is a Rack’s middleware and authentication is handled using a “throw/catch(:warden)” approach. This works fine with Rails until streaming is enabled with ActionController::Live.

José Valim pointed out that the problem is ActionController::Live’s fault. This is because the Live module changes the “process” method so that it runs inside a spawn thread, so that it can return to finish processing the remaining middlewares in the stack. Nothing is sent to the connection before leaving that method due to the Rack issue I’ll describe next. But the “process” method will also handle all filters (before/around/after action hooks). Usually the authentication happens in a before action filter and if the user is not authentication Devise will “throw :warden” but since this is running in a spawn thread, the Warden middleware doesn’t have the chance to catch this symbol and handle it properly.

The Rack issue

I find it amusing that after so many years of web development with Ruby, Rack doesn’t seem to have evolved much to better handling streamed responses, including SSE and why not websockets. The basic blocks are basically the same as when Rack was first created in a successful attempt to add a standard API web servers and frameworks could agree and build on top of it. This is a great achievement but Rack should evolve to better handle streamed responses.

Aaron Patterson has tried to work on another API for Rack that would improve support for streaming but it seems it would break middlewares, and currently it seems the metal is dead. Sounds like HTTP 2.0 multiplexing requires yet more changes, so maybe we’ll get proper support in Rack 3.0, which should be backward compatible and keep supporting existing middlewares, by providing alternative APIs, but that seems like it could take years to get there. He has also written about the issues with Rack API over 5 years ago.

Currently, the way Rack applications handle streaming is by implementing an object that responds to each that will yield a chunk at a time until the stream is finished, which is usually implemented by providing the user an API similar to a proper stream object as properly implemented in other languages. A few years ago an alternative system has been suggested, which became known as the hijacking API. The Phusion team covered it when it was introduced but I think the “partial hijacking” section is no longer valid.

Rack was designed on top of a middleware stack which means any response will only start after all middlewares have been called and returned (except if hijacking is used), since middlewares don’t have access to the socket stream. That’s why Rails had to resort to using threads to handle streamed/chunked responses. But it can offer other alternative implementations that would be more friendly to how Warden and Devise work as demonstrated in this application, which I’ll discuss in the next section.

Before talking about Rails current options, I’d like to stress a bit more the problem with Rack without hijacking, and consequently how it affects web development in Ruby in a negative way, when compared to how this is done in most other languages.

If we compare to how streaming is handled in Grails (and most JVM based frameworks) , or most of the main web frameworks in other languages, it couldn’t be any simpler. Each request thread (or process) has access to a “response” object that accepts a “write” call that goes directly to the socket’s output (or after a “flush” call).

There’s no need to flag a controller as capable of streaming. They are just regular controllers. The request thread or process does not have to spawn another thread to handle streaming, so there’s nothing special with such controllers.

It would be awesome if Ruby web applications had the option to use a more flexible API, more friendly to streamed responses, including SSE and websockets. Hijacking currently seems to be considered a second-class citizen since they are usually ignored by major web frameworks like Rails itself.

The Rails case (or how to work around the current state in Rack apps)

So, with Rails one doesn’t flag an action as one requiring streaming support. They have to flag the full controller. In theory all other actions not taking advantage of the streaming API should work just like regular controllers not flagged with ActionController::Live.

The obvious question is then, “so, why isn’t Live always included?”. After all, the Rails users wouldn’t have to worry about enabling streaming, it would be simply enabled by default for when you want it. One might think that it would be related to performance concerns but I suspect that the main problem is that this is not issues free.

Some middleware assume that the inner middlewares have finished (some of them actually depend on them to be finished) so that they can modify the original response or headers. This kind of post-processing middlewares do not work well with streamed responses.

This includes caching middlewares (handling ETag or last-modified headers), monitoring middlewares injecting some HTML (like NewRelic does automatically by default for example) and many other. Those middlewares will block the stack until the response is fully finished which breaks the desired streamed output. Some of them will check some conditions and skip this blocking behavior under certain circumstances but some will still cause some hard to debug issues or they may be even conceptually broken.

There are also some middlewares that expect the controller’s action code to run in the same thread due to the implementation details surrounding them. For example, if a sandboxed database environment is implemented as a middleware that runs the following layer inside a transaction block that will be rolled back, and if the connection is automatically fetched using the current thread id as the access key, then spawning a new thread would run in a different connection and out of the middleware’s transaction, breaking the sandboxed environment. I think ActiveRecord fetches the connection from thread locals and since ActionController::Live will copy those locals to the new spawned thread it probably works, but I’m just warning that spawning threads may break several middlewares in unexpected ways.

This includes the behavior of Warden communication. So, enabling Live in all Rails controllers would have the immediate effect of breaking most current Rails applications as Devise is the de facto authentication standard for Rails apps. Warden assumes the code handling authentication checks is running in the same thread. It could certainly offer another strategy to inform about failed authentication, but this is not how it currently works.

Even though José Valim said there’s nothing they could do because it’s Live’s fault, this is not completely true. I guess he meant that it would be too much work to make it work. After all, we can’t simply put the fault on Live since the fault actually lies in Rack itself, so streaming is fundamentally broken.

Devise could certainly subclass Warden::Manager and use this subclass as its middleware and overwrite “call” to add some object to env, for example, that would listen to reported failures and they could replace “throw :warden” in its own code with a more higher level API that would communicate to warden properly. But I agree this is a mess and probably doesn’t worth, specially because it couldn’t be called exactly Warden compatible. Another option could be to change Warden itself so that it doesn’t expect the authentication checks to happen in the same thread. Or it could replace the “throw-catch” approach with a “raise/rescue” one, which should work out of the box to how Rails currently handles it. It shouldn’t be hard for Devise itself to wrap Warden and use Exceptions rather than throw-catch, but again, I’m not sure if this is really worthy.

So, let’s explore other options, which adds other API options to Rails itself.

A suggestion to add a new API to Rails

The Warden case is a big issue since Devise is very popular among Rails apps and shouldn’t be ignored. Usually the authentication is performed in filters rather than in the action itself. Introducing a new API would give the user the chance of performing authentication in the main request thread before spawning the streamed thread. This works even if the authentication check is done directly in the action rather than in the filters. The API would work something like:

1	def my_action
2	# optionally call authenticate_user! here, if not using filters
3	streamed do \|stream\|
4	3.times{stream.write "chunk"; sleep 1}
5	end
6	end

This way, the thread would only be spawned after the authentication check is finished. Or “streamed” could use “env[‘rack.hijack’]” when available instead of spawning a new thread.

Use Rack hijacking

Another alternative might be to support streaming only for web servers supporting Rack hijacking. This way, the stream API could work seamless, without requiring “ActionController::Live” to be included. When “response.stream” is used, it would use “env[‘rack.hijack_io’]” if available or either buffer the responses and send them at once or raise some error, based on some configuration accordingly to the user’s preferences, as sometimes streaming is not only an optimization but a requirement that shouldn’t be silently ignored. The same behavior would apply when HTTP 1.0 is used for example.

Or another module such as “ActionController::LiveHijacking” could be created so that Rails users would have that option for a while until Rails thinks this approach is stable enough to be enabled by default.

Conclusion

I’d like to propose two discussions around this issue. One would be a better solution for Rack applications to get to talk directly to the response ( or discussing an strategy for making Rack hijacking a first-class citizen and probably call it something better than hijack). And the other solution would be for Rails to improve support for streaming applications by better handling cases like the Warden/Devise issue. I’ve copied this text with some minor changes to my site so that it could be discussed in the Disqus' comments section or we could discuss it in the issues section of this sample project or in the rails-core mailing list, your call.

Akita's Manga Downloadr Elixir vs Ruby performance revisited

2016-06-20T11:40:00+00:00

Two weeks ago I read an article from Fabio Akita comparing the performance of his Manga Downloadr implementations in Elixir, Crystal and Ruby.

From a quick glance at its source code it seems the application consisted mostly of downloading multiple pages and another minor part would take care of parsing the HTML and extracting some location paths and attributes for the images. At least, this was the part that was being tested in his benchmark. I found it very odd that the Elixir version would finish in about 15s while the Ruby version would take 27s to complete. After all, this wasn’t a CPU bound application but an I/O bound one. I would expect that the same design implemented in any programming language for this kind of application should take about the same time in whatever chosen language. Of course the HTML parser or the HTTP client implementations used on each language could make some difference but the Ruby implementation took almost twice the time taken by the Elixir implementation. I was pretty much confident it had to be a problem with the design rather than a difference in the raw performance among the used languages.

I had to prepare a deploy for the past two weeks which happened last Friday. Then on Friday I decided to take a few hours to understand what the test mode was really all about and rewrote the Ruby application with a proper design for this kind of application taking Ruby’s limitations (specially MRI’s ones) in mind with focus on performance.

The new implementation can be found here on Github.

Feel free to give it a try and let me know if you can think of any changes that could potentially improve the performance in any significant way. I have a few theories my self, like using a SAX parser rather than performing the full parsing, among a few other improvements I can think of, but I’m not really sure whether the changes would be significant given that most of the time is actually spent on network data transfer using a slow connection (about 10MBbps in my case), if we compare to the time needed to parse those HTMLs.

The numbers

So, here are the numbers I get with a 10MBps Internet connection and an AMD Phenom II X6 1090T, with 6 cores at 3.2GHz each:

Elixir: 13.0s (best time, usually ranges from 13.0-16s)
JRuby: 12.3s (best time, usually ranges from 12.3-16s)
MRI: 10.9s (best time, usually ranges from 10.9-16s)

As I suspected, they should perform about the same. JRuby needs 1.8s just to boot the JVM ( measured with time jruby –dev -e ‘’), which means it actually takes about the same as MRI if we don’t take the boot time into consideration (which is usually the case when the application is running a long-lived daemon like a web server).

For JRuby threads are used to handle concurrency while in MRI I was forced to use a pool of forked processes to handle HTML parsing and write some simplified Inter-Process Communication (IPC) technique which is suitable for this particular test case but may not apply to others. Writing concurrent code in Ruby could be easier but for MRI it’s specially hard once you want to use all cores because I find it much easier to write multi-threaded code than to deal with forked processes and special IPC that is not as trivial to write as using threads that share the same memory. You are free to test the performance of other approaches in MRI, like the threaded one, or always forking rather than using a pool of forked processes, changing the amount of workers both for the downloader as well as for the forked pool (I use 6 processes in the pool that parses the HTML since I have 6 cores in my CPU).

I have always been disappointed by the sad state of real concurrency in MRI due to the GIL. I’d love to have a switch to disable the GIL completely so that I would able to benchmark the different approaches (threads vs forks). Unfortunately, this is not possible in MRI or JRuby because MRI has the GIL and JRuby doesn’t handle forking well. Also, Nokogiri does not perform the same in MRI and JRuby, which means there are many other variables involved that running an application using forks in MRI cannot be really compared to run it against JRuby using the multi-threaded approach because the difference in the design is not the only one happening.

When I really need to write some CPU bound code that would benefit from running on all cores I often do it in JRuby since I find it easier to deal with threads rather than spawn processes. Once I had to create an application similar to Akita’s Manga Downloader in test mode and I wrote about how JRuby saved my week exactly due to it enabling real concurrency. I really think MRI team should take real concurrency needs more seriously or it might become irrelevant in the languages and frameworks war. Ruby usually gives us options, but we don’t really have an option to deal with concurrent code in MRI as the core developers believe forking is just fine. Since Ruby usually strives for its simplicity I find this awkward since it’s usually much easier to write multi-threaded code than dealing with spawn processes.

Back to the results of the timing comparison between Elixir and Ruby implementations, of course, I’m not suggesting that Ruby is faster than Elixir. I’m pretty sure the design of the Elixir implementation can be improved as well to get a better time. I’m just demonstrating that for this particular use case of I/O bound applications the raw language performance usually does not make any difference given a proper design. The design is by far the most important feature when working on performance improvements of I/O bound applications. Of course it’s also important for CPU bound applications, but what I mean is that the raw performance is often irrelevant for I/O bound applications while the design is essential.

So, what’s the point?

There are many features one can use to sell another language but we should really avoid the trap of comparing raw performance because it hardly matter for most of the applications web developers work with, if they are the target audience. I’m pretty sure Elixir has great sell points, just like Rust, Go, Crystal, Mirah and so on. I’d be more interested in learning about the advantages of their eco-systems (tools, people, libraries) and how they allow to write good designed software in a better way. Or how they excel in exceptions handling. Or how easy it is to write concurrent and distributed software with them. Or how robust and fault tolerant they are. Or how they can help getting zero down-times during deploy, or how fast the applications would boot (this is one of the raw performance cases where it can matter). How well documented they are and how amazing are their communities. How one can easily debug and profile applications in these environments or how easily they can test something in a REPL, or write automated tests, manage dependencies. How well autoreloading work in the development mode and so on. There are so many interesting aspects of a language and its surrounding environment that I find it frustrating every time I see someone trying to sell a language by comparing the raw performance as it often does not matter in most cases.

Look, I’ve worked with fast hard real-time systems (running on Linux with real-time patches such as Xenomai or RTAI) during my master thesis and I know that raw performance is very important for a broad set of applications, like Robotics, image processing, gaming, operating systems and many others. But we have to understand whom we are talking to. If the audience is web development raw performance simply doesn’t matter that much. This is not the feature that will determine whether your application will scale to thousands of requests per second. Architecture/design is.

If you are working with embedded systems or hard real time systems it makes sense to use C or some other language that does not rely on garbage collectors (as it’s hard to implement a garbage collector with hard timing constraints). But please forget about raw performance for the cases where it doesn’t make much difference.

If you know someone who got a degree in Electrical Engineering, like me, and ask them, you’ll notice it’s pretty common to perform image processing in Matlab, which is an interpreted language and environment to prototype algorithm designs. It’s focused on operations involving matrix and they are pretty fast since they are compiled and optimized. Which allows engineers to quickly test different designs without having to write each variation in C. Once they are happy with the design and performance of the algorithm they can go a step further and implement it in C or use one of the Matlab tools that would try to perform this step automatically.

Engineers are very pragmatic. They want to use the best tools for their jobs. That means a scripting language should be preferred over a static one during the design/prototype phase as it allows faster feedback and iterative loop. Sometimes the performance they get with Matlab is simply fast enough for their needs. The same happens with Ruby, Python, JS and many other languages. They could be used for prototypes or they could be enough for the actual application.

Also, one can start with them and once the raw performance becomes a bottleneck they are free to convert that part to a more efficient language and use some sort of integration to delegate the expensive parts to them. If there are many parts of the application that would require such approach to be taken, then it becomes a burden to maintain it and one might consider moving the complete application to another language to reduce the complexity.

However, this is not my experience with web applications in all past years I’ve been working as a web developer. Rails usually takes about 20ms per request as measured by nginx in production while DNS, network transfer, JS and other related jobs may take a few seconds which means the 20ms spent in the server is simply irrelevant. It could be 0ms and it wouldn’t make any difference to the user experience.

Preventing NewRelic RUM metrics for certain clients with Rails apps

2014-08-07T12:15:00+00:00

We use the awesome Sensu monitoring framework to make sure our application works as expected. Some of our checks use a headless browser (PhantomJS) to explore parts of the application, like exporting search results to Excel or making sure no error is thrown from JS in our Single Page Application. We also use NewRelic and Pingdom to get some other metrics.

But since PhantomJS acts like a real browser, our checks will have influence over the RUM metrics we get from NewRelic, but we’re not really interested in such metrics. We want the metrics from real users, not our monitoring system.

My initial plan was to check if I could filter some IP’s from RUM metrics and asked NewRelic support about this possibility, for which they said it’s not supported yet, unless you want to filter specific controllers or actions.

Since some monitoring scripts have to go through real actions, this was not an option for us. So I decided to take a look at the newrelic_rpm gem and could come with a solution that I’ve confirmed is working fine for us.

Since we have a single page application, I simply add the before-action filter to the main action, but you may adapt it to use in your ApplicationController if you will. This is what I did:

1	class MainController < ApplicationController
2	before_action :ignore_monitoring, only: :index if defined? ::NewRelic
3
4	def index
5	# ...
6	end
7
8	private
9
10	def ignore_monitoring
11	return unless params[:monitoring]
12	::NewRelic::Agent::TransactionState.tl_get.current_transaction.ignore_enduser!
13	rescue => e
14	logger.error "Error in ignore_monitoring filter: #{e.message}\n#{e.backtrace.join "\n"}"
15	end
16	end

The rescue clause is there in case the implementation of newrelic_rpm changes and we don’t notice it. We decided to send a “monitoring=true” param to our requests performed by our monitoring scripts. This way we don’t have to worry about managing and updating a list of monitoring servers and figure out how to update that list in our application without incurring in any down-time.

But in case you want to deal with this somehow, you might be interested in testing “request.remote_ip” or “request.env[‘HTTP_X_FORWARDED_FOR’]”. Just make sure you add something like this to your nginx config file (or a similar trick for your proxy server if you’re using one):

1	location ... {
2	proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
3	}

Sequel is awesome and much better than ActiveRecord

2014-05-30T09:26:00+00:00

I’ve been using Sequel in production since 2012, April and I still think this is the best decision I’ve made so far for the whole project lifetime.

I had played with it sometimes in the past years, when Arel hasn’t been added to ActiveRecord yet and I found it amazing on how it supported lazy queries. Then I spent a few years working with Java, Groovy and Grails when I changed my job in 2009, but kept reading about Ruby (and Rails) news until I found out that AR has added support for lazy queries through Arel, when Rails 3 was released. Then I assumed AR would be a better fit than Sequel since it’s already integrated with Rails and lots of great plug-ins would support it better.

I was plain wrong! In 2011 I changed my job again to work on another Grails application. After finding a bug with no fix or workaround available I decided to create a Rails application to forward the affected requests to. So, in April of 2012 I started to create my Rails app and its models using ActiveRecord. A week later I moved all models from ActiveRecord to Sequel and have been happy since then.

Writing some queries with ActiveRecord was still a pain while Sequel made it was a joy to work with. The following sections will go to each topic I find Sequel is an improvement over AR.

Database pooling implementation

These days I decided to recreate a few models with ActiveRecord so that we could use an admin interface with the activeadmin gem, since it doesn’t support Sequel. After a few requests to the admin interface it stopped responding with timeout errors.

Then I decided to write some code to test my suspicions and run it in the console:

1	pool_size = ActiveRecord::Base.connection_pool.size
2	(pool_size + 1).times{ Thread.start{AR::Field.count}.join }

This yielded an timeout error in the last run. This didn’t happen with my Sequel models:

1	pool_size = Sequel::Model.db.pool.size
2	(pool_size + 1).times.map{ Thread.start{Field.count} }.each &:join

Notice that I don’t even need the join call inside the block for it to work since the count call is so much faster than the timeout settings.

The curious thing is that I didn’t get any timeout errors when using activeadmin with a regular Rails application, so I investigated what was so special on it that I could access the admin interface as many time I wanted and it wouldn’t ever timeout.

I knew the main difference between my application and a regular Rails application is that I only required active_record, while Rails will require active_record/railtie. So I decided to take a look at its content and found this:

1	config.app_middleware.insert_after "::ActionDispatch::Callbacks",
2	"ActiveRecord::ConnectionAdapters::ConnectionManagement"

So I found that AR was tricking here delegating the pool management to the web layer by always clearing active connections from the pool after the request was processed in that middle-ware:

1	ActiveRecord::Base.clear_active_connections! unless testing

Despite the name clear_active_connections! it seems to actually only close and checkin back to the pool the single current connection, whose id is stored in a thread local variable, from my understanding after taking a glance over AR pool management source code. That means that if the request main thread spawns a new thread any connection checked out in the new thread won’t be automatically collected by Rails and your application would start to throw timeout exceptions when waiting for a connection to be available in the pool, for no obvious reason, unless you understand how the connection pool works in AR and how it’s integrated in Rails. Here’s an example:

1	class MainController
2	def index
3	Thread.start{ Post.count }
4	head :ok
5	end
6	end

Try running this controller using a single server process 6 times (assuming the pool size is the default of 5 connections). This should fail:

1	ab -n 6 -c 1 http://localhost:3000/main/index

That means the user is responsible for closing the connection, checking it in back to the pool before the thread is terminated. This wouldn’t be a concern if Post was a Sequel model.

Then I recalled this article from Aaron Patterson.

Update note: it seems this specific case will be fixed in ActiveRecord 4.2 due to the automatic connection check-in upon dead threads strategy implemented in pull request #14360.

Ability to join the same table multiple times with different aliases

The main reason I left AR for Sequel was the need for joining the same table multiple times with different aliases for each joined table. Take a look at this snippet from this sample project:

1	module Sq
2	class Template < Sequel::Model
3	one_to_many :fields
4
5	def mapped_template_ids
6	FieldMapping.as(:m).
7	join(Field.named(:f), id: :field_id, template_id: id).
8	join(Field.named(:mf), id: :m__mapped_field_id).
9	distinct.select_map(:mf__template_id)
10	end
11	end
12	end

I still don’t know how to write such query using AR. If you do, please comment on how to do so without resorting to plain SQL or Arel, which is considered an internal implementation detail of AR for which the API could change anytime even for a patch release.

as and named are not part of Sequel::Model, but implemented as a plug-in. See next section.

Built-in plugin support for models

Although it’s not a strong reason to move to Sequel, since it’s easily implemented with regular Ruby modules in AR, it’s nice to have such a built-in API for extending models:

1	module Sequel::Plugins::AliasSupport
2	module ClassMethods
3	def as(alias_name)
4	from named alias_name
5	end
6
7	def named(alias_name)
8	Sequel.as table_name, alias_name
9	end
10	end
11	end
12	Sequel::Model.plugin :alias_support

Support for composite primary keys

Sequel does support composite primary keys, which are specially useful for join tables, while ActiveRecord requires a unique column as the primary key.

No need to monkey patch it

It seems lots of people don’t find AR’s API good enough because they keep monkey patching it all the time. I really try very hard to avoid any dependency on a library that relies on monkey patching something, specially AR, since it’s always changing its internal implementation.

So, with all major and minor Rails release we often find gems that stopped working due to such internal changes. For example, activeadmin stopped working with Rails 4.1.0.beta1 release even if the public AR public API remained the same.

It takes so much time to work on code that relies on monkey patching AR, that Ernie Miller, after several years trying to provide improvements over AR gave up.

Not surprisingly, one of the gems he used to maintain, polyamorous, was the reason why activeadmin stopped working with latest Rails release.

I never felt the need for monkey patching Sequel’s classes.

Documentation

Sequel’s documentation is awesome! That was the first thing I noticed when I moved from AR to Sequel. Arel is considered internal implementation detail and AR users are not supposed to rely on Arel’s API, which makes AR’s API much more limited besides being badly documented.

Support

Sequel’s mailing list has awesome support from Jeremy Evans, the gem maintainer. As for AR, there’s no dedicated list for it and one has to subscribe to a Rails related list to discuss AR stuff.

Separation of concerns

I like to keep the concerns separately and I can’t think about why an ORM solution should be attached to a web framework implementation. If Rails has great features in a new release with regards to action handling, I shouldn’t be forced to upgrade the ORM library at the same time I upgrade Rails.

Also, if a security fix affects AR only, why should a new Rails version be released?

Often AR will introduce incompatibilities in new versions, while I haven’t seen this happening with Sequel yet for the features I use. Also, I’m free to upgrade either Rails or Sequel any time.

Of course, this doesn’t apply to ORM solutions only, but it’s also valid for mailing handling but this is another topic, so I’ll focus on Sequel vs AR comparison only.

Sequel can also be useful without models

Sometimes it doesn’t make sense to create a model for each table. Sequel’s database object allows you to easily access any table directly while still supporting all dataset methods like you’d do with Sequel models:

1	DB = Sequel::Model.db # or Sequel.connect 'postgres://localhost/my_database'
2	mapped_template_ids = DB[:field_mappings___m]
3	join(:fields___f, id: :m__field_id, template_id: 1).
4	join(:fields___mf, id: :m__mapped_field_id).
5	where(f__deleted: false, mf__deleted: false).
6	distinct.select_map(:mf__template_id)

Philosophy

AR’s philosophy is to delegate constraints to the application model’s layer, while Sequel prefers to implement all constraints in the database level, when possible/viable. I’ve always agreed that we should enforce all constraints in the database level. But this isn’t common among most AR users. AR migrations doesn’t make it easier to create a foreign key properly using its DSL, for example and treat them as second-class citizen, as opposed to Sequel’s philosophy.

The only RDBMS database solution I currently use is PostgreSQL and I really want to use several features that are only supported by PostgreSQL. Sequel’s PG adapter allows me to use those features if I want to, even knowing that it won’t work for other database vendors.

This includes recursive transactions through save-points, options to drop temp table on commit and so on.

Another example: AR 4.1.0.beta1 introduced support for enums, in a database independent way.

I’d much prefer to use PostgreSQL’s enum type for things like that, which comes with database-side built-in validations/constraints.

Also, although you can manage association cascades in the application-side using this plugin with Sequel, usually you’d be advised to perform such cascade operations in the database level when creating the foreign keys, for instance. Also, when a database trigger better takes care of an after/before hook than an application’s code, you should not be afraid of getting advantage of those.

Faster testing when using factories

With PostgreSQL feature of using save-points in transactions, I can set-up RSpec to allow transactional before/after(:all) blocks in addition to the before/after(:each) ones.

This allows me to save quite some time when I can create several database records in a context which will then be shared among several examples, instead of recreating them every-time.

RSpec’s support for this is not good (like having a let global variant over the context) but it’s not hard to get this set-up working in a good enough way, speeding up my test suite a lot.

And it’s pretty easy to use Sequel’s core support for nested transactions so that I can be sure that the database state will be always consistent before each example is run.

Migrations

I strongly believe a database’s schema change should be handled by a separate project, instead of inside an application using the database. More applications may use the same database at some point and it makes sense that managing your database should be handled by a separate application.

I still don’t have a favorite migrations solutions as each of them have their pros and drawbacks. I’m still using AR’s migration for historical reasons, as I used the standalone_migrations gem in a separate project even when my application was written only in Grails and the Rails app didn’t exist yet. Since standalone_migrations only supports AR 3.x branch, and I was interested in some features from AR 4, I created another gem, called active_record_migrations to be able to use AR 4 migrations support in stand-alone mode.

DSL

I much prefer the Sequel’s DSL for writing the migrations as it supports more things in an easier way than AR’S migrations. Also, I’m allowed to use any dataset methods from an migration, instead of having to write everything not supported by the DSL as plain SQL queries.

On the other side, AR, since version 4, allows us to have an reversible block inside a change method which can be quite useful.

Tooling

AR provides a good migration generator, which lacks on Sequel and can be very helpful when creating new migrations.

Performance

I didn’t create any specific performance tests to compare both ORM solutions but I do remember that my specs run much faster when I migrated from AR to Sequel and I’ve also heard from other people that Sequel is faster for most use cases, in MRI at least.

Query DSL

I really like to have control over the generated SQL and a good ORM solution for me is one that will allow me to have better control over it. That’s why I don’t like the Hibernate’s HQL language.

The database should be your friend and if it supports some functions or syntax that would help you why not use them?

Sequel allows me to use nearly all features available through its DSL from my database vendor of choice: PostgreSQL. It also provides me easy access and documentation to use all kind of stuff I can do with plain SQL like “ilike” expressions, sub-queries, nested transactions, import data from file, recursive queries, Common Table Expressions (WITH queries) and so on.

Why not using straight SQL instead of some ORM when cross-database vendors is not an issue?

First, I’d like to say that most of Sequel DSL actually supports multiple database vendors.

But I only find that useful if you’re writing some kind of plug-in or library that should not depend on a single database vendor. But that’s not the case for general use applications.

Once you opt for some database vendor in your application, you shouldn’t have to worry about supporting other database vendors.

So, someone might ask why using any ORM solution if you’re fine with writing plain SQL?

There are many reasons for that. First, most plug-ins expect some Ruby interface to deal with, instead of SQL. This is the case with FactoryGirl, Devise and so on. But this is not the main reason.

An ORM provides lots of goodies, like an easy-to-use API to create and update records, automatic typecasting, creating transactions and much more. But even this is not the main reason for me to prefer an ORM over plain SQL.

The main reason for me is the ability to easily compose a query in some way that is easy to read and maintain, specially when parts of the query depend on the user requesting it or some controller’s param. It’s great that you can change some query on the fly, like this:

1	fields_dataset = Field.where(template_id: params[:id])
2	fields_dataset = fields_dataset.exclude(invisible: true) unless current_user.admin?
3	# ...

Sequel’s drawbacks

When a generic query is performed, Sequel will convert any returned rows as hashes with the column names as keys converted to symbols. This may be a problem if you generate the queries dynamically and alias them based on some table’s id that depend on the user input. If you have enough ids being queried, Sequel may create lots of symbols that will never be garbage collected.

The lack of migration generators built-in for Sequel migrations makes the creation of new migrations a less than ideal task. You may create some custom rake task to aid with migration creations and it shouldn’t be complicated but having that support built into the Sequel core would certainly help.

The main drawback of Sequel is certainly lack of native support of other great gems like Devise, ActiveAdmin and Rails itself. Quite some useful Rails plug-ins will only integrate with ActiveRecord.

Overall feeling

Most of my server-side tasks involve querying data from an RDMBS database and serving JSON representations to the client-side API. So, an ORM solution is a key library for me.

And I couldn’t be happier with all goodness I get from Sequel, which gets out of my way when querying the database in contrast with ActiveRecord, when I used to spend a lot of time trying to figure out whether some kind of query was possible at all.

Thanks, Jeremy Evans, for maintaining such a great library and being so responsive in the mailing list! I really appreciate your efforts, documentation and Sequel itself.

Also, thank you for kindly reviewing this article, providing insightful improvements over it.

Finally, if you’re interested on getting started with Sequel in a Rails application, I’ve published another article on the subject on April, 2012.

Running Java from MRI Ruby through DRb

2014-01-16T15:00:00+00:00

Important update: After I wrote this article I tried to put it to work in my real application and noticed that it can’t really work the way I described due to issues with objects referenced only in the DRb client side being garbage collected in the DRb server side since no references are kept for them in the server-side. I’m keeping this article anyway to explain the idea in the hope we could find a way to work around the memory management issue at some point.

Motivation

In a Ruby application I maintain, we have the requirement of exporting some statistics to XLS (not XLSX) and we had to modify a XLS template for doing that.

After searching the web I couldn’t find a Ruby library that would do the job, but I knew I could count on the Apache POI java library.

MRI Ruby doesn’t have native support for using Java libraries so we have to either use JRuby or some Inter-Process Communication (IPC) approach (I consider hosting a service over HTTP as another form of IPC).

I’ve already used JRuby for serving my web application in the past and we had some good result, but our application is currently running fine on MRI Ruby 2. I don’t want to use JRuby for deployment only to enable me to use Java libraries. Sometimes we’ll re-run some stress tests to test the throughput of our application using several deployment strategies, including using JRuby instead of MRI, in threaded mode (vs the multi-process and multi-threaded approaches with MRI), testing several web servers for each Ruby implementation.

Last time we run our stress tests, Unicorn was a bit faster to serve our pages when compared to using JRuby on Puma, but that wasn’t the main reason why we chose Unicorn. We had some issues with some connections to PostgreSQL with JRuby by that time and we didn’t want to investigate it further, specially when we didn’t notice any advantages in the JRuby deployment for that time.

Things may have changed today but we don’t plan to run another battery of stress tests in the short-run… I just wanted to find another way of having access to Java libraries that wouldn’t attach our application to JRuby in any way. Even when we used to deploy with JRuby, all our code ran in MRI and we used MRI to actually run the tests and also in development mode since it’s much faster to boot and allow faster testing through some forking techniques (spork, zeus, etc).

I didn’t want to add much overhead either, by providing some HTTP service. The overhead is not only in the payload but also in the development work-flow.

What I really wanted was just a bridge that would allow me to run Java code from MRI Ruby, since I’m more comfortable with writing code with Ruby and my tests run faster on MRI rather than JRuby.

So, the obvious choice (at least for me), was to try DRb.

DRb to the rescue

Even after deciding for DRb, you may implement the service with multiple approaches. The simplest one is probably to write the service in JRuby and only access the higher-level interface from the MRI application.

That works but I wanted to avoid this approach for some reasons:

tests would run slower when compared to MRI due to increased boot time for the JVM (main reason)
we’d need to switch applications every time we wanted to work on the Java-related code (we don’t use an IDE, but still, in Vim, that means ‘:lcd ../jruby-app’)
Rails already provides us automatic code reloading out-of-the box for our main application, while we’d have to be constantly rebooting the JRuby application after each change or implement some auto-reloading code ourselves

So, I wanted to test another minimal approach that would only allow us to perform any generic JRuby programming directly from MRI.

Dependencies management, Maven and jbundler

Note: for this section, I’m assuming JRuby is being used. With RVM that means “rvm jruby”.

Christian Meier did a great job with jbundler, a tool similar to Bundler, that will use a Jarfile instead of the Gemfile to specify the Maven dependencies.

So, basically, I created a new Gemfile with bundle init and added a gem ‘jbundler’ entry to it.

Then I created a Jarfile with this content: jar ‘org.apache.poi:poi’. Run bundle exec jbundle and you’re ready to go. Running jbundle console will provide an IRB session with the Maven libraries available.

To create a script, you add a require ‘jbundler’ statement and you can now run it with bundle exec ruby script-name.rb.

The DRb server

So, this is how the JRuby server process looks like:

1	# java_bridge_service.rb:
2
3	POI_SERVICE_URL = "druby://localhost:8787"
4
5	require 'jbundler'
6	require 'drb/drb'
7	require 'ostruct'
8
9	class JavaBridgeService
10	def run(code, _binding = nil)
11	_binding = OpenStruct.new(_binding).instance_eval {binding} if _binding.is_a? Hash
12	result = if _binding
13	eval code, _binding
14	else
15	eval code
16	end
17	result.extend DRb::DRbUndumped if result.respond_to? :java_class # like byte[]
18	result
19	end
20
21	end
22
23	puts "listening to #{POI_SERVICE_URL}"
24	service = DRb.start_service POI_SERVICE_URL, JavaBridgeService.new
25
26	Signal.trap('SIGINT'){ service.stop_service }
27
28	DRb.thread.join

Security note

This is all you need to run arbitrary Ruby code from MRI. Since this makes use of eval, I’d strongly recommend you use this server in a sandbox environment.

The client code

I won’t show the full classes we have for communicating with the server since they are implementation details and people will want to organize it in different ways. Instead I’ll provide some scripting code that you may want to run in an IRB session to test the set-up:

1
2	require 'drb/drb'
3
4	DRb.start_service
5
6	service = DRbObject.new_with_uri 'druby://localhost:8787'
7
8	[
9	'java.io.FileInputStream',
10	'java.io.FileOutputStream',
11	'java.io.ByteArrayOutputStream',
12	'org.apache.poi.hssf.usermodel.HSSFWorkbook',
13	].each{\|java_class\| service.run "import #{java_class}"}
14
15	workbook = service.run 'HSSFWorkbook.new FileInputStream.new(filename)',
16	filename: File.absolute_path('template.xls')
17
18	sheet = workbook.sheet_at 0
19	row = sheet.create_row 0
20	# row.create_cell(0) will display a warning in the server-side since JRuby can't know if you want to use the
21	# short or int method signature
22	cell = service.run 'row.java_send :createCell, [Java::int], col', row: row, col: 0
23	cell.cell_value = 'test'
24
25	# export it to binary data
26	result = service.run 'ByteArrayOutputStream.new'
27	workbook.write result
28
29	# ruby_data is what you would be passing to send_data in controllers:
30	ruby_data = service.run('ByteArrayInputStream.new baos.to_byte_array', baos: result).to_io
31
32	# or, if you want to export it to some file:
33	os = service.run 'FileOutputStream.new filename', filename: File.absolute_path('output.xls')
34	workbook.write os
35

Conclusion

By using such a generic Java bridge, we’re able to use several good Java libraries directly from MRI code.

Troubleshooting

If you’re having any issues with trying that code (I haven’t actually tested the code in this article), please leave a note in the comments and I’ll fix the article. Also, if you have any questions, create a comment and I’ll try to help you.

Or just feel free to thank me if this helped you ;)

Rails: the Good and the Bad

2013-02-17T00:15:00+00:00

A while ago I wrote an article explaining why I don’t like Grails. By that time I was doing Grails development daily for almost 2 years. Some statements there are no longer true and Grails has really improved a lot since 2.0.0. I still don’t like Grails for many more reasons I didn’t find time (or interest) on writing about.

Since almost 2 years ago I was back to Rails programming and the application I currently maintain is a mix of Grails, Rails and Java Spring working together. I feel it is now time to reflect about what I like and what I don’t in Rails.

What kind of web application I’m talking about?

I’ve been working solely on single-page-applications since 2009. All opinions reflected here apply to such kind of application, although some of them will apply to any web application. This is also what I consider the current tendency for web applications, like Twitter, Facebook, Google+, GMail and most applications I’ve seen out there.

When designing such applications one doesn’t use make heavy use of server-side views (ERB, GSP, JSP, you name) but usually render your views in the client-side, although some will prefer to render partial content generated in the server. In the applications I’ve written in those 4 years in different companies and products I’ve been mostly rendering the views in the client-side so also keep that in mind when reading my review.

Basically I only render a single page in the server-side and have plenty of JavaScript (or CoffeeScript) files that are referenced by this page, usually concatenated in a few JavaScript files for production usage.

How does Rails help me on getting my job done?

The Asset Pipeline

I’d say the feature I most like in Rails is undoubtedly the Rails Asset Pipeline. It is an assets processor that uses sprockets and some conventions to help us to declare our assets dependencies and split them in several files and mix different related languages, that will basically compile to JavaScript and CSS. Examples of languages supported out of the box are CoffeeScript and SCSS, that are better versions (in my opinion of course) than JavaScript and CSS.

This tools take out most of the pain I have with JavaScript. The main reason I hate JavaScript is the lack of an import (or require) statement to make it easier to write modular code. This is changing in ES6 but it will take a while before all target browsers support such statement. With the Asset Pipeline I don’t have to worry about it because I may use such “require” statements in comments that are processed by the Asset Pipeline without having to resort to bad techniques like AMD (my opinion, of course).

The Asset Pipeline is also well integrated with the routing system.

Automatic code reloading during development

Booting a Rails application may take a few seconds, so you can’t just load the entire application on each request as you used to do in the CGI era. It would slow down the development a lot. Being able to automatically reload your code so that you have a faster development experience is a great tool provided by Rails. It is far from simple to implement it properly and people often overlook this feature because it always worked great for most people. Creating an automatic-reloading framework for other languages can be even harder. Try to take a look at what some Java reloading frameworks are doing if you don’t believe.

Control over routes

This is supported by most frameworks nowadays but I always wanted this feature when I used to create web sites in Perl long ago. But not all frameworks will make it easy for you to get a “site map” and see all your application routes at once.

Dependency Management

Rails is the main reason why the genius Yehuda Katz decided to create Bundler, the best software dependency management software I know about. Bundler is independent from Rails but I’d say Rails has the credits for inspiring Yehuda to create Bundler but I may be wrong, of course. Ruby had RubyGems for a long while but it suffered from the same problems as Maven.

Without a tool like Bundler you have two options. Always specify the exact version of the libraries you depend on (like Maven users often do) or be prepared to face several issues that may arise from different gem versions that are resolved in different times cause by loose version requirements as it used to be the case with RubyGems users.

Bundler stores a snapshot of the current resolved gems in a file called Gemfile.lock so that it is possible to replicate the entire gem versions under production or other developer’s computer without having to specify exact version matches in your dependency file (Gemfile).

Great testing tools availability

I don’t write integration tests in Grails because it is too slow to boot up the entire framework when I only want to test my domain classes (models in Rails terminology). Writing integration tests in Rails is certainly slower than writing unit tests but it is feasible to write them because Rails boots in a few seconds in the application I maintain. So it is okay to write some integration tests in Rails. I used to use Capybara to write tests for views/controllers interaction but I ended up giving up on this approach preferring to write JavaScript specs to test my front-end code in a much faster way and simply mock jQuery.ajax using my own testing frameworks, oojspec and oojs.

For simple integration tests that only touch the database I don’t need to even load the entire Rails application, which is much faster. I find this flexibility really awesome and makes test writing a much pleasant task.

Other tools that help writing tests in Rails apps are RSpec and FactoryGirl among many others. Most of them can be used outside of Rails scope, but when comparing Rails to non-Ruby web frameworks, it is great to point out how writing web applications with Rails will make automatic testing an easier task than with other languages.

The Rails guides and community

The Rails guides are really fantastic and cover most of the common tasks you need when programming a web applications with Rails. Also, anyone is free to commit any changes to the guides through the public repository docrails and that seems to work great. I’ve even suggested this approach to the Grails core developers a while ago and it also seems this is working great for them as well as their documentation improved a lot since then.

Besides the guides there is plenty of resources about Rails on-line. Many of them are free. There are books (both print and e-books, paid or free), tutorials and several articles covering many topics of web programming in the context of a Rails application. There are even books focused on testing applications, like The RSpec Book, by David Chelimsky. I haven’t found any books focused on testing for Grails or Groovy applications for instance. And I only know about one book focused on JavaScript testing, by Christian Johansen, the author of Buster.js, Sinon.js and one of the maintainers of the Gitorious project.

Rails has a solid community behind it. There are several Rails committers applying many patches everyday and the framework seems to be stronger than ever. You’ll find many useful gems for most tasks you’d think of. They’re usually well integrated to Rails and you may have a hard time if you decide to use another Ruby web framework.

Most of the gems are hosted on GitHub, which is part of the Rails culture I’d say. That helps a lot to contribute back to those gems by adding new features or fixing bugs. And although pull requests are usually merged pretty fast, you don’t even have to wait for it to be merged. You can just instruct Bundler to get that gem from your own fork on GitHub and that is amazing (I wasn’t kidding when I said Bundler is the best software management tool I’m aware of).

Security

Despite all critical security holes found on Rails and other Ruby libraries/gems that popped out recently, Rails takes security very seriously. Once security issues are found they’re promptly fixed and publicly communicated so that users can upgrade their Rails applications. I’m not used to see this attitude in most other frameworks/ libraries I’ve worked with.

Rails also employs some security enhancements to web applications out-of-the-box by default, like CSRF protection and provides a really great security guide that everyone should read, even non-Rails developers.

How Rails gets on my way?

Even though Rails is currently my favorite web framework, it is not perfect. As a matter of fact there are actually many things I don’t like in Rails and this is what this section is all about and also the main motivation for writing this article. The same can be told about Ruby, which is my preferred language, but also has its drawbacks. Not exactly Ruby the language, but the MRI implementation. I’ll get in details in the proper section.

Monolithic design

Rails is not only a web framework and this is really bad from my point of view.

Rails release strategy is to keep the version of all its major components the same one. So, when Rails 3.2.12 is released it will also release ActiveRecord 3.2.12, ActiveSupport 3.2.12, ActionPack 3.2.12, etc. Even if it is a single security fix on ActiveRecord all components will have their version increased. This will also force you to upgrade your ORM if you decide to upgrade your web framework.

ActiveSupport should be maintained in a separate repository for instance as it is completely independent from Rails. The same should be true for ActiveRecord.

The ActiveRecord case

The ORM is a critical part of a web application built on top of a RDBMS. It doesn’t make any sense to me to assume it is part of a web framework. It is not. Its concerns are totally orthogonal (or at least they should be). So, what happens if you want to upgrade your web framework to make use of a new feature like streaming support? What if the newest ActiveRecord bundled with the latest Rails release has incompatible changes in its API? Why should you be forced to upgrade ActiveRecord when you’re only interested in upgrading Rails, the web framework?

Or, what if you love ActiveRecord but are not developing web applications or you’re using another web framework? Why would you have to contribute to Rails repository when you want to contribute to ActiveRecord? Or why don’t you have a separate discussion list for ActiveRecord? A separate site and API documentation?

I solved this problem myself a while ago by replacing ActiveRecord by Sequel and disabling AR completely in my application. Luckily enough I find Sequel has a much better API and solid understanding about how RDBMS are supposed to be used and knows how to take advantage of their features, like transactions, triggers and many others. Sequel will actually advise you to prefer triggers over before/after/around callbacks in your code for many tasks. This is in line with my own feelings about how RDBMS should be used.

Also, for a long while ActiveRecord didn’t support lazy interfaces. Since I’ve stumbled over Sequel several years ago I really loved its API and always used it instead of AR for some of my Ruby scripts, that weren’t related to Rails apps. But for my Rails applications I always tried to avoid adding more dependencies because most gems will just assume you’re using ActiveRecord.

But I couldn’t be more wrong. Since I decided to move over to Sequel I never regretted my decision. It is probably one of the best decisions I’ve made in the last few years. I’m pretty happy with Sequel and its mailing list support. The documentation is great and I have great control over the generated queries, which is very important to me as I often need complex queries in my applications. ActiveRecord is simply too way limited.

And even if Arel could help me to write such queries it is badly documented and is considered a private interface, which means I shouldn’t be relying on its API when using ActiveRecord because theorically AR could change its internal implementation anytime. And the public API provided by AR is simply too poor for the kind of usage I need.

Migrating to Sequel brought other benefits as well. Now the ORM and the web framework can be independently upgraded. For instance, recently there was a security issue found in ActiveRecord which triggered a whole Rails release which I didn’t have to upgrade because it didn’t affect Sequel.

Also, I requested a feature in Sequel a while ago and it got implemented and merged in master a day or two after my request. I tested it on my application by just instructing Bundler to use the version on master. Then I found a concurrency issue with the new feature that affected our deployment on JRuby. In the same day I reported the issue it got fixed on master and I could promptly use it without having to change any other bit of my application.

Jeremy Evans is also very kind when replying to questions in Sequel’s mailing list and will provide great insightful advices once you explain what you’re trying to achieve in your application. He is also very knowledgeable with regards to relational databases. Sequel is really carefully thought and cares a lot about databases, concurrency and many more details. I couldn’t recommend it better to anyone that cares about RDBMS.

Lack of a solid database understanding from the main designer

When I first read about Rails, in 2007, my only previous experience with databases was with Firebird when people used to use Delphi a lot in Brazil. I really loved Firebird but I knew I would have to find something else because Firebird wasn’t often used in web applications and I wanted to use something that was well supported by the community. I also wanted a free database so the options were basically either MySQL or PostgreSQL. I wasn’t really much interested on what database to use since I believed all RDBMS would be essentially the same and I haven’t experienced any issues with Firebird. “It all boils down to SQL” I used to think. So I’ve just made a small research in the web and I found lots of people complaining about MySQL and no one complaining about PostgreSQL. I wasn’t really interested in knowing what people were talking about MySQL and simply decided to go with PostgreSQL at the time since I had to choose one.

A few years later I moved to another company that also happened to use PostgreSQL. Then I used it for 2 more years (4 in total). When I moved my job again, this time the application used a MySQL database. “No problems” I thought as I still believe it all boils down to SQL in the end. Man, I was completely wrong!

After a few days working with MySQL, I noticed too many bugs and bad design decisions that I decided after an year to finally migrate the database to PostgreSQL.

But with so many good conventions that you get when you decide to use Rails, the documentation initially used to use MySQL in the examples. Since lots of people really didn’t have a strong opinion about which database vendor to choose from. That lead the community that was being formed to adopt MySQL in mass initially.

Fortunately it seems the community understands now that PostgreSQL is a much better database but I’d still prefer Rails to recommend towards PostgreSQL in the Getting Started guides.

An example of how bad Rails opinions are over RDBMS is that ActiveRecord doesn’t even support foreign keys, one of the key concepts in RDBMS, in their migrations DSL. That means that the portable Ruby format of the current database schema is not able to restore foreign keys. Hibernate, the de-facto ORM solution for Java-based applications, does support foreign keys. It will even create the foreign keys for you if you declare a belongs-to relationship in your domain classes (models) and ask Hibernate to generate the migration SQL.

If your application needs to support multiple database vendors, I’d recommend you to forget about schema.rb and simply run all migrations whenever you want to create a new database (like a test db, for instance). If you only have to care about a single DB vendor, like me, then just change the AR schema_format to use :sql instead of :ruby. If you don’t care about foreign keys, you’re just plain wrong.

I believe David Heinemeier Hansson is really a smart guy despite what some people might say. I just think he hasn’t focused much on databases before creating Rails or he wouldn’t use MySQL. But there are many other right decisions behind Rails and I find it really impressive the boom DHH has brought to web development frameworks. People often say he is arrogant between other adjectives. I don’t agree. He has a strong opinion about many subjects. So have I and many others. This shouldn’t be seen as impoliteness or arrogance.

Some arrogant core members

People have similar opinion about Linus Torvalds when he is right to the point in his phrases and opinions. He also has strong opinions and a sense of humor that many don’t understand. I just feel people get often easily offended for no good reason these days, which is unfortunate. I have to be extra careful when writing to some lists in the Internet that seems to be even more affected than the usual ones. I have received often really aggressive responses in a few mailing lists for stating my opinions in direct ways that people often consider a rude behavior when I call it a honest and direct opinion. I’m trying to avoid those opinions in some list so that people don’t get mad with me.

I really don’t know those people and I don’t have anything against them. Believe me or not, I’m a good person and have tons of friends and I meet with them very often and they don’t get offended when I’m direct to the point or when I state my strong opinions even when they don’t agree with me. With my closest friends (and even some not that close) I would refer this as the expression “after all, I’m not a girl” in a tone of joke but I can’t tell such things in the Internet or people will criticize me to dead. “You sexist! What do you have against girls?” Nothing at all, it is just an expression often used with humor in my city at least… I love my wife and my daughter is about to born and I’m pretty excited about that. I just think people take some phrases or expressions too seriously.

If you ever have the chance to talk to my friends they will tell you I’m not the kind of guy seeking conflicts but they will tell you that I have lots of strong opinions and that I’m pretty honest and direct about them. They just don’t find it rude but healthy. And I expect the same from them.

It is just sad when I find some angry response from Rails core members in the mailing list for no good reason. If I call some Rails behavior stupid that take it on personal and will threaten stopping helping me because they take my opinion as a personal attack as if I was calling them stupid people. I don’t personally know any of them. How could I find any of them stupid? They are probably much smarter than me but that doesn’t mean I can’t have my own opinions about some decisions behind Rails and find some of them stupid, which doesn’t mean others can disagree with me and think that my way of thinking is stupid. I won’t take it as a personal attack. I swear.

On the other way, I find some of their attitudes really bad. For instance, if you ask for change some behavior in Rails or any of its components some will reply: “send a pull request and we can discuss it. Otherwise we won’t take time to just discuss the ideas with words. Show us code”. I don’t usually see this behavior in most other communities I’ve participated. That basically means: “we don’t care that you spend your valuable time in a code that wouldn’t ever be merged to our project because we don’t agree with the base ideas”. There are many things that can be discussed without code. Asking someone to invest their time writing some code that will be later rejected when it could be rejected before is quite offending in my point of view.

By the way, that is the reason I don’t spend much time in complex patches to Rails. I’ve done that once long ago and I didn’t get feedback from core developers after a while even after spending a considerate amount of time in the patch and adapting many requested changes to it even though I didn’t agree with the changes. So I’d say that my user experience for many libraries is just great but that is not usually the case with the Rails core mailing list. Some of those core developers really believe they’re God gifts to the world which makes it hard to argument with them in several aspects. And if you state your strong opinion about some subject you may be seen as rude and they won’t want to talk to you anymore…

Of course different people will have different experiences but I believe Rails is not the friendlier web framework in my particular case. The Ruby-core list is a totally different beast and I can’t remember any bad experience I had when talking to Matz, Kosaki, Shugo, Nobu and many others. I also had a great experience in the JRuby mailing list, with Charles Nutter and many others. I’ve also talked about the great experience with Jeremy Evans in the Sequel mailing list. I just don’t understand why the Rails core team doesn’t seem to tolerate me. I don’t have any personal issues with any of them. But I don’t usually have a great experience there either so I avoid writing to that list sometimes.

Even after publishing my article with my strong (bad) opinions about Grails I don’t remember any bad experience when talking to them in their list. And I know they read my article as it became somewhat popular in the Grails community and I got even some replies from some of the Grails maintainers themselves.

The Rails API documentation

I remember that one of strong features of Rails 1 was the great API documentation. During the rewrite of Rails 3 lots of great documentation was deleted in the process and either got lost or was moved to the Rails guides.

Currently I just stop trying to find any documentation by looking at the API documentation site. I used to do that a lot in the Rails 1 era. So sad the current state is really bad to the point that I find it almost unusable preferring to find the answers to what I’m looking for on StackOverflow, asking on mailing lists, digging into the Rails source code or by other means. If I’m lucky, the information I’m looking for is documented in the guides, but otherwise I’ll have to spend some time searching for it.

YAML used instead of plain Ruby to store settings

Rails provides us 3 environments by default: development, production and test. But in all projects I’ve worked with I always had a staging environment as well. Currently our deployment strategy involves even more environments. Very soon we realized that it wasn’t easy to manage all those environments by having to tweak so many configuration files: config/database.yml, config/mongo.yml, config/environments/(development|test|production).rb and many other kept popping up. Also, when you run tasks like “rake assets:precompile” it will use the production environment by default while it would use development by default for most tasks.

Every time we needed to create a new environment it was too much work for us to manage. So we ended up by dropping all those YAML files and simple symlink config/settings.rb to config/settings/environment_name.rb. We also symlinked config/environments/*.rb to all point to the same file. We would also manage the different settings in config/settings.rb. So we have staging.rb, production.rb, test.rb, development.rb and a few others under config/settings. We simply symlink the one of interest in config/settings.rb, which is ignored by Git.

The only exception is that test.rb is always used when running tests. That worked out much better for us and it is much easier for us to create a new environment and have all settings, like Redis, Mongo, PostgresSQL, integration URLs and many more settings grouped in a single file symlinked as settings.rb. Pretty simple to figure out what needs to be changed as well as base our settings on top of another existing environment.

For instance, staging.rb would require production.rb and overwrite a few settings. This is a much improved way of handling multiple environments than the standard way most Rails applications implement, by maintaining sparse YAML files among some DSLs written in Ruby (like Devise and others).

I believe the Grails approach of allowing external overrides Groovy files to better configure the application in a per environment basis a better convention to follow than the one suggested by Rails. What is the advantage of YAML(.erb) files over plain Ruby configuration files?

Deployment / scalability

One of the main drawbacks of Rails in my opinion is that it waited too long to start thinking seriously about threaded deployment. Threads were often successfully used by many web frameworks in many languages but for some reason it has been neglected in the Ruby/Rails community.

I believe there are two major reasons for that. The Ruby community usually focus on MRI as the Ruby implementation of choice and MRI has a global interpreter lock that prevents multiple threads running Ruby code to be executed in parallel. So, unless your application is IO intensive you wouldn’t get much benefits from using a threaded approach. I blame MRI for this as they don’t really seem to be bothered by GIL. I mean, they would probably accept a patch to fix the issue but they’re not willing to tackle the issue themselves as they believe forking is just as good solution. And this leads to the next reason, but before that I’d just like to notice that JRuby always performed great in multi-thread environments and that I think Rails took too long before taking this approach more seriously and consider JRuby as a viable deployment environment for the threaded approach. Threads are in my opinion the proper way of handling concurrency in most cases and I really think that should be the default one as in most other web frameworks in other languages.

Now to the next reason why people usually prefer multi-process over multi-thread deployment in the Ruby community. I’ve asked once on the MRI mailing list what was the status of threads support in MRI. Some core committers told me that they wouldn’t invest time on getting rid of the GIL mainly because they feel forking was a better fit most of the times. It avoided some concurrency issues one might experience when using threads. They also argued that they didn’t want Ruby programmers to have to worry about thread-safety, locks, etc. I don’t really understand why people are so afraid of threads and why they think they’re so hard to use in a safe way. I’ve worked with threaded applications for many years and I didn’t have this bad experience several developers complain about.

I really miss proper threading support in MRI because a threaded deployment strategy allows much better memory usage under high load than the multi-process approach and it is much easier to scale. That is also the reason why I think it should be the default. It would avoid the situation where people have to worry about deployment strategies too early in the process. They think about load balancers, proxy, etc. when a single threaded instance would be enough for a long time before your application starts having throughput issues. But if you deploy a single process using a single-thread approach, you’ll very soon realize it doesn’t scale even to your few users. That’s why I believe Rails should promote threaded deployment by default since it is easier to start with.

But the MRI limitation makes this decision hard to make. Specially because the development experience is usually much better on MRI than it is on JRuby. Tests will start running much faster on MRI and some tools that will speed up it even more won’t work well on JRuby, like Spork and similar gems.

So, I can’t really recommend any solution to this deployment problem with Rails. Currently we’re using Unicorn (multi-process) + MRI to deploy our application but I really believe this isn’t the optimal solution to web deployment and I’d really love to see this situation improved in the next years.

Apart from the deployment issues I always missed streaming support in Rails but I haven’t created a section about it in this article because Rails master already seems to support it and Rails 4 will probably be released soon.

The MRI shortcomings

When it comes down to the MRI implementation itself, the lack of a good thread support isn’t the only thing that annoys me.

Symbols vs Strings confusion

I can’t really understand the motivation for symbols to exist in Ruby. They cause more harm than good. I’ve discussed my opinions already a lot here if you’re curious about it.

To make things worse, if the harm and confusion caused by symbols with no apparent benefits wasn’t a reason good enough to get rid of them, attackers are often trying to find new ways to create symbols in web applications. The reason for that is that symbols are not garbage collected. If you employ the threaded strategy when deploying your application and an attacker could get your application to create more symbols your application would crash at some point due to memory leak since symbols are never garbage collected, although it might change at some point.

Autoloading

Autoload is a Ruby feature that allows some files to be lazy loaded, thus improving the start-up time to boot Rails in development mode for instance. I’m curious to know if the lazy approach really makes such a big difference when comparing to just require/load all files. And if it does, couldn’t this load time be improved somehow?

The problem with autoload is that it can create bugs that are hard to track and I indeed have been bitten by a bug caused by autoload. Here is an example of how it can be triggered:

1	#./test.rb:
2	autoload :A, 'a'
3	require 'a/b'
4
5	#./lib/a.rb:
6	require 'a/b'
7
8	#./lib/a/b.rb:
9	module A
10	module B
11	end
12	end
13
14	#ruby -I lib test.rb

Design opinions

I really prefer code that makes its dependencies very explicit. Some languages, like Java and most static ones, will force this to happen. But that is not the case in Ruby.

Rails prefers to follow the Don’t-Repeat-Yourself principle instead of being always explicit about each file dependencies. That makes it impossible for a developer to use a small part of some Rails component because they are designed in such a way that you have to require the entire component and not just part of it even if that file is pretty independent from everything else.

Recently I wanted to use some code in ActionView::Helpers::NumberHelper in my own class ParseFormatUtils. Even though my unit tests worked fine when doing that, my application would fail due to circular dependencies issues caused by autoload and the way the Rails code is designed.

In my applications it is always very clear what each class is responsible for. Rails controllers will only be concerned about the web layer and most of the logic will be coded in a separate class or module and tested independently. That makes testing (both manual and automated) much easier and faster and also makes it easier for the project developers to understand and follow the code.

I’m really sad that Rails doesn’t share my point of view with regards to that and thinks DRY principle is more important than being explicit about all dependencies in each file.

Final notes

Even though there are several aspects of Rails I dislike I couldn’t actually suggest a better framework for a web developer. If I weren’t using Rails I’d probably be using some other Ruby web framework and create some kind of Asset Pipeline and automatic reload mechanism but I don’t really think it would worth the benefits.

All Rails issues are manageable in my opinion. I think other frameworks I’ve worked with are not manageable. The have some fundamental flaws that prevent me from actually considering them if the choice is mine to make.

I’ve reported some serious bugs to Grails JIRA almost an year ago for instance with test cases included and they haven’t been fixed yet. This is something to be really worried about. All Rails issues are easily manageable in my opinion.

I may not deploy my application they way I’d prefer but Unicorn is currently fitting our application needs well enough. I can’t require just ‘action_view/helpers/number_helper’ but requiring full ‘action_view’ instead isn’t that bad either.

I’d just like to state that even though I don’t consider Rails/Ruby to be perfect, they’re still my choice when it comes down to general web development.

How NokoGiri and JRuby saved my week

2012-03-04T12:30:00+00:00

I’d like to share some experiences I had this week trying to parse some HTML with Groovy.

Then, I’ll explain how it was better done with JRuby and it was also finished much faster too.

This week I had to extract some references from some HTML documents and store them to the database.

This is the spec of what I wanted to implement in MiniTest specs written in Ruby:

1	# encoding: utf-8
2	require 'minitest/autorun'
3	require_relative '../lib/references_extractor'
4
5	describe ReferencesExtractor do
6	def example
7	%Q{
8
9
10
11	some text
12
13
14
15	First paragraph.
16	Second paragraph.
17
18	Another paragraph.
19
20
21	}
22	end
23
24	it "extract references from example" do
25	return
26	extractor = ReferencesExtractor.new example
27	{
28	['1'] => {'1' => "some text First paragraph. Second paragraph. Another paragraph."},
29	['1211', '1212', '11'] => {'121' => "First paragraph. Second paragraph."},
30	['1211', '1212', '122'] => {'12' => "First paragraph. Second paragraph. Another paragraph."},
31	['12', '1212'] => {'12' => "First paragraph. Second paragraph. Another paragraph."},
32	['1212', '122'] => {'1212' => "Second paragraph.", '122' => "Another paragraph."},
33	}.each {\|cids, expected\| extractor.get_references_texts(cids).must_equal(expected) }
34	end
35	end

I had a similar test written using JUnit, with a small change to make it more easy to implement but I’ll discuss it later on in this article. Let me just explain this situation better.

Don’t ask me what “cid” means as I wasn’t the one to name this attribute, but I guess it is “c…” id, although I have no clue what is “c…” all about. It was already called this way when I started working on this project and I’m the sole developer of this project right now after lots of other developers having worked on it before me.

Part of the application I maintain has to deal with documents obtained from Edgar filings. Then a processing is made to each HTML tag so that they’re given sequential unique numbers in the “cid” attribute. Someone will then be able to review the documents and highlight certain parts of it by clicking on the elements in the page. So the database has a reference to a document and a cid list, like “1000,1029,1030” will all elements that should be highlighted. This was stored exactly this way as a string in a database column.

But some weeks ago I was requested to export the contents of some highlighted references to an Excel spreadsheet and this is somewhat more complex than it looks like. With jQuery, it would be equivalent to “$(‘[cid=12]’).text()”.

For performance reasons in the search interface I had to import all references from over 3,000 documents to the database. For the new references, I’ll do the processing with jQuery and send it already formatted to the server, but I need to do the initial import and doing the batch processing in the client-side would be painfully slow for this case.

But getting the correct output in the server-side is not that simple. For example, for those documents, there is no CSS involved, making it simpler to deal with. So “

some t

” should be stored as “some t ex t” while “

some text” should be stored as “some text”. Since this requires a deeper understanding of HTML semantics, I decided to simplify it while dealing with Groovy and assume all elements as being block-level elements while parsing the fixed HTML as XML.

The Groovy solution

Doing that in Groovy took me a full week specially due to lack of documentation of XmlParser and XmlSlurper Groovy classes.

First, I had no clue which one to choose. As they had a similar interface I decided to start with XmlParser, and then change to XmlSlurper when it was finished to compare the performance between them.

I couldn’t find any methods for searching for some XPATH or CSS expression. When you write “new XmlParser().parseText(xmlContent)”, you get a Node.

XmlParser is not an HTML parser, so the XML content should be well formed, then you need to use some library like NekoHTML or TagSoup. Then you would use it like “new XmlParser(new Parser()).parseText(xmlContent)” That’s ok, but if you want to play with it and don’t know Groovy enough for dealing with Gradle and Maven dependencies, just use a valid XML as an example.

Since I couldn’t find a search-like method for Node, I had to look for node ‘[cid=12]’ with something like this:

1	xmlContent = ' some text as an example . '
2	root = new XmlParser().parseText(xmlContent)
3	node = root.depthFirst().find { it.@cid == '12' }

Calling “node.text()” would yield to ‘some text.’ and calling “node.children()” would yield to [‘some text’, spanNode, ‘.’], which means it ignores white spaces, so it is of no usage to me.

So, I tried XmlSlurper. In this case, node.text() yields to ‘ some text as an example .’. Great for this example, but when applied to node with cid 12 in the MiniTest example above, it would yield to ‘First paragraph.Second paragraph.Another paragraph.’ ignoring all white spaces, so I couldn’t use this.

But after searching a lot, I figured out that there was a class that would convert some node back to XML including all original white spaces, so it should be possible. Then I tried to get the text by myself.

“node.children()” returned [spanNodeChildInstance], ignoring the text nodes, so I was out of luck and had to dig into its source code. Finally after some hours digging the source-code I found what I was looking for: “node[0].children()” returning [‘ some text ’, spanNode, ‘.’].

It took a while before I could get this to work, but I wasn’t finished with it. I would have to navigate the XML tree for getting the final processed text. Look at the MiniTest example again and you’ll see that I needed to get node with cid 12 as equivalent to the cid list [1211, 1212, 122].

So, one of the features I needed is to look for the first node ancestral having a cid, so that I could try it to see if it was a possible node. It happens that it was not that simple as while traversing the parents maybe I couldn’t find any parent node with a cid. So, how could I check that I’ve reached the root node?

With XmlSlurper, when you call rootNode.parent() you’ll get rootNode. So, I tried something like this:

1	parent = node.parent()
2	while (!parent.@cid && parent != parent.parent()) parent = parent.parent()

But the problem is that the comparison is made by string, so I have no real way to see if I have reached the parent. So, my solution was to check for “node.name() != ‘html’” in this case. This is really a bad API design. Maybe root.parent() could return null. Also, I should be able to compare a node instead of its text.

After several days, in the end of last Thursday I could get a “working” version of a similar JUnit test passing with an implementation in Groovy. But as I wasn’t using really an HTML parser, but an XML one, it means that I couldn’t process white-spaces correctly for in-line blocks.

NokoGiri

Then, on Friday morning I was curious how I could parse HTML with Ruby, as I never did it before. That was when I got my first smile that morning when I read this from Aaron Patterson documentation of NokoGiri:

XML is like violence - if it doesn’t solve your problems, you are not using enough of it.

The smile got even bigger when I tried this:

1	require 'nokogiri'
2	Nokogiri::HTML(' Some Text.').text == 'Some Text.' # true

The smile has shrunk a bit when I realized that I would get the same result if I replaced the inline “b” block element with a “div”. But that is ok, it was already good enough.

Other than the “text” method being more useful than the one used by XmlSlurper (new-lines are treated differently), navigating the XML tree is also much easier with NokoGiri. But I still couldn’t find a good way of finding out if some node was a root one, as calling “root.parent” would raise an exception. Fortunately, as NokoGiri supports XPATH, I didn’t need to do this manual traversing and this wasn’t an issue to my specific needs.

But there was a remaining issue. It performed very badly when compared to the Groovy version, about 4 times slower. Looking at my CPU usage statistics it was obvious to me that it wasn’t using all my CPU power, as in the Groovy version. It didn’t matter how much threads I used with CRuby, each processor wouldn’t be over 20% of the available capacity.

JRuby to the rescue

It is a shame that the Java API actually has a better API than Ruby for dealing with a pool of threads. It is called the Executors framework. As I couldn’t find something like this in the Ruby standard library, I tried a Ruby gem called Concur.

I didn’t investigate if the performance issues were caused by Concur implementation or the CRuby one, but I decided to give JRuby or Rubinius a try. As I already had JRuby available, I tried it first and as the results were about the same as the Groovy version, I didn’t bother to check Rubinius.

With JRuby I could use the Java Executors framework just like in Groovy and I could see all my 6 cores above 90% all the time my 10 threads have been working for importing over 3,000 documents. Unfortunately my actual servers are much slower than my computer and it took more than 4 hours in the staging server when it took about an hour and a half in my computer. The CRuby version would probably take more than 4 hours in my computer, which means it could take almost a full day in the staging and production servers.

Conclusion

I must explain that I haven’t tried using Ruby first because I would be able to take advantage of my models being already mapped by the Grails application, so I wouldn’t have to deal with database set-up and would be allowed to have all my code in a single language. Of course, if I knew beforehand all the pain that it would be coding this in Groovy, I would have already done this in Ruby from the beginning. And the Ruby version was a bit better than my previous attempt with Groovy with regards to some corner cases including new-lines processing.

I’m very grateful for Aaron tendelove Paterson and Charles Nutter for their awesome work on Ruby, NokoGiri and JRuby. Thanks to them I could get my work done very fast in an elegant way, saving my week of frustration with Groovy.

Should we move forward or remain backward compatible?

2012-03-04T12:30:00+00:00

This is just an article’s title, not really a question with a right answer.

It is not always possible to both move forward and remain compatible with legacy code.

Usually, when a project starts there is no legacy code and every change is welcomed. Later on, when the project grows and the user’s code base gets bigger, some people will start complaining about incompatible changes because they’ll have to spend some time changing their code base when they decide to upgrade to a newer version.

When this time comes, the project has to make a decision. It should either keep moving forward and fixing badly designed API when they realize there is a better way of doing things or they should accept that an API change can be very painful for their framework/library users and decide to keep on with the bad API. Java definitely opted for the latter.

The Rails case

In the last weeks, I’ve been reading some articles complaining about Rails changing its API in incompatible ways too fast.

They’re not alone and I’ve seen complaints about this from several other people. In the other side I’m constantly refactoring my own code base and I appreciate Rails doing the same. In the case of libraries and framewoks, when we’re refactoring code, sometimes we come to the conclusion that some API should be better written even if it breaks old software. And I’m also not alone in thinking this way.

Unfortunately, I couldn’t find an employer to pay me to work with Rails as much as I do as a Grails/Groovy/Java developer for the last 3 years. And that is really a pain with regards to API, stability and user experience. I don’t remember complaining about anything in Ruby or Rails that I really missed since internationalization support was added to Rails in version 2.

The Groovy / Java case

This section has grown too fast, so I decided to split it in another article entitled How NokoGiri and JRuby saved my week.

You don’t have to read the entire article if you’re not curious enough, but the Groovy XML parsers API was so badly designed and documented that I could finish the logic with Ruby and NokoGiri in about 2 hours (with tests and setup included) while I spent the entire week trying to do the same in Groovy.

And the result in Ruby would take about the same time for the import to complete. I had to dig into Groovy’s source code due to lack of documentation and do lots of experiments to understand how things worked.

You can fix documentation issues without changing the API, but you can’t fix design issues with Groovy parsers without changing its API. So, is it worth keeping the API just for being backward-compatible and make XML parsing a pain to work with in Groovy?

Then what?

There is not a better approach to take when you decide for remaining backward compatible or keep forward. So, each project will adopt some philosophy and you need to know its philosophy before adopting it or not.

If you prefer API stability over consistency and easy of use, you should choose something like Java, C++, Perl, PHP or Grails. You shouldn’t be really considering Rails.

In the other hand, if you like to be on the edge, then Rails is exactly the way to go.

Which one to choose will basically depend on these questions:

Do you have a good test coverage of your code base?
Do you have to respond really fast to changes?
Will your code hardly change after it is finished?

If you answered “yes” to 3, than you should consider a framework that will avoid very hard to break its API, since no one will constantly maintaining your application to keep up with all the framework upgrades with fixed security issues, for example.

In the other hand, if you have answered “yes” to 1 and 2, using a fast pace changing framework like Rails shouldn’t be an issue. In my case, I don’t write tests for my views as they’re usually very simple and doesn’t contain logic. So, when Rails changed some rules about when to use “<%= … %>” or “<% … %>”, I had to manually look at all of my views to fix them. And I had to do that twice between Rails 2 and Rails 3.1, for example because they did change this behavior back and forward and this is the kind of unnecessary change in my opinion.

Other changes I had to manually check because I don’t test my views is due the change of the output of ERB tags being escaped by default. But that is a good change and I’m pretty sure I forgot to manually escape some of them before the upgrade. So, my application was probably safer against attacks after the upgrade, so this is a good move even so it took a while for me to finish the upgrade. There was no easy path for this change.

But other than that, it was just a matter of making the test suite pass after the upgrade, and if you valuate code refactoring as much as I do, you’ll be writing tests for all code that could possibly break in some refactoring.

And this was a hard issue I have with Grails. I find it too time demanding to write tests for Grails applications and it was really a pain before Grails 2 was released. It is still not good, but I can already write most of my unit tests in Grails without much problem.

So, I would suggest you to answer the above questions first before choosing what web framework to adopt. It is not right to get a fast moving framework because its API is better designed and then later in the future ask their maintainers to stop changing because now you have a working application.

You should know how they work beforehand and accept this when you opt of it.

Generating PDF with ODF templates in Rails

2010-03-16T21:00:00+00:00

In 2009, I wrote an article for the Rails Magazine Issue #4 - The Future of Rails - where I presented an alternative to PDF generation from ODF templates, which can be generated using a regular text processor such as OpenOffice.org or Microsoft Office (after converting the document to ODF).

You can read the entire article downloading this magazine for free or purchasing it. The application code illustrating this approach was published by the magazine on Github.

Unfortunately, I can’t host a working system providing a live demonstration due to my Heroku account limitations, but it should be easy to follow the instructions in the article on your development or production environment.

Do not hesitate in sending me any questions, through comments on this site or by e-mail, if you prefer.