Bugsnag is a great error monitoring service that takes care of reporting and filtering/notifying exceptions in several kind of applications. I used to use my own error reporting tool in the app I currently maintain but as I’m currently evaluating creating a new application, I started to evaluate Bugsnag to save me some time. But I stumbled upon an issue I didn’t have to deal with my custom error reporting tool.
When reporting errors, it’s a good idea to attach as much meaningful data as they could be quite helpful when trying to understand some errors, specially when they aren’t easily reproducible. Such data include user information which I’d prefer not to expose to the front-end, including the user id.
I was initially worried about exposing the API key to the front-end, which someone could use to report errors to my account, but then I figured out I was being too paranoid and that proxying the request wouldn’t prevent users from reporting errors to my account, unless I’d implement some sort of rate limit protection or disabling errors reporting for non authenticated users (after all, I’d be able to track authenticated users acting that way and take some action against them).
However, hiding from the front-end user data meant to be used only internally is important to me. That’s why I decided to take a few hours to proxy browsers errors through the back-end. Here’s how it was implemented using the official bugsnag-js npm package and the bugsnag Ruby gem.
In the JavaScript code, there’s something like showed below. I used XMLHttpRequest rather than fetch in order to support IE11 since the polyfills are lazy loaded as required in our application and fetch may not be available when Bugsnag is initialized in the client:
1 | import bugsnag from 'bugsnag-js'; |
2 | const bugsnagClient = bugsnag({ |
3 | apiKey: '000000000000000000000000', // the actual api key will be inserted in the back-end |
4 | beforeSend: report => { |
5 | const original = report.toJSON(), event = {}; |
6 | let v; |
7 | for (let k in original) if ((v = original[k]) !== undefined) event[k] = v; |
8 | report.ignore(); |
9 | |
10 | const csrf = (document.querySelector('meta[name=_csrf]') || {}).content; |
11 | const xhr = new XMLHttpRequest(); |
12 | xhr.open('POST', '/errors/bugsnag-js/notify?_csrf=' + csrf); |
13 | xhr.setRequestHeader('Content-type', 'application/json'); |
14 | xhr.send(JSON.stringify(event)); |
15 | } |
16 | }); |
The back-end is a Ruby application built on top of the Roda toolkit. It uses the multi_run plugin, splitting the main applications into multiple apps (which can be seen as powerful controllers if it helps understanding how it works). These are the relevant parts of the back-end:
lib/setup_bugsnag.rb:
1 | # frozen-string-literal: true |
2 | |
3 | require 'app_settings' |
4 | require_relative '../app_root' |
5 | |
6 | if api_key = AppSettings.bugsnag_api_key |
7 | require 'bugsnag' |
8 | |
9 | Bugsnag.configure do |config| |
10 | config.api_key = AppSettings.bugsnag_api_key |
11 | config.project_root = APP_ROOT |
12 | config.delivery_method = :synchronous |
13 | config.logger = AppSettings.loggers |
14 | end |
15 | end |
app/apps/errors_app.rb:
1 | # frozen-string-literal: true |
2 | |
3 | require 'json' |
4 | require_relative 'base_app' |
5 | require 'bugsnag_setup' |
6 | |
7 | module Apps |
8 | class ErrorsApp < BaseApp |
9 | private |
10 | |
11 | def process(r) |
12 | super |
13 | r.post('bugsnag-js/notify'){ notify_bugsnag } |
14 | end |
15 | |
16 | def notify_bugsnag |
17 | api_key = settings.bugsnag_api_key |
18 | head :ok unless api_key && settings.store_front_end_errors |
19 | event = JSON.parse request.body.read |
20 | user_data = auth_session.to_h |
21 | user_data['id'] = user_data['profile_id'] |
22 | event['user'] = user_data |
23 | event['apiKey'] = api_key |
24 | event['appVersion'] = settings.app_version |
25 | payload = { apiKey: api_key, notifier: { |
26 | name: 'Bugsnag JavaScript', version: '4.3.0', url: 'https://github.com/bugsnag/bugsnag-js' |
27 | }, events: [event] } |
28 | configuration = Bugsnag.configuration |
29 | options = { |
30 | headers: { |
31 | 'Bugsnag-Api-Key' => api_key, |
32 | 'Bugsnag-Payload-Version' => event['payloadVersion'], |
33 | } |
34 | } |
35 | Bugsnag::Delivery[configuration.delivery_method]. |
36 | deliver(configuration.endpoint, JSON.unparse(payload), configuration, options) |
37 | |
38 | 'OK' # optional response body, could be empty as well, we don't check the response |
39 | end |
40 | end |
41 | end |
That’s it, some extra code, but it allows me to send useful information to Bugsnag while not requiring us to expose them to the front-end application. Hopefully next time I need something like that it will help to have it written down here ;)
Seriously, this section is big and not important at all, feel free to completely skip it right now if you’re short in time or don’t enjoy rants.
This is a rant explaining how ActiveRecord migrations completely defined my career in the past years.
I became curious about programming and computers when I was kid. I remember reading a huge C++ book when I was about 10 years old. I had learned Clipper just a bit before and I recall creating a Bingo game with Clipper, just because I wanted to play Bingo in those machines but I couldn’t :) While learning Clipper I also had my first experience learning SQL and client-server design. My dad subscribed me to a few computer courses by that time, such as “DOS/dBase III Plus”, Clipper + SQL and a few years later Delphi + Advanced SQL. I learned C and C++ from books and when services like Geocities and similar were showing up and the Internet was becoming supported in lots of homes I also became interested in learning HTML to build my own sites, the new hotness for that time. Since I also wanted to serve dynamic content, I decided to learn Perl since it was possible to find some free hosting services supporting Perl, and that was the first interpreted language I learned and I was really fascinated by it by that time.
For a long while I used Perl exclusively for server-side web programming since it was the only option I could find in free hosting services, but while in Electrical Engineering college, I barely did any web programming, and my programming tasks (extra classes) were mostly related to desktop programming (Delphi / C++) and embedded and hard real-time systems using a mix of C and C++ during my master thesis in Mobile Robotics. By that time I had a solid understanding of C and C++, good times, I don’t find myself proficient with them anymore these days. That was a time where I would read and know the entire specs from W3C or HTML 4.01 and CSS. Today it’s simply unfeasible to completely follow all related specs and I’m glad we have competition in the browser’s marketing since it’s really hard to follow up with all changes happening every day.
Once I finished my master thesis and had to find a job, I looked mostly for programming jobs, since I considered myself good in programming, there were lots of interesting opportunities out there while it was really hard to find companies in Brazil working on electronic devices development or Robotics and I never actually enjoyed the other part of Electrical Engineering such as machines, power or electrical installations. I only enjoyed the micro-electronics and embedded devices creation and one should consider themselves very lucky if they can work in such area in Brazil, and I didn’t want to count on luck, so I decided to focus on the programming career instead. I remember my first curriculum was sent to Opera Software, my preferred browser, to apply to a C++ developer position, by that time, but after tons of interviews they didn’t call me, so I’m not currently living in Norway these days ;)
After working for 3 months in a new parking system using Delphi (despite asking for using C++ instead) the contract was finished, the product was already working in one of the malls in my city, and I had to look for another job. They actually extended the offer to keep working with them, but at the same time I found another opportunity and this time I would have to get back to web programming. That was in 2007. Several years later and I couldn’t really remember much of Perl and a lot had happened to web programming in the past years and I didn’t follow that progress.
After a few stressful days trying to learn about every major web programming framework (specially while trying to read about J2EE), I came to the conclusion that I would finally choose one of TurboGears, Django or Rails. I didn’t know Java, Python or Ruby by that time, so the language didn’t take an important role while choosing the framework. I was more interested in learning about how the frameworks would make my life easier. At that time I had to maintain an existing ASP application but at some point I would have to create a new application and I could choose whatever I wanted and definitely I didn’t enjoy ASP.
Since that application had to be displayed in Portuguese, I was considering the Python frameworks more than the Ruby one, as Rails didn’t support internationalization by that time (i18n support was added to Rails 2 if I recall correctly) and even supporting UTF-8 wasn’t straightforward with Ruby 1.8. Iconv and $KCODE were something you’d often hear about in the Ruby community by that time. There were tons of posts dedicated to encoding in Ruby by that time.
But there was that one Rails feature that made me change my mind and choose Rails over TurboGears or Django, which were supposed to work well with encodings and had announced internationalization support. And it was the approach used to evolve databases, which was the right strategy to use from my previous experiences, while I was pretty scared by the model-centered approaches used by TurboGears and Django to handle the database evolution.
By that time I had already plenty of experience working with RDBMS, specially Firebird, and having to deal with versioning the database and supporting multiple environments. That took me a lot of effort every time I started a new project because I basically had to implement the ActiveRecord migrations features every time and I knew that was very time consuming, so I was glad I wouldn’t have to roll my own solution if I used Rails, as ActiveRecord migrations were clearly more than enough for my needs and they worked pretty well. So, despite the issues with encoding and lack of internationalization support, I decided to pick Rails due to the ActiveRecord migrations.
And even though I don’t use ActiveRecord for several years, I’ve been still using its migrations tools since 2007, more recently through my wrapper around it called active_record_migrations.
While I don’t appreciate ActiveRecord as an ORM solution, I like its migrations tooling very much and they haven’t changed much since I used them with Rails 1. The most significant changes since then were support for time-stamped migrations, the reversible block and finally, many years later, proper support for foreign keys (I struggled to add foreign keys using plain SQL for many years).
When I first read about Sequel I was fascinated by it. ActiveRecord wasn’t built around Arel yet by that time, so all those lazy evaluations in Sequel were very appealing to me. But around 2009 I took another job opportunity and this time I would work with Grails and Java rather than Rails, so I missed many recent changes to Rails for a while. In 2011 I changed my job again, but still had to support a Grails application, but I was free to do whatever I liked to the project and since there were quite a lot of Grails bugs that were never fixed and I couldn’t find work-arounds for, I decided to slowly migrate the Grails app to Rails. By that time, Arel had been integrated to ActiveRecord, so it would finally support lazy evaluation as well, so I decided to try to stick with Rails defaults, but a week later I realized that there were still many more reasons why Sequel was far superior to ActiveRecord and decided to replace ActiveRecord with Sequel and never looked back. Best decision ever.
See, I’m a database guy. I work with the database, not against it. I don’t feel the need to abstract the database because I’d prefer to use Ruby over SQL. I was able to appreciate not only SQL but several other powerful tools provided by good database vendors, such as triggers, CTE, stored procedures, constraints, transactions, functions, foreign keys and definitely I didn’t want to avoid the database features at all. ActiveRecord seems to try to focus on hiding the database from the application, by trying to abstract as much as possible so that you feel you’re just working with objects. That’s probably the main reason why I loved Sequel. Sequel embraced the database, it didn’t fight the database. It would try to make it as easy as possible to use whatever vendor-specific feature I wanted to, without getting in my way. That’s why I don’t see Sequel as an ORM, but as a tool that allows me to write the SQL I want with a level of control and logic that would be pretty hard to achieve by building SQL queries through concatenation techniques and manual typecasting of params and result sets.
I can always have a clear idea on the SQL generated by Sequel and it’s way more readable than if I had to write the SQL by hand myself.
When I first learned about Sequel, Jeremy Evans was already its maintainer, but it seems Sequel was first created by Sharon Rosner. Recently I read this article, where this quote came to my attention:
I’m the original author of Sequel [1], an ORM for Ruby. Lately I’ve been finding that ORM’s actually get in the way of accomplishing stuff. I think there’s a case to be made for less abstraction in programming in general, and access to data stores is a major part of that.
For an in-production system I’ve been maintaining for the last 10 years, I’ve recently ripped out the ORM code, replacing it with raw SQL queries, and a bit of DRY glue code. Results: less code, better performing queries, and less dependencies.
- Sharon Rosner, Sequel original author
Good that it’s working well for him, but I really find it weird to see that he would consider Sequel a traditional ORM. To me, Sequel allows me to write more maintainable queries, so I consider it more of a query builder than an ORM. If I had to build all SQL by hand and typecast params and result sets by hand, I think the result would be much worse, not better.
So, nowadays, I’m considering creating a brand new application after several years, and I’m frustrated that it takes a really long time to bootstrap a production-ready new application with the state-of-the-art features. I started working on such sample project to serve as a start point. The idea is to add features such as automated deployment, including blue-green (canary) strategies for zero downtime, using Roda as the Ruby framework, Webpack to bundle static resources, support a lightweight alternative to React, such as Dio.js or Inferno.js, supporting multiple environments, flexible configurations, client-side routing, proper security measures (CSRF, CSP headers), a proper authentication system, such as Rodauth, proper images uploading (think of Shrine), distributed logging (think of fluentd) with proper details, reliable background jobs, server-side and client-side testing, support for lazy code loading for both client-side and server-side, autoreloading of Ruby code in the server-side, analytics, APM, client-side performance tricks such as link preloading, performance tracking for both server-side and client-side code, errors tracking for both server-side and client-side code, integrated with sourcemaps and notifications from monitoring services, CDN support, full-text search through ElasticSearch or Solr, caching storage such as Redis, Docker based infra-structure, backup, high-availability of databases, and many many more features that are supposed to be found in production-ready applications. As you can see, it’s really frustrating to create a new application from scratch these days, as it seems any new product could easily take an year to reach a solid production-ready level. And, of course, support for database migrations.
The last thing I would want to worry about while working on this huge project is to waste time with a simple task, such as managing the database state through some migrations and related tools. Specially as ActiveRecord migrations have been providing that for so long and it works pretty well. However, this time I really wanted to ditch the dependency on railties for this new project, and active_record_migrations relies on railties for simplicity, so that it can take advantage of the Rails generators and just be a very simple wrapper around ActiveRecord migrations. But since AR itself won’t be used in this project, I decided to spend several hours (about two full days), replicating the most important tools from ActiveRecord to Sequel. And this is how sequel_tools was born this week.
I find it interesting how such a little detail, like Rails bundling a proper database migrations tooling, influenced a lot of my career, since I only learned Ruby because of Rails in the first place and I only chose Rails because of ActiveRecord migrations :) If I was working with Python I wouldn’t have learned Ruby most likely and wouldn’t work in my current job, and wouldn’t have created many gems such as:
I’ve also been using Ruby for some other projects such as cert-generator, a Rack application that can be launched from a Docker container that allows development suited auto-signed root CA and HTTPS certificates in such a way supported by modern browsers. I’ve written about it in my previous article.
Or I wouldn’t have contributed to some Ruby projects such as Rails, orm_adapter-sequel, Redmine, Gitorious (now dead), Unicorn, RSpec-rails, RSpec, Capistrano, Sequel, js-routes, jbundler, database_cleaner, Devise, ChiliProject, RVM, rails-i18n, rb-readline and acl9. Most of them were minor contributions or documentation updates, but anyway… :)
Not to mention many bugs reported to MRI, JRuby and Ruby projects that have been fixed since then. And, before I forget, some features have been added to Ruby after Matz approved some of my requests. For example, the soon to be released Ruby 2.5 is introducing ERB#result_with_hash (see issue #8631.
Or my request to remove the ‘useless’ ‘contatenation’ syntax that was approved by Matz about 5 years ago, and I still hope someone would implement it at some point :)
I wonder what would be my current situation if ActiveRecord migrations weren’t bundled with Rails in 2007 :) On the other side, maybe I could have become rich working with Python? ;)
If you’re a Sequel user, you probably spent a while searching for Rake integration around Sequel migrations and realized it was more time than you’d wished. I’ve been in the same situation, but it was so frustrating to me, because I wasn’t able to find all tasks I want to have at disposal, that I’d often just forget about using Sequel migrations to stick with ActiveRecord migrations. Not because I like the AR migrations DSL better (I don’t by the way), but because all tooling is already there, ready to be used through some simple rake commands.
sequel_tools is my effort in trying to come up with some de facto solution for integrating Sequel migrations and related tooling and Rake, and see if the Sequel community could concentrate the efforts on building together a solid foundation for Sequel migrations. I hope others would sympathize and contribute to the goal, so that we wouldn’t have to waste time thinking about migrations again in the future when using Sequel.
Here are some of the supported actions, which can be easily integrated to Rake, but are implemented in such a way that other interfaces, such as command lines or Thor, should be also made easy to build:
I decided not to support the Integer based migrations at this point as I can’t see any drawbacks of time-stamp based migrations that would be addressed by the Integer strategy while there are many problems with the Integer strategy even if there’s a single developer working in the project. I’m open to discuss this with anyone that thinks that could convince me otherwise that supporting Integer based migrations would add something to the table. It’s just that it’s more code to maintain and test and I’m not willing to do that unless there is indeed some advantage over using time-stamp based migrations.
The project also allows missing migration files, since I find it useful specially when reviewing multiple branches, dealing with independent migrations.
I don’t think it’s a good idea to work with a Ruby format for storing the current schema, as a lot of things are specific to the database vendor. I never used the Ruby vendor-independent format in all those years, but if you think you’d value such a feature in case you just use the basics when designing the tables and want your project to support multiple database vendors, then go ahead and either send a Pull Request to make it configurable, or create an additional gem to add that feature and I can link to it in the documentation.
I’d love to get some feedback regarding what the Sequel community would think about it. I’d love for us to get to some consensus on what should be the de facto solution for managing Sequel migrations in a somewhat feature-complete fashion and would love to get the community help on making such de facto solution happen to the best interest of we, Sequel happy (and sometimes frustrated by the lack of proper tooling around migrations - no more) users ;)
Please take a look at how the code looks like and I hope you find it easy to extend to your own needs. Any suggestions and feedback are very welcome, specially now that the project is new and we can change a lot before it gets a stable API.
May I count with your help? ;)
Note: if you only care about getting the certificates, jump to the end of the article and you’ll find a button to just do that. This way you don’t even need Linux to generate them.
For a long time I’ve been testing my application locally using a certificate issued by Let’s encrypt, which I must renew every few months for domains such as dev.mydomain.com. Recently, I’ve been considering creating a new app and I don’t have a domain for it yet.
So I decided to take some time to learn how to create self-signed certificates in such a way that browsers such as Chrome and Firefox would accept it without any disclaimer with no extra step.
It took me about 2 hours to be able achieve this task, so I decided to write it down so that it would save me time in the future when I need to repeat this process.
I’ll use the myapp.example.com domain for my new app, since the example.com domain is reserved.
The first step is add that domain in /etc/hosts:
1 | 127.0.0.1 localhost myapp.example.com |
Recent browsers will require the subject alternate names extension, so the script will generate that extension using a template like this:
1 | [SAN] |
2 | subjectAltName = @alternate_names |
3 | |
4 | [ alternate_names ] |
5 | |
6 | DNS.1 = myapp.example.com |
7 | IP.1 = 127.0.0.1 |
8 | IP.2 = 192.168.0.10 |
Replace the second IP with your own fixed IP if you have one just in case you need to access it from another computer in the network, like some VM, for example. Edit the script below to change the template. You’ll need to add the root CA certificate we’ll generate soon to those other computers in the network in order to do so, as I’ll explain in the last steps in this article. Just remove IP.2 if you don’t care about it.
Then create this script to help generating the certificates in ~/.ssl/generate-certificates:
1 | #!/bin/bash |
2 | |
3 | FQDN=${1:-myapp.example.com} |
4 | |
5 | # Create our very own Root Certificate Authority |
6 | |
7 | [ -f my-root-ca.key.pem ] || \ |
8 | openssl genrsa -out my-root-ca.key.pem 2048 |
9 | |
10 | # Self-sign our Root Certificate Authority |
11 | |
12 | [ -f my-root-ca.crt.pem ] || \ |
13 | openssl req -x509 -new -nodes -key my-root-ca.key.pem -days 9131 \ |
14 | -out my-root-ca.crt.pem \ |
15 | -subj "/C=US/ST=Utah/L=Provo/O=ACME Signing Authority Inc/CN=example.net" |
16 | |
17 | # Create Certificate for this domain |
18 | |
19 | [ -f ${FQDN}.privkey.pem ] || \ |
20 | openssl genrsa -out ${FQDN}.privkey.pem 2048 |
21 | |
22 | # Create the extfile including the SAN extension |
23 | |
24 | cat > extfile <<EOF |
25 | [SAN] |
26 | subjectAltName = @alternate_names |
27 | |
28 | [ alternate_names ] |
29 | |
30 | DNS.1 = ${FQDN} |
31 | IP.1 = 127.0.0.1 |
32 | IP.2 = 192.168.0.10 |
33 | EOF |
34 | |
35 | # Create the CSR |
36 | |
37 | [ -f ${FQDN}.csr.pem ] || \ |
38 | openssl req -new -key ${FQDN}.privkey.pem -out ${FQDN}.csr.pem \ |
39 | -subj "/C=US/ST=Utah/L=Provo/O=ACME Service/CN=${FQDN}" \ |
40 | -reqexts SAN -extensions SAN \ |
41 | -config <(cat /etc/ssl/openssl.cnf extfile) |
42 | |
43 | # Sign the request from Server with your Root CA |
44 | |
45 | [ -f ${FQDN}.cert.pem ] || \ |
46 | openssl x509 -req -in ${FQDN}.csr.pem \ |
47 | -CA my-root-ca.crt.pem \ |
48 | -CAkey my-root-ca.key.pem \ |
49 | -CAcreateserial \ |
50 | -out ${FQDN}.cert.pem \ |
51 | -days 9131 \ |
52 | -extensions SAN \ |
53 | -extfile extfile |
54 | |
55 | # Update this machine to accept our own root CA as a valid one: |
56 | |
57 | sudo cp my-root-ca.crt.pem /usr/local/share/ca-certificates/my-root-ca.crt |
58 | sudo update-ca-certificates |
59 | |
60 | cat <<EOF |
61 | Here's a sample nginx config file: |
62 | |
63 | server { |
64 | listen 80; |
65 | listen 443 ssl; |
66 | |
67 | ssl_certificate ${PWD}/${FQDN}.cert.pem; |
68 | ssl_certificate_key ${PWD}/${FQDN}.privkey.pem; |
69 | |
70 | root /var/www/html; |
71 | |
72 | index index.html index.htm index.nginx-debian.html; |
73 | |
74 | server_name ${FQDN}; |
75 | |
76 | location / { |
77 | # First attempt to serve request as file, then |
78 | # as directory, then fall back to displaying a 404. |
79 | try_files $uri $uri/ =404; |
80 | } |
81 | } |
82 | EOF |
83 | |
84 | grep -q ${FQDN} /etc/hosts || echo "Remember to add ${FQDN} to /etc/hosts" |
Then run it:
1 | cd ~/.ssl |
2 | chmod +x generate-certificates |
3 | ./generate-certificates # will generate the certificates for myapp.example.com |
4 | |
5 | # to generate for another app: |
6 | ./generate-certificates otherapp.example.com |
The script will output a sample nginx file demonstrating how to use the certificate and will remind you about adding the entry to /etc/hosts if it detects the domain is not present already.
That’s it. Even curl should work out-of-the-box, just like browsers such as Chrome and Firefox:
1 | curl -I https://myapp.example.com |
If you need to install the root certificate in other computers in the network (or VMs), it’s located in ~/.ssl/my-root-ca.crt.pem. If the other computers are running Linux:
1 | # The .crt extension is important |
2 | sudo cp my-root-ca.crt.pem /usr/local/share/ca-certificates/my-root-ca.crt |
3 | sudo update-ca-certificates |
I didn’t research about how to install them in other OS, so please let me know in the comments if you know and I’ll update the article explaining the instructions for setting up VM guests of other operating systems.
I’ve also created a Docker container with a simple Ruby Rack application to generate those certs. The code is simple and is available at Github.
It’s also published to Docker Hub.
You can give it a try here:
I hope you’ll find it useful as much as I do ;)
The Ruby ecosystem is famous for providing convenient ways of doing things. Very often security concerns are traded for more convenience. That makes me feel out of place because I’m always struggling to change the default route since I’m not interested in trading security with convenience when I have to make a choice.
Since it’s Friday 13, let’s talk a bit about my fears ;)
I remember that several of the security issues that were disclosed in the past few years in the Ruby community only existed in the first place because of this idea that we should try to deliver features the most convenient way. Like allowing YAML to dump/load Ruby objects, for example, when people were used to use it to serialize/deserialize. Thankfully it seems JSON is more popular these days even if more limited - you can’t serialize times or dates, for example, as allowed in YAML.
Here are some episodes I can remember of regarding how convenience was the reason behind many vulnerabilities:
render
conveniently accepting multiple arguments formats: 1, 2I remember that for a long while I was used to always explicitly convert params to the expected format, like params[:name].to_s and that alone was enough to protect my application from many of the disclosed vulnerabilities. But my application was still vulnerable to the first mentioned in the list above and the worst part is that we never ever used XML or YAML in our controllers but we were affected by that bug in the name of convenience (for others, not us).
Any other web framework providing seamless params binding depending on how the params keys are formatted are vulnerable for the same reasons but most (all?) people doing web development with Ruby these days will rely on Rack::Request somehow. And it will automatically convert your params to array if they are formatted like ?a[]=1&a[]=2 or hashes if they are formatted like ?a[x]=1&a[y]=2. This is built-in and you can’t change this behavior for your specific application. I mean, you could replace Rack::Utils.default_query_parser and implement parse_nested_query as parse_query for your own custom parser but then that would apply to other Rack apps mounted in your app (think of Sidekiq web, for example) and you don’t know whether or not they’re relying on such conveniences.
I’ve been bothered by the inconvenience of having to add .to_s to all string params (in name of providing more convenience, which is ironic anyway) for many reasons, and wanted a more convenient way of accessing params safely for years. As you can see, what is convenient to some can be inconvenient to others. But that would require a manual inspection in all controllers to review all cases where a param is fetched from the request. I wasn’t that much bothered after all, so I thought it wouldn’t worth the effort for such a big app.
Recently I noticed Rack recently deprecated Rack::Request#[] and I used it a lot as not only it was more convenient calling request[‘name’] instead of request.params[‘name’] but most examples in Roda’s README used that convenient #[] method (the examples were updated after it was deprecated). Since eventually I’d have to fix all usage of such method, and once they were used all over the places in our Roda apps (think of controllers - we use the multi_run plugin), I decided to finally take a step further and fix the old problem as well.
Since I realized that it wouldn’t be possible to make Rack parse queries in a more simpler way, I decided to build a solution that would wrap around Rack parsed params. For a Roda app, like ours, writing a Roda plugin for that makes perfect sense, so this is what I did:
1 | # apps/plugins/safe_request_params.rb |
2 | require 'rack/request' |
3 | require 'json' |
4 | |
5 | module AppPlugins |
6 | module SafeRequestParams |
7 | class Params |
8 | attr_reader :files, :arrays, :hashes |
9 | |
10 | def initialize(env: nil, request: nil) |
11 | request ||= Rack::Request.new(env) |
12 | @params = {} |
13 | @files = {} |
14 | @arrays = {} |
15 | @hashes = {} |
16 | request.params.each do |name, value| |
17 | case value |
18 | when String then @params[name] = value |
19 | when Array then @arrays[name] = value |
20 | when Hash |
21 | if value.key? :tempfile |
22 | @files[name] = UploadedFile.new value |
23 | else |
24 | @hashes[name] = value |
25 | end |
26 | end # ignore if none of the above |
27 | end |
28 | end |
29 | |
30 | # a hash representing all string values and their names |
31 | # pass the keys you're interested at optionally as an array |
32 | def to_h(keys = nil) |
33 | return @params unless keys |
34 | keys.each_with_object({}) do |k, r| |
35 | k = to_s k |
36 | next unless key? k |
37 | r[k] = self[k] |
38 | end |
39 | end |
40 | |
41 | # has a string value for that key name? |
42 | def key?(name) |
43 | @params.key?(to_s name) |
44 | end |
45 | |
46 | def file?(name) |
47 | @files.key?(to_s name) |
48 | end |
49 | |
50 | # WARNING: be extra careful to verify the array is in the expected format |
51 | def array(name) |
52 | @arrays[to_s name] |
53 | end |
54 | |
55 | # has an array value with that key name? |
56 | def array?(name) |
57 | @arrays.key?(to_s name) |
58 | end |
59 | |
60 | # WARNING: be extra careful to verify the hash is in the expected format |
61 | def hash_value(name) |
62 | @hashes[to_s name] |
63 | end |
64 | |
65 | # has a hash value with that key name? |
66 | def hash?(name) |
67 | @hashes.key?(to_s name) |
68 | end |
69 | |
70 | # returns either a string or nil |
71 | def [](name, nil_if_empty: true, strip: true) |
72 | value = @params[to_s name] |
73 | value = value&.strip if strip |
74 | return value unless nil_if_empty |
75 | value&.empty? ? nil : value |
76 | end |
77 | |
78 | def file(name) |
79 | @files[to_s name] |
80 | end |
81 | |
82 | # raises if it can't convert with Integer(value, 10) |
83 | def int(name, nil_if_empty: true, strip: true) |
84 | return nil unless value = self[name, nil_if_empty: nil_if_empty, strip: strip] |
85 | to_int value |
86 | end |
87 | |
88 | # converts a comma separated list of numbers to an array of Integer |
89 | # raises if it can't convert with Integer(value, 10) |
90 | def intlist(name, nil_if_empty: true, strip: nil) |
91 | return nil unless value = self[name, nil_if_empty: nil_if_empty, strip: strip] |
92 | value.split(',').map{|v| to_int v } |
93 | end |
94 | |
95 | # converts an array of strings to an array of Integer. The query string is formatted like: |
96 | # ids[]=1&ids[]=2&... |
97 | def intarray(name) |
98 | return nil unless value = array(name) |
99 | value.map{|v| to_int v } |
100 | end |
101 | |
102 | # WARNING: be extra careful to verify the parsed JSON is in the expected format |
103 | # raises if JSON is invalid |
104 | def json(name, nil_if_empty: true) |
105 | return nil unless value = self[name, nil_if_empty: nil_if_empty] |
106 | JSON.parse value |
107 | end |
108 | |
109 | private |
110 | |
111 | def to_s(name) |
112 | Symbol === name ? name.to_s : name |
113 | end |
114 | |
115 | def to_int(value) |
116 | Integer(value, 10) |
117 | end |
118 | |
119 | class UploadedFile |
120 | ATTRS = [ :tempfile, :filename, :name, :type, :head ] |
121 | attr_reader *ATTRS |
122 | def initialize(file) |
123 | @file = file |
124 | @tempfile, @filename, @name, @type, @head = file.values_at *ATTRS |
125 | end |
126 | |
127 | def to_h |
128 | @file |
129 | end |
130 | end |
131 | end |
132 | |
133 | module InstanceMethods |
134 | def params |
135 | env['app.params'] ||= Params.new(request: request) |
136 | end |
137 | end |
138 | end |
139 | end |
140 | |
141 | Roda::RodaPlugins.register_plugin :app_safe_request_params, AppPlugins::SafeRequestParams |
Here’s how it’s used in apps (controllers):
1 | require_relative 'base' |
2 | module Apps |
3 | class MyApp < Base |
4 | def process(r) # r is an alias to self.request |
5 | r.post('save'){ save } |
6 | end |
7 | |
8 | private |
9 | |
10 | def save |
11 | assert params[:name] === params['name'] |
12 | # Suppose a file is passed as the "file_param" |
13 | assert params['file_param'].nil? |
14 | refute params.file('file_param').tempfile.nil? |
15 | p params.files.map(&:filename) |
16 | p params.json(:json_param)['name'] |
17 | p [ params.int(:age), params.intlist(:ids) ] |
18 | assert params['age'] == '36' |
19 | assert params.int(:age) == 36 |
20 | |
21 | # we don't currently use this in our application, but in case we wanted to take advantage |
22 | # of the convenient query parsing that will automatically convert params to hashes or arrays: |
23 | children = params.array 'children' |
24 | assert params['children'].nil? |
25 | user = params.hash_value :user |
26 | name = user['name'].to_s |
27 | |
28 | # some convenient behavior we appreciate in our application: |
29 | assert request.params['child_name'] == ' ' |
30 | assert params['child_name'].nil? # we call strip on the values and convert to nil if empty |
31 | end |
32 | end |
An idea for those wanting to expand the safeness of the Params
class above to the unsafe
methods (json, array, hash_value) one could implement it in such a way that any hashes
would be wrapped in a Params instance. However they should probably consider more specialized
solutions in those cases, such as dry-validation or
surrealist.
In web frameworks developed in static languages this isn’t often a common reason for vulnerability because it’s harder to implement solutions like the one adopted by Rack as one would have to use some generic type such as Object for mappings params keys to their values, which is usually avoided in typed languages. Also, method signatures are often more explicit which prevents an specially crafted param to be interpreted as being of a different type than expected by methods. This is even more true in languages that don’t support method overloading, such as Java.
That’s one of the reasons I like the idea of introducing optional typing to Ruby, as I once proposed. I do like the flexibility of Ruby and that’s one of the reasons why I often preferred script languages over static ones for general purpose programming (I used to do Perl programming in my initial days when developing to the web).
But if Ruby was flexible enough to also allow me to specify optional typing, like Groovy does, it would be even better in my opinion. Until there, even though I’m not an security expert by any means, I feel like the recent changes on how our app fetch params from the request should significantly reduce the possibility of introducing bugs caused by params injection in general.
After all, security is already a quite complex topic to me and I don’t even want to have to think about what would be the impact of doing something like MyModel.where(username: params[‘username’]) and have to think what could possibly go wrong if someone would inject some special array or hash in the username param. Security is already hard to get it right. No need to make it even harder by providing automatic params binding through the same method out of the box in the name of convenience.
WARNING: skip the TLDR section if you like some drama.
TLDR: PostgreSQL doesn’t reclaim space when dropping a column. If you use some script that will add temporary columns and run it many times at some point it will reach the 1600 max columns per table limit.
It was a Friday afternoon (it’s always on Friday, right?) and we were close to start a long awaited migration process and after several tests everything seemed to be working just fine, until someone told me they were no longer able to continue testing as the servers wouldn’t allow them to port deals anymore. After a quick inspection in the logs I noticed the message saying we had reached the 1600 columns per table limit in PostgreSQL.
If you never got into this situation (and if you haven’t read the TLDR) you might be wondering: “how the hell would someone get 1600 columns in a single table?!”. Right? I was just as impressed, although I already suspected what could be happening, since I knew the script would create temporary columns to store the previous reference ids when inserting new records, even though they were dropped by the end of the transaction.
If that didn’t happen to you, you might think I was the first to face this issue but you’d be wrong. A quick search in the web for the 1600 columns limit and you’ll find many more cases of people unexpectedly reaching this limit without actually having that many columns in the table. I wasn’t the first one and won’t be the last one to face this issue but, luckily for you who are reading this article, you won’t be the next person to reach that limit ;)
Yes, now I agree it’s not a good idea after all, but let me try to explain why I did it in the first place.
In case you’re not aware, you can only use columns from the table being inserted in the “returning” clause of some “insert-into-select-returning” statement. But I wanted to keep a mapping between the newly inserted ids and the previous ones, from the “select” clause of the insert-into statement. So my first idea was to simply add a temporary “previous_id” column to the table and use it to store the old id so that I could map them.
Let me give some concrete example, with tables and queries so that it gets clearer for those of you who might be confused by the above explanation. We have documents, that can have many references associated to it and each reference can have multiple citations. The actual model is as much complicated as irrelevant to the problem, so let me simplify it to make my point.
Suppose we want to duplicate a document and its references and citations. We could have the following tables:
In my first implementation the strategy was to add a temporary previous_id to doc_refs and then the script would do something like:
1 | insert into doc_refs(previous_id, doc_id, category_id) select id, doc_id, 30 from |
2 | doc_refs where category_id = 20; |
This way it would be possible to know the mapping between the copied and pasted references so that the script could duplicate the citations using that mapping.
This script would have to run thousands of times to port all deals so, since I learned about the columns limit and how dropping a column wouldn’t really reclaim space in PostgreSQL, I’d need another strategy to get the mapping without resorting to some temporary column. I’d also have to figure out how to reclaim that space at some point in case I’d need to add some additional column for good at some point in the future, but I’ll discuss that part in another section below.
In case you reached those limits for the same reason as me, I’ll tell you how I modified the script to use a temporary mapping table instead of a temporary column. Our tables use a serial (integer with a generator) column. The process is just a little bit more complicated then using the temporary column:
1 | create temp table refs_mapping as |
2 | select id, nextval('doc_refs_id_seq') from doc_refs where category_id = 20; |
With that table it’s just a matter of inserting the records using this table to get the mapping between the ids. Not that hard after all, and the solution is free from the columns limit issue :)
Once the script to port deals was fixed and running I decided to take some action to reclaim the space used by the dropped columns so that I could create new columns later in that table if I had to.
After searching the web some would tell that a full vacuum freeze would take care of rewriting the table, which would then reclaim the space. It didn’t work in my tests. It seems the easiest would be to create a dump and restore it in a new database but in our case that would mean some downtime which I wanted to avoid. Maybe it would be possible to use this strategy with some master-slave replication setup with no downtime, but I decided to try another strategy, which was simpler in our case.
Our clients only need read access to those tables, while the input is done by an internal team, which is much easier for us to manage downtime if needed.
So I decided to lock the table for write access while the script would recreate the table and then I’d replace the old one with the new one. It took only a handful seconds to complete the operation (the table had about 3 million records). The script looked something like this:
1 | begin; |
2 | lock doc_refs in exclusive mode; |
3 | lock citations in exclusive mode; |
4 | create table new_refs ( |
5 | id integer not null primary key default nextval('doc_refs_id_seq'), |
6 | doc_id integer not null references documents(id), |
7 | category_id integer not null references categories(id) on delete cascade |
8 | ); |
9 | create index on new_refs(doc_id, category_id); |
10 | create index on new_refs(category_id); |
11 | |
12 | insert into new_refs select * from doc_refs; |
13 | |
14 | alter table citations drop constraint fk_citations_reference; |
15 | alter table doc_refs rename to old_refs; |
16 | alter table new_refs rename to doc_refs; |
17 | alter table citations add constraint fk_citations_reference |
18 | foreign key (ref_id) references doc_refs(id) on delete cascade; |
19 | alter sequence doc_refs_id_seq owned by doc_refs.id; |
20 | commit; |
21 | |
22 | -- clean-up after that: |
23 | |
24 | drop table references_old; |
Fortunately that table was only referenced by one table, so it wasn’t that complicate as if that had happened to some other tables in our database. With a simple script like that we were able to rewrite the table with no downtime and the write access was locked for about 20 or 30 seconds only, while the read access wasn’t affected at all. I hope that could be an useful trick in case you found this article because you got yourself in a similar situation :)
If you have other suggestions on how to handle the mentioned issues I’d love to hear from you. I’m always curious about possible solutions, after all, who knows when it will be the next time I’d have to think out of the box? ;) Please let me know in the comments below. Thanks :)
In my previous article, I had a hard time trying to explain why I wanted to replace Rails with something else in the first place. This article is my attempt to write more specifically about what I dislike in Rails for the purpose of the single page application we maintain.
In summary, in the previous article I explained that I preferred to work with more focused and independent libraries, while Rails prefers to adopt a somewhat integrated and highly coupled solution, which is a fine approach too. There are trade-offs involved with either approach and I won’t get into the details for this article. As I said in my previous article this is mostly about developer’s personal taste and mindset, so by no means I ever wanted to bash on Rails. Quite the opposite. Rails served me pretty well for a long time and I could live with it for many more years, so getting it out of our stack wasn’t an urgent matter by any means.
For the purpose of this article, I won’t discuss the Good and Bad of Ruby, since it was mainly written to explain why choosing another Ruby framework instead of Rails.
In case you didn’t read the previous article, the kind of application I work with is a single page application, so keep this in mind when trying to understand my motivations for replacing Rails.
So, here are some features provided by Rails which I didn’t use when I took the decision to remove Rails from our stack:
So, for a long while I have been wondering how exactly Rails was helping us to build and maintain our application. The application was already very decoupled from Rails and its code didn’t rely on ActiveSupport core extensions either. We tried to keep our controllers thin, although there’s still quite some work to do before we get there.
On the other side, there were a few times I had trouble trying to debug some weird problems after upgrading Rails and it was I nightmare when I had to dig into Rails' source code and I wasted a lot of time in the process, so I did have a compelling reason to not stick with Rails. There were other parts I disliked in Rails, which I describe in the next section.
action_view
even if you only need action_view/helpers/number_helper
for example;param[:text].to_s
because I didn’t want to get a hash or an array when accessing some param
because they were injected by some malicious request taking advantage of Rails automatic
params binding rules);Rails is still great as an entrance framework for beginners (and some experts as well). Here are the good parts:
web-console
gem bundled by default in
development mode;So, Rails is not only a framework but a set of good practices (among a set of questionable practices that will vary accordingly to each one’s taste) bundled together as well. It’s not the only solution trying to provide a solid ground for web developers though. Another similar solution with similar goals seems to be Hanami for example, although Rails seems to be more mature to me. For example, I find code reloading to be a fundamental part of developing web applications and Hanami doesn’t seem to provide a very solid solution that would work across different Ruby implementation such as JRuby for example, accordingly to these docs.
But overall, I still find Rails to be one of the best available frameworks for developing web applications. It’s just that for my personal tastes and mindset I’m more aligned to something like Roda than to something like Rails but one should understand the motivations behind one’s decisions in order to figure out by themselves which solution works best for their own taste rather than expecting some article to tell you what is the Right Solution ™.
Feel free to skip to the next section if you don’t care about it.
I recently finished moving a 5 years old Rails application to a custom stack on top of Roda from Jeremy Evans, also the maintainer of the awesome Sequel ORM. The application is actually older than that and I’ve been working on it for 6 years. It used to be a Grails application that was moved from SVN to Git about 7 years ago but I never had access to the SVN repository so I don’t really know how old this application is. It was completely migrated from Grails to Rails in 2013. And these days I replaced Rails with Roda but this time it was painless and only took a few weeks.
I have some experience with replacing the technology of an existing application without interrupting the regular development flow and deployment procedures and the only times I really had to interrupt the services for a little while was the day I replaced MySql with PostgreSQL and the day I moved the servers from collocation to Google Cloud Platform.
I may write about what steps I usually follow when changing the stack (I replaced Sprockets with Webpack a few years ago among, Devise with a custom solution, among many examples) in another article. But the reason I’m describing this scenario for this article’s purpose is only so that you have some raw idea about this project size, specially if you consider it had 0 tests when I joined the company as the sole developer and had to understand a messy Grails application with tons of JS embedded in GSP pages with functions comprising hundreds of lines with many many logical branches inside. Years later and there are still tons of tests lacking, specially in the front-end code and much more to improve. To give you a better idea, we currently have about 5k lines of Ruby test code, and 20k lines of other custom (not generated) Ruby code plus 5k lines of database migrations code. Besides that we have about 11k lines of CoffeeScript code, 6k lines of JS code and 2.5k lines of CoffeeScript tests code. I’m not including any external libraries in those stats. You have probably noticed already how poor is the test coverage currently, specially in the front-end. At this point I expect you to have some raw idea on this project size. It’s not a small project.
Understanding this section is definitely the answer on why I feel alone in the Ruby community.
Again, feel free to skip this subsection.
When I was working on my Master thesis (Robotics, Electrical Engineering) I stopped working with web development for a while and focused on embedded C programming, C++ hard real-time systems and the like. After I finished the Master thesis my first job was back to Delphi programming. Only in 2007 I moved my job back to web development, several years later and I only had experience with Perl so far. After a lot of research I decided for Rails and Ruby, although I have also seriously considered TurboGears and Django by that time, both using the Python language. I wasn’t worried by the language by that time as I didn’t know either Ruby or Python and they seemed similar one to the other. Ultimately I chose Rails because of how it handled database migrations.
In 2007, when looking at the alternatives, Rails was very appealing. There were conventions that would save me a lot of work when starting to work with web development again, there were generators to help me getting started, great documentation, it bundled a database migrations framework so that I wouldn’t have to recreate myself, simple to understand error stack-traces, good defaults for the production environment (such as proper 500 and 404 pages), great auto-reloading of code in the development environment, great logging, awesome testing tools and integrated to generators, quick boot, custom routes, convention over configuration and so on.
Last but not least, a very rich ecosystem with smart people working on great gems and learning Ruby together and they were all amazing by its meta-programming capabilities, the possibility of changing core classes through monkey patches and so on. And since it’s possible, we should use it in all places we can, right? Specific-domain-languages (SDL) were used by all popular gems by that time. And there wasn’t much fragmentation like in the Java community. Basically almost anyone writing web applications in Ruby were writing Rails apps and following its conventions. That allowed the community to grow fast, with several Rails plugins and projects assuming the application was running Rails. Most of us have only known Ruby because of Rails, including myself. This is already enough reason to thank DHH. Rails definitely raised the bar for other web frameworks.
As the ecosystem matured, we saw the rise of Rack and more people using what they called micro-frameworks such as the popular Sinatra, Merb among others. Rails improved internationalization support in version 2, merged with Merb in version 3, got Sprockets in version 4 and so on. The assets pipeline were really a thing when they were introduced in Rails by that time. It was probably the latest really big change introduced by Rails that really inspired the general web development scenario.
In the meantime Ruby has also evolved a lot, providing better unicode support, adding a new Hash syntax, garbage collecting symbols, improving performance and getting new great tools such as Bundler. RubyGems got a better API, the Rails guides got much better and they have a superb documentation on securing web applications that is accessible to any web developer and not only Rails ones. We have also seen lots of books and courses teaching the Rails way, as well as many dedicated blogs, videos, conferences and so on. I don’t remember watching such a fast growing in any other community until JavaScript got a lot of traction recently, motivated not only by single page applications which are becoming more and more common, but also by the creation of Node.js.
Many more languages have been created or re-discovered recently including Go, Elixir, Haskell, Scala, Rust and many many more. But up to this day, despite the existing of symbols and a poor threading model in MRI and lack of proper support for threaded applications in stdlib, Ruby is still my preferred general purpose language. That includes web applications. What about Rails?
If you guessed performance was the reason, you guessed wrong. For some reason I don’t quite understand, developers seem to be obsessed by performance even in scenarios where it doesn’t matter. I never faced server-side performance issues with Rails. Accordingly to NewRelic most requests would be served by less than 20ms in the server-side. Even if we could cut those 20ms it wouldn’t make any difference at all. So, what’s wrong after all?
There’s nothing wrong with Rails in a fundamental way. It’s a matter of taste in my case I guess because it’s really hard to find an objective way to explain why I wasn’t fully satisfied with Rails. You should probably understand that this article is not about bashing on Rails in any way. It’s a personal point of view on why I feel like a strange and why it’s not a great feeling. [Update: after writing this article, I spent some time trying to list the parts I dislike in Rails and wrote a dedicated article about it, which you can read here if you’re curious]
To help you understand where I come from, I have never followed the “Rails Way” if there’s such a thing. I used jQuery when Prototype was the default library, I used RSpec when test/unit was the default one, I used factories when Rails teached fixtures, I used Sequel rather than the bundled ActiveRecord, but instead of Sequel’s migrations I used ActiveRecord’s migration through the active_record_migrations gem. Some years ago I replaced Sprockets with Webpack (which fortunately Rails just embraced in Rails 5.1 release, while I wasn’t using Rails anymore when it was released). After some frustration trying to get Devise to work well with Sequel I decided to replace Devise with a custom solution (previously I had to customize Devise a lot to make it support our non-traditional integration for dealing with sign-ins and custom password hashing inherited by the time it was written in Grails).
Since we’re talking about a single page application, almost all of the requests
were JSON ones. We didn’t embrace REST, or respond_to, we had very few
server-side views and often had to dig into Rails or Devise source code to try
to understand why something wasn’t working as we expected them to. That included
several problems we had with streamed responses (which Rails calls Live Streaming
for some reason I don’t quite follow, although I suspect that’s because they
introduced some optimizations to start sending the view’s header sooner and called
it streaming support, so they needed another name when they introduced
ActionController::Live
) after each major Rails upgrade. I used to spend a lot
of time trying to understand Rails internal source whenever I had to debug such
problems. It was pretty confusing to me. The same happened with Devise.
At some point I started to ask myself what Rails was adding to the table. And
it got worse. When I first met Rails it booted in no time. It got slower to
boot at each new release and then they introduced complex solutions such as
spring to try to fix this slowness. For a long time they used (and still use
to this day) Ruby’s autoload
feature to lazily evaluate code as it’s needed
in order to decrease the boot time. Matz don’t like autoload
and I don’t
like it either, but this article is already long enough to discuss this subject
too.
Something I never particularly enjoyed in Rails was all that magic related to
auto-loading. I always preferred explicit and simple code over sophisticated
code that auto-wires things. As you can guess, even though I loved how
Rails booted quickly and how auto-reloading just worked with Rails (except
when it didn’t - more on that later) I really wanted to specify all my
dependencies explicitly in each file. But I couldn’t just use require
or
auto-reloading would stop working. I had to use ActiveSupport’s
require_dependency
and I hated it because it wasn’t just regular Ruby code.
I also didn’t like the fact that Rails enforced all monkey patches to Ruby
core classes made by ActiveSupport extensions, introducing methods such as
blank?
, present?
, presence
, try
, starts_with?
, ends_with?
and so on.
That’s related to the fact I enjoy explicit dependencies as I think it’s much
easier to follow a code with explicit dependencies.
So, one of my main motivations to get rid of Rails was to get rid of
ActiveSupport, since Rails depends on ActiveSupport, including its monkey
patches and auto-loading implementation. Replacing Rails with Roda alone
didn’t allow me to get rid of ActiveSupport just yet as I’ll explain
later in this article, but it was an important first move. What follows
is the kind of frustration with the Ruby community in the sense of how
very popular Ruby gems are written with about the same mentality of those
from Rails core. Such gems include the very popular mail gem as well as
FactoryGirl, for example. Even Sidekiq will patch Ruby core classes.
I’ll talk more about this later, but let me introduce Roda first.
[Update: after writing this article both the mail and sidekiq gems have worked to remove their monkey patches and I’d like to congratulate them for the effort and give them “Thank you so much!”]
From time to time I considered replacing Rails with something else but I always gave up for a reason or another. Sometimes I realized I liked Sprockets and the other framework didn’t provide an alternative to the Rails Assets Pipeline. Another time I realized that auto-reloading didn’t work great with the other framework. Other times I didn’t like the way code was organized with the other framework. When I read Jeremy’s announcement for Roda, it was just the right time with the right framework for me.
I greatly appreciate Jeremy from a long time since getting introduced to Sequel. He’s a lovely person, who provides awesome and kind support and he’s a great library designer. Sequel is simply the best ORM I’ve seen so far. Also, I find it quite simple to follow Sequel’s code base and after looking into Roda’s source it’s pretty much trivial to follow and understand. It’s basically one simple source file that handles routing and plugins support and basically everything else is provided by plugins you can opt-in/out and each plugin, being small and self contained, is pretty simple to understand and if you don’t agree with how it’s implemented just implement that part your own.
After having a glance over the core Roda plugins one stood out particularly: multi_run. For what I want, this plugin would give me great organization, similar to Rails controllers, with the advantage that they could have their own middleware stacks, they could be mounted anywhere, including in a separate app, they were easy to test separately as if they were a single app if desired but more importantly: it allowed me to easily lazy load the application code, which allowed the application to boot instantly with Puma, without the need of autoload and other trickery. Here’s an example:
1 | require 'roda' |
2 | module Apps |
3 | class MainApp < Roda |
4 | plugin :multi_run |
5 | # you'll probably want other plugins, such as :error_handler and :not_found, |
6 | # or maybe error_email |
7 | |
8 | def self.register_app(path, &app_block) |
9 | ->(env) do |
10 | require_relative path |
11 | app_block[].call env |
12 | end |
13 | end |
14 | |
15 | run 'sessions', register_app('sessions_app'){ SessionsApp } |
16 | run 'static', register_app('static_app'){ StaticApp } |
17 | run 'users', register_app('users_app'){ UsersApp } |
18 | # and so on |
19 | end |
20 | end |
Even if you decide to load the main application when testing particular apps, the
overhead would be negligible, since it would only load the tested app basically.
And if you are afraid of using lazy loading in the production environment because
you want to deliver a warmed app, it’s quite easy to change register_app
:
1 | require 'roda' |
2 | module Apps |
3 | class MainApp < Roda |
4 | plugin :multi_run |
5 | plugin :environments |
6 | |
7 | def self.register_app(path, &app_block) |
8 | if production? |
9 | require_relative path |
10 | app_block[] |
11 | else |
12 | ->(env) do |
13 | require_relative path |
14 | app_block[].call env |
15 | end |
16 | end |
17 | end |
18 | |
19 | run 'sessions', register_app('sessions_app'){ SessionsApp } |
20 | # and so on |
21 | end |
22 | end |
This is not just a theory, this is how I implemented in our application and it
boots in less than a second. Just about the same as the simplest Rack app.
Of course, I hadn’t really measured this in any scientific way, it’s a simple
in-head count when running bundle exec puma
, where most of the time is spent
on Bundler and requiring Roda (about 0.6s with my gemset). No need for spring
,
autoload
or any complicated code to make it fast. It just works and it’s just
Ruby, by using explicit lazy loading rather than an automatic system.
So, I really wanted to try this approach and I had a plan where I would run both Roda and Rails stacks altogether for a while, by running the Rails app as the fallback app when the Roda stack wouldn’t match the route. I could even use the path_rewriter plugin to migrate a single action at a time to the Roda stack if I wanted to.
There was just one remaining issue I had to figure out how to solve before
I started moving the app to the Roda stack: automatic code reloading. I
decided to ask in the ruby-roda
mail group how Roda handled code reloading
and Jeremy said it was out of Roda’s responsibility and that I could choose
any code reloader I wanted and pointed to some documentation listing some
of them, including one of his own. I spent quite some time researching about
them and still preferred the one provided by ActiveSupport::Dependencies
but since I wanted to get rid of ActiveSupport and autoloading in the first
place there was no point in keep using it. If you’re curious about this
research, I wrote about it here.
If you’re curious on why I dislike Ruby’s autoload
feature, you’ll
find the explanation in that article.
After some discussion around automatic code reloading in Ruby with Jeremy
I suggested him an approach I think would work pretty well and transparently
although it would require to patch both require
and require_relative
in development mode. Jeremy wasn’t much interested on it because of those
monkey patches, but I was still confident it would be a better option than
the others I had evaluated so far. I decided to give it a try and that’s
how AutoReloader was born.
With the autoreloading issue solved, it was all set to start porting the
app slowly to the Roda stack, and the process was pretty much a breeze.
If you want to have some basic idea on Rails overhead, the full Ruby specs
suite were about 2s faster with the same (converted) tests after getting rid
of the last Rails bits. It used to take 10s to run 380 examples and thousands
of assertions, and after getting rid of Rails it took 8s with an extra example.
Upgrading Bundler saved me another half a second so currently it takes 7.6s
to finish (about half a second for bundle exec
, 1.5s to load accordingly to
RSpec report and 5.6s to run).
But getting rid of Rails was just the first step in this lonely journal.
Getting rid of Rails wasn’t enough to get rid of ActiveSupport. We have a
LocaleUtils class we use to format numbers among other utilities based on
the user’s locale. It used to include ActionView::Helpers::NumberHelper
,
and by that time I learned the hard way that I couldn’t simply
require 'action_view/helpers/number_helper'
because I’d have problems
related to ActiveSupport’s autoloading mechanism, so I had to fully
require action_view
. Anyway, since ActionView depends on ActiveSupport
I wanted to get rid of it as well. As usual, after lots of wasted time
searching for Ruby number formatting gems I decided to implement the
formatting myself and a few hours later I got rid of ActionView.
But ActiveSupport was still there as a great warrior! This time it was dependency of… guess what? Yep, FactoryGirl! Oh, man :( After some research on alternative factory implementations I found Fabrication to be dependency free. An hour later I ported our factories to Fabrication and finally got rid of ActiveSupport! Yay, no more monkey patches to core Ruby classes! Right?
Well, not exactly… :( The monkey patch culture is deeply rooted in
Ruby’s community. Some very popular gems add monkey patches, such as
the mail gem, or sidekiq. While reading the mail gem source I found it
very confusing, so I decided to replace it with something simpler.
We use exim4 to forward e-mails to Amazon SES, so Ruby’s basic NET/SMTP
support is enough for delivering e-mails to Exim, all I needed was a
MIME mail formatter in order to send simple TEXT + HTML multi-part
mail to users. After some more research I decided to implement it
myself and this is how simple_mail_builder
was born.
At some point I might decide to create my own simple jobs processor
just to get rid of Sidekiq’s monkey patches, but my point is that I
have this feeling of being a lonely warrior fighting a lost battle
because of my expectations mismatch with what the Ruby community
overall consider acceptable practices such as modifying Ruby core
classes in libraries. I agree it’s okay for instrumenting code, such
as NewRelic, to patch other’s code, but for other use cases I don’t
really agree with such approach.
In one hand I really love the Ruby language, except for some few caveats, but there’s a huge mismatch with the Ruby community way of writing Ruby code, and this is a big thing. I don’t really know what’s the situation in other language communities, so I guess I might be a lonely warrior in any other language I opted for instead of Ruby, but Ruby is the only language I really appreciate so far among those I’ve worked with.
I guess I should just stop dreaming about the ideal Ruby community and give up on trying to get a monkey-patch free web application…
At least, I can now easily and happily debug anything that happens to the application without having to spend a lot of time digging into Rails or Devise’s source code, which used to take me a lot of time. Everything’s clean water. I have tons of flexibility to do what I want in no time with the new stack. The application boots pretty quickly and I’ll never run into edge cases involving ActiveSupport::Dependencies auto-reloading again. Or issues involving ActionController::Live. Or Devise issues when using Sequel as the ORM.
Ultimately I feel like I got full control over the application and that’s simply priceless! It’s an awesome feeling of freedom I never experienced before. Instead of focusing on the lonely warrior fighting a lost battle bad feeling, I’ll try concentrate on those great benefits from now on.
TLDR: This article proposes savepoints to implement nested transactions, which are supported by PostgreSQL, Oracle, Microsoft SQL Server, MySQL (with InnoDB but I think some statements would automatically cause an implicit commit, so I’m not sure it works well with MySQL) and other vendors, but not by some vendors or engines. So, if using savepoints or nested transactions are not possible with your database most likely this article won’t be useful to you. Also, not all ORM provide support for savepoints in their API. I know Sequel and ActiveRecord do. It also provides a link on how to achieve the same goal with Minitest.
I’ve been feeling lonely about my take on tests for a long time. I’ve read many articles on tests in the past years and most of them, not only in the Ruby community, seem to give us the same advices. Good advices by the way. I understand the reasoning about them but I also understand they come with trade-offs and this is where I feel kind of lonely. All articles I’ve read and some people that have worked with me have tried to convince me that I’m just plain wrong.
I never cared much about this but I never wrote about it either as I thought no one would be interested in learning about some techniques I’ve been using for quite some years to speed up my tests. Because it seems everything would simply tell me I’d go to hell for writing tests this way.
A few weeks ago I read this article from Travis Hunter which reminded me of an old TO-DO. More importantly, it made me realize I wasn’t that lonely in thinking the way I do about tests.
“Bullshit! I came here because the titles said my tests would be faster, I’m not interested in your long stories!”. Sure, feel free to completely skip the next section and go straight to the fun section.
I graduated in Electrical Engineering after 5 years in the college. Then more two years working on my master thesis on hard real-time systems towards mobile robotics. I think there are two things which engineers in general get used to after a few years in the college. Almost everything involves trade-offs and one of the most important jobs of an engineering is to identify them and choose the one they consider to have the best cost benefit. The other one is related to the first one in knowing that some tools will better fit a set of goals. I mean, I know this is also understood by CS and similar graduated people, but I have this feeling it’s not as strong in general in those areas as I observe in some (electrical/mechanical/civil) engineers.
When I started using RSpec and Object Daddy (many of you may only know Factory Girl these days), a popular factory tool by that time, I noticed my suite would take almost a minute for just a few examples touching the database. That would certainly slow me down as I would have to add many more tests.
But I felt really bad when I complained about that once in the RSpec mailing list and David Chemlinsky mentioned about taking 54s to run a couple of hundred examples when actually I had only 54 examples in my suite by that time.
And it felt even worse when I contributed once to Gitorious and noticed that over a thousand examples would finish in just a few seconds, even though lots of them didn’t touch the database. Marius Mathiesen and Christian Johansen are very skilled developers and they were the main Gitorious maintainers by that time. Christian is the author of the popular Sinon.js, one of the authors of the great Buster.js and author of the Test-Driven JavaScript Development book.
For that particular application, I had to create a lot of records in order to create the record I needed to test. And I was recreating them on every single test requiring such record, through Object Daddy but I suspect the result would be about the same with FactoryGirl or any other factory tool.
When I realized that creating lots of records in the database was that expensive, I stopped following the traditional advises for writing tests and only worried about what I really cared for which remains basically the same to these days.
These are my test goals:
These are not my test goals at all:
I even wrote my own JavaScript test runner because I needed one that allowed me to run my tests in the specified order, supported IE6 (by that time) and beforeAll and I couldn’t find any by that time. My application used to register some live events on document and would never unregister them because it was not necessary, so my test suite would only be allowed to initialize it once. Also, recreating a tree on every test would take a lot of time, so I wanted to run a set of tests that would work on the same tree based on the result of previous tests.
I was okay with that trade as long my tests would run fast, but JavaScript test runners authors wouldn’t agree, so I created OOJSpec for my needs. I never advertised it because I don’t consider it to be feature complete yet, although it suites my current needs. It doesn’t currently support running a single test because I need to think in some way to declare a test’s dependencies (in other tests) so that those dependent tests would also be run before the requested one. Also, maintaining a test runner is not trivial and since it’s currently hard for me to find time to review patches I preferred not to announce it. Since I can run individual test files, it’s working fine for my needs, so I don’t currently have much motivation to further improve it.
A common case while testing some scenarios is that one wants to write a set of tests that exercise about the same set of records. Most people nowadays are using either one of the two common approaches:
Loading specific fixtures before each context wouldn’t be significantly faster than using a factory when using a competent factory and ORM implementations, so some will simply use DatabaseCleaner with the truncate strategy to delete all data before the suite starts and loading the fixtures to the database. After that usually each example would run inside a transaction that would be rolled back which is usually much faster than truncating and reloading the fixtures.
I don’t particularly like fixtures because I find them to make tests more complicated to write and understand. But I would certainly consider them if they would make my tests significantly faster. Also, nothing prevents us from using the same fixtures approach with factories as we could also use the factories to populate the initial data before the suite starts, but the real problem is that writing tests would still be more complicated in my opinion.
So, I prefer to think about solutions that allows tests to remain fast even when using factories. Obviously that means that we should find some way to avoid recreating the same records for a given group since the only way to speed up a suite that takes a lot of time creating records in the database is to reduce the amount of time spent in the database creating those records.
There are other kind of optimizations that would be interesting to try but that it’s probably complicated to implement as it would probably require a change in FactoryGirl API to allow such optimizations. For example, rather than sending one statement at a time to the database I guess it would be faster to send all of them at once. However I’m not sure it would be that much faster if you are using a connection pool (usually a single connection in the test environment) that keeps the connection open and you’re using a local database.
So, let’s talk about the low-hang fruits which are also the best ones in this case. How can we reuse a set of records among a set of examples while still allowing them to be independent from each other?
The idea is to use nested transactions to achieve that goal. You begin a transaction during the suite start (or some context involving database statements) and then the suite will create a savepoint before a set/group of examples (a context in RSpec language) and rollback to that savepoint after the context finished.
Managing such savepoint names can be complex to implement on your own but if you are going this route anyway because your ORM doesn’t provide an easy API to handle nested transactions then you may not be interested in the rspec_nested_transactions gem I’ll present in the next section.
However with Sequel this is as easy as:
1 | # The :auto_savepoint option will automatically add the "savepoint: true" option to inner |
2 | # transaction calls. |
3 | DB.transaction(auto_savepoint: true, savepoint: true, rollback: :always){ run_example } |
With ActiveRecord the API works like this (thanks Tiago Amaro, for showing me the API):
1 | ActiveRecord::Base.transaction(requires_new: true) do |
2 | run[] |
3 | raise ActiveRecord::Rollback |
4 | end |
This will detect whether a transaction is already in place and use savepoints if it is or will issue a BEGIN to start the transaction. It will manage the savepoint names automatically for you and will even rollback it automatically when using the “rollback: :always” option. Very handy indeed. But in order to achieve this Sequel doesn’t provide methods such as “start_transaction” and “end_transaction”.
Why is this a problem? Sequel does the right thing by always requiring a block to be passed to the “transaction” method but RSpec does not support “around(:all)”. However Myron Marston posted a few years ago how to implement it using fibers and Sean Walbran created a real gem based on that article. You’d probably be interested in combining this with the well known strategy of wrapping each example in a nested transaction themselves.
If you feel confident that you will always remember to use “around(:all)” with a “DB.transaction(savepoint: true, rollback: :always){}” block whenever you want to create such a common set of records to be used inside a group of examples then the rspec_around_all gem may be all you need to implement that strategy.
Not only I find this bug prone (I could forget about the transaction block) I also bother to repeat this pattern every time I want to create a set of shared records.
There’s a caveat though. If your application creates transactions itself it should be aware of savepoints too (this is accomplished automatically when using Sequel provided you use the :auto_savepoint option in the outmost transaction) even if BEGIN-COMMIT is enough out of the tests, so that it works as expected in combination with this technique. If you are using ActiveRecord, that means using “requires_new: true”.
If you are using Sequel or ActiveRecord and PostgreSQL, Oracle, MSSQL, MySQL (with InnoDB) or any other vendor supporting nested transactions and have full control over the transaction calls, implementing this technique can speed up your suite a lot with regards to the tests touching the database. And rspec_nested_transactions will make it even easier to implement.
I’ve released today rspec_nested_transactions which allows one to run all (inner) examples and contexts inside a transaction (usually a database transaction) with a single configuration:
1 | require 'rspec_nested_transactions' |
2 | |
3 | RSpec.configure do |c| |
4 | c.nested_transaction do |example_or_group, run| |
5 | (run[]; next) unless example_or_group.metadata[:db] # or delete this line if you don't care |
6 | # with Sequel, assuming the database is stored in DB: |
7 | DB.transaction(auto_savepoint: true, savepoint: true, rollback: :always, &run) |
8 | |
9 | # with ActiveRecord (Oracle, MSSQL, MySql[InnoDB], PostgreSQL): |
10 | ActiveRecord::Base.transaction(requires_new: true) do |
11 | run[] |
12 | raise ActiveRecord::Rollback |
13 | end |
14 | end |
15 | end |
That’s it. I’ve been using a fork of rspec_around_all (branch config_around) since 2013 and it has always served me great since then and I never had to change it since then, so I guess it’s quite stable. However for a long time I considered moving it to a separate gem and remove the parts I didn’t actually use (like “around(:all)”). I always post-poned it but Travis' article reminded me about it and I thought that maybe others might be interested on this approach as well.
So, I improved the specs, cleaned up the code using recent Ruby features (>= 2.0 [prepend]) and released the new gem. Since the specs use the “<<~” heredoc it will only run on Ruby >= 2.3 but I guess it should work with all Ruby >= 2.0 (or even 1.9 I guess if you implement Module.prepend).
Jeremy Evans, the Ruby Hero who happens to be the maintainer of Sequel and creator of Roda, was kind enough to provide a link on how to achieve the save with Minitest) in the comments below. No need for Fibers in that case. Go check that out if you’re working with Minitest.
Currently our application runs 364 examples (RSpec doesn’t report the expectations count, but I suspect it could be around a thousand) in 7.8s while many of them will touch the database. Also, when I started this Rails application I decided to give ActiveRecord another try since it had also included support for a lazy API when Arel was introduced, which I was already used to with Sequel. A week or two later I decided to move to Sequel after finding AR API quite limiting for the application’s needs. At that time I noticed that the tests finished considerably faster after switching from ActiveRecord to Sequel, so I guess Sequel has a lower overhead when compared to ActiveRecord and switching to Sequel could possibly help speeding up your test suite as well.
That’s it, I hope some of you would see value in this approach. If you have other suggestions (besides running the examples in parallel) to speed up a test suite, I’m always interested in speeding up our suite. We have a ton of code both in server-side and client-side and only part of them is currently tested and I’m always looking towards improving the test coverage which means potentially we could implement over 500 more tests (for both server-side and client-side) while I still want the test suite to complete in just a few seconds. I think the most hard/critical parts are currently covered in the server-side and it will be easier to test other parts once I’m moving the application to Roda (the client-side needs much more work to make it easier to test some critical parts). I would be really happy if both server and client-side suites would finish in within a second ;) (currently the client-side suite takes about 11s to complete - 204 tests / 438 assertions).
I started to experiment with writing big Ruby web applications as a set of smaller and fast Rack applications connected by a router using Roda’s multi_run plugin.
Such design allows the application to boot super fast in the development environment (and in the production environment too unless you prefer to eager load your code in production). Here’s how the design looks like (I’ve written about AutoReloader in another article):
1 | # config.ru |
2 | if ENV['RACK_ENV'] == 'development' |
3 | require 'auto_reloader' |
4 | AutoReloader.activate reloadable_paths: [ 'apps', 'lib', 'models' ] |
5 | run ->(env) do |
6 | AutoReloader.reload! do |
7 | ActiveSupport::Dependencies.clear # avoid some issues |
8 | require_relative 'apps/main' |
9 | Apps::Main.call env |
10 | end |
11 | end |
12 | else |
13 | require_relative 'apps/main' |
14 | run Apps::Main |
15 | end |
16 | |
17 | # apps/main.rb |
18 | require 'roda' |
19 | module Apps |
20 | class Main < Roda |
21 | plugin :multi_run |
22 | # other plugins and middlewares are added, such as :error_handler, :not_found, :environments |
23 | # and a logger middleware. They take some space, so I'm skipping them. |
24 | |
25 | def self.register_app(path, &app_block) |
26 | # if you want to eager load files in production you'd change this method a bit |
27 | ->(env) do |
28 | require_relative path |
29 | app_block[].call env |
30 | end |
31 | end |
32 | |
33 | run 'sessions', register_app('session'){ Session } |
34 | run 'admin', register_app('admin') { Admin } |
35 | # other apps |
36 | end |
37 | end |
38 | |
39 | # apps/base.rb |
40 | require 'roda' |
41 | module Apps |
42 | class Base < Roda |
43 | # add common plugins for rendering, CSRF protection, middlewares |
44 | # like ETag, authentication and so on. Most apps would inherit from this. |
45 | route{|r| process r } |
46 | private |
47 | def process(r) |
48 | protect_from_csrf # added by some CSRF plugin |
49 | end |
50 | end |
51 | end |
52 | |
53 | # apps/admin.rb |
54 | require_relative 'base' |
55 | module Apps |
56 | class Admin < Base |
57 | private |
58 | def process(r) |
59 | super # protects from forgery and so on |
60 | r.get('/'){ "TODO Admin interface" } |
61 | # ... |
62 | end |
63 | end |
64 | end |
Then I want to be able to test those applications separately and for some of them I would only get confidence if I tested against a real server since I would want them to handle with cookies or streaming and checking for some HTTP headers injected by the real server and so on. And I wanted to be able to write such tests that could run as quickly as possible.
I started experimenting with Puma and noticed it can start a new server really fast (like 1ms in my development environment). I didn’t want to add many dependencies so I decided to create some simple DSL over ‘net/http’ stdlib since its API is not much friendly. The only dependencies so far are http-cookie and Puma (WEBrick does not support full hijack support and it doesn’t provide a simple API to serve Rack apps either and it’s much slower to boot). Handling cookies correctly to keep the user session is not trivial so I decided to introduce the http-cookie dependency to manage a cookie jar.
That’s how rack_toolkit was born.
This way I can start the server before the test suite starts, change the Rack app served by the server dynamically, and stop it when the suite finishes (or you can simply start and stop it for each example since it boots really fast). Here’s a spec_helper.rb you could use if you are using RSpec:
1 | # spec/spec_helper.rb |
2 | require 'rack_toolkit' |
3 | RSpec.configure do |c| |
4 | c.add_setting :server |
5 | c.add_setting :skip_reset_before_example |
6 | |
7 | c.before(:suite) do |
8 | c.server = RackToolkit::Server.new start: true |
9 | c.skip_reset_before_example = false |
10 | end |
11 | |
12 | c.after(:suite) do |
13 | c.server.stop |
14 | end |
15 | |
16 | c.before(:context){ @server = c.server } |
17 | c.before(:example) do |
18 | @server = c.server |
19 | @server.reset_session! unless c.skip_reset_before_example |
20 | end |
21 | end |
Testing the Admin app should be easy now:
1 | # spec/apps/admin_spec.rb |
2 | require_relative '../../apps/admin' |
3 | RSpec.describe Admin do |
4 | before(:all){ @server.app = Admin } |
5 | it 'shows an expected main page' do |
6 | @server.get '/' |
7 | expect(@server.last_response.body).to eq 'TODO Admin interface' |
8 | end |
9 | end |
Please take a look at the project’s README for more examples and supported API. RackToolkit allows you to get the current_path, referer, manages cookies sessions, provides a DSL for get, post and post_data on top of ‘net/http’ from stdlib, allows overriding the environment variables sent to the Rack app, simulating an https request as if the app was behind some proxy like Nginx, supports “virtual hosts”, default domain, performing requests to external Internet urls and many other options.
It currently doesn’t provide a DSL for quickly access elements from the response body, filling in forms and submitting them, but I plan to work on this once I need it. It won’t ever support JavaScript though unless it would be possible at some point to do so without slowing it down significantly. If you want to work on such DSL, please let me know.
The test suite currently runs 33 requests and finishes in ~50ms (skipping the external request example). It’s that fast.
Looking forward your suggestions to improve it. Your feedback is very welcomed.
I’ve been writing some Roda apps recently. Roda doesn’t come with any automatic code reloader, like Rails does. Its README lists quite a few code reloaders that could be used with Roda but while converting a JRuby on Rails small application to Roda I noticed I didn’t really like any of the options. I’ve written a review about the available options if you’re curious.
I could simply use ActiveSupport::Dependencies since I knew it was easy to set up and worked mostly fine but one of the reasons I’m thinking about leaving Rails is the autoloading behavior of ActiveSupport::Dependencies and the monkey patches to Ruby core classes added by ActiveSupport as a whole. So, I decided to create auto_reloader which provides the following features:
What AutoReloader does not implement:
1 | # app.rb |
2 | App = -> { [ '200', { 'Content-Type' => 'text/plain' }, [ 'Sample output' ] ] } |
3 | |
4 | # config.ru |
5 | if ENV['RACK_ENV'] != 'development' |
6 | require_relative 'app' |
7 | run App |
8 | else |
9 | require 'auto_reloader' |
10 | # won't reload before 1s elapsed since last reload by default. It can be overridden |
11 | # in the reload! call below |
12 | AutoReloader.activate reloadable_paths: [ '.' ] |
13 | run -> (env) { |
14 | AutoReloader.reload! do |
15 | require_relative 'app' |
16 | App.call env |
17 | end |
18 | } |
19 | end |
If you also want it to reload if the “app.json” configuration file has changed:
1 | # app.rb |
2 | require 'json' |
3 | config = JSON.parse File.read 'config/app.json' |
4 | App = -> { [ '200', { 'Content-Type' => 'text/plain' }, [ config['output'] ] ] } |
5 | |
6 | # append this to config.ru |
7 | require 'listen' # add the 'listen' gem to your Gemfile |
8 | app_config = File.expand_path 'config/app.json' |
9 | Listen.to(File.expand_path 'config') do |added, modified, removed| |
10 | AutoReloader.force_next_reload if (added + modified + removed).include?(app_config) |
11 | end |
If you decided to give it a try and found any bugs please let me know.
When we are writing a service in Ruby, it’s super useful to have the ability to automatically change its behavior to conform the latest changes to the code. Otherwise we’d have to manually restart the server after each change. This would slow down a lot the development flow, specially if the application takes a while before it’s ready to process next request.
I guess most people using Ruby are writing web applications with Rails. Many don’t notice that Rails supports auto code reloading out of the box, through ActiveSupport::Dependencies. A few will notice it once they are affected by some corner case where the automatic code reloading doesn’t work well.
Another feature provided by Rails is the ability of automatic loading files if the application follows some conventions, so that the developer is not forced to manually require some code’s dependencies. Another benefit is that this behavior is similar to Ruby’s autoload feature, which purpose is to speed up the loading time of applications by avoiding to load files the application won’t need. Matz seems to dislike this feature and discouraged its usage 4 years ago. Personally I’d love to see autoload gone as it can cause bugs that are hard to track. However, loading many files in Ruby is currently slow even if simply loading them from disk would be pretty fast. So, I guess Ruby would have to provide some sort of pre-compiled files support before deprecating autoload so that we wouldn’t need it for the purpose of speeding up the start-up time.
Since automatic code reloading usually works well enough for Rails applications, most people won’t research about code reloaders until they are writing web apps with other frameworks such as Sinatra, Padrino, Roda, pure Rack, whatever.
This article will review generic automatic code reloaders, including ActiveSupport::Dependencies, but leaving specific ones out of the scope, like Sinatra::Reloader and Padrino::Reloader. I’ve not checked Ruby version compatibility of each one, but all of them work on latest MRI.
Rack::Reloader is bundled with the rack gem. It’s very simple but it’s only suitable for simple applications in my opinion. It won’t unload constants, so if you remove some file or rename some class the old ones will still be available. It works as a Rack middleware.
One can provide the middleware a custom or external back-end, but I’ll only discuss the default one, which is bundled with Rack::Reloader, called Rack::Reloader::Stat.
Before each request it traverse $LOADED_FEATURES, skipping .so/bundle files and call Kernel.load on each file that has been modified since the last request. Since config.ru is loaded rather than required it’s not listed in $LOADED_FEATURES so it will be never reloaded. This means that the app’s code should live in another file required in config.ru rather than living directly in config.ru. It worth mentioning that because I’ve been bitten by this more than once while testing Rack::Reloader.
Differently from the Rails approach, any changed file will be reloaded even if you modify some gem’s source.
I won’t discuss performance issues when there are many files loaded because one could provide another back-end able to track files changes very quickly and because there are more important issues affecting this strategy.
Suppose your application has some code like this:
1 | require 'singleton' |
2 | class MyClass |
3 | include Singleton |
4 | attr_reader :my_flag |
5 | def initialize |
6 | @my_flag = false |
7 | end |
8 | end |
Calling MyClass.instance.my_flag will return false. Now, if you change the code so that @my_flag is assigned to true in “initialize” MyClass.instance.my_flag will still return false.
Let’s investigate another example where Rack::Reloader strategy won’t work:
1 | # assets_processor.rb |
2 | class AssetsProcessor |
3 | @@processors = [] |
4 | def self.register |
5 | @@processors << self |
6 | end |
7 | |
8 | def self.process |
9 | @@processors.each :&do_process |
10 | end |
11 | end |
12 | |
13 | # assets_compiler.rb |
14 | require_relative 'assets_processor' |
15 | class AssetsCompiler < AssetsProcessor |
16 | register |
17 | |
18 | def self.do_process |
19 | puts 'compiling assets' |
20 | end |
21 | end |
22 | |
23 | # gzip_assets.rb |
24 | require_relative 'assets_processor' |
25 | class GzipAssets < AssetsProcessor |
26 | register |
27 | |
28 | def self.do_process |
29 | puts 'gzipping assets' |
30 | end |
31 | end |
32 | |
33 | # app.rb |
34 | require_relative 'assets_compiler' |
35 | require_relative 'gzip_assets' |
36 | class App |
37 | def run |
38 | AssetsProcessor.process |
39 | end |
40 | end |
Running App.new.run will print “compiling assets” and then “gzipping assets”. Now, if you change assets_compiler.rb, it will also print “compiling assets” once more the next time it’s called.
This applies to all situations where a given class method is supposed to be run only once or when the order of files load matter. For example, suppose AssetsProcessor.register implementation is changed in assets_processor.rb. Since register was already called in its subclasses that means the change won’t take effect in them since only assets_processor.rb will be reloaded by Rack::Reloader. Other reloaders discussed here also suffer with this issue but they provide some work-arounds for some of them.
Some reloaders like rerun and shotgun will simply reload everything on each request. They fork at each request before requiring any files, which means those files are never required in the main process. Due to forking it won’t work on JRuby or Windows. This is a safe approach when using MRI on Linux or Mac though. However, if your application takes a long time to boot then your requests would have a big latency during the development mode. In that case, if the reason for the slow start-up lies in the framework code and other external libraries rather than the app specific code, which we want to be reloadable, one can require them before forking to speed it up.
This approach is a safe bet, but unsuitable when running on JRuby or Windows. Also if loading all app’s specific code is still slow, one may be interested in looking for faster alternatives. Besides that, this latency will exist in development mode for all requests even if no files have been changed. If you’re working on performance improvements other approaches will yield to better results.
rack-unreloader takes care of unloading constants during reload, differently from Rack::Reloader.
It has basically two modes of operation. One can use “Unreloader.require(‘dep’){[‘Dep’, …]}” to require dependencies while also providing which new constants are created and those will be unloaded during reload. This is the safest approach but it’s not transparent. For every required reloadable file we must manually provide a list of constants to be unloaded. On the other side this is the fastest possible approach since the reloader doesn’t have to try to figure out those constants automatically, like other options that will be mentioned below do. Also, it doesn’t override “require”, so it’s great for those that don’t want any monkey patching. Ruby currently does not provide a way to safely discover those constants automatically without monkey patching require, so rack-unreloader is probably the best you can get if you want to avoid monkey patches.
The second mode of operation is to not provide that block and Unreloader will look at changes to $LOADED_FEATURES before and after the call of Unreloader.require to figure out which constants the required file define. However, without monkey patching “require” this mode can’t be reliable, as I’ll explain in the sub-section below.
Before getting into it, there’s another feature of rack-unreloader that speed up reloading by only reloading the changed files, differently from other options I’ll explore below in this article. However, reloading just changed files is not always reliable as I’ve discussed in the Rack::Reloader Issues section.
Finally, differently from other libraries, rack-unreloader actually calls “require” rather than “load” and deletes the reloaded files from $LOADED_FEATURES before the request so that calling “require” will actually reload the file.
It’s only reliable if you always provide the constants defined on each Unreloader.require() call. This is also the fastest approach. It may be a bit boring to write code like this. Also, even in this mode, it’s only reliable if your application works fine regardless of the order each file is reloaded (I’ve shown an example in the Rack::Reloader Issues section demonstrating how this approach is not reliable if this is not the case).
Let’s explore why the automatic approach is not reliable:
1 | # t.rb: |
2 | require 'json' |
3 | module T |
4 | def self.call(json) |
5 | JSON.parse(json) |
6 | end |
7 | end |
8 | |
9 | # app.rb: |
10 | require 'rack/unreloader' |
11 | require 'fileutils' |
12 | Unreloader = Rack::Unreloader.new{ T } |
13 | Unreloader.require('./t.rb') # {'T'} # providing the block wouldn't trigger the error |
14 | Unreloader.call '{}' |
15 | FileUtils.touch 't.rb' # force file to be reloaded |
16 | sleep 1 # there's a default cooltime delay of 1s before next reload |
17 | Unreloader.call '{}' # NameError: unitialized constant T::JSON |
Since rack-unreloader does not override “require” it can’t track which files define which constants in a reliable way. So, it thinks ’t.rb' is responsible for defining JSON and will then unload JSON (which has some C extensions which cannot be unloaded). This also affects JRuby if the file imports some Java package among other similar cases. So, if you want to work with the automatic approach with rack-unreloader you’d have to require all those dependencies before running Unreloader.call. This is very error-prone, that’s why I think it’s mostly useful if you always provide the list of constants expected to be defined by the required dependency.
However rack-unreloader provides a few options like “record_dependency”, “subclasses” and “record_split_class” to make it easier to specify the explicit dependencies between files so that the right files are reloaded. But that means the application author must have a good understanding on how auto-reloading works, how their dependencies work and will also require them to fully specify the dependencies. It can be a lot of work but it may worth in the case reloading all reloadable files can take a lot of time. If you’re looking for the fastest possible reloader than rack-unreloader may well be your best option.
Now we’re talking about the reloader behind Rails, which is great and battle tested and one of my favorites. Some people don’t realize it’s pretty simple to use it outside Rails, so let me demonstrate how it can be used since it seems it’s not widely documented.
1 | require 'active_support' # this must be required before any other AS module as per documentation |
2 | require 'active_support/dependencies' |
3 | ActiveSupport::Dependencies.mechanism = :load # or :require in production environment |
4 | ActiveSupport::Dependencies.autoload_paths = [__dir__] |
5 | |
6 | require_dependency 'app' # optional if app.rb defines App, since it also supports autoloading |
7 | puts App::VERSION |
8 | # change version number and then: |
9 | ActiveSupport::Dependencies.clear |
10 | require_dependency 'app' |
11 | puts App::VERSION |
Or, in the context of a Rack app:
1 | require 'active_support' |
2 | require 'active_support/dependencies' |
3 | if ENV['RACK_ENV'] == 'development' |
4 | ActiveSupport::Dependencies.mechanism = :load |
5 | ActiveSupport::Dependencies.autoload_paths = [__dir__] |
6 | |
7 | run ->(env){ |
8 | ActiveSupport::Dependencies.clear |
9 | App.call env |
10 | } |
11 | else |
12 | ActiveSupport::Dependencies.mechanism = :require |
13 | require_relative 'app' |
14 | run App |
15 | end |
ActiveSupport::Dependencies has a quite complex implementation and I don’t really have a solid understanding of it so please let me know about my mistakes in the comments section so that I can fix them.
Basically it will load dependencies in the autoload_paths or require them depending on the informed mechanism. It keeps track of which constants are added by overriding “require”. This way it knows that JSON was actually defined by “require ‘json’” if it’s called by “require_dependency ’t'” and would detect that T was the new constant defined by ’t.rb' and the one that should be unloaded upon ActiveSupport::Dependencies.clear. Also, it doesn’t reload individual changed files only but unloads all reloadable files on “clear”. This is less likely to cause problems as I’ve explained in previous section. It’s also possible to configure it to use an efficient file watcher, like the one implemented by the ‘listen’ gem, which uses an evented approach using OS provided system calls. This way, one can skip the “clear” call if the loaded reloadable files have not been changed by speeding up the request even in development mode.
ActiveSupport::Dependencies supports a hooks system that allow others to observe when some files are loaded and take some action. This is specially useful for Rails engines when you want to run some code only after some dependency has been loaded for example.
ActiveSupport::Dependencies is not only a code reloader but it also implements an auto code loader by overriding Object’s const_missing to automatically try to require code that would define that constant by following some conventions. For example, in the first time one attempts to use ApplicationController, since it’s not defined, it will look in the search paths for an ‘application_controller.rb’ file and load it. That means the start-up time can be improved since we only load code we actually use. However this could lead to some issues that would make the application behave differently in production due to side effects caused by the order some files would be loaded. But Rails applications have been built around this strategy for several years and it seems such caveats have only affected a few people. Those cases can usually be worked around through “require_dependency”.
If your code doesn’t follow the naming convention it will have to use “require_dependency”. This way, if ApplicationController is defined in controllers/application.rb, you’d use “require_dependency ‘controllers/application’” before using it.
Personally I don’t like autoloading in general and always prefer explicit dependencies in all my Ruby files, so even in my Rails apps I don’t rely on autoloading for my own classes. The same applies for Ruby’s built-in “autoload” feature. I’ve been bitten already by an autoload related bug when trying to use ActionView’s number helpers by requiring the specific file I was interested in. Here’s a simpler use case demonstrating the issue with “autoload”:
1 | # test.rb |
2 | autoload :A, 'a' |
3 | require 'a/b' |
4 | |
5 | # a.rb |
6 | require 'a/b' |
7 | |
8 | # a/b.rb |
9 | module A |
10 | module B |
11 | end |
12 | end |
13 | |
14 | # ruby -I . test.rb |
15 | # causes "...b.rb:1:in `<top (required)>': uninitialized constant A (NameError)" |
It’s not quite clear what’s happening here since the message isn’t very clear about the real problem and it gets even more complicated to understand in a real complex code base. Requiring ‘a/b’ before requiring ‘a’ will cause a circular dependency issue. When “module A” is seen inside “a/b.rb”, it doesn’t exist yet and the “autoload :A, ‘a’” tells Ruby it should require ‘a’ in that case. So, this is what it does, but ‘a.rb’ will require ‘a/b.rb’ which we were trying to load in the first place. There are other similar problems that are caused by autoload and that’s why I don’t use it myself despite the potential of loading the application faster. Ideally Ruby should provide support for some sort of pre-compiled (or pre-parsed) files which would be useful for big applications to speed up code loading since the disk I/O is not the bottleneck but the Ruby parsing itself.
ActiveSupport::Dependencies is a pretty decent reloader and I guess most people are just fine with it and its known caveats. However there are some people, like me, which are more picky.
Before I get into the picky parts, let’s explore the limitations one has to have in mind when using a reloader that relies on running some file code multiple times. The only really safe strategy I can think of for handling auto-reloading is to completely restart the application or to use the fork/exec approach. They have their own caveat, like being slower than the alternatives, so it’s always about trade-offs when it comes to auto-reloaders. Running some code more than once can lead to unexpected results since not all actions can be rolled back.
For example, if you include some module to ::Object, this can’t be undone. And even if we could work around it, we’d have to detect such automatically which would perform so badly that it would be probably better to simply restart everything. This applies to monkey patching, to creating some constants in namespaces which are not reloadable (like defining JSON::CustomExtension) and similar situations. So, when we are dealing with automatic reloaders we should keep that in mind and understand that reloading will never be perfect unless we actually restart the full application (or use fork/exec). ActiveSupport::Dependencies provides some options as autoload_once_paths so that such code wouldn’t be executed more than once but if you have to change such code then you’ll be forced to restart the full application.
Also, any file actually required rather than loaded (either with require or require_relative) won’t be auto-reloaded, which forces the author to always use require_dependency to load files that are supposed to be reloadable.
Here’s what I dislike about it:
Among the options covered in this article, ActiveSupport::Dependencies is my favorite one although I would consider rerun or shotgun when running on MRI and Linux if the application starts quickly and I wouldn’t have to work on performance improvements (in that case, it’s useful to have the behavior of performing like in production when no files have been changed).
Basically, if your application is fast to load then it may make sense to start with rerun or shotgun since they are the only real safe bets I can think of.
However, I performed a few metrics in my application and decided it worth creating a new transparent reloader that would also fix some of the caveats I see in ActiveSupport::Dependencies. I wrote a new article about auto_reloader.
If you know about other automatic code reloaders for Ruby I’d love to know about them. Please let me know in the comments section. Also let me know if you think I misunderstood how any of those mentioned in this article actually works.
This article is basically a copy of this project’s README. You may read it there if you prefer. It’s a sample application demonstrating the current streaming state with Devise or Warden.
Devise is an authentication library built on top of Warden, providing a seamless integration with Rails apps. This application was created following the steps described in Devise’s Getting Started section. Take a look at the individual commits and their messages if you want to check each step.
Warden is a Rack’s middleware and authentication is handled using a “throw/catch(:warden)” approach. This works fine with Rails until streaming is enabled with ActionController::Live.
José Valim pointed out that the problem is ActionController::Live’s fault. This is because the Live module changes the “process” method so that it runs inside a spawn thread, so that it can return to finish processing the remaining middlewares in the stack. Nothing is sent to the connection before leaving that method due to the Rack issue I’ll describe next. But the “process” method will also handle all filters (before/around/after action hooks). Usually the authentication happens in a before action filter and if the user is not authentication Devise will “throw :warden” but since this is running in a spawn thread, the Warden middleware doesn’t have the chance to catch this symbol and handle it properly.
I find it amusing that after so many years of web development with Ruby, Rack doesn’t seem to have evolved much to better handling streamed responses, including SSE and why not websockets. The basic blocks are basically the same as when Rack was first created in a successful attempt to add a standard API web servers and frameworks could agree and build on top of it. This is a great achievement but Rack should evolve to better handle streamed responses.
Aaron Patterson has tried to work on another API for Rack that would improve support for streaming but it seems it would break middlewares, and currently it seems the metal is dead. Sounds like HTTP 2.0 multiplexing requires yet more changes, so maybe we’ll get proper support in Rack 3.0, which should be backward compatible and keep supporting existing middlewares, by providing alternative APIs, but that seems like it could take years to get there. He has also written about the issues with Rack API over 5 years ago.
Currently, the way Rack applications handle streaming is by implementing an object that responds to each that will yield a chunk at a time until the stream is finished, which is usually implemented by providing the user an API similar to a proper stream object as properly implemented in other languages. A few years ago an alternative system has been suggested, which became known as the hijacking API. The Phusion team covered it when it was introduced but I think the “partial hijacking” section is no longer valid.
Rack was designed on top of a middleware stack which means any response will only start after all middlewares have been called and returned (except if hijacking is used), since middlewares don’t have access to the socket stream. That’s why Rails had to resort to using threads to handle streamed/chunked responses. But it can offer other alternative implementations that would be more friendly to how Warden and Devise work as demonstrated in this application, which I’ll discuss in the next section.
Before talking about Rails current options, I’d like to stress a bit more the problem with Rack without hijacking, and consequently how it affects web development in Ruby in a negative way, when compared to how this is done in most other languages.
If we compare to how streaming is handled in Grails (and most JVM based frameworks) , or most of the main web frameworks in other languages, it couldn’t be any simpler. Each request thread (or process) has access to a “response” object that accepts a “write” call that goes directly to the socket’s output (or after a “flush” call).
There’s no need to flag a controller as capable of streaming. They are just regular controllers. The request thread or process does not have to spawn another thread to handle streaming, so there’s nothing special with such controllers.
It would be awesome if Ruby web applications had the option to use a more flexible API, more friendly to streamed responses, including SSE and websockets. Hijacking currently seems to be considered a second-class citizen since they are usually ignored by major web frameworks like Rails itself.
So, with Rails one doesn’t flag an action as one requiring streaming support. They have to flag the full controller. In theory all other actions not taking advantage of the streaming API should work just like regular controllers not flagged with ActionController::Live.
The obvious question is then, “so, why isn’t Live always included?”. After all, the Rails users wouldn’t have to worry about enabling streaming, it would be simply enabled by default for when you want it. One might think that it would be related to performance concerns but I suspect that the main problem is that this is not issues free.
Some middleware assume that the inner middlewares have finished (some of them actually depend on them to be finished) so that they can modify the original response or headers. This kind of post-processing middlewares do not work well with streamed responses.
This includes caching middlewares (handling ETag or last-modified headers), monitoring middlewares injecting some HTML (like NewRelic does automatically by default for example) and many other. Those middlewares will block the stack until the response is fully finished which breaks the desired streamed output. Some of them will check some conditions and skip this blocking behavior under certain circumstances but some will still cause some hard to debug issues or they may be even conceptually broken.
There are also some middlewares that expect the controller’s action code to run in the same thread due to the implementation details surrounding them. For example, if a sandboxed database environment is implemented as a middleware that runs the following layer inside a transaction block that will be rolled back, and if the connection is automatically fetched using the current thread id as the access key, then spawning a new thread would run in a different connection and out of the middleware’s transaction, breaking the sandboxed environment. I think ActiveRecord fetches the connection from thread locals and since ActionController::Live will copy those locals to the new spawned thread it probably works, but I’m just warning that spawning threads may break several middlewares in unexpected ways.
This includes the behavior of Warden communication. So, enabling Live in all Rails controllers would have the immediate effect of breaking most current Rails applications as Devise is the de facto authentication standard for Rails apps. Warden assumes the code handling authentication checks is running in the same thread. It could certainly offer another strategy to inform about failed authentication, but this is not how it currently works.
Even though José Valim said there’s nothing they could do because it’s Live’s fault, this is not completely true. I guess he meant that it would be too much work to make it work. After all, we can’t simply put the fault on Live since the fault actually lies in Rack itself, so streaming is fundamentally broken.
Devise could certainly subclass Warden::Manager and use this subclass as its middleware and overwrite “call” to add some object to env, for example, that would listen to reported failures and they could replace “throw :warden” in its own code with a more higher level API that would communicate to warden properly. But I agree this is a mess and probably doesn’t worth, specially because it couldn’t be called exactly Warden compatible. Another option could be to change Warden itself so that it doesn’t expect the authentication checks to happen in the same thread. Or it could replace the “throw-catch” approach with a “raise/rescue” one, which should work out of the box to how Rails currently handles it. It shouldn’t be hard for Devise itself to wrap Warden and use Exceptions rather than throw-catch, but again, I’m not sure if this is really worthy.
So, let’s explore other options, which adds other API options to Rails itself.
The Warden case is a big issue since Devise is very popular among Rails apps and shouldn’t be ignored. Usually the authentication is performed in filters rather than in the action itself. Introducing a new API would give the user the chance of performing authentication in the main request thread before spawning the streamed thread. This works even if the authentication check is done directly in the action rather than in the filters. The API would work something like:
1 | def my_action |
2 | # optionally call authenticate_user! here, if not using filters |
3 | streamed do |stream| |
4 | 3.times{stream.write "chunk"; sleep 1} |
5 | end |
6 | end |
This way, the thread would only be spawned after the authentication check is finished. Or “streamed” could use “env[‘rack.hijack’]” when available instead of spawning a new thread.
Another alternative might be to support streaming only for web servers supporting Rack hijacking. This way, the stream API could work seamless, without requiring “ActionController::Live” to be included. When “response.stream” is used, it would use “env[‘rack.hijack_io’]” if available or either buffer the responses and send them at once or raise some error, based on some configuration accordingly to the user’s preferences, as sometimes streaming is not only an optimization but a requirement that shouldn’t be silently ignored. The same behavior would apply when HTTP 1.0 is used for example.
Or another module such as “ActionController::LiveHijacking” could be created so that Rails users would have that option for a while until Rails thinks this approach is stable enough to be enabled by default.
I’d like to propose two discussions around this issue. One would be a better solution for Rack applications to get to talk directly to the response ( or discussing an strategy for making Rack hijacking a first-class citizen and probably call it something better than hijack). And the other solution would be for Rails to improve support for streaming applications by better handling cases like the Warden/Devise issue. I’ve copied this text with some minor changes to my site so that it could be discussed in the Disqus' comments section or we could discuss it in the issues section of this sample project or in the rails-core mailing list, your call.
Two weeks ago I read an article from Fabio Akita comparing the performance of his Manga Downloadr implementations in Elixir, Crystal and Ruby.
From a quick glance at its source code it seems the application consisted mostly of downloading multiple pages and another minor part would take care of parsing the HTML and extracting some location paths and attributes for the images. At least, this was the part that was being tested in his benchmark. I found it very odd that the Elixir version would finish in about 15s while the Ruby version would take 27s to complete. After all, this wasn’t a CPU bound application but an I/O bound one. I would expect that the same design implemented in any programming language for this kind of application should take about the same time in whatever chosen language. Of course the HTML parser or the HTTP client implementations used on each language could make some difference but the Ruby implementation took almost twice the time taken by the Elixir implementation. I was pretty much confident it had to be a problem with the design rather than a difference in the raw performance among the used languages.
I had to prepare a deploy for the past two weeks which happened last Friday. Then on Friday I decided to take a few hours to understand what the test mode was really all about and rewrote the Ruby application with a proper design for this kind of application taking Ruby’s limitations (specially MRI’s ones) in mind with focus on performance.
The new implementation can be found here on Github.
Feel free to give it a try and let me know if you can think of any changes that could potentially improve the performance in any significant way. I have a few theories my self, like using a SAX parser rather than performing the full parsing, among a few other improvements I can think of, but I’m not really sure whether the changes would be significant given that most of the time is actually spent on network data transfer using a slow connection (about 10MBbps in my case), if we compare to the time needed to parse those HTMLs.
So, here are the numbers I get with a 10MBps Internet connection and an AMD Phenom II X6 1090T, with 6 cores at 3.2GHz each:
As I suspected, they should perform about the same. JRuby needs 1.8s just to boot the JVM ( measured with time jruby –dev -e ‘’), which means it actually takes about the same as MRI if we don’t take the boot time into consideration (which is usually the case when the application is running a long-lived daemon like a web server).
For JRuby threads are used to handle concurrency while in MRI I was forced to use a pool of forked processes to handle HTML parsing and write some simplified Inter-Process Communication (IPC) technique which is suitable for this particular test case but may not apply to others. Writing concurrent code in Ruby could be easier but for MRI it’s specially hard once you want to use all cores because I find it much easier to write multi-threaded code than to deal with forked processes and special IPC that is not as trivial to write as using threads that share the same memory. You are free to test the performance of other approaches in MRI, like the threaded one, or always forking rather than using a pool of forked processes, changing the amount of workers both for the downloader as well as for the forked pool (I use 6 processes in the pool that parses the HTML since I have 6 cores in my CPU).
I have always been disappointed by the sad state of real concurrency in MRI due to the GIL. I’d love to have a switch to disable the GIL completely so that I would able to benchmark the different approaches (threads vs forks). Unfortunately, this is not possible in MRI or JRuby because MRI has the GIL and JRuby doesn’t handle forking well. Also, Nokogiri does not perform the same in MRI and JRuby, which means there are many other variables involved that running an application using forks in MRI cannot be really compared to run it against JRuby using the multi-threaded approach because the difference in the design is not the only one happening.
When I really need to write some CPU bound code that would benefit from running on all cores I often do it in JRuby since I find it easier to deal with threads rather than spawn processes. Once I had to create an application similar to Akita’s Manga Downloader in test mode and I wrote about how JRuby saved my week exactly due to it enabling real concurrency. I really think MRI team should take real concurrency needs more seriously or it might become irrelevant in the languages and frameworks war. Ruby usually gives us options, but we don’t really have an option to deal with concurrent code in MRI as the core developers believe forking is just fine. Since Ruby usually strives for its simplicity I find this awkward since it’s usually much easier to write multi-threaded code than dealing with spawn processes.
Back to the results of the timing comparison between Elixir and Ruby implementations, of course, I’m not suggesting that Ruby is faster than Elixir. I’m pretty sure the design of the Elixir implementation can be improved as well to get a better time. I’m just demonstrating that for this particular use case of I/O bound applications the raw language performance usually does not make any difference given a proper design. The design is by far the most important feature when working on performance improvements of I/O bound applications. Of course it’s also important for CPU bound applications, but what I mean is that the raw performance is often irrelevant for I/O bound applications while the design is essential.
There are many features one can use to sell another language but we should really avoid the trap of comparing raw performance because it hardly matter for most of the applications web developers work with, if they are the target audience. I’m pretty sure Elixir has great sell points, just like Rust, Go, Crystal, Mirah and so on. I’d be more interested in learning about the advantages of their eco-systems (tools, people, libraries) and how they allow to write good designed software in a better way. Or how they excel in exceptions handling. Or how easy it is to write concurrent and distributed software with them. Or how robust and fault tolerant they are. Or how they can help getting zero down-times during deploy, or how fast the applications would boot (this is one of the raw performance cases where it can matter). How well documented they are and how amazing are their communities. How one can easily debug and profile applications in these environments or how easily they can test something in a REPL, or write automated tests, manage dependencies. How well autoreloading work in the development mode and so on. There are so many interesting aspects of a language and its surrounding environment that I find it frustrating every time I see someone trying to sell a language by comparing the raw performance as it often does not matter in most cases.
Look, I’ve worked with fast hard real-time systems (running on Linux with real-time patches such as Xenomai or RTAI) during my master thesis and I know that raw performance is very important for a broad set of applications, like Robotics, image processing, gaming, operating systems and many others. But we have to understand whom we are talking to. If the audience is web development raw performance simply doesn’t matter that much. This is not the feature that will determine whether your application will scale to thousands of requests per second. Architecture/design is.
If you are working with embedded systems or hard real time systems it makes sense to use C or some other language that does not rely on garbage collectors (as it’s hard to implement a garbage collector with hard timing constraints). But please forget about raw performance for the cases where it doesn’t make much difference.
If you know someone who got a degree in Electrical Engineering, like me, and ask them, you’ll notice it’s pretty common to perform image processing in Matlab, which is an interpreted language and environment to prototype algorithm designs. It’s focused on operations involving matrix and they are pretty fast since they are compiled and optimized. Which allows engineers to quickly test different designs without having to write each variation in C. Once they are happy with the design and performance of the algorithm they can go a step further and implement it in C or use one of the Matlab tools that would try to perform this step automatically.
Engineers are very pragmatic. They want to use the best tools for their jobs. That means a scripting language should be preferred over a static one during the design/prototype phase as it allows faster feedback and iterative loop. Sometimes the performance they get with Matlab is simply fast enough for their needs. The same happens with Ruby, Python, JS and many other languages. They could be used for prototypes or they could be enough for the actual application.
Also, one can start with them and once the raw performance becomes a bottleneck they are free to convert that part to a more efficient language and use some sort of integration to delegate the expensive parts to them. If there are many parts of the application that would require such approach to be taken, then it becomes a burden to maintain it and one might consider moving the complete application to another language to reduce the complexity.
However, this is not my experience with web applications in all past years I’ve been working as a web developer. Rails usually takes about 20ms per request as measured by nginx in production while DNS, network transfer, JS and other related jobs may take a few seconds which means the 20ms spent in the server is simply irrelevant. It could be 0ms and it wouldn’t make any difference to the user experience.
We use the awesome Sensu monitoring framework to make sure our application works as expected. Some of our checks use a headless browser (PhantomJS) to explore parts of the application, like exporting search results to Excel or making sure no error is thrown from JS in our Single Page Application. We also use NewRelic and Pingdom to get some other metrics.
But since PhantomJS acts like a real browser, our checks will have influence over the RUM metrics we get from NewRelic, but we’re not really interested in such metrics. We want the metrics from real users, not our monitoring system.
My initial plan was to check if I could filter some IP’s from RUM metrics and asked NewRelic support about this possibility, for which they said it’s not supported yet, unless you want to filter specific controllers or actions.
Since some monitoring scripts have to go through real actions, this was not an option for us. So I decided to take a look at the newrelic_rpm gem and could come with a solution that I’ve confirmed is working fine for us.
Since we have a single page application, I simply add the before-action filter to the main action, but you may adapt it to use in your ApplicationController if you will. This is what I did:
1 | class MainController < ApplicationController |
2 | before_action :ignore_monitoring, only: :index if defined? ::NewRelic |
3 | |
4 | def index |
5 | # ... |
6 | end |
7 | |
8 | private |
9 | |
10 | def ignore_monitoring |
11 | return unless params[:monitoring] |
12 | ::NewRelic::Agent::TransactionState.tl_get.current_transaction.ignore_enduser! |
13 | rescue => e |
14 | logger.error "Error in ignore_monitoring filter: #{e.message}\n#{e.backtrace.join "\n"}" |
15 | end |
16 | end |
The rescue clause is there in case the implementation of newrelic_rpm changes and we don’t notice it. We decided to send a “monitoring=true” param to our requests performed by our monitoring scripts. This way we don’t have to worry about managing and updating a list of monitoring servers and figure out how to update that list in our application without incurring in any down-time.
But in case you want to deal with this somehow, you might be interested in testing “request.remote_ip” or “request.env[‘HTTP_X_FORWARDED_FOR’]”. Just make sure you add something like this to your nginx config file (or a similar trick for your proxy server if you’re using one):
1 | location ... { |
2 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; |
3 | } |
I’ve been using Sequel in production since 2012, April and I still think this is the best decision I’ve made so far for the whole project lifetime.
I had played with it sometimes in the past years, when Arel hasn’t been added to ActiveRecord yet and I found it amazing on how it supported lazy queries. Then I spent a few years working with Java, Groovy and Grails when I changed my job in 2009, but kept reading about Ruby (and Rails) news until I found out that AR has added support for lazy queries through Arel, when Rails 3 was released. Then I assumed AR would be a better fit than Sequel since it’s already integrated with Rails and lots of great plug-ins would support it better.
I was plain wrong! In 2011 I changed my job again to work on another Grails application. After finding a bug with no fix or workaround available I decided to create a Rails application to forward the affected requests to. So, in April of 2012 I started to create my Rails app and its models using ActiveRecord. A week later I moved all models from ActiveRecord to Sequel and have been happy since then.
Writing some queries with ActiveRecord was still a pain while Sequel made it was a joy to work with. The following sections will go to each topic I find Sequel is an improvement over AR.
These days I decided to recreate a few models with ActiveRecord so that we could use an admin
interface with the activeadmin
gem, since it doesn’t support Sequel. After a few requests
to the admin interface it stopped responding with timeout errors.
Then I decided to write some code to test my suspicions and run it in the console:
1 | pool_size = ActiveRecord::Base.connection_pool.size |
2 | (pool_size + 1).times{ Thread.start{AR::Field.count}.join } |
This yielded an timeout error in the last run. This didn’t happen with my Sequel models:
1 | pool_size = Sequel::Model.db.pool.size |
2 | (pool_size + 1).times.map{ Thread.start{Field.count} }.each &:join |
Notice that I don’t even need the join
call inside the block for it to work since the count
call is so much faster than the timeout
settings.
The curious thing is that I didn’t get any timeout errors when using activeadmin
with a regular
Rails application, so I investigated what was so special on it that I could access the admin
interface as many time I wanted and it wouldn’t ever timeout.
I knew the main difference between my application and a regular Rails application is that
I only required active_record
, while Rails will require active_record/railtie
. So I decided
to take a look at its content and found this:
1 | config.app_middleware.insert_after "::ActionDispatch::Callbacks", |
2 | "ActiveRecord::ConnectionAdapters::ConnectionManagement" |
So I found that AR was tricking here delegating the pool management to the web layer by always clearing active connections from the pool after the request was processed in that middle-ware:
1 | ActiveRecord::Base.clear_active_connections! unless testing |
Despite the name clear_active_connections! it seems to actually only close and checkin back to the pool the single current connection, whose id is stored in a thread local variable, from my understanding after taking a glance over AR pool management source code. That means that if the request main thread spawns a new thread any connection checked out in the new thread won’t be automatically collected by Rails and your application would start to throw timeout exceptions when waiting for a connection to be available in the pool, for no obvious reason, unless you understand how the connection pool works in AR and how it’s integrated in Rails. Here’s an example:
1 | class MainController |
2 | def index |
3 | Thread.start{ Post.count } |
4 | head :ok |
5 | end |
6 | end |
Try running this controller using a single server process 6 times (assuming the pool size is the default of 5 connections). This should fail:
1 | ab -n 6 -c 1 http://localhost:3000/main/index |
That means the user is responsible for closing the connection, checking it in back to the pool before the thread is terminated. This wouldn’t be a concern if Post was a Sequel model.
Then I recalled this article from Aaron Patterson.
Update note: it seems this specific case will be fixed in ActiveRecord 4.2 due to the automatic connection check-in upon dead threads strategy implemented in pull request #14360.
The main reason I left AR for Sequel was the need for joining the same table multiple times with different aliases for each joined table. Take a look at this snippet from this sample project:
1 | module Sq |
2 | class Template < Sequel::Model |
3 | one_to_many :fields |
4 | |
5 | def mapped_template_ids |
6 | FieldMapping.as(:m). |
7 | join(Field.named(:f), id: :field_id, template_id: id). |
8 | join(Field.named(:mf), id: :m__mapped_field_id). |
9 | distinct.select_map(:mf__template_id) |
10 | end |
11 | end |
12 | end |
I still don’t know how to write such query using AR. If you do, please comment on how to do so without resorting to plain SQL or Arel, which is considered an internal implementation detail of AR for which the API could change anytime even for a patch release.
as
and named
are not part of Sequel::Model, but implemented as a plug-in. See next section.
Although it’s not a strong reason to move to Sequel, since it’s easily implemented with regular Ruby modules in AR, it’s nice to have such a built-in API for extending models:
1 | module Sequel::Plugins::AliasSupport |
2 | module ClassMethods |
3 | def as(alias_name) |
4 | from named alias_name |
5 | end |
6 | |
7 | def named(alias_name) |
8 | Sequel.as table_name, alias_name |
9 | end |
10 | end |
11 | end |
12 | Sequel::Model.plugin :alias_support |
Sequel does support composite primary keys, which are specially useful for join tables, while ActiveRecord requires a unique column as the primary key.
It seems lots of people don’t find AR’s API good enough because they keep monkey patching it all the time. I really try very hard to avoid any dependency on a library that relies on monkey patching something, specially AR, since it’s always changing its internal implementation.
So, with all major and minor Rails release we often find gems that stopped working due to such
internal changes. For example, activeadmin
stopped working with Rails 4.1.0.beta1 release
even if the public AR public API remained the same.
It takes so much time to work on code that relies on monkey patching AR, that Ernie Miller, after several years trying to provide improvements over AR gave up.
Not surprisingly, one of the gems he used to maintain, polyamorous
, was the reason why
activeadmin
stopped working with latest Rails release.
I never felt the need for monkey patching Sequel’s classes.
Sequel’s documentation is awesome! That was the first thing I noticed when I moved from AR to Sequel. Arel is considered internal implementation detail and AR users are not supposed to rely on Arel’s API, which makes AR’s API much more limited besides being badly documented.
Sequel’s mailing list has awesome support from Jeremy Evans, the gem maintainer. As for AR, there’s no dedicated list for it and one has to subscribe to a Rails related list to discuss AR stuff.
I like to keep the concerns separately and I can’t think about why an ORM solution should be attached to a web framework implementation. If Rails has great features in a new release with regards to action handling, I shouldn’t be forced to upgrade the ORM library at the same time I upgrade Rails.
Also, if a security fix affects AR only, why should a new Rails version be released?
Often AR will introduce incompatibilities in new versions, while I haven’t seen this happening with Sequel yet for the features I use. Also, I’m free to upgrade either Rails or Sequel any time.
Of course, this doesn’t apply to ORM solutions only, but it’s also valid for mailing handling but this is another topic, so I’ll focus on Sequel vs AR comparison only.
Sometimes it doesn’t make sense to create a model for each table. Sequel’s database object allows you to easily access any table directly while still supporting all dataset methods like you’d do with Sequel models:
1 | DB = Sequel::Model.db # or Sequel.connect 'postgres://localhost/my_database' |
2 | mapped_template_ids = DB[:field_mappings___m] |
3 | join(:fields___f, id: :m__field_id, template_id: 1). |
4 | join(:fields___mf, id: :m__mapped_field_id). |
5 | where(f__deleted: false, mf__deleted: false). |
6 | distinct.select_map(:mf__template_id) |
AR’s philosophy is to delegate constraints to the application model’s layer, while Sequel prefers to implement all constraints in the database level, when possible/viable. I’ve always agreed that we should enforce all constraints in the database level. But this isn’t common among most AR users. AR migrations doesn’t make it easier to create a foreign key properly using its DSL, for example and treat them as second-class citizen, as opposed to Sequel’s philosophy.
The only RDBMS database solution I currently use is PostgreSQL and I really want to use several features that are only supported by PostgreSQL. Sequel’s PG adapter allows me to use those features if I want to, even knowing that it won’t work for other database vendors.
This includes recursive transactions through save-points, options to drop temp table on commit and so on.
Another example: AR 4.1.0.beta1 introduced support for enums, in a database independent way.
I’d much prefer to use PostgreSQL’s enum type for things like that, which comes with database-side built-in validations/constraints.
Also, although you can manage association cascades in the application-side using this plugin with Sequel, usually you’d be advised to perform such cascade operations in the database level when creating the foreign keys, for instance. Also, when a database trigger better takes care of an after/before hook than an application’s code, you should not be afraid of getting advantage of those.
With PostgreSQL feature of using save-points in transactions, I can set-up RSpec to allow
transactional before/after(:all)
blocks in addition to the before/after(:each)
ones.
This allows me to save quite some time when I can create several database records in a context which will then be shared among several examples, instead of recreating them every-time.
RSpec’s support for this is not good (like having a let
global variant over the context) but
it’s not hard to get this set-up working in a good enough way, speeding up my test suite a lot.
And it’s pretty easy to use Sequel’s core support for nested transactions so that I can be sure that the database state will be always consistent before each example is run.
I strongly believe a database’s schema change should be handled by a separate project, instead of inside an application using the database. More applications may use the same database at some point and it makes sense that managing your database should be handled by a separate application.
I still don’t have a favorite migrations solutions as each of them have their pros and drawbacks.
I’m still using AR’s migration for historical reasons, as I used the standalone_migrations
gem
in a separate project even when my application was written only in Grails and the Rails app didn’t
exist yet. Since standalone_migrations
only supports AR 3.x branch, and I was interested in some
features from AR 4, I created another gem, called
active_record_migrations
to be able to
use AR 4 migrations support in stand-alone mode.
I much prefer the Sequel’s DSL for writing the migrations as it supports more things in an easier way than AR’S migrations. Also, I’m allowed to use any dataset methods from an migration, instead of having to write everything not supported by the DSL as plain SQL queries.
On the other side, AR, since version 4, allows us to have an reversible
block inside a change
method which can be quite useful.
AR provides a good migration generator, which lacks on Sequel and can be very helpful when creating new migrations.
I didn’t create any specific performance tests to compare both ORM solutions but I do remember that my specs run much faster when I migrated from AR to Sequel and I’ve also heard from other people that Sequel is faster for most use cases, in MRI at least.
I really like to have control over the generated SQL and a good ORM solution for me is one that will allow me to have better control over it. That’s why I don’t like the Hibernate’s HQL language.
The database should be your friend and if it supports some functions or syntax that would help you why not use them?
Sequel allows me to use nearly all features available through its DSL from my database vendor of choice: PostgreSQL. It also provides me easy access and documentation to use all kind of stuff I can do with plain SQL like “ilike” expressions, sub-queries, nested transactions, import data from file, recursive queries, Common Table Expressions (WITH queries) and so on.
First, I’d like to say that most of Sequel DSL actually supports multiple database vendors.
But I only find that useful if you’re writing some kind of plug-in or library that should not depend on a single database vendor. But that’s not the case for general use applications.
Once you opt for some database vendor in your application, you shouldn’t have to worry about supporting other database vendors.
So, someone might ask why using any ORM solution if you’re fine with writing plain SQL?
There are many reasons for that. First, most plug-ins expect some Ruby interface to deal with, instead of SQL. This is the case with FactoryGirl, Devise and so on. But this is not the main reason.
An ORM provides lots of goodies, like an easy-to-use API to create and update records, automatic typecasting, creating transactions and much more. But even this is not the main reason for me to prefer an ORM over plain SQL.
The main reason for me is the ability to easily compose a query in some way that is easy to read and maintain, specially when parts of the query depend on the user requesting it or some controller’s param. It’s great that you can change some query on the fly, like this:
1 | fields_dataset = Field.where(template_id: params[:id]) |
2 | fields_dataset = fields_dataset.exclude(invisible: true) unless current_user.admin? |
3 | # ... |
When a generic query is performed, Sequel will convert any returned rows as hashes with the column names as keys converted to symbols. This may be a problem if you generate the queries dynamically and alias them based on some table’s id that depend on the user input. If you have enough ids being queried, Sequel may create lots of symbols that will never be garbage collected.
The lack of migration generators built-in for Sequel migrations makes the creation of new migrations a less than ideal task. You may create some custom rake task to aid with migration creations and it shouldn’t be complicated but having that support built into the Sequel core would certainly help.
The main drawback of Sequel is certainly lack of native support of other great gems like Devise, ActiveAdmin and Rails itself. Quite some useful Rails plug-ins will only integrate with ActiveRecord.
Most of my server-side tasks involve querying data from an RDMBS database and serving JSON representations to the client-side API. So, an ORM solution is a key library for me.
And I couldn’t be happier with all goodness I get from Sequel, which gets out of my way when querying the database in contrast with ActiveRecord, when I used to spend a lot of time trying to figure out whether some kind of query was possible at all.
Thanks, Jeremy Evans, for maintaining such a great library and being so responsive in the mailing list! I really appreciate your efforts, documentation and Sequel itself.
Also, thank you for kindly reviewing this article, providing insightful improvements over it.
Finally, if you’re interested on getting started with Sequel in a Rails application, I’ve published another article on the subject on April, 2012.
Important update: After I wrote this article I tried to put it to work in my real application and noticed that it can’t really work the way I described due to issues with objects referenced only in the DRb client side being garbage collected in the DRb server side since no references are kept for them in the server-side. I’m keeping this article anyway to explain the idea in the hope we could find a way to work around the memory management issue at some point.
In a Ruby application I maintain, we have the requirement of exporting some statistics to XLS (not XLSX) and we had to modify a XLS template for doing that.
After searching the web I couldn’t find a Ruby library that would do the job, but I knew I could count on the Apache POI java library.
MRI Ruby doesn’t have native support for using Java libraries so we have to either use JRuby or some Inter-Process Communication (IPC) approach (I consider hosting a service over HTTP as another form of IPC).
I’ve already used JRuby for serving my web application in the past and we had some good result, but our application is currently running fine on MRI Ruby 2. I don’t want to use JRuby for deployment only to enable me to use Java libraries. Sometimes we’ll re-run some stress tests to test the throughput of our application using several deployment strategies, including using JRuby instead of MRI, in threaded mode (vs the multi-process and multi-threaded approaches with MRI), testing several web servers for each Ruby implementation.
Last time we run our stress tests, Unicorn was a bit faster to serve our pages when compared to using JRuby on Puma, but that wasn’t the main reason why we chose Unicorn. We had some issues with some connections to PostgreSQL with JRuby by that time and we didn’t want to investigate it further, specially when we didn’t notice any advantages in the JRuby deployment for that time.
Things may have changed today but we don’t plan to run another battery of stress tests in the short-run… I just wanted to find another way of having access to Java libraries that wouldn’t attach our application to JRuby in any way. Even when we used to deploy with JRuby, all our code ran in MRI and we used MRI to actually run the tests and also in development mode since it’s much faster to boot and allow faster testing through some forking techniques (spork, zeus, etc).
I didn’t want to add much overhead either, by providing some HTTP service. The overhead is not only in the payload but also in the development work-flow.
What I really wanted was just a bridge that would allow me to run Java code from MRI Ruby, since I’m more comfortable with writing code with Ruby and my tests run faster on MRI rather than JRuby.
So, the obvious choice (at least for me), was to try DRb.
Even after deciding for DRb, you may implement the service with multiple approaches. The simplest one is probably to write the service in JRuby and only access the higher-level interface from the MRI application.
That works but I wanted to avoid this approach for some reasons:
So, I wanted to test another minimal approach that would only allow us to perform any generic JRuby programming directly from MRI.
Note: for this section, I’m assuming JRuby is being used. With RVM that means “rvm jruby”.
Christian Meier did a great job with jbundler, a tool similar to Bundler, that will use a Jarfile instead of the Gemfile to specify the Maven dependencies.
So, basically, I created a new Gemfile with bundle init and added a gem ‘jbundler’ entry to it.
Then I created a Jarfile with this content: jar ‘org.apache.poi:poi’. Run bundle exec jbundle and you’re ready to go. Running jbundle console will provide an IRB session with the Maven libraries available.
To create a script, you add a require ‘jbundler’ statement and you can now run it with bundle exec ruby script-name.rb.
So, this is how the JRuby server process looks like:
1 | # java_bridge_service.rb: |
2 | |
3 | POI_SERVICE_URL = "druby://localhost:8787" |
4 | |
5 | require 'jbundler' |
6 | require 'drb/drb' |
7 | require 'ostruct' |
8 | |
9 | class JavaBridgeService |
10 | def run(code, _binding = nil) |
11 | _binding = OpenStruct.new(_binding).instance_eval {binding} if _binding.is_a? Hash |
12 | result = if _binding |
13 | eval code, _binding |
14 | else |
15 | eval code |
16 | end |
17 | result.extend DRb::DRbUndumped if result.respond_to? :java_class # like byte[] |
18 | result |
19 | end |
20 | |
21 | end |
22 | |
23 | puts "listening to #{POI_SERVICE_URL}" |
24 | service = DRb.start_service POI_SERVICE_URL, JavaBridgeService.new |
25 | |
26 | Signal.trap('SIGINT'){ service.stop_service } |
27 | |
28 | DRb.thread.join |
This is all you need to run arbitrary Ruby code from MRI. Since this makes use of eval, I’d strongly recommend you use this server in a sandbox environment.
I won’t show the full classes we have for communicating with the server since they are implementation details and people will want to organize it in different ways. Instead I’ll provide some scripting code that you may want to run in an IRB session to test the set-up:
1 | |
2 | require 'drb/drb' |
3 | |
4 | DRb.start_service |
5 | |
6 | service = DRbObject.new_with_uri 'druby://localhost:8787' |
7 | |
8 | [ |
9 | 'java.io.FileInputStream', |
10 | 'java.io.FileOutputStream', |
11 | 'java.io.ByteArrayOutputStream', |
12 | 'org.apache.poi.hssf.usermodel.HSSFWorkbook', |
13 | ].each{|java_class| service.run "import #{java_class}"} |
14 | |
15 | workbook = service.run 'HSSFWorkbook.new FileInputStream.new(filename)', |
16 | filename: File.absolute_path('template.xls') |
17 | |
18 | sheet = workbook.sheet_at 0 |
19 | row = sheet.create_row 0 |
20 | # row.create_cell(0) will display a warning in the server-side since JRuby can't know if you want to use the |
21 | # short or int method signature |
22 | cell = service.run 'row.java_send :createCell, [Java::int], col', row: row, col: 0 |
23 | cell.cell_value = 'test' |
24 | |
25 | # export it to binary data |
26 | result = service.run 'ByteArrayOutputStream.new' |
27 | workbook.write result |
28 | |
29 | # ruby_data is what you would be passing to send_data in controllers: |
30 | ruby_data = service.run('ByteArrayInputStream.new baos.to_byte_array', baos: result).to_io |
31 | |
32 | # or, if you want to export it to some file: |
33 | os = service.run 'FileOutputStream.new filename', filename: File.absolute_path('output.xls') |
34 | workbook.write os |
35 |
By using such a generic Java bridge, we’re able to use several good Java libraries directly from MRI code.
If you’re having any issues with trying that code (I haven’t actually tested the code in this article), please leave a note in the comments and I’ll fix the article. Also, if you have any questions, create a comment and I’ll try to help you.
Or just feel free to thank me if this helped you ;)
A while ago I wrote an article explaining why I don’t like Grails. By that time I was doing Grails development daily for almost 2 years. Some statements there are no longer true and Grails has really improved a lot since 2.0.0. I still don’t like Grails for many more reasons I didn’t find time (or interest) on writing about.
Since almost 2 years ago I was back to Rails programming and the application I currently maintain is a mix of Grails, Rails and Java Spring working together. I feel it is now time to reflect about what I like and what I don’t in Rails.
I’ve been working solely on single-page-applications since 2009. All opinions reflected here apply to such kind of application, although some of them will apply to any web application. This is also what I consider the current tendency for web applications, like Twitter, Facebook, Google+, GMail and most applications I’ve seen out there.
When designing such applications one doesn’t use make heavy use of server-side views (ERB, GSP, JSP, you name) but usually render your views in the client-side, although some will prefer to render partial content generated in the server. In the applications I’ve written in those 4 years in different companies and products I’ve been mostly rendering the views in the client-side so also keep that in mind when reading my review.
Basically I only render a single page in the server-side and have plenty of JavaScript (or CoffeeScript) files that are referenced by this page, usually concatenated in a few JavaScript files for production usage.
I’d say the feature I most like in Rails is undoubtedly the Rails Asset Pipeline. It is an assets processor that uses sprockets and some conventions to help us to declare our assets dependencies and split them in several files and mix different related languages, that will basically compile to JavaScript and CSS. Examples of languages supported out of the box are CoffeeScript and SCSS, that are better versions (in my opinion of course) than JavaScript and CSS.
This tools take out most of the pain I have with JavaScript. The main reason I hate JavaScript is the lack of an import (or require) statement to make it easier to write modular code. This is changing in ES6 but it will take a while before all target browsers support such statement. With the Asset Pipeline I don’t have to worry about it because I may use such “require” statements in comments that are processed by the Asset Pipeline without having to resort to bad techniques like AMD (my opinion, of course).
The Asset Pipeline is also well integrated with the routing system.
Booting a Rails application may take a few seconds, so you can’t just load the entire application on each request as you used to do in the CGI era. It would slow down the development a lot. Being able to automatically reload your code so that you have a faster development experience is a great tool provided by Rails. It is far from simple to implement it properly and people often overlook this feature because it always worked great for most people. Creating an automatic-reloading framework for other languages can be even harder. Try to take a look at what some Java reloading frameworks are doing if you don’t believe.
This is supported by most frameworks nowadays but I always wanted this feature when I used to create web sites in Perl long ago. But not all frameworks will make it easy for you to get a “site map” and see all your application routes at once.
Rails is the main reason why the genius Yehuda Katz decided to create Bundler, the best software dependency management software I know about. Bundler is independent from Rails but I’d say Rails has the credits for inspiring Yehuda to create Bundler but I may be wrong, of course. Ruby had RubyGems for a long while but it suffered from the same problems as Maven.
Without a tool like Bundler you have two options. Always specify the exact version of the libraries you depend on (like Maven users often do) or be prepared to face several issues that may arise from different gem versions that are resolved in different times cause by loose version requirements as it used to be the case with RubyGems users.
Bundler stores a snapshot of the current resolved gems in a file called Gemfile.lock so that it is possible to replicate the entire gem versions under production or other developer’s computer without having to specify exact version matches in your dependency file (Gemfile).
I don’t write integration tests in Grails because it is too slow to boot up the entire framework when I only want to test my domain classes (models in Rails terminology). Writing integration tests in Rails is certainly slower than writing unit tests but it is feasible to write them because Rails boots in a few seconds in the application I maintain. So it is okay to write some integration tests in Rails. I used to use Capybara to write tests for views/controllers interaction but I ended up giving up on this approach preferring to write JavaScript specs to test my front-end code in a much faster way and simply mock jQuery.ajax using my own testing frameworks, oojspec and oojs.
For simple integration tests that only touch the database I don’t need to even load the entire Rails application, which is much faster. I find this flexibility really awesome and makes test writing a much pleasant task.
Other tools that help writing tests in Rails apps are RSpec and FactoryGirl among many others. Most of them can be used outside of Rails scope, but when comparing Rails to non-Ruby web frameworks, it is great to point out how writing web applications with Rails will make automatic testing an easier task than with other languages.
The Rails guides are really fantastic and cover most of the common tasks you need when programming a web applications with Rails. Also, anyone is free to commit any changes to the guides through the public repository docrails and that seems to work great. I’ve even suggested this approach to the Grails core developers a while ago and it also seems this is working great for them as well as their documentation improved a lot since then.
Besides the guides there is plenty of resources about Rails on-line. Many of them are free. There are books (both print and e-books, paid or free), tutorials and several articles covering many topics of web programming in the context of a Rails application. There are even books focused on testing applications, like The RSpec Book, by David Chelimsky. I haven’t found any books focused on testing for Grails or Groovy applications for instance. And I only know about one book focused on JavaScript testing, by Christian Johansen, the author of Buster.js, Sinon.js and one of the maintainers of the Gitorious project.
Rails has a solid community behind it. There are several Rails committers applying many patches everyday and the framework seems to be stronger than ever. You’ll find many useful gems for most tasks you’d think of. They’re usually well integrated to Rails and you may have a hard time if you decide to use another Ruby web framework.
Most of the gems are hosted on GitHub, which is part of the Rails culture I’d say. That helps a lot to contribute back to those gems by adding new features or fixing bugs. And although pull requests are usually merged pretty fast, you don’t even have to wait for it to be merged. You can just instruct Bundler to get that gem from your own fork on GitHub and that is amazing (I wasn’t kidding when I said Bundler is the best software management tool I’m aware of).
Despite all critical security holes found on Rails and other Ruby libraries/gems that popped out recently, Rails takes security very seriously. Once security issues are found they’re promptly fixed and publicly communicated so that users can upgrade their Rails applications. I’m not used to see this attitude in most other frameworks/ libraries I’ve worked with.
Rails also employs some security enhancements to web applications out-of-the-box by default, like CSRF protection and provides a really great security guide that everyone should read, even non-Rails developers.
Even though Rails is currently my favorite web framework, it is not perfect. As a matter of fact there are actually many things I don’t like in Rails and this is what this section is all about and also the main motivation for writing this article. The same can be told about Ruby, which is my preferred language, but also has its drawbacks. Not exactly Ruby the language, but the MRI implementation. I’ll get in details in the proper section.
Rails is not only a web framework and this is really bad from my point of view.
Rails release strategy is to keep the version of all its major components the same one. So, when Rails 3.2.12 is released it will also release ActiveRecord 3.2.12, ActiveSupport 3.2.12, ActionPack 3.2.12, etc. Even if it is a single security fix on ActiveRecord all components will have their version increased. This will also force you to upgrade your ORM if you decide to upgrade your web framework.
ActiveSupport should be maintained in a separate repository for instance as it is completely independent from Rails. The same should be true for ActiveRecord.
The ORM is a critical part of a web application built on top of a RDBMS. It doesn’t make any sense to me to assume it is part of a web framework. It is not. Its concerns are totally orthogonal (or at least they should be). So, what happens if you want to upgrade your web framework to make use of a new feature like streaming support? What if the newest ActiveRecord bundled with the latest Rails release has incompatible changes in its API? Why should you be forced to upgrade ActiveRecord when you’re only interested in upgrading Rails, the web framework?
Or, what if you love ActiveRecord but are not developing web applications or you’re using another web framework? Why would you have to contribute to Rails repository when you want to contribute to ActiveRecord? Or why don’t you have a separate discussion list for ActiveRecord? A separate site and API documentation?
I solved this problem myself a while ago by replacing ActiveRecord by Sequel and disabling AR completely in my application. Luckily enough I find Sequel has a much better API and solid understanding about how RDBMS are supposed to be used and knows how to take advantage of their features, like transactions, triggers and many others. Sequel will actually advise you to prefer triggers over before/after/around callbacks in your code for many tasks. This is in line with my own feelings about how RDBMS should be used.
Also, for a long while ActiveRecord didn’t support lazy interfaces. Since I’ve stumbled over Sequel several years ago I really loved its API and always used it instead of AR for some of my Ruby scripts, that weren’t related to Rails apps. But for my Rails applications I always tried to avoid adding more dependencies because most gems will just assume you’re using ActiveRecord.
But I couldn’t be more wrong. Since I decided to move over to Sequel I never regretted my decision. It is probably one of the best decisions I’ve made in the last few years. I’m pretty happy with Sequel and its mailing list support. The documentation is great and I have great control over the generated queries, which is very important to me as I often need complex queries in my applications. ActiveRecord is simply too way limited.
And even if Arel could help me to write such queries it is badly documented and is considered a private interface, which means I shouldn’t be relying on its API when using ActiveRecord because theorically AR could change its internal implementation anytime. And the public API provided by AR is simply too poor for the kind of usage I need.
Migrating to Sequel brought other benefits as well. Now the ORM and the web framework can be independently upgraded. For instance, recently there was a security issue found in ActiveRecord which triggered a whole Rails release which I didn’t have to upgrade because it didn’t affect Sequel.
Also, I requested a feature in Sequel a while ago and it got implemented and merged in master a day or two after my request. I tested it on my application by just instructing Bundler to use the version on master. Then I found a concurrency issue with the new feature that affected our deployment on JRuby. In the same day I reported the issue it got fixed on master and I could promptly use it without having to change any other bit of my application.
Jeremy Evans is also very kind when replying to questions in Sequel’s mailing list and will provide great insightful advices once you explain what you’re trying to achieve in your application. He is also very knowledgeable with regards to relational databases. Sequel is really carefully thought and cares a lot about databases, concurrency and many more details. I couldn’t recommend it better to anyone that cares about RDBMS.
When I first read about Rails, in 2007, my only previous experience with databases was with Firebird when people used to use Delphi a lot in Brazil. I really loved Firebird but I knew I would have to find something else because Firebird wasn’t often used in web applications and I wanted to use something that was well supported by the community. I also wanted a free database so the options were basically either MySQL or PostgreSQL. I wasn’t really much interested on what database to use since I believed all RDBMS would be essentially the same and I haven’t experienced any issues with Firebird. “It all boils down to SQL” I used to think. So I’ve just made a small research in the web and I found lots of people complaining about MySQL and no one complaining about PostgreSQL. I wasn’t really interested in knowing what people were talking about MySQL and simply decided to go with PostgreSQL at the time since I had to choose one.
A few years later I moved to another company that also happened to use PostgreSQL. Then I used it for 2 more years (4 in total). When I moved my job again, this time the application used a MySQL database. “No problems” I thought as I still believe it all boils down to SQL in the end. Man, I was completely wrong!
After a few days working with MySQL, I noticed too many bugs and bad design decisions that I decided after an year to finally migrate the database to PostgreSQL.
But with so many good conventions that you get when you decide to use Rails, the documentation initially used to use MySQL in the examples. Since lots of people really didn’t have a strong opinion about which database vendor to choose from. That lead the community that was being formed to adopt MySQL in mass initially.
Fortunately it seems the community understands now that PostgreSQL is a much better database but I’d still prefer Rails to recommend towards PostgreSQL in the Getting Started guides.
An example of how bad Rails opinions are over RDBMS is that ActiveRecord doesn’t even support foreign keys, one of the key concepts in RDBMS, in their migrations DSL. That means that the portable Ruby format of the current database schema is not able to restore foreign keys. Hibernate, the de-facto ORM solution for Java-based applications, does support foreign keys. It will even create the foreign keys for you if you declare a belongs-to relationship in your domain classes (models) and ask Hibernate to generate the migration SQL.
If your application needs to support multiple database vendors, I’d recommend you to forget about schema.rb and simply run all migrations whenever you want to create a new database (like a test db, for instance). If you only have to care about a single DB vendor, like me, then just change the AR schema_format to use :sql instead of :ruby. If you don’t care about foreign keys, you’re just plain wrong.
I believe David Heinemeier Hansson is really a smart guy despite what some people might say. I just think he hasn’t focused much on databases before creating Rails or he wouldn’t use MySQL. But there are many other right decisions behind Rails and I find it really impressive the boom DHH has brought to web development frameworks. People often say he is arrogant between other adjectives. I don’t agree. He has a strong opinion about many subjects. So have I and many others. This shouldn’t be seen as impoliteness or arrogance.
People have similar opinion about Linus Torvalds when he is right to the point in his phrases and opinions. He also has strong opinions and a sense of humor that many don’t understand. I just feel people get often easily offended for no good reason these days, which is unfortunate. I have to be extra careful when writing to some lists in the Internet that seems to be even more affected than the usual ones. I have received often really aggressive responses in a few mailing lists for stating my opinions in direct ways that people often consider a rude behavior when I call it a honest and direct opinion. I’m trying to avoid those opinions in some list so that people don’t get mad with me.
I really don’t know those people and I don’t have anything against them. Believe me or not, I’m a good person and have tons of friends and I meet with them very often and they don’t get offended when I’m direct to the point or when I state my strong opinions even when they don’t agree with me. With my closest friends (and even some not that close) I would refer this as the expression “after all, I’m not a girl” in a tone of joke but I can’t tell such things in the Internet or people will criticize me to dead. “You sexist! What do you have against girls?” Nothing at all, it is just an expression often used with humor in my city at least… I love my wife and my daughter is about to born and I’m pretty excited about that. I just think people take some phrases or expressions too seriously.
If you ever have the chance to talk to my friends they will tell you I’m not the kind of guy seeking conflicts but they will tell you that I have lots of strong opinions and that I’m pretty honest and direct about them. They just don’t find it rude but healthy. And I expect the same from them.
It is just sad when I find some angry response from Rails core members in the mailing list for no good reason. If I call some Rails behavior stupid that take it on personal and will threaten stopping helping me because they take my opinion as a personal attack as if I was calling them stupid people. I don’t personally know any of them. How could I find any of them stupid? They are probably much smarter than me but that doesn’t mean I can’t have my own opinions about some decisions behind Rails and find some of them stupid, which doesn’t mean others can disagree with me and think that my way of thinking is stupid. I won’t take it as a personal attack. I swear.
On the other way, I find some of their attitudes really bad. For instance, if you ask for change some behavior in Rails or any of its components some will reply: “send a pull request and we can discuss it. Otherwise we won’t take time to just discuss the ideas with words. Show us code”. I don’t usually see this behavior in most other communities I’ve participated. That basically means: “we don’t care that you spend your valuable time in a code that wouldn’t ever be merged to our project because we don’t agree with the base ideas”. There are many things that can be discussed without code. Asking someone to invest their time writing some code that will be later rejected when it could be rejected before is quite offending in my point of view.
By the way, that is the reason I don’t spend much time in complex patches to Rails. I’ve done that once long ago and I didn’t get feedback from core developers after a while even after spending a considerate amount of time in the patch and adapting many requested changes to it even though I didn’t agree with the changes. So I’d say that my user experience for many libraries is just great but that is not usually the case with the Rails core mailing list. Some of those core developers really believe they’re God gifts to the world which makes it hard to argument with them in several aspects. And if you state your strong opinion about some subject you may be seen as rude and they won’t want to talk to you anymore…
Of course different people will have different experiences but I believe Rails is not the friendlier web framework in my particular case. The Ruby-core list is a totally different beast and I can’t remember any bad experience I had when talking to Matz, Kosaki, Shugo, Nobu and many others. I also had a great experience in the JRuby mailing list, with Charles Nutter and many others. I’ve also talked about the great experience with Jeremy Evans in the Sequel mailing list. I just don’t understand why the Rails core team doesn’t seem to tolerate me. I don’t have any personal issues with any of them. But I don’t usually have a great experience there either so I avoid writing to that list sometimes.
Even after publishing my article with my strong (bad) opinions about Grails I don’t remember any bad experience when talking to them in their list. And I know they read my article as it became somewhat popular in the Grails community and I got even some replies from some of the Grails maintainers themselves.
I remember that one of strong features of Rails 1 was the great API documentation. During the rewrite of Rails 3 lots of great documentation was deleted in the process and either got lost or was moved to the Rails guides.
Currently I just stop trying to find any documentation by looking at the API documentation site. I used to do that a lot in the Rails 1 era. So sad the current state is really bad to the point that I find it almost unusable preferring to find the answers to what I’m looking for on StackOverflow, asking on mailing lists, digging into the Rails source code or by other means. If I’m lucky, the information I’m looking for is documented in the guides, but otherwise I’ll have to spend some time searching for it.
Rails provides us 3 environments by default: development, production and test. But in all projects I’ve worked with I always had a staging environment as well. Currently our deployment strategy involves even more environments. Very soon we realized that it wasn’t easy to manage all those environments by having to tweak so many configuration files: config/database.yml, config/mongo.yml, config/environments/(development|test|production).rb and many other kept popping up. Also, when you run tasks like “rake assets:precompile” it will use the production environment by default while it would use development by default for most tasks.
Every time we needed to create a new environment it was too much work for us to manage. So we ended up by dropping all those YAML files and simple symlink config/settings.rb to config/settings/environment_name.rb. We also symlinked config/environments/*.rb to all point to the same file. We would also manage the different settings in config/settings.rb. So we have staging.rb, production.rb, test.rb, development.rb and a few others under config/settings. We simply symlink the one of interest in config/settings.rb, which is ignored by Git.
The only exception is that test.rb is always used when running tests. That worked out much better for us and it is much easier for us to create a new environment and have all settings, like Redis, Mongo, PostgresSQL, integration URLs and many more settings grouped in a single file symlinked as settings.rb. Pretty simple to figure out what needs to be changed as well as base our settings on top of another existing environment.
For instance, staging.rb would require production.rb and overwrite a few settings. This is a much improved way of handling multiple environments than the standard way most Rails applications implement, by maintaining sparse YAML files among some DSLs written in Ruby (like Devise and others).
I believe the Grails approach of allowing external overrides Groovy files to better configure the application in a per environment basis a better convention to follow than the one suggested by Rails. What is the advantage of YAML(.erb) files over plain Ruby configuration files?
One of the main drawbacks of Rails in my opinion is that it waited too long to start thinking seriously about threaded deployment. Threads were often successfully used by many web frameworks in many languages but for some reason it has been neglected in the Ruby/Rails community.
I believe there are two major reasons for that. The Ruby community usually focus on MRI as the Ruby implementation of choice and MRI has a global interpreter lock that prevents multiple threads running Ruby code to be executed in parallel. So, unless your application is IO intensive you wouldn’t get much benefits from using a threaded approach. I blame MRI for this as they don’t really seem to be bothered by GIL. I mean, they would probably accept a patch to fix the issue but they’re not willing to tackle the issue themselves as they believe forking is just as good solution. And this leads to the next reason, but before that I’d just like to notice that JRuby always performed great in multi-thread environments and that I think Rails took too long before taking this approach more seriously and consider JRuby as a viable deployment environment for the threaded approach. Threads are in my opinion the proper way of handling concurrency in most cases and I really think that should be the default one as in most other web frameworks in other languages.
Now to the next reason why people usually prefer multi-process over multi-thread deployment in the Ruby community. I’ve asked once on the MRI mailing list what was the status of threads support in MRI. Some core committers told me that they wouldn’t invest time on getting rid of the GIL mainly because they feel forking was a better fit most of the times. It avoided some concurrency issues one might experience when using threads. They also argued that they didn’t want Ruby programmers to have to worry about thread-safety, locks, etc. I don’t really understand why people are so afraid of threads and why they think they’re so hard to use in a safe way. I’ve worked with threaded applications for many years and I didn’t have this bad experience several developers complain about.
I really miss proper threading support in MRI because a threaded deployment strategy allows much better memory usage under high load than the multi-process approach and it is much easier to scale. That is also the reason why I think it should be the default. It would avoid the situation where people have to worry about deployment strategies too early in the process. They think about load balancers, proxy, etc. when a single threaded instance would be enough for a long time before your application starts having throughput issues. But if you deploy a single process using a single-thread approach, you’ll very soon realize it doesn’t scale even to your few users. That’s why I believe Rails should promote threaded deployment by default since it is easier to start with.
But the MRI limitation makes this decision hard to make. Specially because the development experience is usually much better on MRI than it is on JRuby. Tests will start running much faster on MRI and some tools that will speed up it even more won’t work well on JRuby, like Spork and similar gems.
So, I can’t really recommend any solution to this deployment problem with Rails. Currently we’re using Unicorn (multi-process) + MRI to deploy our application but I really believe this isn’t the optimal solution to web deployment and I’d really love to see this situation improved in the next years.
Apart from the deployment issues I always missed streaming support in Rails but I haven’t created a section about it in this article because Rails master already seems to support it and Rails 4 will probably be released soon.
When it comes down to the MRI implementation itself, the lack of a good thread support isn’t the only thing that annoys me.
I can’t really understand the motivation for symbols to exist in Ruby. They cause more harm than good. I’ve discussed my opinions already a lot here if you’re curious about it.
To make things worse, if the harm and confusion caused by symbols with no apparent benefits wasn’t a reason good enough to get rid of them, attackers are often trying to find new ways to create symbols in web applications. The reason for that is that symbols are not garbage collected. If you employ the threaded strategy when deploying your application and an attacker could get your application to create more symbols your application would crash at some point due to memory leak since symbols are never garbage collected, although it might change at some point.
Autoload is a Ruby feature that allows some files to be lazy loaded, thus improving the start-up time to boot Rails in development mode for instance. I’m curious to know if the lazy approach really makes such a big difference when comparing to just require/load all files. And if it does, couldn’t this load time be improved somehow?
The problem with autoload is that it can create bugs that are hard to track and I indeed have been bitten by a bug caused by autoload. Here is an example of how it can be triggered:
1 | #./test.rb: |
2 | autoload :A, 'a' |
3 | require 'a/b' |
4 | |
5 | #./lib/a.rb: |
6 | require 'a/b' |
7 | |
8 | #./lib/a/b.rb: |
9 | module A |
10 | module B |
11 | end |
12 | end |
13 | |
14 | #ruby -I lib test.rb |
I really prefer code that makes its dependencies very explicit. Some languages, like Java and most static ones, will force this to happen. But that is not the case in Ruby.
Rails prefers to follow the Don’t-Repeat-Yourself principle instead of being always explicit about each file dependencies. That makes it impossible for a developer to use a small part of some Rails component because they are designed in such a way that you have to require the entire component and not just part of it even if that file is pretty independent from everything else.
Recently I wanted to use some code in ActionView::Helpers::NumberHelper in my own class ParseFormatUtils. Even though my unit tests worked fine when doing that, my application would fail due to circular dependencies issues caused by autoload and the way the Rails code is designed.
In my applications it is always very clear what each class is responsible for. Rails controllers will only be concerned about the web layer and most of the logic will be coded in a separate class or module and tested independently. That makes testing (both manual and automated) much easier and faster and also makes it easier for the project developers to understand and follow the code.
I’m really sad that Rails doesn’t share my point of view with regards to that and thinks DRY principle is more important than being explicit about all dependencies in each file.
Even though there are several aspects of Rails I dislike I couldn’t actually suggest a better framework for a web developer. If I weren’t using Rails I’d probably be using some other Ruby web framework and create some kind of Asset Pipeline and automatic reload mechanism but I don’t really think it would worth the benefits.
All Rails issues are manageable in my opinion. I think other frameworks I’ve worked with are not manageable. The have some fundamental flaws that prevent me from actually considering them if the choice is mine to make.
I’ve reported some serious bugs to Grails JIRA almost an year ago for instance with test cases included and they haven’t been fixed yet. This is something to be really worried about. All Rails issues are easily manageable in my opinion.
I may not deploy my application they way I’d prefer but Unicorn is currently fitting our application needs well enough. I can’t require just ‘action_view/helpers/number_helper’ but requiring full ‘action_view’ instead isn’t that bad either.
I’d just like to state that even though I don’t consider Rails/Ruby to be perfect, they’re still my choice when it comes down to general web development.
I’ve been working solely on single-page web applications for the last 3 years. The client-side code I write is something about 70% of my total code and this percentage has been increasing over the time. While there are excelent tools to work with for testing back-end code in Ruby (RSpec, Capybara, FactoryGirl) I still miss a great framework for writing tests for my client-side code. At least that used to be the case.
We currently have tons of great alternatives for writing client-side code: Knockout.js, Angular.js, Ember.js, Serenade.js and a thousand more. They’re awesome for helping us to build single-page applications despite JavaScript being such an horrible language that is only now considering modular programming in ES6, but this will take some years before we can rely on its support :(
Even some languages, like the awesome CoffeeScript, were born to try to make JavaScript code writing more pleasant, although they’re still unable to provide something like a require/import statement. After all, they still need to compile to JavaScript :( Fortunately there are some assets pre-processor tools available to help us writing more modular code, like the Rails Asset Pipeline that will allow me to write “require"s as comments in my source headers and that has greatly reduced the pain that is working with JavaScript for me.
But when it comes to integration tests for my client-side code I’ve never felt great with regards to current available testing frameworks for JavaScript. I’ve been using Jasmine for a long time but I always missed a beforeAll/afterAll feature. A lot! Mocha/Chai bundle seems great, but unfortunately they require a JavaScript feature that is not present in older Internet Explorer, which I still must support in my products :( Finally, Buster.js is a great modular framework but it is just not suitable for the way I write integration tests because of their random execution order.
Konacha is a great gem that took the right approach on providing some conventions to tests organization being well integrated to the Rails Asset Pipeline. But it used Mocha/Chai… So I created a while ago the rails-sandbox-assets gem with the same goal of Konacha of introducing some conventions to test organization and integrating to the Rails Asset Pipeline. But differently from Konacha, it is framework-agnostic. In fact, I’ve written adapters for all mentioned testing frameworks in this article:
And recently my own testing framework built on top of Buster.js reporter and assertions:
All those Ruby gems integrate to the Rails Asset Pipeline and all you have to do is creating your tests/specs in specific locations and they will be all automatically loaded by the test runner. Just like it happens with Konacha, this test runner server will only serve the application assets (JavaScript, CSS, images) and won’t touch any controllers, models or any other Ruby code.
It is even possible to integrate the Rails Asset Pipeline to non-Rails application, as I’ve done with this Grails application as a proof-of-concept. See oojs_assets_enabler for a minimal Rails application that can be integrated to any other server framework to enable you to use the power of the assets pre-processor and testing tools with your non-Rails application.
If you don’t like the idea of using the Rails Asset Pipeline (because you’re averse to Rails or Ruby names), even if it won’t require from you any Ruby knowledge, you can still use oojspec standalone. I’ve created some jsfiddle’s examples in oojspec README demonstrating how to do that (or do you think that JsFiddle has included support for Rails as well?! ;) ).
Enough with small talking!
Take a look at the reporter first, to see how it looks like.
Yes, I know it is failing. This is on purpose so that you can see the stack-traces and how failures and errors look like.
The oojspec gem will already provide you an HTML runner that will include all your tests/specs located under test/javascripts/oojspec/_test.js[.coffee] and spec/javascripts/oojspec/_spec.js[.coffee] at your taste. Just include the “oojspec” dependency to your Gemfile and run “bundle”.
Stylesheets in [test|spec]/stylesheets/oojspec/*_[test|spec].css are also automatically included in the HTML runner. You can just import the required CSS files from them.
If you want to take full advantage of the Rails Asset Pipeline, try to disassociate the “Rails” name from it first. It has nothing to do with Rails at all. You don’t have to learn Ruby or Rails for taking advantage of it. Although, if you’re using Rails you’ll be able to integrate your dynamic routes to your assets. But even if you aren’t you can get pre-compilation and minifying tasks, automatic CoffeeScript compiling and, specially, the ability of specifying dependencies between your sources by using special comments in your source headers:
1 | // bowling_spec.js |
2 | // this will require bowling.js or bowling.js.coffee: |
3 | //= require bowling |
4 | |
5 | describe("Bowling", function(){ |
6 | // ... |
7 | }); |
Please let me know if you’d like a more in-depth article on how to take full advantage of the Rails Asset Pipeline with your non-Rails application.
All you have to do is to follow the short instructions here. This example has showed how to integrate with Grails but basically all you have to do is to adapt it to add this to your project.
Okay, so you don’t see value in the Rails Asset Pipeline or you’re using your own tools for pre-processing your assets. Then you’ll have to write an HTML runner yourself, which is also pretty simple. Here is a working example in JsFiddle on how to do it.
1 | <!doctype html> |
2 | <html> |
3 | <head> |
4 | <base href="http://oojspec.herokuapp.com/" /> |
5 | <meta http-equiv="content-type" content="text/html; charset=utf-8"> |
6 | |
7 | <title>oojspec Test Runner</title> |
8 | |
9 | <link href="/assets/oojspec.css" media="screen" rel="stylesheet" type="text/css" /> |
10 | <script src="/assets/oojspec.js" type="text/javascript"></script> |
11 | <script type="text/javascript">oojspec.exposeAll()</script> |
12 | <!-- put your code and tests/specs here in the right order of dependency: |
13 | <script src="/assets/first_spec.js" type="text/javascript"></script> |
14 | <script src="/assets/second_spec.js" type="text/javascript"></script> |
15 | --> |
16 | </head> |
17 | <body> |
18 | |
19 | <script type="text/javascript"> |
20 | oojspec.autorun() |
21 | </script> |
22 | </body> |
23 | </html> |
Feel free to download oojspec.css and oojspec.js for faster local development first.
Now that we have our runner set up, it is time to describe our code by writing some tests/specs.
You can do it with:
1 | oojspec.describe("Some description", function(){ |
2 | this.example("Basic stuff work :P", function(){ |
3 | this.assert(true); |
4 | }); |
5 | }); |
When using the oojspec gem, by default it will expose the “describe” function to the global (window) namespace, although this can be disabled by adding the following line to your application.rb:
1 | config.sandbox_assets.options[:skip_oojspec_expose] = true |
Also when using CoffeeScript to write your specs (even if your code is written in JavaScript), that example becomes more succinct. Also, I’m using the exported “describe” this time:
1 | describe "Some description", -> |
2 | @example "Basic stuff work :P", -> @assert true |
If you prefer to keep with JavaScript, but don’t want to type “this.” all the time, you can use an alternative idiom:
1 | oojspec.describe("Some description", function(s){ |
2 | s.example("Basic stuff work :P", function(s){ |
3 | s.assert(true); |
4 | }); |
5 | }); |
From within a description block, the following DSL keywords are available:
From within an example, you can use any assertion supported by the referee library. All of them are well documented here. You can mix both assertions and expectations in your examples. And you can even write your own custom assertions/expectations.
1 | oojspec.assertions.add("isVisible", { |
2 | assert: function(actual) { |
3 | return $(actual).is(':visible'); |
4 | }, |
5 | assertMessage: "Expected ${0} to be visible.", |
6 | refuteMessage: "Expected ${0} not to be visible.", |
7 | expectation: "toBeVisible" |
8 | }); |
Sometimes you need to wait for certain conditions after taking some actions and those will most probably happen in an async fashion. So, for letting you focus in the specs instead of having to write polling functions yourself, oojspec borrows the waitsFor/runs approach from Jasmine.
1 | describe("Some description", function(s){ |
2 | s.example("Operation was successful", function(s){ |
3 | $('button#create').click(); |
4 | s.waitsFor("dialog to pop up", function(){ |
5 | return $('#show-message-dialog:visible').length > 0; |
6 | }); |
7 | s.runs(function(s){ |
8 | s.expect('#show-message-dialog').toBeVisible(); |
9 | }) |
10 | }); |
11 | }); |
You can use multiple waitsFor and runs blocks in the same example at your will.
Sometimes mocks are really useful. Specially for creating fake HTTP servers for responding to your application AJAX requests. But since they’re orthogonal to test runners, no mocking library is included in oojspec. But I’d recommend you using the excellent Sinon.js mocking and stubing library. If you’re using the Rails Asset Pipeline, this is just a matter of including the sinon-rails gem to your Gemfile and requiring it in your spec:
1 | //= require sinon |
Sinon.js has a fake AJAX server built-in but if you always use jQuery for your AJAX requests you might find my gem fake-ajax-server somewhat easier to use.
Specially when writing integration tests for my client-side code, I find it easier to describe a group of behaviors like sequential examples that are depending on a given order. In those cases I find it useful to share some state between them and taking an object-oriented approach would take care of this.
Suppose you have some class that you instantiate on your application load that will take care of registering some jQuery live events which are never unregistered because it is not needed by your application. So, you’re unable to instantiate such a class several times in “before” hooks because you’d be registering the same events several times. In that case, you can instantiate it in a “beforeAll” hook once in your suite.
But then it will be impossible to get back to the original state. But I don’t see this as a major issue. Suppose you have to test a dynamic tree, using the excellent jqTree library. You can start with an empty tree and add a test for including a new item to the tree. Then you add another test for including a sub item to the item created in your prior test. Then you add a test for moving it so that it becomes a sibling of the first item. Then you add a test for deleting the first item and make sure only the last one is kept. I don’t really mind if all those tests written for a “Tree Management” context are not independent from each other. I find it easier to write those tests in this sequential order than trying to make them independent.
This is the main point where I find the other testing frameworks to be too limiting for me or they don’t target the same browsers as I do.
When writing non-oo tests with oojspec, “this” will refer to an object containing only the available DSL for that context. This same DSL object is also sent as the first arguments to the blocks used by example, context, runs, etc.
On the other hand, when writing OO tests, you are in charge of specifying what will “this” refer to.
By default, OO tests are “non-bare”, which means that the DSL will be merged with your “this” object. This allows you to write “this.example” as before. But you can opt for using a “bare” approach in which case you’ll handle the DSL through the first argument of the block.
You can provide the description directly in the passed object or as the first argument as before. It is only required that your object responds to runSpecs() as the entry point.
Here are some examples:
1 | // non-bare approach, with the description in the object itself: |
2 | describe({ |
3 | description: 'Plain Object binding', |
4 | dialog: {dialog: true}, |
5 | runSpecs: function(){ this.example('an example', this.sampleExample); }, |
6 | sampleExample: function(){ this.assert(this.dialog.dialog); } |
7 | }); |
8 | |
9 | // traditional description syntax and a bare approach: |
10 | describe('Bare description', { |
11 | bare: true, |
12 | dialog: {dialog: true}, |
13 | runSpecs: function(s){ s.example('an example', this.sampleExample); }, |
14 | sampleExample: function(s){ s.assert(this.dialog.dialog); } |
15 | }); |
In case you prefer CoffeeScript, like me, you can find the “class” syntax somewhat easier to work with. oojspec will instantiate a class in case it detects it is a class (its prototype responds to runSpecs instead of the object itself). It even uses the constructor’s name if a description is not provided.
1 | describe class # you can use an anonymous class as well |
2 | @description: 'Bare class' |
3 | @bare: true |
4 | |
5 | runSpecs: (dsl)-> |
6 | @dialog = dialog: true |
7 | dsl.example 'an example', @anExample |
8 | dsl.context 'in some context', @aContext |
9 | dsl.describe NonBareClass |
10 | |
11 | anExample: (s)-> s.expect(@dialog).toEqual dialog: true |
12 | |
13 | # this.runs is not available from an example when using a bare approach |
14 | aContext: (s)-> s.example 'another example', (s)-> s.refute @runs |
15 | |
16 | class NonBareClass # description will be "NonBareClass" |
17 | runSpecs: -> |
18 | @dialog = dialog: true |
19 | @example 'an example', @anExample |
20 | @context 'in some context', @aContext |
21 | |
22 | anExample: -> @expect(@dialog).toEqual dialog: true |
23 | |
24 | # this.describe is never available from within an example |
25 | aContext: -> @example 'another example', -> @refute @describe |
This article is already long enough. I’ll try to find out some time in the future to focus in some real use case to demonstrate how I write integration tests for my single-page applications using some real application as an example.
I’d really love to hear your feedback about oojspec. Please let me know what you think about it by e-mail, GitHub, comments in this page or Twitter (rrrosenfeld). If you think you’ve found some bug, please report it on GitHub issues.
Despite the fact that I don’t like the JavaScript language, we can’t just avoid it.
Client-side programming allows for better user experience and less network traffic and is required for lots of web applications. I’ve been doing client-side code most of my time since 2009 and it takes more and more of my time. I don’t think this is gonna change.
Although not perfect, CoffeeScript took a lot of the pain of writing JavaScript code for me, although it still doesn’t provide any import/require feature as it has to compile to JavaScript anyway. So, all examples in this article will be written in CoffeeScript, but feel free to write your own tests and code in JavaScript if you prefer.
Since we have a lot of our logic now in the client-side, it is time to take it much more seriously. That means we must write specs (unit and integration ones) for our client-side code as well. That has been a pain for me for a while, but I took some time to release some code to help us with this task, and this is mostly what I’ll be talking about in this article. Specially on client-side code integration testing.
Although my released gems depend on Rails Asset Pipeline support, this article should also guide you on how to easily write your specs for whatever server-side framework you’ve chosen. I’ll provide an example on how to do that for a Grails application, but you could apply the instructions for whatever other framework you want.
Feel free to skip this entire section.
I should state that I’m passionated about Ruby and that Rails is currently my web framework of choice, so be warned that this is probably a biased opinion.
The biggest mistake in the design of the JavaScript language in my opinion was the lack of a require/import statement, which won’t allow us to easily split our applications into modules. This was fixed for server-side JS applications by Node.js, but is still an issue for client-side code (that running in web browsers).
ES.Next is going to add modules support for JavaScript but it can take quite a while before 99% of your client users will be using a browser that supports those modules.
Currently I know two alternatives for dealing with dependency management in JavaScript:
The Rails Asset Pipeline falls in this second category, just like the Grails Resources plugin. But the Resources plugin will require you to set up your dependencies in a separate file, while in the Rails Asset Pipeline you set up your dependencies as comments in your asset (JavaScript and stylesheets) headers. I much prefer this approach as it reminds me of regular require/import features existing in most programming languages. Also, differently from the Rails Asset Pipeline, the Grails Resources plugin won’t support CoffeeScript out-of-the-box.
Also, the Rails Asset Pipeline is well documented and easily extended by the use of plugins (or Ruby gems if you prefer).
I’m sorry about you, but this is not a reason for not reading this article. You can still take advantage of the techniques and tools I describe here in whatever framework you’re using. Just keep reading on.
Please read this article for the reasoning behind it. In short, oojspec is designed with integration tests in mind and an OO approach.
I really like OO programming and being able to easily share states. This allows me to write maintainable code and specs in a modular way.
I find code written in CS more concise and easier to read. It supports comprehensions, destructuring assignment, splats, string interpolation, array range syntax, “class” and “extend” keywords, “@attribute” as a shortcut to “this.attribute”, easy function bindings through “=>”, and easier “for-in” and “for-of” constructions among several other great language additions.
On the other side I don’t like very much that “==” is translated to “===” and that “elvis?” has a different meaning inside functions and a few other issues I can’t remember right now.
But all in all, CS is a much better language than JS in my opinion. Even if you don’t want to write CoffeeScript for your production code, you should consider using it at least for your specs. But feel free to use JS for your specs too if you really dislike CS.
So, with CS and the Rails Asset Pipeline which will provide a require-like mechanism, client-side programming is no longer a pain to me. Well, that and the bundled helper tools for helping me out in the testing task, which I’ll explore more in-depth in this article.
After writing some specs you can end up with a huge file when writing an integration testing for an application. There will be lots of “describes”/contexts and I’d rather see them split in multiple files for better organization and maintainability. But this is just a suggestion, feel free to use regular “class” constructions in CoffeeScript and put everything in a single file if you prefer.
The integration tests I’ll be talking about in this article will use a mocked fake server that will simulate replying to AJAX requests. This will only work for requests using jQuery.ajax (or getJSON/post) which is stubbed by the excelent SinonJS written by my friend Christian Johansen from Gitorious fame.
This will allow the techniques presented in this article to be used with whatever web framework you can think of. Another advantage is that it will run pretty fast by mocking the server-side responses.
Having said that, if you really want to write full integration tests, like with Capybara, this should be pretty easy to achieve if your application is written in Rails. It is just a matter of mounting the spec runner in some route like ‘/oojspec’ for your test environment. Please leave some comment if you want some detailed instructions on how to do that, but be aware that you won’t be able to write Ruby code from your JavaScript specs, like filling some initial data in the database through some beforeEach calls… You’d need to add some extra test-only routes for helping you with that.
Okay, okay, calm down :)
You’ll need a minimal Rails application in some of your application sub-directory.
Here are the instructions for doing so (You’ll need Ruby 1.9 and RubyGems installed):
The specs go to “spec/javascripts/*_spec.js(.coffee)”. They usually “=require spec_helpers” in the first line.
You’re encouraged to split your spec class in several files. Just see the example specs created by the bundled generators.
If you run the spec_helper generator and then run “rails g oojs:asset shopping_cart” (or “rake oojs:spec – –name=shopping_cart” for non Rails applications), these files will be created:
spec/javascripts/spec_helper.js.coffee:
1 | # =require application |
2 | # =require modules |
3 | # =require jquery |
4 | # =require oojspec_helpers |
5 | # #require jquery.ba-bbq # uncomment for enabling $.deparam() |
6 | # |
7 | # Put your common spec code here. |
8 | # Then put "# =require spec_helper" in your specs headers.b |
You’ll need to remove the first “# =require application” line if your application doesn’t have an application.js(.coffee) file in the assets path. All other dependencies are provided by the oojs gem.
spec/javascripts/shopping_cart_spec.js.coffee:
1 | # =require spec_helper |
2 | # =require_tree ./shopping_cart |
3 | |
4 | oojspec.describe 'ShoppingCart', new specs.ShoppingCartSpec |
spec/javascripts/shopping_cart/main.js.coffee:
1 | extendClass 'specs.ShoppingCartSpec', (spec)-> |
2 | initialize: -> |
3 | @createFakeServer() |
4 | @extend this, new specs.oojspec.AjaxHelpers(@fakeServer) |
5 | |
6 | runSpecs: -> |
7 | @beforeAll -> @fakeServer.start() |
8 | @afterAll -> @fakeServer.stop() |
9 | @before -> @fakeServer.ignoreAllRequests() |
10 | |
11 | @it 'passes', -> |
12 | @expect(@fakeServer).toBeDefined() |
Feel free to add as many files you want inside the spec/javascripts/shopping_cart/ directory.
spec/javascripts/shopping_cart_spec/fake_server.js.coffee:
1 | # =require fake_ajax_server |
2 | |
3 | createProducts = -> [ |
4 | {id: 1, name: 'One'} |
5 | {id: 2, name: 'Two'} |
6 | ] |
7 | |
8 | extendClass 'specs.ShoppingCartSpec', -> |
9 | createFakeServer: -> |
10 | @fakeServer = new FakeAjaxServer (url, settings)-> |
11 | if settings then settings.url = url else settings = url |
12 | handled = false |
13 | switch settings.dataType |
14 | when 'json' then switch settings.type |
15 | when 'get' then switch settings.url |
16 | when '/products' then handled = true; settings.success createProducts() |
17 | # when 'post' then switch settings.url |
18 | # when ... |
19 | # when undefined then switch settings.type |
20 | # when 'get' then switch settings.url |
21 | # when ... |
22 | # when 'post' then switch settings.url |
23 | # when ... |
24 | return if handled |
25 | console.log arguments |
26 | throw "Unexpected AJAX call: #{settings.url}" |
Whenever your application issue an AJAX request, and that is handled by your fake server, you’ll need to decide what to do in your specs. For example, if you click a button and wants to wait for an ajaxRequest to complete, and then process the request, do something like:
1 | @it 'asks for products when clicking on Products button', -> |
2 | $('#products-button').click() |
3 | @waitsForAjaxRequest() |
4 | @runs -> |
5 | @nextRequest '/products', 'get', 'json' # won't pass if such a request wasn't issued |
6 | @expect($('ul#products li:contains(One)')).toExist() |
Take a look at ajax_spec_helpers.js.coffee for a list of useful available helpers.
Also take a look at oojspec-jquery.js.coffee for a list of additional matchers for usage with jQuery objects.
There is a lot more to discuss but this article has already taken me a lot of time. I’m intending to write another article creating a test suite for an existent sample application to further demonstrate its capabilities.
Feel free to leave any questions or suggestions in the comments so that we can improve those techniques even more.
Happy client-side coding :)
I’d like to share some experiences I had this week trying to parse some HTML with Groovy.
Then, I’ll explain how it was better done with JRuby and it was also finished much faster too.
This week I had to extract some references from some HTML documents and store them to the database.
This is the spec of what I wanted to implement in MiniTest specs written in Ruby:
1 | # encoding: utf-8 |
2 | require 'minitest/autorun' |
3 | require_relative '../lib/references_extractor' |
4 | |
5 | describe ReferencesExtractor do |
6 | def example |
7 | %Q{ |
8 | <div cid=1> |
9 | <empty cid=11> |
10 | </empty> |
11 | some text |
12 | <div cid=12> |
13 | <div cid=121> |
14 | <empty /><another></another> |
15 | <p cid=1211>First paragraph.</p> |
16 | <p cid=1212>Second paragraph.</p> |
17 | </div> |
18 | <p cid=122>Another pa<b>ra</b>graph.</p> |
19 | </div> |
20 | </div> |
21 | } |
22 | end |
23 | |
24 | it "extract references from example" do |
25 | return |
26 | extractor = ReferencesExtractor.new example |
27 | { |
28 | ['1'] => {'1' => "some text First paragraph. Second paragraph. Another paragraph."}, |
29 | ['1211', '1212', '11'] => {'121' => "First paragraph. Second paragraph."}, |
30 | ['1211', '1212', '122'] => {'12' => "First paragraph. Second paragraph. Another paragraph."}, |
31 | ['12', '1212'] => {'12' => "First paragraph. Second paragraph. Another paragraph."}, |
32 | ['1212', '122'] => {'1212' => "Second paragraph.", '122' => "Another paragraph."}, |
33 | }.each {|cids, expected| extractor.get_references_texts(cids).must_equal(expected) } |
34 | end |
35 | end |
I had a similar test written using JUnit, with a small change to make it more easy to implement but I’ll discuss it later on in this article. Let me just explain this situation better.
Don’t ask me what “cid” means as I wasn’t the one to name this attribute, but I guess it is “c…” id, although I have no clue what is “c…” all about. It was already called this way when I started working on this project and I’m the sole developer of this project right now after lots of other developers having worked on it before me.
Part of the application I maintain has to deal with documents obtained from Edgar filings. Then a processing is made to each HTML tag so that they’re given sequential unique numbers in the “cid” attribute. Someone will then be able to review the documents and highlight certain parts of it by clicking on the elements in the page. So the database has a reference to a document and a cid list, like “1000,1029,1030” will all elements that should be highlighted. This was stored exactly this way as a string in a database column.
But some weeks ago I was requested to export the contents of some highlighted references to an Excel spreadsheet and this is somewhat more complex than it looks like. With jQuery, it would be equivalent to “$(‘[cid=12]’).text()”.
For performance reasons in the search interface I had to import all references from over 3,000 documents to the database. For the new references, I’ll do the processing with jQuery and send it already formatted to the server, but I need to do the initial import and doing the batch processing in the client-side would be painfully slow for this case.
But getting the correct output in the server-side is not that simple. For example, for those documents, there is no CSS involved, making it simpler to deal with. So “<div>some t<div>ex</div>t</div>” should be stored as “some t ex t” while “<div>some t<span>ex</span>t” should be stored as “some text”. Since this requires a deeper understanding of HTML semantics, I decided to simplify it while dealing with Groovy and assume all elements as being block-level elements while parsing the fixed HTML as XML.
Doing that in Groovy took me a full week specially due to lack of documentation of XmlParser and XmlSlurper Groovy classes.
First, I had no clue which one to choose. As they had a similar interface I decided to start with XmlParser, and then change to XmlSlurper when it was finished to compare the performance between them.
I couldn’t find any methods for searching for some XPATH or CSS expression. When you write “new XmlParser().parseText(xmlContent)”, you get a Node.
XmlParser is not an HTML parser, so the XML content should be well formed, then you need to use some library like NekoHTML or TagSoup. Then you would use it like “new XmlParser(new Parser()).parseText(xmlContent)” That’s ok, but if you want to play with it and don’t know Groovy enough for dealing with Gradle and Maven dependencies, just use a valid XML as an example.
Since I couldn’t find a search-like method for Node, I had to look for node ‘[cid=12]’ with something like this:
1 | xmlContent = '<div cid="12"> some text <span cid="13"> as an example </span>.</div>' |
2 | root = new XmlParser().parseText(xmlContent) |
3 | node = root.depthFirst().find { it.@cid == '12' } |
Calling “node.text()” would yield to ‘some text.’ and calling “node.children()” would yield to [‘some text’, spanNode, ‘.’], which means it ignores white spaces, so it is of no usage to me.
So, I tried XmlSlurper. In this case, node.text() yields to ‘ some text as an example .’. Great for this example, but when applied to node with cid 12 in the MiniTest example above, it would yield to ‘First paragraph.Second paragraph.Another paragraph.’ ignoring all white spaces, so I couldn’t use this.
But after searching a lot, I figured out that there was a class that would convert some node back to XML including all original white spaces, so it should be possible. Then I tried to get the text by myself.
“node.children()” returned [spanNodeChildInstance], ignoring the text nodes, so I was out of luck and had to dig into its source code. Finally after some hours digging the source-code I found what I was looking for: “node[0].children()” returning [‘ some text ’, spanNode, ‘.’].
It took a while before I could get this to work, but I wasn’t finished with it. I would have to navigate the XML tree for getting the final processed text. Look at the MiniTest example again and you’ll see that I needed to get node with cid 12 as equivalent to the cid list [1211, 1212, 122].
So, one of the features I needed is to look for the first node ancestral having a cid, so that I could try it to see if it was a possible node. It happens that it was not that simple as while traversing the parents maybe I couldn’t find any parent node with a cid. So, how could I check that I’ve reached the root node?
With XmlSlurper, when you call rootNode.parent() you’ll get rootNode. So, I tried something like this:
1 | parent = node.parent() |
2 | while (!parent.@cid && parent != parent.parent()) parent = parent.parent() |
But the problem is that the comparison is made by string, so I have no real way to see if I have reached the parent. So, my solution was to check for “node.name() != ‘html’” in this case. This is really a bad API design. Maybe root.parent() could return null. Also, I should be able to compare a node instead of its text.
After several days, in the end of last Thursday I could get a “working” version of a similar JUnit test passing with an implementation in Groovy. But as I wasn’t using really an HTML parser, but an XML one, it means that I couldn’t process white-spaces correctly for in-line blocks.
Then, on Friday morning I was curious how I could parse HTML with Ruby, as I never did it before. That was when I got my first smile that morning when I read this from Aaron Patterson documentation of NokoGiri:
XML is like violence - if it doesn’t solve your problems, you are not using enough of it.
The smile got even bigger when I tried this:
1 | require 'nokogiri' |
2 | Nokogiri::HTML('<div>Some <span>Te<b>x</b>t</span>.').text == 'Some Text.' # true |
The smile has shrunk a bit when I realized that I would get the same result if I replaced the inline “b” block element with a “div”. But that is ok, it was already good enough.
Other than the “text” method being more useful than the one used by XmlSlurper (new-lines are treated differently), navigating the XML tree is also much easier with NokoGiri. But I still couldn’t find a good way of finding out if some node was a root one, as calling “root.parent” would raise an exception. Fortunately, as NokoGiri supports XPATH, I didn’t need to do this manual traversing and this wasn’t an issue to my specific needs.
But there was a remaining issue. It performed very badly when compared to the Groovy version, about 4 times slower. Looking at my CPU usage statistics it was obvious to me that it wasn’t using all my CPU power, as in the Groovy version. It didn’t matter how much threads I used with CRuby, each processor wouldn’t be over 20% of the available capacity.
It is a shame that the Java API actually has a better API than Ruby for dealing with a pool of threads. It is called the Executors framework. As I couldn’t find something like this in the Ruby standard library, I tried a Ruby gem called Concur.
I didn’t investigate if the performance issues were caused by Concur implementation or the CRuby one, but I decided to give JRuby or Rubinius a try. As I already had JRuby available, I tried it first and as the results were about the same as the Groovy version, I didn’t bother to check Rubinius.
With JRuby I could use the Java Executors framework just like in Groovy and I could see all my 6 cores above 90% all the time my 10 threads have been working for importing over 3,000 documents. Unfortunately my actual servers are much slower than my computer and it took more than 4 hours in the staging server when it took about an hour and a half in my computer. The CRuby version would probably take more than 4 hours in my computer, which means it could take almost a full day in the staging and production servers.
I must explain that I haven’t tried using Ruby first because I would be able to take advantage of my models being already mapped by the Grails application, so I wouldn’t have to deal with database set-up and would be allowed to have all my code in a single language. Of course, if I knew beforehand all the pain that it would be coding this in Groovy, I would have already done this in Ruby from the beginning. And the Ruby version was a bit better than my previous attempt with Groovy with regards to some corner cases including new-lines processing.
I’m very grateful for Aaron tendelove Paterson and Charles Nutter for their awesome work on Ruby, NokoGiri and JRuby. Thanks to them I could get my work done very fast in an elegant way, saving my week of frustration with Groovy.
This is just an article’s title, not really a question with a right answer.
It is not always possible to both move forward and remain compatible with legacy code.
Usually, when a project starts there is no legacy code and every change is welcomed. Later on, when the project grows and the user’s code base gets bigger, some people will start complaining about incompatible changes because they’ll have to spend some time changing their code base when they decide to upgrade to a newer version.
When this time comes, the project has to make a decision. It should either keep moving forward and fixing badly designed API when they realize there is a better way of doing things or they should accept that an API change can be very painful for their framework/library users and decide to keep on with the bad API. Java definitely opted for the latter.
In the last weeks, I’ve been reading some articles complaining about Rails changing its API in incompatible ways too fast.
They’re not alone and I’ve seen complaints about this from several other people. In the other side I’m constantly refactoring my own code base and I appreciate Rails doing the same. In the case of libraries and framewoks, when we’re refactoring code, sometimes we come to the conclusion that some API should be better written even if it breaks old software. And I’m also not alone in thinking this way.
Unfortunately, I couldn’t find an employer to pay me to work with Rails as much as I do as a Grails/Groovy/Java developer for the last 3 years. And that is really a pain with regards to API, stability and user experience. I don’t remember complaining about anything in Ruby or Rails that I really missed since internationalization support was added to Rails in version 2.
This section has grown too fast, so I decided to split it in another article entitled How NokoGiri and JRuby saved my week.
You don’t have to read the entire article if you’re not curious enough, but the Groovy XML parsers API was so badly designed and documented that I could finish the logic with Ruby and NokoGiri in about 2 hours (with tests and setup included) while I spent the entire week trying to do the same in Groovy.
And the result in Ruby would take about the same time for the import to complete. I had to dig into Groovy’s source code due to lack of documentation and do lots of experiments to understand how things worked.
You can fix documentation issues without changing the API, but you can’t fix design issues with Groovy parsers without changing its API. So, is it worth keeping the API just for being backward-compatible and make XML parsing a pain to work with in Groovy?
There is not a better approach to take when you decide for remaining backward compatible or keep forward. So, each project will adopt some philosophy and you need to know its philosophy before adopting it or not.
If you prefer API stability over consistency and easy of use, you should choose something like Java, C++, Perl, PHP or Grails. You shouldn’t be really considering Rails.
In the other hand, if you like to be on the edge, then Rails is exactly the way to go.
Which one to choose will basically depend on these questions:
If you answered “yes” to 3, than you should consider a framework that will avoid very hard to break its API, since no one will constantly maintaining your application to keep up with all the framework upgrades with fixed security issues, for example.
In the other hand, if you have answered “yes” to 1 and 2, using a fast pace changing framework like Rails shouldn’t be an issue. In my case, I don’t write tests for my views as they’re usually very simple and doesn’t contain logic. So, when Rails changed some rules about when to use “<%= … %>” or “<% … %>”, I had to manually look at all of my views to fix them. And I had to do that twice between Rails 2 and Rails 3.1, for example because they did change this behavior back and forward and this is the kind of unnecessary change in my opinion.
Other changes I had to manually check because I don’t test my views is due the change of the output of ERB tags being escaped by default. But that is a good change and I’m pretty sure I forgot to manually escape some of them before the upgrade. So, my application was probably safer against attacks after the upgrade, so this is a good move even so it took a while for me to finish the upgrade. There was no easy path for this change.
But other than that, it was just a matter of making the test suite pass after the upgrade, and if you valuate code refactoring as much as I do, you’ll be writing tests for all code that could possibly break in some refactoring.
And this was a hard issue I have with Grails. I find it too time demanding to write tests for Grails applications and it was really a pain before Grails 2 was released. It is still not good, but I can already write most of my unit tests in Grails without much problem.
So, I would suggest you to answer the above questions first before choosing what web framework to adopt. It is not right to get a fast moving framework because its API is better designed and then later in the future ask their maintainers to stop changing because now you have a working application.
You should know how they work beforehand and accept this when you opt of it.
A while ago I’ve written on why I prefer Rails over Grails, so be aware that this is another biased article.
That old article is already outdated since Grails 2 was released, and I was asked to update that article. That was my original idea, but then the comments wouldn’t make sense anymore, so I decided to write another take on Rails and Grails comparison. But this is a completely entire new article and not just an update to the old one.
I never understood this statement although I’ve been constantly told this for a long time.
Both languages were first released in 1995, more than 15 years ago, so why wouldn’t Ruby be considered as solid as Java?
I have no idea why some people think that getting some program to compile is any indication that it should work.
Certainly those people don’t include Kent Beck and Erich Gamma or they wouldn’t have developed JUnit back in 1994, even before Java 1.0 being publicly released by Sun Microsystems.
So, as far as you understand that you need automated tests in whatever language you choose, it shouldn’t matter if the language is a static or a dynamic one.
How much? No one answers me that question. They think this way: “Java programs are compiled, so they must run faster then sofware written in any interpreted language”. People should really be worried about how fast they need their application to be before choosing their framework. If they can’t measure, they can’t compare performance, this is pretty obvious.
If you need a web application, you should be able to benchmark for your actual scenario before choosing a language and web framework. Also, if your application is very JavaScript intensive, it shouldn’t really matter the performance of the server side for many applications.
A typical web application will fetch data from some kind of database, do some parameters bindings and generate some HTML, XML or JSON result. This usually happens really fast on any language or web framework, so you shouldn’t be really concerned about language performance for web applications. Most performance improvements will be a result of some design change rather than a language change.
So, it is more likely that the framework design is more important than the language itself. If some language allows programmers to easily write better designed code, it is more likely that a framework written in such language will perform better. You should really be concerned on how fast you can develop your solution with the chosen framework/language. And I really don’t believe anyone can be as productive in Java as in any other dynamic and less verbose language.
Haven’t you ever heard that you can run Rails in the JVM through JRuby, a Ruby interpreter written in Java? The Rails test suite goes green on JRuby as well.
Grails is built on top of the well known Spring framework and Hibernate, and integrates to Maven and Ivy.
Rails was originally considered a monolithic full-stack framework, with very few dependencies on external libraries. This has changed a lot since the Rails 3 refactoring, but somehow people still see Rails as a monolithic framework.
While both Rails and Grails will reuse external libraries, Rails seems to be more well integrated to them than Grails.
This is very noticeable in the case of Hibernate integration on GORM, the Grails Object-Relational Mapper (ORM).
Rails uses by default the ActiveRecord library as their ORM solution, that implements the Active Record pattern in Ruby.
Hibernate, in the other side, adopted the Data Mapper / Unit of Work (Session) pattern.
I won’t cover the differences, merits and shortcomings of those patterns as it is out of the scope for this article and there is plenty of information around the web about them. I’d just like to state that you can opt for the DataMapper library in Ruby if you prefer this pattern.
The important thing here is to point that Grais will try to hide the Hibernate Session for newcomers to Grails and make some developers believe it implements the Active Record pattern, since the Data Mapper pattern add complexity for simple applications. The documentation will only cover Hibernate Sessions after explaining about Domain Modelling. This topic is so important for avoiding issues with Grails that it should be the first one as it can lead to several unexpected results.
If you’re planning to to use Grails, don’t do that before reading the entire documentation for GORM and this series of 3 articles about GORM Gotchas. This will save a lot of your time in the future.
GORM has bad defaults for newcomers and you’ll be surprised by when data is persisted and why you can’t call save() directly in some GORM instance in a background thread. That is usually the situation where you learn about the Hibernate Session if you haven’t read the entire documentation before.
On the other hand I haven’t found a single “gotcha” for the ActiveRecord gem, used by Rails as the default ORM implementation. Also all libraries used by Rails are very well integrated.
I’ve being coding for about 2 decades now. And still I don’t find it to be an exact science, as some would like to suppose. Otherwise, they wouldn’t ask you for time estimates on feature requests or try to use tools like MS Project to manage a software project, as if Gantt charts could be useful for this kind of project.
Of course, I can completely understand the reasons for the ones willing to bring such project management tools to the software world. Good luck to them! But I won’t talk about this subject in this article as it is too big. I would just like to state that software can be better understood when compared to sciences like Music or general Arts.
Both require lots of experience, personal feelings and are hard to estimate conclusion times since it is almost always something completely new. Although there are some recipes for certain kinds of music or movies, but then they are no longer art.
Some time ago I was asked to estimate how long it would take for me to implement a search system over some HTML documents taken from EDGAR filings. I’m pretty sure that this wouldn’t be something new for some of you who have already had experience with search engines before, but that wasn’t my case definitely. So, I knew I should research about tools like Lucene for search indexing, but I have never worked with them before. So how could I estimate this?
As I started following the tutorials, I thought the main problem was solved in the first 2 days, but I couldn’t predict that I would spend so much time reading about the configuration files for Solr, and how search params could be adjusted. There is a lot of stuff to know about and configure for your needs.
Particularly, one of the curiosities I’ve noticed is that even if my configuration was set to enable AND-like search for all typed terms, if it happens for a user to prepend some word with a plus (“+”) or minus (“-”), then non-prepended words would become optional. I had enabled the DisMax mode, by the way.
So, I’d like to talk specifically about this specific challenge as it is a good example for demonstrating some techniques I’ve learned last year after reading Clean Code. Although being very Java-oriented, this book has a few simple rules that can be applied to every language and be really effective. Just like in Music and Movie Making, Software Writing is also a science in which there are lots of resources to learn from and that can be used in a systematic way. Learning those tools and techniques will help developers to deliver more in less time.
Developers should invest time on well-written code because they’ll spend most of their time reading code. So, it makes sense to invest time and money on tools that will make it easier to browse some code as well as investing some time polishing their code so that they become more readable too.
Before talking about those simple rules, I’d like to show you how I might write this code in my early times. Don’t waste your time trying to understand this code. Then, I’ll show you the code that I’ve actually written in a couple of hours, exactly as I have estimated before, since it didn’t have any external dependencies. So, basically, this is the trend:
Transform terms like ‘some +required -not-allowed “any phrase” id:(10 or 20 or 30)’ into ‘+some +required -not-allowed +“any phrase” +id:(10 or 20 or 30)’.
Pretty simple, right? But even software like this can be bug-prone. So, here is a poor implementation (in Groovy, as I’m a Grails programmer in my current job). Don’t try to really understand it (more on this later), just take a look at the code (dis)organization. I didn’t even try to compile it.
1 | class SolrService { |
2 | ... |
3 | private String processQuery(String query) { |
4 | query = query.replaceAll('#', '') |
5 | def expressions = [], matches |
6 | while (matches = query =~ /\([^\(]*?\)/) { |
7 | matches.each { match -> |
8 | expressions << match |
9 | query = query.replace(match, "#{${expressions.size()}}".toString()) |
10 | } |
11 | } |
12 | (query =~ /\".*?\"/).each { match -> |
13 | expressions << match |
14 | query = query.replace(match, "#{${expressions.size()}}".toString()) |
15 | } |
16 | query = query.split(' ').findAll{it}.collect { word -> |
17 | word[0] in ['-', '+'] ? word : "+${word}" |
18 | }.join(' ') |
19 | def s = expressions.size() |
20 | expressions.reverse().eachWithIndex { expression, i -> |
21 | query = query.replace("#{${s - i}}", expression) |
22 | } |
23 | } |
24 | |
25 | def search(query) { |
26 | query = processQuery(query) |
27 | ... |
28 | return solrServer.request(new SolrQuery(query)) |
29 | } |
30 | } |
Ok, I’ll agree that for this specific case, the code may be not that bad, but although processQuery is not that big, you’ll need some time for figuring it out what is happened if you’re required to modify this method.
Also, looking at it, could you be sure it will work for all cases? Or could you tell me what is the reason for some specific line? What is this code protected from? How comfortable would you be if you were to modify this code? How would you write automated tests for processQuery?
Also, as the logic gets more complex, coding this way could led to some messy code like the one I’ve just taken from a file in the project that integrates Hibernate to Grails:
1 | // grails-core/grails-hibernate/src/main/groovy/grails/orm/HibernateCriteriaBuilder.java |
2 | // ... |
3 | @SuppressWarnings("rawtypes") |
4 | @Override |
5 | public Object invokeMethod(String name, Object obj) { |
6 | Object[] args = obj.getClass().isArray() ? (Object[])obj : new Object[]{obj}; |
7 | |
8 | if (paginationEnabledList && SET_RESULT_TRANSFORMER_CALL.equals(name) && args.length == 1 && |
9 | args[0] instanceof ResultTransformer) { |
10 | resultTransformer = (ResultTransformer) args[0]; |
11 | return null; |
12 | } |
13 | |
14 | if (isCriteriaConstructionMethod(name, args)) { |
15 | if (criteria != null) { |
16 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here")); |
17 | } |
18 | |
19 | if (name.equals(GET_CALL)) { |
20 | uniqueResult = true; |
21 | } |
22 | else if (name.equals(SCROLL_CALL)) { |
23 | scroll = true; |
24 | } |
25 | else if (name.equals(COUNT_CALL)) { |
26 | count = true; |
27 | } |
28 | else if (name.equals(LIST_DISTINCT_CALL)) { |
29 | resultTransformer = CriteriaSpecification.DISTINCT_ROOT_ENTITY; |
30 | } |
31 | |
32 | createCriteriaInstance(); |
33 | |
34 | // Check for pagination params |
35 | if (name.equals(LIST_CALL) && args.length == 2) { |
36 | paginationEnabledList = true; |
37 | orderEntries = new ArrayList<Order>(); |
38 | invokeClosureNode(args[1]); |
39 | } |
40 | else { |
41 | invokeClosureNode(args[0]); |
42 | } |
43 | |
44 | if (resultTransformer != null) { |
45 | criteria.setResultTransformer(resultTransformer); |
46 | } |
47 | Object result; |
48 | if (!uniqueResult) { |
49 | if (scroll) { |
50 | result = criteria.scroll(); |
51 | } |
52 | else if (count) { |
53 | criteria.setProjection(Projections.rowCount()); |
54 | result = criteria.uniqueResult(); |
55 | } |
56 | else if (paginationEnabledList) { |
57 | // Calculate how many results there are in total. This has been |
58 | // moved to before the 'list()' invocation to avoid any "ORDER |
59 | // BY" clause added by 'populateArgumentsForCriteria()', otherwise |
60 | // an exception is thrown for non-string sort fields (GRAILS-2690). |
61 | criteria.setFirstResult(0); |
62 | criteria.setMaxResults(Integer.MAX_VALUE); |
63 | |
64 | // Restore the previous projection, add settings for the pagination parameters, |
65 | // and then execute the query. |
66 | if (projectionList != null && projectionList.getLength() > 0) { |
67 | criteria.setProjection(projectionList); |
68 | } else { |
69 | criteria.setProjection(null); |
70 | } |
71 | for (Order orderEntry : orderEntries) { |
72 | criteria.addOrder(orderEntry); |
73 | } |
74 | if (resultTransformer == null) { |
75 | criteria.setResultTransformer(CriteriaSpecification.ROOT_ENTITY); |
76 | } |
77 | else if (paginationEnabledList) { |
78 | // relevant to GRAILS-5692 |
79 | criteria.setResultTransformer(resultTransformer); |
80 | } |
81 | // GRAILS-7324 look if we already have association to sort by |
82 | Map argMap = (Map)args[0]; |
83 | final String sort = (String) argMap.get(GrailsHibernateUtil.ARGUMENT_SORT); |
84 | if (sort != null) { |
85 | boolean ignoreCase = true; |
86 | Object caseArg = argMap.get(GrailsHibernateUtil.ARGUMENT_IGNORE_CASE); |
87 | if (caseArg instanceof Boolean) { |
88 | ignoreCase = (Boolean) caseArg; |
89 | } |
90 | final String orderParam = (String) argMap.get(GrailsHibernateUtil.ARGUMENT_ORDER); |
91 | final String order = GrailsHibernateUtil.ORDER_DESC.equalsIgnoreCase(orderParam) ? |
92 | GrailsHibernateUtil.ORDER_DESC : GrailsHibernateUtil.ORDER_ASC; |
93 | int lastPropertyPos = sort.lastIndexOf('.'); |
94 | String associationForOrdering = lastPropertyPos >= 0 ? sort.substring(0, lastPropertyPos) : null; |
95 | if (associationForOrdering != null && aliasMap.containsKey(associationForOrdering)) { |
96 | addOrder(criteria, aliasMap.get(associationForOrdering) + "." + sort.substring(lastPropertyPos + 1), |
97 | order, ignoreCase); |
98 | // remove sort from arguments map to exclude from default processing. |
99 | @SuppressWarnings("unchecked") Map argMap2 = new HashMap(argMap); |
100 | argMap2.remove(GrailsHibernateUtil.ARGUMENT_SORT); |
101 | argMap = argMap2; |
102 | } |
103 | } |
104 | GrailsHibernateUtil.populateArgumentsForCriteria(grailsApplication, targetClass, criteria, argMap); |
105 | GrailsHibernateTemplate ght = new GrailsHibernateTemplate(sessionFactory, grailsApplication); |
106 | PagedResultList pagedRes = new PagedResultList(ght, criteria); |
107 | result = pagedRes; |
108 | } |
109 | else { |
110 | result = criteria.list(); |
111 | } |
112 | } |
113 | else { |
114 | result = GrailsHibernateUtil.unwrapIfProxy(criteria.uniqueResult()); |
115 | } |
116 | if (!participate) { |
117 | hibernateSession.close(); |
118 | } |
119 | return result; |
120 | } |
121 | |
122 | if (criteria == null) createCriteriaInstance(); |
123 | |
124 | MetaMethod metaMethod = getMetaClass().getMetaMethod(name, args); |
125 | if (metaMethod != null) { |
126 | return metaMethod.invoke(this, args); |
127 | } |
128 | |
129 | metaMethod = criteriaMetaClass.getMetaMethod(name, args); |
130 | if (metaMethod != null) { |
131 | return metaMethod.invoke(criteria, args); |
132 | } |
133 | metaMethod = criteriaMetaClass.getMetaMethod(GrailsClassUtils.getSetterName(name), args); |
134 | if (metaMethod != null) { |
135 | return metaMethod.invoke(criteria, args); |
136 | } |
137 | |
138 | if (isAssociationQueryMethod(args) || isAssociationQueryWithJoinSpecificationMethod(args)) { |
139 | final boolean hasMoreThanOneArg = args.length > 1; |
140 | Object callable = hasMoreThanOneArg ? args[1] : args[0]; |
141 | int joinType = hasMoreThanOneArg ? (Integer)args[0] : CriteriaSpecification.INNER_JOIN; |
142 | |
143 | if (name.equals(AND) || name.equals(OR) || name.equals(NOT)) { |
144 | if (criteria == null) { |
145 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here")); |
146 | } |
147 | |
148 | logicalExpressionStack.add(new LogicalExpression(name)); |
149 | invokeClosureNode(callable); |
150 | |
151 | LogicalExpression logicalExpression = logicalExpressionStack.remove(logicalExpressionStack.size()-1); |
152 | addToCriteria(logicalExpression.toCriterion()); |
153 | |
154 | return name; |
155 | } |
156 | |
157 | if (name.equals(PROJECTIONS) && args.length == 1 && (args[0] instanceof Closure)) { |
158 | if (criteria == null) { |
159 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here")); |
160 | } |
161 | |
162 | projectionList = Projections.projectionList(); |
163 | invokeClosureNode(callable); |
164 | |
165 | if (projectionList != null && projectionList.getLength() > 0) { |
166 | criteria.setProjection(projectionList); |
167 | } |
168 | |
169 | return name; |
170 | } |
171 | |
172 | final PropertyDescriptor pd = BeanUtils.getPropertyDescriptor(targetClass, name); |
173 | if (pd != null && pd.getReadMethod() != null) { |
174 | ClassMetadata meta = sessionFactory.getClassMetadata(targetClass); |
175 | Type type = meta.getPropertyType(name); |
176 | if (type.isAssociationType()) { |
177 | String otherSideEntityName = |
178 | ((AssociationType) type).getAssociatedEntityName((SessionFactoryImplementor) sessionFactory); |
179 | Class oldTargetClass = targetClass; |
180 | targetClass = sessionFactory.getClassMetadata(otherSideEntityName).getMappedClass(EntityMode.POJO); |
181 | if (targetClass.equals(oldTargetClass) && !hasMoreThanOneArg) { |
182 | joinType = CriteriaSpecification.LEFT_JOIN; // default to left join if joining on the same table |
183 | } |
184 | associationStack.add(name); |
185 | final String associationPath = getAssociationPath(); |
186 | createAliasIfNeccessary(name, associationPath,joinType); |
187 | // the criteria within an association node are grouped with an implicit AND |
188 | logicalExpressionStack.add(new LogicalExpression(AND)); |
189 | invokeClosureNode(callable); |
190 | aliasStack.remove(aliasStack.size() - 1); |
191 | if (!aliasInstanceStack.isEmpty()) { |
192 | aliasInstanceStack.remove(aliasInstanceStack.size() - 1); |
193 | } |
194 | LogicalExpression logicalExpression = logicalExpressionStack.remove(logicalExpressionStack.size()-1); |
195 | if (!logicalExpression.args.isEmpty()) { |
196 | addToCriteria(logicalExpression.toCriterion()); |
197 | } |
198 | associationStack.remove(associationStack.size()-1); |
199 | targetClass = oldTargetClass; |
200 | |
201 | return name; |
202 | } |
203 | } |
204 | } |
205 | else if (args.length == 1 && args[0] != null) { |
206 | if (criteria == null) { |
207 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] not supported here")); |
208 | } |
209 | |
210 | Object value = args[0]; |
211 | Criterion c = null; |
212 | if (name.equals(ID_EQUALS)) { |
213 | return eq("id", value); |
214 | } |
215 | |
216 | if (name.equals(IS_NULL) || |
217 | name.equals(IS_NOT_NULL) || |
218 | name.equals(IS_EMPTY) || |
219 | name.equals(IS_NOT_EMPTY)) { |
220 | if (!(value instanceof String)) { |
221 | throwRuntimeException(new IllegalArgumentException("call to [" + name + "] with value [" + |
222 | value + "] requires a String value.")); |
223 | } |
224 | String propertyName = calculatePropertyName((String)value); |
225 | if (name.equals(IS_NULL)) { |
226 | c = Restrictions.isNull(propertyName); |
227 | } |
228 | else if (name.equals(IS_NOT_NULL)) { |
229 | c = Restrictions.isNotNull(propertyName); |
230 | } |
231 | else if (name.equals(IS_EMPTY)) { |
232 | c = Restrictions.isEmpty(propertyName); |
233 | } |
234 | else if (name.equals(IS_NOT_EMPTY)) { |
235 | c = Restrictions.isNotEmpty(propertyName); |
236 | } |
237 | } |
238 | |
239 | if (c != null) { |
240 | return addToCriteria(c); |
241 | } |
242 | } |
243 | |
244 | throw new MissingMethodException(name, getClass(), args); |
245 | } |
246 | // ... |
I do really hope never to have to understand such code… I’d be curious to find how would such an automated test be written for this invokeMethod, as I couldn’t find the tests in this project.
Back to the original implementation, what would be wrong with such code?
Even if you try to split processQuery into smaller methods, you would be required to pass some common values over and over again, like query and the expressions array, that would not only be an in-parameter but would be an out-parameter too as it would have to be changed inside some methods… When that happens, it is a hint that the overall code needs a separate class for doing the job. This is one of the simple rules I’ve learned in Clean Code.
While reading the original example, the first thing you’ll see is the processQuery method declared in the SolrService class. What does it do? Why do we need it? Who is using it? Only when we look forward, we’ll be able to detect that it is being used from the search method.
I was always used to write code that way, writing the least dependent methods first and the higher level ones as the latest ones. I guess I thought they should be declared first before they could be mentioned. Maybe that was true for some procedural languages I’ve started with before my first experience with OOP while reading a book about C++.
But in all OO languages I know about, it is ok to declare your methods in any order. Writing them top down makes it easier for another reader to understand your code because he/she will read your high-level instructions first.
Keeping your methods really small will make it easier to understand them and to write unit tests against them too. They’ll also be less error-prone.
Having lots of parameters in methods makes it really complicate to associate what is the meaning of each parameter. Looking at this code, could you understand what is the meaning of the last parameters?
1 | request.setAction(ACTION.POST, true, true, 10, false) |
You’d certainly have to checkout the API for AbstractUpdateRequest.
This is a typical example where you’d probably be better served by a separate class.
When you find out some situation where you’d like to return multiple values (I’m not talking about returning a single list, here) and you need some parameter for returning them (and out-parameter), you should reconsider if you’re taking the right path.
Also, you should really try to avoid modifying any parameter as debugging such code can be really frustrating.
This one is gold. Good names are essential for a good code reading experience. It can save you several hours trying to understand some snippet of code.
Take a look at the signature of the invokeMethod method in the Grails-Hibernate integration example code:
1 | Object invokeMethod(String name, Object obj) |
Wouldn’t it be easier to understand what it does if the signature was changed to this one?
1 | Object invokeMethodWith(String methodName, Object methodArguments) |
2 | |
3 | // code would look like (just supposing, I'm not sure): |
4 | criteria.invokeMethodWith("eq", [attributeName, expectedValue]) |
What does “obj” mean in the actual implementation? It could be anything with such generic description. Investing some time choosing good names for your methods and variables can save a lot of time from others trying to understand what the code does.
Just by making use of those simple rules, you’ll be able to:
Some rules I’ve being using for my entire life and I’m not sure if they are all documented in the Clean Code book or not. But I’d like to talk a bit about them too.
I’ve seen pseudo-code like this, so many times:
1 | declare square_root(number) { |
2 | if (number >= 0) { |
3 | do_real_calculations_with(number) |
4 | } |
5 | } |
Often, there are even more validation rules inside each block and this style gets really hard to read. And, worse than that, it is only protecting the software from crashing or generating an unexpected exception, but it does not properly handle bad inputs (negative numbers).
Also, usually do_real_calculations_with(number) is written as pages of code in a way you won’t be able to see the enclosing brackets of the block in a single page. Take a look again at the Hibernate-Grails integration code to see if you can easily find out where the block beginning at “if (isCriteriaConstructionMethod(name, args)) {” ends.
Even when you don’t have to do anything if the necessary conditions are not met, I’d rather code this way:
1 | declare square_root(number) { |
2 | if (number < 0) return // or raise "Taking the square root of negative numbers is not supported by this implementation" |
3 | do_real_calculations_with(number) |
4 | } |
This is a real example found in PersistentManagerBase.java from the Tomcat project.
1 | protected void processMaxIdleSwaps() { |
2 | |
3 | if (!getState().isAvailable() || maxIdleSwap < 0) |
4 | return; |
5 | |
6 | Session sessions[] = findSessions(); |
7 | long timeNow = System.currentTimeMillis(); |
8 | |
9 | // Swap out all sessions idle longer than maxIdleSwap |
10 | if (maxIdleSwap >= 0) { |
11 | for (int i = 0; i < sessions.length; i++) { |
12 | StandardSession session = (StandardSession) sessions[i]; |
13 | synchronized (session) { |
14 | if (!session.isValid()) |
15 | continue; |
16 | int timeIdle = // Truncate, do not round up |
17 | (int) ((timeNow - session.getThisAccessedTime()) / 1000L); |
18 | if (timeIdle > maxIdleSwap && timeIdle > minIdleSwap) { |
19 | if (session.accessCount != null && |
20 | session.accessCount.get() > 0) { |
21 | // Session is currently being accessed - skip it |
22 | continue; |
23 | } |
24 | if (log.isDebugEnabled()) |
25 | log.debug(sm.getString |
26 | ("persistentManager.swapMaxIdle", |
27 | session.getIdInternal(), |
28 | Integer.valueOf(timeIdle))); |
29 | try { |
30 | swapOut(session); |
31 | } catch (IOException e) { |
32 | // This is logged in writeSession() |
33 | } |
34 | } |
35 | } |
36 | } |
37 | } |
38 | } |
It is hard to see what bracket is closing which bracket in the end… This could be rewritten as:
1 | ... |
2 | if (maxIdleSwap < 0) return; |
3 | for (int i = 0; i < sessions.length; i++) { |
4 | ... |
5 | if (timeIdle <= maxIdleSwap || timeIdle < minIdleSwap) continue; |
6 | if (session.accessCount != null && session.accessCount.get() > 0) continue; |
7 | ... |
The pattern is:
1 | if some_condition |
2 | lots of lines of complex code handling here |
3 | else |
4 | simple handling for the case where some_condition is false |
Here is a concrete example taken from ActiveRecord::Explain.
1 | def logging_query_plan # :nodoc: |
2 | threshold = auto_explain_threshold_in_seconds |
3 | current = Thread.current |
4 | if threshold && current[:available_queries_for_explain].nil? |
5 | begin |
6 | queries = current[:available_queries_for_explain] = [] |
7 | start = Time.now |
8 | result = yield |
9 | logger.warn(exec_explain(queries)) if Time.now - start > threshold |
10 | result |
11 | ensure |
12 | current[:available_queries_for_explain] = nil |
13 | end |
14 | else |
15 | yield |
16 | end |
17 | end |
I would rather write such code as:
1 | def logging_query_plan # :nodoc: |
2 | threshold = auto_explain_threshold_in_seconds |
3 | current = Thread.current |
4 | return yield unless threshold && current[:available_queries_for_explain].nil? |
5 | queries = current[:available_queries_for_explain] = [] |
6 | start = Time.now |
7 | result = yield |
8 | logger.warn(exec_explain(queries)) if Time.now - start > threshold |
9 | result |
10 | ensure |
11 | current[:available_queries_for_explain] = nil |
12 | end |
Of course, this isn’t exactly the same as the original code in the case yield generates some exception for the “else” code, but I’m sure this could be worked around.
I’ve often found this pattern while reading Java code and I believe that is the result of using some Java IDE. The IDE will tell the developer that some exceptions were not handled and will automatically fill the code as:
1 | void myMethod() throws MyOwnException { |
2 | try { |
3 | someMethod() |
4 | } |
5 | catch(FileNotFoundException ex) { |
6 | throw MyOwnException("File was not found") |
7 | } |
8 | catch(WrongPermissionException ex) { |
9 | throw MyOwnException("You don't have the right permission to write to the file") |
10 | } |
11 | catch(CorruptFileException ex) { |
12 | throw MyOwnException("The file is corrupted") |
13 | } |
14 | ... |
15 | } |
If you’re only interested in gracefully handle exceptions to give your user a better feedback, why doesn’t you just write this instead:
1 | void myMethod() throws MyOwnException { |
2 | try { |
3 | someMethod() |
4 | } catch(Exception ex) { |
5 | log.error("Couldn't perform XYZ action", ex) |
6 | throw new MyOwnException("Sorry, couldn't perform XYZ action. Please contact our support team and we'll investigate this issue.") |
7 | } |
8 | } |
And, finally, following those techniques, here is how I actually coded that original challenge and implemented the tests in JUnit:
1 | class SearchService { |
2 | ... |
3 | def search(query) { |
4 | query = new QueryProcessor(query).processedQuery |
5 | ... |
6 | new SearchResult(solrServer.request(new SolrQuery(query))) |
7 | } |
8 | } |
I’ll omit the implementation of SearchResult class, as it is irrelevant to this specific challenge. I just want to point out that I’ve abstracted the search feature in some wrapper classes for not exposing Solr internals.
And here is the real implementation code:
1 | package myappname.search |
2 | |
3 | /* Solr behaves in an uncommon way: |
4 | Even when configured for making an "AND" search, when a signal (+ or -) |
5 | is prepended to any word, the ones that are not prepended are considered optionals. |
6 | We don't want that, so we're prefixing all terms with a "+" unless they're already |
7 | prefixed. |
8 | */ |
9 | class QueryProcessor { |
10 | private query, expressions = [], words = [] |
11 | |
12 | QueryProcessor(query) { this.query = query } |
13 | |
14 | def getProcessedQuery() { |
15 | removeHashesFromQuery() |
16 | extractParenthesis() |
17 | extractQuotedText() |
18 | splitWords() |
19 | addPlusSignToUnsignedWords() |
20 | joinProcessedWords() |
21 | replaceExpressions() |
22 | query |
23 | } |
24 | |
25 | private removeHashesFromQuery() { query = query.replaceAll('#', '') } |
26 | |
27 | private extractParenthesis() { |
28 | def matches = query =~ /\([^\(]*?\)/ |
29 | if (!matches) return |
30 | replaceMatches(matches) |
31 | // keep trying in case of nested parenthesis |
32 | extractParenthesis() |
33 | } |
34 | |
35 | private replaceMatches(matches) { |
36 | matches.each { |
37 | expressions << it |
38 | query = query.replace(it, "#{${expressions.size()}}".toString()) |
39 | } |
40 | } |
41 | |
42 | private extractQuotedText() { |
43 | replaceMatches(query =~ /\".*?\"/) |
44 | } |
45 | |
46 | private splitWords() { |
47 | words = query.split(' ').findAll{it} |
48 | } |
49 | |
50 | private addPlusSignToUnsignedWords() { |
51 | words = words.collect { word -> |
52 | word[0] in ['-', '+'] ? word : "+${word}" |
53 | } |
54 | } |
55 | |
56 | private joinProcessedWords() { query = words.join(' ') } |
57 | |
58 | private replaceExpressions() { |
59 | def s = expressions.size() |
60 | expressions.reverse().eachWithIndex { expression, i -> |
61 | query = query.replace("#{${s - i}}", expression) |
62 | } |
63 | } |
64 | } |
And the unit tests:
1 | package myappname.search |
2 | |
3 | import org.junit.* |
4 | |
5 | class QueryProcessorTests { |
6 | @Test |
7 | void removeHashesFromQuery() { |
8 | def p = new QueryProcessor('some#hashes # in # query') |
9 | p.removeHashesFromQuery() |
10 | assert p.query == 'somehashes in query' |
11 | } |
12 | |
13 | @Test |
14 | void extractParenthesis() { |
15 | def p = new QueryProcessor('(abc (cde fgh)) no parenthesis transaction_id:(ijk) (lmn)') |
16 | p.extractParenthesis() |
17 | assert p.query == '#{4} no parenthesis transaction_id:#{2} #{3}' |
18 | assert p.expressions == ['(cde fgh)', '(ijk)', '(lmn)', '(abc #{1})'] |
19 | } |
20 | |
21 | @Test |
22 | void extractQuotedText() { |
23 | def p = new QueryProcessor('some "quoted" text and "some more"') |
24 | p.extractQuotedText() |
25 | assert p.query == 'some #{1} text and #{2}' |
26 | assert p.expressions == ['"quoted"', '"some more"'] |
27 | } |
28 | |
29 | @Test |
30 | void splitWords() { |
31 | def p = new QueryProcessor('some #{1} text and id:#{2} ') |
32 | p.splitWords() |
33 | assert p.words == ['some', '#{1}', 'text', 'and', 'id:#{2}'] |
34 | } |
35 | |
36 | @Test |
37 | void addPlusSignToUnsignedWords() { |
38 | def p = new QueryProcessor('some #{1} -text and id:#{2} +text ') |
39 | p.splitWords() |
40 | p.addPlusSignToUnsignedWords() |
41 | assert p.words == ['+some', '+#{1}', '-text', '+and', '+id:#{2}', '+text'] |
42 | } |
43 | |
44 | @Test |
45 | void joinProcessedWords() { |
46 | def p = new QueryProcessor('') |
47 | p.words = ['+some', '-minus', '+#{1}'] |
48 | p.joinProcessedWords() |
49 | assert p.query == "+some -minus +#{1}" |
50 | } |
51 | |
52 | @Test |
53 | void replaceExpressions() { |
54 | def p = new QueryProcessor('+#{1} -minus +transaction_id:#{2}') |
55 | p.expressions = ['first', '(23 or 98)'] |
56 | p.replaceExpressions() |
57 | assert p.query == '+first -minus +transaction_id:(23 or 98)' |
58 | } |
59 | |
60 | @Test |
61 | void processedQuery() { |
62 | def p = new QueryProcessor('coca-cola -pepsi transaction_id:(34 or 76)') |
63 | assert p.processedQuery == '+coca-cola -pepsi +transaction_id:(34 or 76)' |
64 | } |
65 | } |
That is it. I’d like you to share your opinions on other techniques I may have not talked about here. Are there any improvements that you think would make this code even easier to understand? I’d really appreciate any other considerations you might have since I’m always very interested in writing Clean Code.
For some years now, I’ve been writing lots of JavaScript. Not that I chose to, but it is the only available language for client-side programming. Well, not really since there are some languages that will compile to JavaScript. So, I chose to work with CoffeeScript lately, since it is far better than JavaScript for my tastes.
All this client-side programming requires testing too. While sometimes testing using real browsers suits better, tools like Selenium are extremely slow if you have tons of JavaScript to test. So, I was looking for a faster alternative that allowed me to test my client-side code.
Before I present the approach I decided to take, I’d like to warn you that there are lots of good alternatives out there. If you want to take a look at how to use the excellent PhantomJS headless webkit browser, you might be interested in this article.
I decided to go with a solution based on Node.js, a fast runtime JavaScript environment built on top of Google’s V8 engine. Even using Node.js, you’ll find out many good alternatives like Zombie.js, which can also be integrated to the excellent integration test framework Capybara through capybara-zombie. It can also be integrated to Jasmine through zombie-jasmine-spike.
Even though there are great options out there, I still chose another approach for no special reason. The interesting thing about Node.js, is that there’s an interesting ecosystem behind it with tools like NPM which is a package manager for Node, similar to apt on Debian, for instance. On Debian, it can be installed with:
1 | apt-get install -y node npm |
But I would recommend installing just node through apt, and install npm using the instructions here:
1 | curl http://npmjs.org/install.sh | sh |
The reason for that is that the search command of the npm command provided by the Debian package was not working for me, running the list command instead. Maybe this happens only in the unstable distribution, but I don’t want to get out of the main subject here.
Since we want to test our client-side script, it is necessary to install some library to emulate the browsers DOM, since Node won’t provide one itself. The jsdom library seems to be the de facto standard one for creating a DOM environment.
I don’t really like to read assertions, prefering expectations instead. If you’re like me, you’ll like the Jasmine.js library for writing your expectations in JavaScript. If you don’t want to write integration tests, chances are that you’ll need to mock your AJAX calls. Sinon.js is an excellent framework that will allow you to do that. And since I avoid JavaScript itself at all cost, I’ll write all my examples using CoffeeScript.
If your web framework, differently from Rails doesn’t support CoffeeScript by default, and still you got an interest on this language, you can use Jitter to watch your CoffeeScript files and convert them to JavaScript on the fly. It will replicate your directory structure, converting all your .coffee files to .js:
1 | jitter src/coffee/ web-app/js/ |
Install all those dependencies with NPM:
1 | npm install jitter jasmine-node jsdom |
Although you can install jQuery and Sinon.js with ‘npm install jquery sinon’, that won’t make sense, since you’ll want to load them from your DOM environment. So download Sinon.js to your hard-disk to get faster tests.
I don’t practice TDD (or BDD) and I this is a conscious choice. I find it faster to write the implementation first and then write the tests. So, proceeding with this approach, let me show you an example for a “Terms and Conditions” page. Here’s a possible implementation (I’m showing only the client-side part):
1 | <!DOCTYPE html> |
2 | <html> |
3 | <head> |
4 | <script type="text/javascript" src="js/jquery.min.js"></script> |
5 | <link rel="stylesheet" type="text/css" href="css/jquery.ui.css"> |
6 | <script type="text/javascript" src="js/jquery-ui.min.js"></script> |
7 | <script type="text/javascript" src="js/wmd/showdown.js"></script> |
8 | <script type="text/javascript" src="js/show-terms-and-conditions.js"></script> |
9 | </head> |
10 | <body> |
11 | </body> |
12 | </html> |
Showdown is a JS library for converting Markdown to HTML. Here is the show-terms-and-conditions.coffee equivalent in CoffeeScript:
1 | $ -> |
2 | converter = new Attacklab.showdown.converter() |
3 | lastTermsAndConditions = {} |
4 | $.get 'termsAndConditions/lastTermsAndConditions', (data) -> |
5 | lastTermsAndConditions = data |
6 | $('<div/>').html(converter.makeHtml(lastTermsAndConditions.termsAndConditions)) |
7 | .dialog |
8 | width: 800, height: 600, modal: true, buttons: |
9 | 'I agree': onAgreement, 'Log out': onLogout |
10 | |
11 | onAgreement = -> |
12 | $.post 'termsAndConditions/agree', id: lastTermsAndConditions.id, => |
13 | $(this).dialog('close') |
14 | window.location = '../' # redirect to home |
15 | |
16 | onLogout = -> |
17 | $(this).dialog('close') |
18 | window.location = '../logout' # sign out |
As you can see, this will issue an AJAX request as soon as the page is loaded. So, we need to fake the AJAX call before we run show-terms-and-conditions.js. This can be easily done with this fake-ajax.js, using Sinon.js:
1 | sinon.stub($, 'ajax') |
If you’re not using jQuery, you can try the “sinon.useFakeXMLHttpRequest()” documented in the “Fake XHR” example in Sinon.js site.
Ok, so here is a possible example of specification for this code in CoffeeScript. Jasmine-sinon can help you to write better expectations, so download it to ‘spec/js/jasmine-sinon.js’.
1 | # spec/js/show-terms-and-conditions.spec.coffee: |
2 | |
3 | require './jasmine-sinon' # wouldn't you love if vanilla JavaScript also supported 'require'? |
4 | dom = require 'jsdom' |
5 | |
6 | #f = (fn) -> __dirname + '/../../web-app/js/' + fn # if you prefer to be more explicit |
7 | f = (fn) -> '../../web-app/js/' + fn |
8 | |
9 | window = $ = null |
10 | |
11 | dom.env |
12 | html: '<body></body>' # or require('fs').readFileSync("#{__dirname}/spec/fixures/any.html").toString() |
13 | scripts: ['sinon.js', f('jquery/jquery.min.js'), f('jquery/jquery-ui.min.js'), f('wmd/showdown.js'), 'ajax-faker.js', |
14 | f('showTermsAndConditions.js')] |
15 | # src: ["console.log('all scripts were loaded')", "var loaded=true"] |
16 | done: (errors, _window) -> |
17 | console.log("errors:", errors) if errors |
18 | window = _window |
19 | $ = window.$ |
20 | # jasmine.asyncSpecDone() if window.loaded |
21 | |
22 | # We must tell Jasmine to wait until the DOM is loaded and the script is run |
23 | # Jasmine doesn't support a beforeAll, like RSpec |
24 | beforeEach(-> waitsFor -> $) unless $ |
25 | # another approach: (you should uncomment the line above for it to work) |
26 | # already_run = false |
27 | # beforeEach -> already_run ||= jasmine.asyncSpecWait() or true |
28 | |
29 | describe 'showing Terms and Conditions', -> |
30 | |
31 | it 'should get last Terms and Conditions', -> |
32 | @after -> $.ajax.restore() # undo the stubbed ajax call introduced by fake-ajax.js after this example. |
33 | expect($.ajax).toHaveBeenCalledOnce() |
34 | firstAjaxCallArgs = $.ajax.getCall(0).args[0] |
35 | expect(firstAjaxCallArgs.url).toEqual 'termsAndConditions/lastTermsAndConditions' |
36 | firstAjaxCallArgs.success id: 1, termsAndConditions: '# title' |
37 | |
38 | describe 'after set-up', -> |
39 | beforeEach -> window.sinon.stub $, 'ajax' |
40 | afterEach -> $.ajax.restore() |
41 | afterEach -> $('.ui-dialog').dialog 'open' # it is usually closed at the end of each example |
42 | |
43 | it 'should convert markdown to HTML', -> expect($('h1').text()).toEqual 'title' |
44 | |
45 | it 'should close the dialog, send a request to server and redirect to ../ when the terms are accepted', -> |
46 | $('button:contains(I agree)').click() |
47 | ajaxRequestArgs = $.ajax.args[0][0] |
48 | expect(ajaxRequestArgs.url).toEqual 'termsAndConditions/agree' |
49 | expect(ajaxRequestArgs.data).toEqual id: 1 |
50 | |
51 | ajaxRequestArgs.success() |
52 | expect(window.location).toEqual '../' |
53 | expect($('.ui-dialog:visible').length).toEqual 0 |
54 | |
55 | it 'should close the dialog and redirect to ../logout when the terms are not accepted', -> |
56 | # the page wasn't really redirected in this simulation by the prior example |
57 | $('button:contains(Log out)').click() |
58 | expect(window.location).toEqual '../logout' |
59 | expect($('.ui-dialog:visible').length).toEqual 0 |
You can run this spec with:
1 | jasmine-node --coffee spec/js/ |
The output should be something like:
1 | Started |
2 | .... |
3 | |
4 | Finished in 0.174 seconds |
5 | 2 tests, 9 assertions, 0 failures |
Instead of writing “expect($(‘.ui-dialog:visible’).length).toEqual 0”, BDD would advice you to write “expect($(‘.ui-dialog’)).toBeVisible()” instead. Jasmine allows you to write custom matchers. Take a look at my jQuery matchers for an example.
Unfortunately, due to a bug in jsdom, the expected implementations of toBeVisible and toBeHidden won’t work for my cases, where I usually do that by toggling the hidden CSS class (.hidden {display: none}) of my elements. So, I check for this CSS class on my jQuery matchers.
Anyway, I’m just starting to write tests this way. Maybe there are better ways of writing tests like those.
Finally, if you want, you can also set up some auto-testing environment using a tool such as Guard that will watch your JavaScript (or CoffeeScript) files for changes and call jasmine-node on them. Here is an example Guardfile:
1 | guard 'jasmine-node', jasmine_node_bin: File.expand_path("#{ENV['HOME']}/node_modules/jasmine-node/bin/jasmine-node") do |
2 | watch(%r{^(spec/js/[^\.].+\.spec\.coffee)}) { |m| m[1] } |
3 | watch('spec/js/jasmine-sinon.js'){ 'spec/js/' } |
4 | end |
If you have any tips, please leave a comment.
Enjoy!
Have you always wanted to add just part of your modified file to the index stage?
Usually, that happens when you’re working in a feature or bug and then realizes another issue in the file. It could be another bug, interesting feature, documentation, comment or just code formatting.
If you’re like me, you won’t include both modifications into a single commit. Then what to do?
What I used to do when I realized this before actually fixing that bug was calling “git stash”, fix the bug and “git stash pop”. This works well for simple fixes, if you didn’t change your database, so that the application will continue to work after “git stash”.
But what if you have already fixed the code? You could undo the fix, save the file, add it to index, and then redo the fix. Believe me, I’ve done that several times.
But I won’t do it anymore! Don’t worry, I’ll keep my commits separate. It’s just that I found a better way of doing this: “git add -e” (or “git add -p” and choosing the “e” option). Go try it if you don’t know this already. Much easier to try it than to try to explain it! ;) Also “git help add” will explain it better than me. See EDITING PATCHES section.
I’ve been willing to write such an article for 2 years now. A recent thread in Grails users mailing list triggered the initiative to finally write it. Actually, I was replying a message but it became too big and I decided to take the chance to write an article on the subject.
That was the thread subject. And the text following is my answer.
I’ve been working with Grails for more than 2 years now. Before that, I learned Rails in 2007 and like it. I didn’t move to Grails because I love Grails though.
I moved because I changed my job and Grails was used in the new job. I’ve changed my job again last month, initially to work with Rails but then, when they found out that I also knew Groovy and Grails, they decided to offer me another Grails opportunity.
So here I am, working with Grails for probably more two years at least I would guess… Since 2007, I never stopped watching Rails or Ruby closely, so I think I’m pretty able to compare both.
Then, I would say that choosing between them will depend on what you want to achieve. If you want to run your application in a Java web container, maybe Grails is the way to go. I’ve never deployed a Rails application with JRuby and Warbler, so I’m just guessing.
If you just want to be able to integrate your web application to your legacy Java code, than both Groovy and JRuby will allow you to do that easily. Differently from Groovy, though, JRuby will allow you to “require” jar’s at run-time easily. But maybe Grails has better integration with Maven. Again, I say maybe because I never tried to do that with the JRuby + Warbler approach besides really simple experiments.
If you just want to write web applications, than you’re in the situation as me and I can help you more on that.
Let me explain to you what are the reasons I prefer Rails myself and what I don’t like in Grails. I invite all Grails community to participate in this discussion and help alleviate the shortcomings perceived by me about Grails.
I don’t know if that is your case, but I don’t even consider writing a new application without a good test coverage. Unfortunately I was not given the opportunity to do that yet because the companies I worked with didn’t want to give me time for writing the tests.
Unfortunately, this seems to be a common approach in Grails community as most of the plugins I used didn’t have test coverage, so I guess my companies were not alone. In the other side, it is a strong practice of Rubysts to write tests for their code, including most plugins available. Also, the Rails code base itself has a great test coverage. In the other side I’ve experienced some bugs in Grails like runtime dependencies added to BuildConfig.groovy not being included in the war in previous releases which suggests me that its test coverage is not comparable with the Rails' one.
Then, if you search for books written entirely about tests for Rails, you’ll find lots of them:
Also, testing uses to be one of the first chapters in almost Rails book, reflecting the importance that Ruby and Rails users give to automated testing.
Also, there are tons of projects dedicated to some part of test creation for Ruby:
In the other side, I didn’t find a single book specialized in testing Grails applications. I’ve only seen a single small chapter about testing in Grails in some Grails books. Also, there are lots of great articles and tutorials on Rails testing while I can’t find good resources on Grails testing.
Since I prefer specifications over assertions, I started to write some tests for Grails with EasyB. But its documentation and features can’t be compared with the Rspec one. Also, I don’t find so many alternatives in the Groovy world yet. I have some problems with EasyB, but it was the best I could find and that’s what I’ve being using for testing Groovy and Grails code.
Also, while I can write unit tests for Rails that can actually touch the database, this is not possible with Grails. Grails will force me to use mocks in unit tests. But if part of the logic involves direct queries to the database, which is almost always my situation, then I’m forced to use integration tests for all my tests which, added to the slow boot time for Grails applications, make test writing a very slow task. Also, writing an integration test when actually I want to unit test my class just because of a Grails limitation doesn’t seem right for me.
Grails documentation is usually sparse with references to Hibernate’s documentation, Spring’s documentation, Shiro’s documentation etc. While I agree that using existent libraries is a good thing, I also like to see a well organized and comprehensive documentation instead of jumping between several sites, each one using a different documentation organization and style. Specially when most of them are crappy for my taste.
In the other side, I usually find great documentation for Rails and its several available plugins with concise information showing how to use them in a glance.
This seems to be changing in Grails 2.0, but for the last 2 years I’ve had enormous trouble writing Grails application because every change I make in my domain classes (which I do often), Grails will restart my application, loosing any session and spending a lot of time in the rebooting process. This really slows down the development time. This also happens to classes under src/ while doesn’t happen to controllers and GSP’s.
Compare the time of booting a fresh Grails application with booting a Rails one. Rails will make the application available barely instantly. This becomes more annoying when Grails will insist in rebooting after changing some classes and while the application gets bigger or when you do lots of processing in Bootstrap. In Rails, this is super fast in development mode because of the Ruby autoload feature that will allow you to lazily evaluate your classes.
Groovy API is based in Java API, which was badly designed in my opinion. Ruby, differently from Java, will have Date, DateTime and Time classes, for instance. Java, in the other side, has java.util.Date and java.sql.TimeStamp, etc. I’ve seen people arguing that it’s because Ruby is much newer, but actually both languages were born in 1995.
The Ruby API is also very well written in my opinion and has also great documentation. Everything fits great in Ruby while Groovy tries to make some methods simpler adding methods to standard Java classes but still it is built on top of Java’s API, which means it couldn’t be as well integrated and well-thought as one that was built specifically considering the language features from the beginning.
With regards to the language itself, I really prefer the Ruby way of monkey-patching (reopening classes) and its way of writing meta-programming. Specially, I love Ruby modules and the concept of mixins (instead of supporting multiple inheritance), while I don’t think there’s something like that in Groovy.
Also, I don’t understand why Groovy created a new syntax (“”“ - triple quotes) for multi-line strings instead of allowing multi-line strings using single quotes just like Ruby. On the other hand, I don’t like the fact that Ruby doesn’t support multi-line comment like most languages (no, don’t tell me that =begin and =end were really intended to be used as multi-line comments).
Ruby had RubyGems for a long time for managing dependencies and easily install gems (libraries, programs). There’s a huge repository of Ruby gems. Java has Maven, but Maven doesn’t allow you to specify “hibernate > 3.6”. You need to be specific.
And then, Maven will try to solve conflicts if you need a dependency that depends in Hibernate 3.6.5 and another one that depends on Hibernate 3.6.6. And Maven will not always be able to solve this dependency well.
In Ruby, suppose one gem depends on “hibernate >= 3.6” and another one depends on “hibernate = 3.6.6”. Then RubyGems will be able to choose hibernate 3.6.6. But what if your application depends on latest gem version? Than you don’t specify the version and it will fetch the last one. Then, say that some time has passed and another developer needs to replicate the dependencies. It wouldn’t be so uncommon that the newest version of one of the dependencies is not compatible anymore with that one used when the application was first developed. For solving this specific problem Rails had a rake task (rake rails:freeze) in its early times that would copy the gems to a vendor folder so that the application could be easily deployed anywhere. But that wasn’t a really good solution and then, some years ago, Yehuda Katz released Bundler, which solved this problem by writing a file that recorded all gem versions used in last “bundle” command which allowed that configuration to be replicated anytime without vendoring all gems.
Bundler is a great tool and all Rails application starting from Rails 3.0 use it for managing dependencies. I don’t know a similar handy project for Groovy.
The next version of Rails (3.1.0), soon to be released, will allow mounting some applications in certain paths that could interact with the main app. I guess Django supported this for a longer time, but Grails won’t support this feature in 2.0 as far as I know. This is also a great feature.
Unless JRuby is being used, you don’t need to previously allocate memory to your application. The memory will increase as it needs more memory. That means you can run lots of Rails application in the same time in your development environment without being concerned about limiting their memory before running the application. That usually means you have more free available RAM.
My first web applications were written in Perl about 15 years ago or more. While at Electrical Engineering college I didn’t have lots of web development spending most of my developing time with C and C++, working in embedded and real-time systems.
In 2007, I was back to web development and needed to update my knowledge. When I looked for web frameworks, I was evaluating mostly TurboGears, Django and Rails, after discarding MS .NET and Java-based ones. I didn’t know Ruby nor Python at that time so I wasn’t biased against any of them. The argument that I really bought while choosing Rails over the other alternatives was the database evolution approach. If I remember correctly, both TurboGears and Django used the same approach used by Grails. You write your domain classes and then generate the database tables based on these classes attributes. I didn’t like this approach at all because I was really concerned about database evolution. In the other hand, Rails supported database migrations and the model classes attributes didn’t have to be replicated since they would be dynamically fetched from the mapped database table at run-time during Rails initializatin. I really prefer this approach but database migrations only seems to be supported by the Grails framework itself in Grails 2.0, which wasn’t released yet by the time I’m writing this.
For a long time we used to “dbCreate=update” in DataSource.groovy and that is simple not maintainable. I hope Grails 2.0 will teach developers best practices like those used by Rails since always.
Regarding the framework API itself, I really prefer the Rails API. There are lots of useful DSLs, that I don’t find in Grails, specially for defining hooks like before_save, after_save, before_validation, etc. You can specify these hooks in many useful ways and calling them multiple times. Also, instead of static variables you have a declarative DSL for defining associations like has_many, belongs_to, etc. I also always found odd that Grails used closures instead of methods for controller’s actions, although this seems to have changed to better in next to be released Grails. Also, I like the fact that Rails generators will create controllers inherited from ApplicationController by default, which means you can add methods to the ApplicationController class if you want to add them to all controllers.
Also, Rails allow me to specify which layout to apply directly in the controller instead of in ERB (GSP equivalent). Also, I don’t need to write boilerplate code in my views like in GSPs.
I’m a Vim user and Vim support for Groovy indentation and code highlighting is terrible. In the other side, there’s good support for the Ruby language and the Rails framework.
Rails has always been worried for offering good default for web applications. This is specially true with security concerns. All text will be sanitized inside “<%= … %>” blocks unless explicitly said not to do that. In Grails you can do that, but that is not set by default and will only work with the “${…}” style, which can’t be always used as my long experience with Grails has showed. I’m not sure when they’re not allowed through because it never made sense to me… :( But it seems the problem is using this syntax in a nested context like “${[something, "abc: ${2 * someValue}”].join(‘<br/>’)}“ but I don’t remember exactly.
Another time-saving while writing Rails applications is that auto-complete works in the interactive console (irb) and the “delete” key works as expected in Linux, differently from “groovysh”. I’ve also opened an issue in JIRA presenting a patch to Jline to fix this annoyance that was also present with JRuby at that time. JRuby fixed the problem but groovysh still doesn’t behaves correctly with regards to the “delete” key.
The tab-completion will be also available while debugging a Ruby application using the ruby-debugger gem for instance. And I can even debug Ruby applications in Vim, my favorite editor. :)
Errors in GSP’s will display unrelated lines. Also, the stack-trace is so big when errors happen, as usual in Java applications, that a friend called them MonsterExceptions.
Both of them were said to be fixed for Grails 2.0 but I didn’t test it yet.
Rails errors on the other hand are very precise and easy to find the source of the error.
I remember that one of the oddest behavior I experienced while first learning Grails was that after fixing some piece of code that bug persisted and some while later it worked. It was the first time in my life as a programmer that I’ve seen such behavior. In Rails, when you change some code, the change will be in effect immediately or it won’t make effect at all until you restart your application depending on what you’re modifying. But since Java didn’t support listening to file-system events asynchronously until the recent Java 7, Java applications use to implement file-change monitoring using the poller method. So, it may take a while before your changes make effect and you’ll never know if the file was already recompiled or not.
Actually, I was expecting to write a more detailed article some years ago with more concrete examples but that would take some time and that’s the reason why I didn’t write it before. But, as I was replying the message by e-mail, the answer was becoming so big that I decided to write such an article even if it’s not the way I would like it to be. I hope I get some time in the future to polish it. Also, as I get some feedback from Groovy and Grails users and after Grails 2.0 is finally released, I intend to update this article to reflect the changes and any possible mistake that I could have made, as soon as I get some time.
So, sorry for the unpolished article, but that’s what I can currently write. I hope it can be useful anyway. So, good luck in your framework decision, whatever it be!
I have been willing to write such an article for a long time and finally found some inspiration and time for doing it.
No software is finished. Even Vi, which was created in 1976, is not finished. If no one is working in some software anymore it just means it is not being maintained or has been replaced by another one. That means your code will be changed or entirely replaced.
Unless you’re expecting your software to be replaced soon, you should consider writing maintainable code. It’s very important to your code to be readable and maintainable because most of the time developers will spend reading it. So, while knowing well your editor is important, you should consider spending more time refactoring your code to make it more readable than finding new ways of writing efficiently using your editor because a clean source code will save you much more time than any editor key mapping. But also, a good editor/IDE will also help you to refactor your code.
You should really apply the good advices present in all books about Agile Software Developing since it is the only way I know of writing software that actually works in the real world. I’ll not talk about Agile in this article, since it is out of the scope and there are also great books out there about this subject. I’m assuming that the reader is familiar with the subject though for better understanding this article. Here are the software writing guidelines that I’m talking about, although I won’t explain the reasoning behind them, as they are a bit long and all books and articles on the subject will explain them:
Since you’ll be writing code for today’s usage, sometime you’ll face the situation where you need to write a new feature that shares lots of implementation details of a prior feature. You shouldn’t be copying and pasting code from the prior feature. This seems obvious but if I didn’t often find code written that way I wouldn’t be talking about this. WARNING: whenever you find yourself copying and pasting some code, even for different projects, you should think twice. Most probably you should separate the common part in another method, class or library. Some languages will require some boilerplate code, but make sure you’re copying and pasting only the necessary boilerplate if that’s your case.
The commonest reason why developers don’t rafactor their code is because they’re afraid of breaking some critical production system. This is often related to the lack of a good suite of automated tests. Specially if your application is a critical production system, it should be covered by tests. The more you copy and paste code, the harder it will be to evolve the code base and understand it.
The same bug will also happen in multiple places in the source code and even if you fix it in some part of the code, the bug will show up again on Friday, 5pm, and you’ll have to cancel your weekend planned schedule to work hard to find a hidden bug that was already fixed in other part of the code but you don’t know that because you were not the one that fixed it. And people will be asking you why does it take so long for fixing the application under production in the most critical time where it shouldn’t really fail while presenting it to a big potential client corporation!
TODO: talk about test simplicity, coverage, documentation tool and careless about TDD or testing after. TODO: talk about test priorizing. TODO: talk about mocks and importance of speed and isolation of concerns
You should really keep your code minimal to be polite with the other developers that will work in your code some time later. Maybe that developer will be you again. Having small methods, classes and files will help reading the code without the need of scrolling the text. Also, some editors like Vim allow you to display multiple source files at the same time. Having small methods will help you to understand the overall code.
TODO: talk about spending time thinking in good names
TODO: talk about how comments can be avoided with clean code
TODO: Compare C++ and Java to dynamic languages like Ruby, Python or Groovy TODO: talk about tradeoffs and performance concerns vs development speed TODO: talk about legacy Java code and JRuby, Groovy, Scala, Clojure and JPython. TODO: also talk about network-based API integration
TODO
TODO
TODO: Avoid uncommon solutions and complicated architectures
TODO: Give preference to common network based APIs
TODO: you can apply or not but it’s important to understand it
TODO: explain differences between writing end-software and libraries and talk about tests here
TODO: talk about Java, setters/getters, private/protected/public, interfaces and its abuse
TODO
TODO: and invest time learning it
TODO: return if exceptional_case
TODO: talk about the try-catch approach and the type of applications (libraries, unsaved data) as well as about tests.
TODO
TODO: talk about nested if’s, while’s and alternatives like catch-throw
TODO: talk about <=> and how to deal with its lack in some languages. Sort should return -1, 0 or 1. TODO: talk about wrong usage of sort for getting max and min.
TODO
TODO
TODO
TODO: talk about synchronized methods
TODO: talk about language vs architecture, and concerning before due time or without benchmark/profiling.
TODO: talk about simple web APIs and queuing systems for integrating applications in possibly different languages. TODO: Avoid writing language or vendor specific solutions
TODO: It does happen in Java. Talk about unbounded in-memory cache.
In 2009, I wrote an article for the Rails Magazine Issue #4 - The Future of Rails - where I presented an alternative to PDF generation from ODF templates, which can be generated using a regular text processor such as OpenOffice.org or Microsoft Office (after converting the document to ODF).
You can read the entire article downloading this magazine for free or purchasing it. The application code illustrating this approach was published by the magazine on Github.
Unfortunately, I can’t host a working system providing a live demonstration due to my Heroku account limitations, but it should be easy to follow the instructions in the article on your development or production environment.
Do not hesitate in sending me any questions, through comments on this site or by e-mail, if you prefer.