Principle of Least Surprise

I’ve been programming for a long time now. One of the benefits that comes with experience is that I’ve made a lot of mistakes and tend to have a good idea what not to do. Some mistakes, however, I find myself making time and time again. Those mistakes typically come from a violation of the principle of least surprise.

First, to set the stage, I’m working on a a bulk data import process. Users can import data for several different objects by uploading a csv file. Many of these files are quite large, so we want to process them in the background. Due to the number of ways that users can break a spreadsheet upload (as I’ll talk about shortly) I want to make sure I save the uploaded files and have a process for repeating the import to allow me to debug any problems that arise. This lead to the following structure:

  • A number of Importer classes that know how to validate and import a row of a csv.
  • A DataImport class that holds the uploaded file, enough information to create an instance of Importer, and a results object.
  • A background job (using resque) that will run the import.

The DataImport uses serialization extensively. For example, the results object is serialized so that we can display information including the number of valid and invalid rows, any errors that occur and also the data for errored rows. A list of options needed to initialize the Importer object are also stored serialized on the DataImport.

This has worked fine in production, but there were some problems in development mode. While I was working on the import, I would frequently have background imports fail because the wrong number of parameters were passed to the Importer object. When I looked into it, this was caused by ActiveRecord failing to unserialize my attributes.

Somewhat surprisingly, in some cases, ActiveRecord was returning the yaml string for my options attribute instead of an unserialized object. Instead of raising an exception, ActiveRecord just returned a string value. This isn’t at all what I expected to happen. Because I assumed that I would be notified of errors, I wasn’t checking whether the returned value was a YAML string. I just passed the object on into the program where it would fail in an obscure way much later on.

Once I realized what the problem was, I was able to read the source code to see what was happening. Doing that I learned that ActiveRecord swallows a number of generated exceptions. I was able to debug and figure out that the issue was caused by classes in the serialized object that weren’t loaded by Rails. To fix it, I just needed to use require_dependency to make sure all of the serialized classes were loaded. With that done, my serialization and unserialization now work as expected.

Of course, this wouldn’t have cost me so much time if Rails didn’t silently ignore failures in unserialization. If a developer asks to serialize an attribute, and that attribute can’t be unserialized, returning the serialized version is a clear violation of the principal of least surprise. In this case, raising an exception is a much better decision.

Update

In the open source tradition, this bug annoyed me enough to submit a fix. Due to this pull request the issue should be fixed in Rails 4.0.