Future Objects in Ruby

As asynchronous processing becomes more popular, new techniques are needed to simplify our code. I’ve found a variant of the future object to be a useful technique in environments where processing might be moved from inline to a batch environment.

While people define future objects in different ways, I’ve typically defined them as an object which is returned immediately from a method call, but whose value isn’t defined until later. I’ll be describing one type of future. For another, you can take a look at this post by John Pignata.

To set the stage, let me show an example of code where I used future objects. This is a slightly abstracted and simplified example from a real client application. In this application, we dealt with somewhat large sets of objects, movies in this example, which needed to be filtered and scored on some criteria. For our implementation, we used Redis ordered sets due to the blazing speed in which Redis performs set manipulations.

In a typical case, we might do something like:

  • Take a filtered set of movie IDs
  • From that list, remove movies that a user owns or has rated
  • From that list, alter the weightings based upon some data we know about the user
  • Alter the resulting weightings again based upon the number of times a movie has been shown to the user
  • Take the top 10 highest rated movie ids

In psuedo code, this might look like:

def recommend_movies_for_user_from_base_set(user, base_set )
  base_set.without(owned_or_rated_movie_set_for_user(user)).
  adjust_weight_by(preference_set_for_user(user)).
  adjust_weight_by(recommendation_views_set_for_user(user)).
  top_n_results(number_to_show)
end

This code worked well and performed wonderfully until the day we decided to create a page that showed recommendations for ten different genres at once. That code looked like:

def recommendations_for_user_in_genres(user, genres)
  genres.map do |genre|
    recommend_movies_for_user_from_base_set(user, base_set_for_genre(genre))
  end
end

Suddenly, we found our performance was unacceptable. When we profiled our application, it turned out that the Redis set processing wasn’t the bottleneck. The bottleneck was the network side of things. To generate recommendations for 10 genres, we did 40 roundtrips to Redis, resulting in significant delay.

The fix seemed simple. Redis has a pipeline method that executes allows you to queue up a bunch of statements and send them Redis in a batch. With that in mind, our simple controller method looked like:

def index
  redis.pipelined do
    ids = recommendations_for_user_in_genres(current_user, all_genres)
  end
  @recommendations = recommendations_for_ids(ids)
end

This seemed really simple, but unfortunately it didn’t work. All of our calls to Redis inside the pipeline block were returning nil. Instead, all of the values were returned from the call to pipelined as an array.

This was a mess. To get the results of our calls, we would need to know how many calls to Redis we made and which array values to extract. It meant that we would need to write result parsing code that knew quite a bit about the internals of how we implemented without, or adjust_weight_by. In the end, we did it because we needed the performance increase. Things worked okay until we found an edge case in our implementation of without that required us to make two Redis calls in place of one. Suddenly, an internal change in our set code required a change to our result parsing code. This just felt wrong.

Having already implemented future objects in the Facebooker library, this seemed like a perfect place to add them to redis-rb. In this case, each call to Redis inside a pipeline block would return a future object that would get its value set when the block completes. For example:

  result = nil
  redis.pipelined do
    result = one_through_four_set.highest_value()
    # At this point, result is an object, but it's value hasn't been set
  end
  result # this is now the value 4

With this in place, none of our code needed to change when it was called inside a pipelined block. Inside the block, @recommendations is set to an array of future objects. Once the block is complete and Redis has executed the pipeline, the results are parsed and the values of the future objects are set.

The implementation itself is basically a proxy object. Redis returns an instance of a class that acts mostly as a proxy, sending all messages on to the value that is set at the end of the pipeline. For safety purposes, the future object raises an exception if you try to access it before the value is set. The implementation turns out to be quite simple:

class PipelineResult
  instance_methods.each { |m| undef_method m unless m =~ /(^__|^nil\?$|^send$|proxy_|^respond_to\?$|^new|object_id$)/ }
  def initialize
    @result     = nil
    @result_set = false
  end

  def result=(result_object)
    @result_set = true
    @result = result_object
  end

  def respond_to?(name)
    super || @result.respond_to?(name)
  end

  def ===(other)
    other === @result
  end

  def method_missing(name,*args,&proc)
    if !@result_set
      raise UnexecutedRequest.new("You may not access the result until the pipeline has been executed")
    else
      @result.send(name,*args,&proc)
    end
  end
end

Let’s quickly walk through that. This first bit goes through an undefines a large number of default methods on the object. In the end, it leaves only things that start with __, nil?, send, respond_to?, new, object_id and proxy_ methods.

In our initialize method, we default the result to nil and set a flag that tells us that we haven’t set a value yet. (Because Redis can return a nil value, we can’t use the presence of @result as an indicator that a value is set.)

Next, we have a result= method to allow setting the resulting value. The next two methods are just some sugar to make respond_to? and comparisons work once a value is set. The final method does the actual proxying if a value is set and raises an exception if not.

All in all, it’s a pretty small little class but one that significantly simplified out code.

After running this for a month or so, I decided to submit it as a pull request to redis-rb.

So What Happened?

As of version 3.0, redis-rb supports future objects. After some discussion, the author decided to use a slightly different implementation. The implemented version requires that you call value on the returned future object. This is a huge improvment over the old behavior, but unfortunately it means you can’t transparently wrap code in a piplined block and have it still function.