Preloading summary data using ActiveRecord

It seems like every application we build has at least one screen that needs to access a lot of data. For example, if we were building a web application for a school we would likely have a page for teachers that shows all of their students including their current grade, number of assignments turned in on time and attendance. Sadly, this typically results in a slow page load. Let’s look at how we can make this type of page run quickly without adding much complexity.

Let’s set up a couple of example models to demonstrate this case. We’ll have a student model. Our student will have a list of absences, a list of assignments and a list of grades. Our model could look something like:

class Student < ActiveRecord::Base
  has_many :assignments
  has_many :grades, :through => :assignments
  has_many :absences
end

In our output, we’ll want to show a tabular view of information for each student. A first cut would probably look something like:

  <% for student in @teacher.students %>
    <tr>
      <td><%= student.name %> </td>
      <td><%= student.absences.count %> </td>
      <td><%= student.assignments.select {|a| a.on_time?}.size / student.assignments.size.to_f rescue nil %> </td>
      <td><%= student.grades.sum(:score)/student.grades.count %> </td>
   </tr>
  <% end %>

That code works, but there are a couple of obvious problems with it. First, we have a lot of logic in our views. Second, we’ll run 4 queries for each user: 1 to count absences, 1 to get the number of on_time assignements, 1 to get the total score and one to count the number of grades. This is okay for viewing 10 students, but performance will become a serious problem when looking at a couple of hundred students.

So how do we clean this up? Let’s start by cleaning up the logic in the views. Hopefully that will make a solution more apparent. The first step is to create methods on student for each of those items. Instead of calling student.absences.count in the view, we can add a student.times_absent method. We can do the same for the percent of on time assignments and for the average score. Our code then becomes:

  <% for student in @teacher.students %>
    <tr>
      <td><%= student.name %> </td>
      <td><%= student.times_absent %> </td>
      <td><%= student.percent_assignments_on_time %> </td>
      <td><%= student.average_score %> </td>
   </tr>
  <% end %>

Our implementation of those methods will look familiar:

class Student < ActiveRecord::Base
  has_many :assignments
  has_many :grades, :through => :assignments
  has_many :absences
  
  def times_absent
    absences.count
  end
  
  def percent_assignments_on_time
    student.assignments.select {|a| a.on_time?}.size / student.assignments.size.to_f rescue nil
  end

  def average_score
    student.grades.sum(:score)/student.grades.count
  end 

end

That cleaned up our code, but it didn’t do anything for our performance. We could try to use include to speed up our query. If we had 200 students each with 200 assignments, this would involve us running a find that returns 200*200 rows. Another solution to our performance problem is to use a raw sql query. We can write a sql query that will give us the percent_assignments_on_time and the average score for our whole list of users at once. I normally avoid doing this because then I can’t lose the abstraction that active record provides. What if we could keep the same interface but use a sql query to preload the data? That would be the best of both worlds. We can do this using memoization. To implement this, we’ll start by adding memoization to our methods:

  def average_score
    @average_score ||= student.grades.sum(:score)/student.grades.count
  end 

Next, we’ll create a method that will retrieve and set the average score for each user that we care about. It will look something like:

  attr_accessor :average_score, :percent_assignments_on_time
  def self.load_assignment_analytics!(students)
    sql =<<-SQL
      select u.id, 
             avg(score) as cached_average_score,
             sum(on_time)/count(*) as cached_on_time_percent
      from students u
       join assignments a on u.id = a.user_id
       join grades g on g.assignment_id = a.id
      where u.id in (?)
      group by u.id
    SQL
    assignment_info = Student.find_by_sql([sql,false,users.map(&:id)])
    user_map = assignment_info.index_by(&:id)
    students.each do |student|
      rec = user_map[student.id] 
      if rec.nil?
        student.percent_assignments_on_time = "None"
        student.average_score = "None"
      else
        student.percent_assignments_on_time = rec. cached_on_time_percent.to_f
        student.average_score = rec. cached_average_score.to_f
      end
    end
    
  end

That methods runs our query for all users that we pass in. For each user, it finds the result from our query. If no result exists (because they don’t have any assignments or grades) it will set the values to “None”. (If we set them to nil or false our query would be re-run in the interface because of the ||= line. If there are results for a user, we set the value on the object.

Now, when our view iterates over the list of users, it will use the previously stored data. The interface remains the same but our performance improves just by adding a single call to Student.load_assignment_analytics!(teacher.students) in our controller. If we forget to call this method, the view will still work, it will just run more queries. This is definitely less DRY than our original code, as we implement the logic for our counts in two ways. We could redefine our average score method to always call load_assignment_info!, but that might be overkill. There are also a couple of gotchas. Because we use Student.find_by_sql to fetch our data, we need to make sure that we name our calculated constants something that isn’t a method name on student. If you don’t, the method will be called.

All in all, I think this is a relatively elegant and simple solution to a complex problem.