Speeding up delayed job

Delayed Job is a fantastic plugin for moving work out of the request flow. Unfortunately, it has a few performance problems when running with large numbers of jobs on MySQL. With minor changes, we can remove a large amount of the overhead.

I noticed the performance problem on one of our applications when MySQL was taking large amounts of CPU. The application had been in production for about a year and had been performing well. I had just recently moved image resizing out of the main workflow and into delayed job. When I went to look at the slow query log, I saw this being run frequently:

SELECT * FROM `delayed_jobs` WHERE ((run_at <= '2009-11-04 06:58:49' AND (locked_at IS NULL OR locked_at < '2009-11-04 02:58:49') OR (locked_by = 'host:example.com pid:17731')) AND failed_at IS NULL)  ORDER BY priority DESC, run_at ASC LIMIT 5;

When reading the output of the slow query log, I look mainly at two things. The first is the query time. In this case, the query was being run in less than a second. The second thing I look at is the ratio of the number of rows examined to the number of rows sent back to the app. In this case, the database had to look at 67,604 rows just to give the app 5 rows back. When you see full table scans like this, it is usually a sign of a missing index. I looked at the delayed jobs table and saw that there was only an index on ID. That means the DB was reading every row in the table for each query.

Once I realized that I was seeing a full table scan, I knew I needed to add an index. Normally, when you add indexes to a table, you’re doing it to speed up the row selection performed by the where clause. That isn’t a good idea here. Delayed job removes successfully completed records by default, and we have very few updates that fail. This means that the vast majority of rows in the table will match the where clause. When this happens, you typically want to add an index to speed up the sort instead of the where clause.

This query sorts on two fields, priority and run_at. We can easily add an index there, but notice the ordering on the sort. It sorts by priority in descending order. Because of the way database indexes work, they are typically only used for ascending order sorts. If you were to create an index on delayed_jobs(priority,run_at) the database won’t be able to use the index. (You can try it out. Add the index and explain the query.

Some databases, like Oracle, allow you to create an index with a specified ordering. For example, you can create an index on delayed_jobs(priority desc, run_at asc) which will then allow the query to use the index. While that syntax is supported by MySQL, the ordering is ignored. That means MySQL still won’t be able to use the index.

So how did I solve the problem? Simple. I changed the semantics of priority. By default, Delayed Job expects a higher value of priority to be more important. I decided to assign lower numbers to mean more important (similar to Unix nice values.) Once I changed the way I assigned priorities, I simply changed the query to order by priority ASC, run_at ASC and added an index on delayed_jobs(priority,run_at). Now, when the query is run, the log looks like:

SELECT * FROM `delayed_jobs` WHERE ((run_at <= '2009-11-04 06:58:49' AND (locked_at IS NULL OR locked_at < '2009-11-04 02:58:49') OR (locked_by = 'host:example.com pid:17731')) AND failed_at IS NULL)  ORDER BY priority ASC, run_at ASC LIMIT 5;

That quickly reduced the load on the system and improved application performance.