Speeding up SOLR indexing

We’re finishing up a project that uses SOLR and acts_as_solr extensively. Our dataset isn’t large, but it’s non-trivial as well. We’re managing a search index with a little over 100,000 documents in it. Each document has indexes on 12 fields. In general, our search performance is outstanding. We get results almost instantly. Our indexing performance is another story. I’m not a big fan of premature optimization, but when a full reindex takes 4 days, something needs to change</p>

There’s some good advice on the web for pulling indexing out of a request flow. Unfortunately for us, this wasn’t going to help. Our updates would take several days to process. This delay makes testing changes to search incredibly painful. After much googling, I found a number of people that recommended allowing SOLR to manage its own commits. It took me a few tries to get this working. In the end, I was making it much harder than it had to be. Disabling autcommit really only takes two steps.

First, update your call to acts_as_solr to disable autcommit:

class MyIndexedModel < ActiveRecord::Base
  acts_as_solr :fields=>[:name,:body], :auto_commit=>false
end

With just that change our index performance went from 2 seconds per record to 50 records per second. A 100x speed up. Unfortunately, our changes weren’t showing up in the index. SOLR doesn’t add indexed records to the database until a commit is done. SOLR does provide an easy way to tell it to manage the commits itself. To enable this, edit vendor/plugins/acts_as_solr/solr/solr/conf/solrconfig.xml There is commented out configuration for autoCommit Remove the XML comments around this area. We use the maxTime parameter to have SOLR update the index every minute. Our configuration looks like:

...
  <updateHandler class="solr.DirectUpdateHandler2">

    <!-- A prefix of "solr." for class names is an alias that
         causes solr to search appropriate packages, including
         org.apache.solr.(search|update|request|core|analysis)
     -->

    <autoCommit>
      <maxDocs>10000</maxDocs>
      <maxTime>60000</maxTime>
    </autoCommit>
...

That’s all there is to it! We get 100x faster indexing and our updates still show up within a minute. Suddenly I don’t dread making changes to our indexed models!