Speeding up location based searches

With the popularity of Google Maps, more and more web applications are becoming location aware. Rails already has some great tools for working with geodata, such as GeoKit. We’ve been happily using these tools for years. While GeoKit has some basic optimizations, we recently found a case where lookups were taking close to 8 seconds. We needed a way to speed this up. Read on to learn how we did it.

First, let me explain our domain. We are working on a site that shows the 25 nearest amenities to a location. The application is focused on San Francisco and the surrounding areas. We store the amenities in a table with columns for latitude and longitude. All told, we search about 800,000 amenity records.

To find the nearest 25 amenities, our database has to do a lot of work. For each listing in the database, we need to calculate the distance from our origin. Performing that calculation would be unbearably slow, so geokit speeds up the query by including a square bounds. First, the list of locations is limited by a square 5 miles on a side (we use a big window to make sure we get amenities in more rural areas.) Once we’ve limited our dataset, the results are sorted by distance. Unfortunately, due to the amenity density in San Francisco, we still have to go through 10,000 records on average.

This calculation and sort is just slow. In version 5.1, MySQL includes some geospatial extensions. These extensions include datatypes for locations and spatial indexes to make spatial searches much faster. Unfortunately, ActiveRecord doesn’t support these extensions. That means we’ll have to do some things manually.

First, let’s look at the three primitives we’ll use. The first is a Point Points represent a location in a 2D space. We’ll use Points to represent the location of an amenity by it’s latitude and longitude coordinates. In MySQL, you can create a point using the SQL: GeomFromText('Point(38.567 -122.576)')

The second primitive is a Polygon Polygons include a list of coordinates that make up the vertices. Polygons must be closed. For example, we could create the polygon GeomFromText('Polygon((0 0,0 1,1 1,1 0,0 0))') We’ll use a Polygon to represent the boundary inside which we want to search.

The final primitive is a Line We use a Line to calculate the distance between two points. According to the OpenGIS specification, we should be able to use the distance function for this. Unfortunately, MySQL doesn’t implement Distance.

We’ll also need to create a spatial index. A spatial index is an r-tree index that is optimized for fast distance calculations. Currently, only MyISAM tables support spatial indexes. They also require that the column be defined as NOT NULL. We’ll have to keep that in mind as we go along.

Let’s start by adding a Point column to our amenities table. Because ActiveRecord doesn’t support the GIS extensions, we’ll need to use the execute method inside our migration. This looks like:

    execute 'alter table amenities add lat_lng_point point not null'

Because we want to use Spatial indexes, we’ll also need to convert our amenities table to MyISAM.

    execute 'alter table amenities engine myisam'

With that in place, we’ll need to convert our existing data to include the lat_lng_point column. From what I can tell, Points can only be created using the GeomFromText method. That means we’ll need to generate the text representation of a Point during our conversion. This can be done with the following SQL”

update amenities set lat_lng_point = GeomFromText(concat('Point(',lat,' ',lng,')'))

At this point, we now have a table with the location data being stored as a Point. Since all of our records have a Point, we now meet the criteria for adding our spatial index:

    execute 'create spatial index amenities_spatial_idx on amenities(lat_lng_point)'

With that in place, we can need a method that does a few thing. First, we need to compute a bounds to limit our distance calculation. We’ll do that using GeoKit::Bounds Once we have that, we need to write a query that uses the MBRWithin function (MBR stands for Minimum Bounding Rectangle) to find the list of points inside our Bounds. Once we have that, we can use a Line to get the distance between our origin and each point and order by that. The end result is a little ugly:

  
  def self.near(mappable)
    bound = GeoKit::Bounds.from_point_and_radius(mappable,5)
    sql = <<SQL
    select *,  GLength(LineStringFromWKB(LineString(AsBinary(lat_lng_point), 
    (AsBinary(GeomFromText('Point(#{mappable.lat} #{mappable.lng})')))))) * 69 as distance
    from amenities where MBRWithin(lat_lng_point,
     GeomFromText('Polygon((#{bound.ne.lat} #{bound.ne.lng}, #{bound.ne.lat} #{bound.sw.lng},#{bound.sw.lat} #{bound.sw.lng},#{bound.sw.lat} #{bound.ne.lng},#{bound.ne.lat} #{bound.ne.lng}))')) 
    order by distance asc limit 25;
SQL
    find_by_sql(sql)
  end

It’s ugly, but it is blazingly fast! Our query went from 8s to 0.01s. That’s definitely worth a little code ugliness.

While we’ve handled the inital data conversion and querying, we can’t add data to our table. We need to provide a Point for each record, and there’s no way to do that using ActiveRecord. Instead, we’ll need to use a MySQL trigger.

    execute <<EOT
      create trigger set_lat_lng_point 
      before insert on amenities for each row 
      set  new.lat_lng_point = GeomFromText(concat('Point(',new.lat,' ',new.lng,')'));
EOT

That code will create a Point from the lat and lng when a record is inserted. If you need to be able to update locations, you could make the trigger fire on update as well. You’ll want to install the Trigger Happy plugin so that your triggers are dumped and included in your test database.

After all that, can you believe we’re still not quite done? Due to a bug in MySQL, inserting a row where lat_lng_point is NULL will give an error. It turns out that MySQL checks the not null columns before triggers are run. Luckily, there is a workaround. If you don’t explicitly set the column value to NULL, the trigger will be run before the value is checked. To get ActiveRecord to not explicitly set the column to NULL, we’ll use a bit of a hack.

  # Remove lat_lng_point from the list of attributes
  # otherwise, AR writes it as NULL which keeps the trigger from firing
  # and setting the value correctly  
  def before_save
    @attributes.delete("lat_lng_point")
  end

That code will remove the column for AR’s list of attributes.

Obviously, this is somewhat complicated. If your searches are running okay using just normal SQL, I wouldn’t recommend changing. If you need high performance and are searching large amounts of data, this little bit of code can make a huge difference.

# Our entire migration
class ConvertAmenitiesToUseSpatials < ActiveRecord::Migration
  def self.up
    execute 'alter table amenities add lat_lng_point point not null'
    execute 'alter table amenities engine myisam'
    execute 'update amenities set lat_lng_point = GeomFromText(concat(\'Point(\',lat,\' \',lng,\')\'))'
    execute 'create spatial index amenities_spatial_idx on amenities(lat_lng_point)'
    execute "create trigger set_lat_lng_point before insert on amenities for each row set  new.lat_lng_point = GeomFromText(concat('Point(',new.lat,' ',new.lng,')'));"
  end

  def self.down
    execute 'alter table amenities drop lat_lng_point'
  end
end