New Geo Features in MongoDB 2.4
Motivation
Geometric processing as a field of study has many applications, and has resulted in lots of research, and powerful tools. Many modern web applications have location based components, and require a data storage engines capable of managing geometric information. Typically this requires the introduction of an additional storage engine into your infrastructure, which can be a time consuming and expensive operation.
MongoDB has a set of geometric storage and search
features
. The
MongoDB 2.4 release
brought several improvements to MongoDB’s existing geo capabilities and the introduction of the
2dsphere
index
.
The primary conceptual difference (though there are also many functional differences) between the
2d
and
2dsphere
indexes, is the type of coordinate system that they consider.
Planar coordinate systems
are useful for certain applications, and can serve as a simplifying approximation of spherical coordinates. As you consider larger geometries, or consider geometries near the meridians and poles however, the requirement to use proper
spherical coordinates
becomes important.
In addition to this major conceptional difference, there are also significant functional differences, which are outlined in some depth in the
Geospatial Indexes and Queries
section of the MongoDB documentation. This post will discuss the new features that have been added in the 2.4 release.
What’s New
Storing non-point geometries
Unlike the
2d
index, which only allowed the storage of points, the
2dsphere
index allows the storage and querying of points, lines, and polygons. To support the storage of different geometries, instead of introducing a proprietary format, MongoDB conforms to the
GeoJSON
standard. GeoJSON is a collaborative community project that produced a specification for encoding entities in JSON. It has garnered significant support, including the
OpenLayers project
,
PostGIS
, and has growing language support for
python
and
ruby
.
Here are a few simple examples of GeoJSON embedded documents:
A BSON Document with a GeoJSON Point embedded in the
geo
field:
{
geo: {
type: “Point”,
coordinates: [100.0, 0.0]
}
}
A BSON Document with a GeoJSON LineString embedded in the
geo
field:
{
geo: {
type: “LineString”,
coordinates: [ [100.0, 0.0], [101.0, 1.0] ]
}
}
A BSON Document with a GeoJSON Polygon embedded in the
geo
field:
{
geo: {
type: “Polygon”,
coordinates: [
[ [100.0, 0.0], [101.0, 0.0],
[101.0, 1.0], [100.0, 1.0],
[100.0, 0.0] ]
]
}
}
Note: A GeoJSON Polygon’s coordinates are an array of arrays of point specifications. Each array of point specifications should have the same starting and ending point to form a closed loop. The first array of point specifications defines the polygon’s exterior geometry, and each subsequent array of point specifications defines a “hole” in the polygon. Polygons should be non self-intersecting, and holes should be fully contained by the polygon.
Inclusion searches on a sphere
The new
$geoWithin
operator, which takes a Polygon geometry as a specifier, returns any geometries of any type that are fully contained within the polygon. It will work well without any index, but must look at every document in the collection to do so.
Intersecting geometries on a sphere
The new
$geoIntersects
operator, which takes any geometry as a specifier, returns any geometries that have a non-empty intersection with the specifier.
$geoIntersects
also works well without an index, and must also look at each document in the collection.
Better support for compound indexes
The
2d
index can only be used in a compound index if 1. it is the first field, 2. there are exactly two fields in the compound index, and 3. if the second field isn’t a
2d
index.
2dsphere
indexes aren’t limited in this way, which allows us to pre-filter based on a non-geo field - which is often more efficient.
Consider the following queries: Find me Hot Dog Stands in New York state i.e. use a compound index: (business_type, location). Find me geometries in New York state that are Hot Dog stands i.e. use the compound index: (location, business_type)
The first query will be much more efficient than the second, because business_type is a simple text field, and greatly reduces the set of geometries to search.
Additionally, we can have multiple
2dsphere
indexes in the same compound index. This allows queries like: “Find routes with a start location within 50 miles from JFK, and an end location within 100 miles of YYC”.
How it Works
Everything starts when you insert a geometry into a
2dsphere
index. We use the open source
s2
C++ library
from google to select a minimal set of cells that fully cover a geometry. This set of grid cells is called a covering, and the size of the cells is dynamic (between 500m and 100km on a side) based upon the size of the polygon being covered.
fig 3 - A very low granularity covering of the entire United Kingdom
fig 4 - A fairly granular covering of the A4 around Trafalgar Square.
Each cell in these coverings is now added to a standard B-tree index, with a key that is easily calculable by the location on surface of the sphere - more granular(smaller) cells will have the same prefix as a larger cell that occupies the same area of the surface of the sphere.
Intersection & Within searches
Finding geometries that may be intersecting or within a search polygon becomes as easy as generating a covering for the search specifier, and for each cell in that covering, query the B-tree for any geometries that interact with these cells. Once the list of possibly interacting geometries has been retrieved from the index, each geometry in checked in turn to see if it should be included in the result set.
Near searches
The near search provided by the
$near
operator is implemented by doing
$within
searches on concentrically growing donuts (circular polygons with with circular holes).
We encourage user feedback and testing on these new Geo features and are excited to see what the community builds.
Map images ⓒ
OpenStreetMap
contributors, licensed under the Creative
Commons Attribution-ShareAlike 2.0
license (CC-BY-SA).
Map data ⓒ
OpenStreetMap
contributors, licensed under the
Open Data Commons Open Database License
(ODbL).
May 21, 2013