Geometric processing as a field of study has many applications, and has resulted in lots of research, and powerful tools. Many modern web applications have location based components, and require a data storage engines capable of managing geometric information. Typically this requires the introduction of an additional storage engine into your infrastructure, which can be a time consuming and expensive operation.
MongoDB has a set of geometric storage and search features. The MongoDB 2.4 release brought several improvements to MongoDB’s existing geo capabilities and the introduction of the
The primary conceptual difference (though there are also many functional differences) between the
2dsphere indexes, is the type of coordinate system that they consider. Planar coordinate systems are useful for certain applications, and can serve as a simplifying approximation of spherical coordinates. As you consider larger geometries, or consider geometries near the meridians and poles however, the requirement to use proper spherical coordinates becomes important.
In addition to this major conceptional difference, there are also significant functional differences, which are outlined in some depth in the Geospatial Indexes and Queries section of the MongoDB documentation. This post will discuss the new features that have been added in the 2.4 release.
Storing non-point geometries
2d index, which only allowed the storage of points, the
2dsphere index allows the storage and querying of points, lines, and polygons. To support the storage of different geometries, instead of introducing a proprietary format, MongoDB conforms to the GeoJSON standard. GeoJSON is a collaborative community project that produced a specification for encoding entities in JSON. It has garnered significant support, including the OpenLayers project, PostGIS, and has growing language support for python and ruby.
Here are a few simple examples of GeoJSON embedded documents:
A BSON Document with a GeoJSON Point embedded in the
coordinates: [100.0, 0.0]
A BSON Document with a GeoJSON LineString embedded in the
coordinates: [ [100.0, 0.0], [101.0, 1.0] ]
A BSON Document with a GeoJSON Polygon embedded in the
[ [100.0, 0.0], [101.0, 0.0],
[101.0, 1.0], [100.0, 1.0],
[100.0, 0.0] ]
Note: A GeoJSON Polygon’s coordinates are an array of arrays of point specifications. Each array of point specifications should have the same starting and ending point to form a closed loop. The first array of point specifications defines the polygon’s exterior geometry, and each subsequent array of point specifications defines a “hole” in the polygon. Polygons should be non self-intersecting, and holes should be fully contained by the polygon.
Inclusion searches on a sphere
$geoWithin operator, which takes a Polygon geometry as a specifier, returns any geometries of any type that are fully contained within the polygon. It will work well without any index, but must look at every document in the collection to do so.
Intersecting geometries on a sphere
$geoIntersects operator, which takes any geometry as a specifier, returns any geometries that have a non-empty intersection with the specifier.
$geoIntersects also works well without an index, and must also look at each document in the collection.
Better support for compound indexes
2d index can only be used in a compound index if 1. it is the first field, 2. there are exactly two fields in the compound index, and 3. if the second field isn’t a
2dsphere indexes aren’t limited in this way, which allows us to pre-filter based on a non-geo field - which is often more efficient.
Consider the following queries: Find me Hot Dog Stands in New York state i.e. use a compound index: (business_type, location). Find me geometries in New York state that are Hot Dog stands i.e. use the compound index: (location, business_type)
The first query will be much more efficient than the second, because business_type is a simple text field, and greatly reduces the set of geometries to search.
Additionally, we can have multiple
2dsphere indexes in the same compound index. This allows queries like: “Find routes with a start location within 50 miles from JFK, and an end location within 100 miles of YYC”.
How it Works
Everything starts when you insert a geometry into a
2dsphere index. We use the open source
s2 C++ library from google to select a minimal set of cells that fully cover a geometry. This set of grid cells is called a covering, and the size of the cells is dynamic (between 500m and 100km on a side) based upon the size of the polygon being covered.
fig 3 - A very low granularity covering of the entire United Kingdom
fig 4 - A fairly granular covering of the A4 around Trafalgar Square.
Each cell in these coverings is now added to a standard B-tree index, with a key that is easily calculable by the location on surface of the sphere - more granular(smaller) cells will have the same prefix as a larger cell that occupies the same area of the surface of the sphere.
Intersection & Within searches
Finding geometries that may be intersecting or within a search polygon becomes as easy as generating a covering for the search specifier, and for each cell in that covering, query the B-tree for any geometries that interact with these cells. Once the list of possibly interacting geometries has been retrieved from the index, each geometry in checked in turn to see if it should be included in the result set.
The near search provided by the
$near operator is implemented by doing
$within searches on concentrically growing donuts (circular polygons with with circular holes).
We encourage user feedback and testing on these new Geo features and are excited to see what the community builds.
Map images ⓒ OpenStreetMap contributors, licensed under the Creative Commons Attribution-ShareAlike 2.0 license (CC-BY-SA).
Map data ⓒ OpenStreetMap contributors, licensed under the Open Data Commons Open Database License (ODbL).