Cross Cluster Search Using Atlas Search and Data Federation
Rate this article
, allows you to form federated database instances that span multiple data sources like different Atlas clusters, AWS S3 buckets, and other HTTPs sources. Now, one application or service can work with its individual cluster with dedicated resources and data compliance while queries can run on a union of the datasets. This is great for analytics or those global view dashboards and many other use cases in distributed systems.
is also an emerging product that allows applications to build relevance-based search powered by Lucene directly on their MongoDB collections. While both products are amazing on their own, they can work together to form a multi-cluster, robust text search to solve challenges that were hard to solve beforehand.
Plotting attributes on a map based on geo coordinates is a common need for many applications. Complex code needs to be added if we want to merge different search sources into one data set based on the relevance or other score factors within a single request.
With Atlas federated queries run against Atlas search indexes, this task becomes as easy as firing one query.
In my use case, I have two clusters: cluster-airbnb (Airbnb data) and cluster-whatscooking (restaurant data). For most parts of my applications, both data sets have nothing really in common and are therefore kept in different clusters for each application.
However, if I am interested in plotting the locations of restaurants and Airbnbs (and maybe shops, later) around the user, I have to merge the datasets together with a search index built on top of the merged data.
As mentioned above, the two applications are running on two separated Atlas clusters due to their independent microservice nature. They can even be placed on different clouds and regions, like in this picture.
The restaurants data is stored in a collection named “restaurants” followed by a common modeling, such as grades/menu/location.
The Airbnb application stores a different data set model keeping Airbnb data, such as bookings/apartment details/location.
The power of the document model and federated queries is that those data sets can become one if we create a federated database instance and group them under a “virtual collection” called “pointsOfInterest.”
The data sets can now be queried as if we have a collection named “pointsOfInterest” unioning the two.
Since the collections are located on Atlas, we can easily use Atlas search to individually index each. It’s also most probable that we already did that as our underlying applications require search capabilities of restaurants and Airbnb facilities.
However, if we make sure that the names of the indexes are identical—for example, “default”—and that key fields for special search—like geo—are the same (e.g., “location”), we can run federated search queries on “pointsOfInterest.” We are able to do that since the federated queries are propagated to each individual data source that comprise the virtual collection. With Atlas Search, it's surprisingly powerful as we can get results with a correct merging of the search scores between all of our data sets. This means that if geo search points of interest are close to my location, we will get either Airbnb or restaurants correctly ordered by the distance. What’s even cooler is that Atlas Data Federation intelligently “pushes down” as much of a query as possible, so the search operation will be done locally on the clusters and the union will be done in the federation layer, making this operation as efficient as possible.
We can take the query we just ran in Compass and export it to MongoDB Charts, our native charting offering that can directly connect to a federated database instance, plotting the data on a map: