BlogAtlas Vector Search voted most loved vector database in 2024 Retool State of AI reportLearn more >>
MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Aperol Spritz Summer With MongoDB Geospatial Queries & Vector Search

Anaiya Raisinghani13 min read • Published Jul 08, 2024 • Updated Jul 08, 2024
AIPythonMongoDB
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
It’s summer in New York City and you know what that means: It’s the season of the spritz! There is nothing (and I fully, truly, 110% mean nothing) better than a crisp Aperol spritz to end a day that was so hot and muggy that the subway was interchangeable from a sauna.
While I normally love adventuring through the city in search of what will perfectly fulfill my current craving, there are certain months when I refuse to spend any more time than necessary moving around outdoors (hello, heatwave?!). At night during an NYC summer, we are lounging — lounging on rooftops, terraces, and sidewalks — wherever we can fit. And with minimal movement, we want our Aperol spritzes as close as possible. So, let’s use MongoDB geospatial queries, MongoDB Atlas Vector Search, and the Google Places API to find our closest spritz locations in the West Village neighborhood of New York City while using semantic search to help us get the most out of our queries.
In this tutorial, we will use the various platforms listed above to find all the locations selling Aperol spritzes in the West Village neighborhood of New York City, ones that match our semantic query of being outdoors with quick service (we need those spritzes and need them NOW!), and the one closest to our starting location.
Before we begin the tutorial, let’s go over some of the important platforms we will be using on our journey.

What are MongoDB geospatial queries?

MongoDB geospatial queries allow you to search your database based on geographical locations! This means you are able to find different locations such as restaurants, parks, museums, etc. based just on their coordinates. In this tutorial, we will use MongoDB geospatial queries to search the locations of places that serve Aperol spritzes that we sourced from Google’s Places API. To use geospatial queries properly with MongoDB, we will need to ensure our data points are loaded in GeoJSON format. More on that below!

What is MongoDB Atlas Vector Search?

MongoDB Atlas Vector Search is a way of searching through your database semantically, or by meaning. This means instead of searching based on specific keywords or exact text phrases, you can retrieve results even if a word is spelled wrong, or retrieve results based on synonyms. This will integrate fabulously with our tutorial because we can search through the reviews we retrieve from our Google Places API and see which ones match closest to what we’re looking for. Let’s go!

Pre-requisites

To be successful with this tutorial, you will need:
  1. The IDE of your choosing — this tutorial uses a Google Colab notebook. Please feel free to run your commands directly from a notebook.
  2. A MongoDB Atlas account.
  3. A MongoDB Atlas cluster — the free tier will work perfectly.
  4. A Google Cloud Platform account — please create an account and a project. We will go through this together.
  5. A Google Cloud Platform API key.
  6. An OpenAI API key — this is how we will embed our location reviews so we can use MongoDB Atlas Vector Search!
Once your MongoDB Atlas cluster has been provisioned and you have everything else written down in a secure spot, you’re ready to begin.

Set up your Google Cloud project

Our first step is to create a project inside of our Google Cloud account. This is so we can ensure the use of the Google Places API to find all locations that serve Aperol spritzes in the West Village.
This is what your project will look like once it’s been created. Please make sure to set up your billing account information on the left-hand side of the screen. You can set up a free trial for $300 worth of credits, so if you’re trying out this tutorial, please feel free to do that and save some money!
Google Cloud account setup
Once your account is set up, let’s enable the Google Places API that we are going to be using. You can do this through the same link to set up your Google Cloud project.
This is the API we want to use: Google Places API
Hit the Enable button and a popup will come up with your API key. Store it somewhere safe since we will be using it in our tutorial! Make sure to not lose it or expose it anywhere.
With every Places API request made, your API key must be used. You can find out more from the documentation.
Once that’s in place, we can get started on our tutorial.

Imports and API key setup

Now, head over to your Google Colab notebook.
We want to install googlemaps and openai in our notebook since these are necessary for us when building this tutorial.
Then, define and run your imports:
We are going to use the getpass library to keep our API keys secret.
Set it up for your Google API key and your OpenAI API key:

Vector Search embedding function setup

Now, let's set ourselves up for Vector Search success. First, set your key and then establish our embedding function. For this tutorial, we are using OpenAI's "text-embedding-3-small" embedding model. We are going to be embedding the reviews of our spritz locations so we can make some judgments on where to go!

Nearby search method in Google Places API

When using Nearby Search in our Google Places API, we are required to set up three parameters: location, radius, and keyword. For our location, we can find our starting coordinates (the very middle of the West Village) by right-clicking on Google Maps and copying the coordinates to our clipboard. This is how I got the coordinates shown below: How to find our coordinates
For our radius, we have to have it in meters. Since I’m not very savvy with meters, let’s write a small function to help us make that conversion.
Our keyword will just be what we’re hoping to find from the Google Places API: Aperol spritzes!
We can then make our API call using the places_nearby method.
Before we can go ahead and print out our locations, let’s think about our end goal. We want to achieve a couple of things before we insert our documents into our MongoDB Atlas cluster. We want to:
  1. Get detailed information about our locations, so we need to make another API call to get our place_id, the location name, our formatted_address, the geometry for our coordinates, some reviews (only up to five), and the location rating. You can find more fields to return (if your heart desires!) from the Nearby Search documentation.
  2. Embed our reviews for each location using our embedding function. We want to make sure that we have a field for these so our vectors are stored in an array inside our cluster. We are choosing to embed here just to make things easier for ourselves in the long run. Let’s also join the five reviews together into one string to make things a bit easier on the embedding.
  3. Think about how our coordinates are set up, while we’re creating a dictionary with all the important information we want to portray. MongoDB geospatial queries require GeoJSON objects. This means we need to make sure we have the proper format, or else we won’t be able to use our geospatial queries operators later. We also need to keep in mind that the longitude and latitude are stored in a nested array underneath geometry and location inside the Google Places API. So, unfortunately, we cannot just access it from the top level. We need to work some magic first. Here is an example of the output that I copied from their documentation showing where the latitude and longitude are nested:
With all this in mind, let’s get to it!
Let’s print out our output and see what our spritz locations in the West Village neighborhood are! Let’s also check and make sure that we have a newly developed embedding field with our reviews embedded:
Our proper output
So, if I scroll over in my notebook, I can see there are embeddings, but I will prove they are there once we insert our data into MongoDB Atlas since it’s a bit hard to capture in a single picture.
Let’s insert them using the pymongo library.

Insert documents into MongoDB Atlas cluster

First, let’s install pymongo.
Now, set up our MongoDB connection. To do this, please make sure you have your connection string.
Please keep in mind that you can name your database and collection anything you like, since it won’t be created until we write in our data. I am naming my database “spritz_summer” and my collection “spritz_locations_WV”. Run the code block below to insert your documents into your cluster:
Go ahead and double-check that everything was written in correctly in MongoDB Atlas: Our documents in MongoDB Atlas
Make sure to double-check that your embedding field exists and that it’s an array of 1536, and please make sure your coordinates are properly configured the way mine are in the image.

Which comes first, vector search or geospatial queries?

Great question! Since both of these — if we’re looking at them simply from an aggregation pipeline operator — need to be the first stage in their pipelines, instead of making one pipeline, we can do a little loophole and create two. But how will we decide which one to do first?!
When I’m using Google Maps to figure out where to go, I normally first search for what I’m craving, and then I see how far away it is from where I currently am. So let’s keep that mindset and start off with MongoDB Atlas Vector Search. But, I understand that intuitively, some of you might prefer to search via all nearby locations and then semantically search (geospatial queries first and then vector search), so let’s highlight that method as well below.
We have a couple of steps here. Our first step is to create a Vector Search Index. Please do this inside of MongoDB Atlas by following the Vector Search documentation.
Please keep in mind that your index is not run in your script. It lives in your cluster. You’ll know it’s ready to go when it turns green and is activated.
Once it’s activated, let’s get to vector searching!
So. Let’s say I just finished dinner with my besties at our favorite restaurant in the West Village, Balaboosta. The food was great, it’s a summer night, we’re in the mood for post-dinner spritzes outside, and we would prefer to be seated quickly. Let’s see if we can find a spot!
Our first step in building our pipeline is to embed our query. We cannot compare text to vectors; we have to compare vectors to vectors. We can do this with only a couple of lines since we are using the same embedding model that we embedded our reviews with:
Now, let’s build out our aggregation pipeline. Since we are going to be using a $geoNear in our pipeline next, we want to keep the IDs found from this aggregation pipeline so we don’t search through everything — we only search through our sample size. For now, make sure your $vectorSearch stage is at the very top!
Let’s print out our results and see what happens from our query of “outdoor seating quick service”:
Output from printing our $vectorSearch aggregation pipeline
We have five fantastic options! If we go and read through the reviews, we can see they align with what we’re looking for. Here is one example: One example review aligning with what we’re looking for
Let’s go ahead and save the IDs from our pipeline above in a simple line so we can specify that we only want to use our $geoNear operator on these five:
Now that they’re saved, we can build out our $geoNear pipeline and see which one of these options is closest to us from our starting point, Balaboosta, so we can walk on over.

Geospatial queries in MongoDB

To figure out the coordinates of Balaboosta, I right-clicked on Google Maps and saved the coordinates, and then made sure I had the longitude and latitude in the proper order.
First, create a 2dsphere on our location field, so we can put a 2dsphere index on our collection:
Here is the pipeline, with our query specifying that we only want to use the IDs of the locations we found above:
Let’s print it out and see what we get!
Output after searching via distance from our five sample size
It seems like the restaurant we are heading over to is Pastis since it’s only 182.83 meters (0.1 miles) away. Time for an Aperol spritz outdoors!
For those who would prefer to switch things around and run geospatial queries first and then incorporate vector search, here is the pipeline:
First, create our $geoNear pipeline and ensure you’re saving in your places_ids and the distances so that we can carry them through our vector search pipeline.
We also need to rebuild our MongoDB Atlas Vector Search index with an included “_id” path:
Once that’s active and ready, we can build out our vector search pipeline:
Run it, and you should see some pretty similar results as before! Leave a comment below letting me know which locations showed up for you as your output — these are mine: Output from running geospatial first and then vector search
As you can see, they’re the same results but in a slightly different order, as they are no longer ordered by distance.

Conclusion

In this tutorial, we covered how to use MongoDB Atlas Vector Search, and the Google Places API to find our closest spritz locations in the West Village neighborhood of New York City with semantic search, and then used MongoDB geospatial queries to find which locations were closest to us from a specific starting point.
For more information on MongoDB geospatial queries please visit the documentation located above, and if you have any questions or want to share your work, please join us in the MongoDB Developer Community.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Quickstart

Store Sensitive Data With Python & MongoDB Client-Side Field Level Encryption


Sep 23, 2022 | 11 min read
Podcast

Making Diabetes Data More Accessible and Meaningful with Tidepool and MongoDB


May 16, 2022 | 15 min
Podcast

Scaling the Gaming Industry with Gaspard Petit of Square Enix


Mar 22, 2023 | 29 min
Tutorial

MongoDB Time Series with C++


Apr 03, 2024 | 6 min read
Table of Contents