The MongoDB Atlas Sample Datasets
Rate this article
Did you know that MongoDB Atlas provides a complete set of example data to help you learn faster? The feature enables you to load eight datasets into your database to explore. You can use this with the MongoDB Atlas M0 free tier to try out MongoDB Atlas and MongoDB's features. The sample data helps you try out features such as indexing, querying including geospatial, and aggregations, as well as using MongoDB Tooling such as MongoDB Charts and MongoDB Compass.
In the rest of this post, we'll explore why it was created, how to first load the sample data, and then we'll outline what the datasets contain. We'll also cover how you can download these datasets to use them on your own local machine.
Before diving into how we load the sample data, it's worth highlighting why we built the feature in the first place. We built this feature because often people would create a new empty Atlas cluster and they'd then have to wait until they wrote their application or imported data into it before they were able to learn and explore the platform. Atlas's Sample Data was the solution. It removes this roadblock and quickly allows you to get a feel for how MongoDB works with different types of data.
- In your left navigation pane in Atlas, click Clusters, then choose which cluster you want to load the data into.
- For that cluster, click the Ellipsis (...) button.
- Then, click the button "Load Sample Dataset."
- Click the correspondingly named button, "Load Sample Dataset."
This process will take a few minutes to complete, so let's look at exactly what kind of data we're going to load. Once the process is completed, you should see a banner on your Atlas Cluster similar to this image below.
The Atlas Sample Datasets are comprised of eight databases and their associated collections. Each individual dataset is documented to illustrate the schema, the collections, the indexes, and a sample document from each collection.
This dataset consists of a single collection of AirBnB reviews and listings. There are indexes on the
name, and on the
locationfields as well as on the
_idof the documents.
This dataset consists of three collections of randomly generated financial services data. There are no additional indexes beyond the
_idindex on each collection. The collections represent accounts, transactions, and customers.
The advantages in using this pattern are a reduction in index size when compared to storing each transaction in a single document. It can potentially simplify queries and it provides the ability to use pre-aggregated data in our documents.
This dataset consists of a single collection with information on shipwrecks. It has an additional index on the
coordinatesfield (GeoJSON). This index is a Geospatial 2dsphere index. This dataset was created to help explore the possibility of geospatial queries within MongoDB.
This dataset consists of five collections with information on movies, movie theatres, movie metadata, and user movie reviews and their ratings for specific movies. The data is a subset of the IMDB dataset. There are three additional indexes beyond
_id: on the sessions collection on the
user_idfield, on the theatres collection on the
location.geofield, and on the users collection on the
In order to use the collections for geographical searching, we need to add an index, specifically a . We can add this index and then search for all restaurants in a one-kilometer radius of a given location, with the results being sorted by those closest to those furthest away. The code below creates the index, then adds a helper variable to represent 1km, which our query then uses with the criteria to return the list of restaurants within 1km of that location.
The routes collection uses to hold data on airline routes between airports. It references airline information in the
airlinesub document, which has details about the specific plane on the route. This is another example of improving performance at the cost of minor data duplication for fields that are likely to be frequently accessed.
This dataset consists of a single collection with no additional indexes. It represents detailed weather reports from locations across the world. It holds geospatial data on the locations in the form of legacy coordinate pairs.
It is also possible to download and explore these datasets on your own local machine. You can download the complete sample dataset via the wget command:
Note: You can also use the curl command:
If you don't provide any connection details to
mongorestore, it will attempt to connect to MongoDB on your local machine, on port 27017 (which is MongoDB's default). This is the same as providing
These datasets offer a wide selection of data that you can use to both explore MongoDB's features and prototype your next project without having to worry about where you'll find the data.
News & Announcements
Unlock the Value of Data in MongoDB Atlas with the Intelligent Analytics of Microsoft Fabric
Nov 17, 2023 | 6 min read