Deploy a Data Lake
On this page
Estimated completion time: 15 minutes
This part of the tutorial will guide you through deploying an Atlas Data Lake.
Prerequisites
To complete this part of the tutorial, you must:
- Create a MongoDB Atlas account, if you do not have one already.
- Be a
Project Owner
in the project that you want to deploy a Data Lake to.
Procedure
Log in to MongoDB Atlas.
Drag and drop the following paths to the sample datasets from the Data Stores pane on the left to the Data Lake pane on the right.
/airbnb/listingsAndReviews/{bedrooms string}/{review_scores.review_scores_rating int}/
This path references the
airbnb
dataset, which contains the vacation home listing details and customer reviews. To learn more about this dataset, see Sample AirBnB Listings Dataset.For this path, Data Lake utilizes partitions optimized for queries on the
bedrooms
field andreview_scores.review_score_ratings
fields./analytics/accounts/{limit int}/
This path references the
analytics
dataset, which contains data for a typical finanacial services application. To learn more about this dataset, see Sample Analytics Dataset.For this path, Data Lake utilizes partitions optimized for queries on the
limit
field./analytics/customers/{birthdate isodate}/
This data references the
analytics
dataset, which contains collections for a typical finanacial services application. To learn more about this dataset, see Sample Analytics Dataset.For this path, Data Lake utilizes partitions optimized for queries on the
birthdate
field./analytics/transactions/{account_id int}/
This path references the
analytics
dataset, which contains data for a typical finanacial services application. To learn more about this dataset, see Sample Analytics Dataset.For this path, Data Lake utilizes partitions optimized for queries on the
account_id
field./mflix/movies/{type string}/{year int}/
This path references the
mflix
dataset, which contains data on movies and movie theaters. To learn more about this dataset, see Sample Mflix Dataset.For this path, Data Lake utilizes partitions optimized for queries on the
type
andyear
fields./mflix/sessions.json
This path references the
mflix
dataset, which contains data on movies and movie theaters. To learn more about this dataset, see Sample Mflix Dataset.This path does not contain any partition attributes and so, for queries against data in the collection, Data Lake searches all the files in the collection.
/mflix/theaters/{theaterId string}/{location.address.zipcode string}/
This path references the
mflix
dataset, which contains data on movies and movie theaters. To learn more about this dataset, see Sample Mflix Dataset.For this path, Data Lake utilizes partitions optimized for queries on the
theaterId
andlocation.address.zipcode
fields./mflix/users.json
This path references the
mflix
collection, which contains data on movies and movie theaters. To learn more about this dataset, see Sample Mflix Dataset.This path does not contain any partition attributes and so, for queries against data in the collection, Data Lake searches all the files in the collection.
/nyc-yellow-cab-trips/{trip_start_isodate isodate}/{passenger_count int}/{fare_type string}/
The path references the
nyc-yellow-cab-trips
dataset, which contains data on the trips, including trip date, fare, and number of passengers.For this path, Data Lake utilizes partitions optimized for queries on the
trip_start_isodate
,passenger_count
, andfare_type
fields.
Optional: Change the name of the Data Lake from DataLake0 to GettingStarted
by clicking the associated icon.
You need not modify the database or collection name because the sample queries that you run against the sample datasets later in this tutorial use the default names.
Next Steps
Now that your Data Lake is deployed, proceed to Connect to Your Data Lake.
