Data Lake Configuration
On this page
Overview
The Atlas Data Lake configuration is in JSON format. It contains mappings between your data stores and Data Lake. Data Lake supports S3 buckets, Atlas clusters, and publicly accessible URLs as data stores. You must define mappings in your Data Lake to your S3 bucket, Atlas cluster, and HTTP data stores to run queries against your data.
Information in your storage configuration is visible internally at MongoDB and stored as operational data to monitor and improve the performance of Atlas Data Lake. So, we recommend that you do not use PII in your configurations.
Example Configuration for Individual Data Stores
Click on the tab below to learn more about the Data Lake configuration for that data store provider.
Example Configuration for Running Federated Queries
You can define mappings between your S3, Atlas cluster, and HTTP data stores and Data Lake in the storage configuration to run federated queries against your data.
For the preceding sample S3, Atlas cluster, and HTTP data stores, the Data Lake configuration for federated queries resembles the following:
{ "stores" : [ { "name" : "datacenter-alpha", "provider" : "s3", "region" : "us-east-1", "bucket" : "datacenter-alpha", "additionalStorageClasses" : [ "STANDARD_IA" ], "prefix" : "/metrics", "delimiter" : "/" }, { "name" : "atlasClusterStore", "provider" : "atlas", "clusterName" : "myDataCenter", "projectId" : "5e2211c17a3e5a48f5497de3" }, { "name" : "httpStore", "provider" : "http", "allowInsecure" : false, "urls" [ "https://www.datacenter-hardware.com/data.json", "https://www.datacenter-software.com/data.json" ], "defaultFormat" : ".json" } ], "databases" : [ { "name" : "datacenter-metrics", "collections" : [ { "name" : "inventory", "dataSources" : [ { "storeName" : "datacenter-alpha", "path" : "/hardware/{date date}" }, { "storeName" : "atlasClusterStore", "database" : "metrics", "collection" : "hardware" }, { "storeName" : "httpStore", "allowInsecure" : false, "urls": [ "https://www.datacenter-metrics.com/data.json" ], "defaultFormat" : ".json" } ] } ] } ] }
If the database in the storage configuration contains collections from S3, Atlas, and HTTP data stores, the query results might contain data from all the data stores.
Configuration Format
The Data Lake configuration has the following format:
stores
- The
stores
object defines each data store associated with the Data Lake. The data store captures files in an S3 bucket, documents in Atlas cluster, or files stored at publicly accessible URLs. Data Lake can only access data stores defined in thestores
object. databases
- The
databases
object defines the mapping between each data store defined instores
and MongoDB collections in the databases.
stores

stores
Array of objects where each object represents a data store to associate with the Data Lake. The data store captures files in an S3 bucket, documents in Atlas cluster, or files stored at publicly accessible URLs. A Data Lake can only access data stores defined in the
stores
object.
stores.[n].name
Name of the data store. The
databases.[n].collections.[n].dataSources.[n].storeName
field references this value as part of mapping configuration.NoteTo use Atlas as a data store, Data Lake requires a serverless instance, or an
M10
or higher cluster.
stores.[n].provider
Defines where the data is stored. Value can be one of the following:
s3
for an AWS S3 bucket.atlas
for a collection in an Atlas cluster.http
for data in files hosted at publicly accessible URLs.
databases

databases
Array of objects where each object represents a database, its collections, and, optionally, any views on the collections. Each database can have multiple
collections
andviews
objects.
databases.[n].collections
Array of objects where each object represents a collection and data sources that map to a
stores
data store. For dynamically generated databases, you can define only one wildcard (*
) collection object in the storage configuration.
databases.[n].collections.name
Name of the collection to which Data Lake maps the data contained in each
databases.[n].collections.[n].dataSources.[n].storeName
. Each object in the array represents the mapping between the collection and an object in thestores
array.
databases.[n].collections.[n].dataSources
Array of objects where each object represents a
stores
data store to map with the collection.
databases.[n].collections.[n].dataSources.[n].storeName
Name of a data store to map to the
<collection>
. Must match thename
of an object in thestores
array.
databases.[n].views
Array of objects where each object represents an aggregation pipeline on a collection. To learn more about views, see Views.
databases.[n].views.[n].pipeline
Aggregation pipeline stage(s) to apply to the
source
collection.