Configuring Data Lake
On this page
- Overview
- Retrieve Data Lake Configuration
- Set or Update Data Lake Configuration
- Log in to MongoDB Atlas.
- Select the Data Lake option on the left-hand navigation.
- Click Configuration for the Data Lake and choose the configuration method:
- Make necessary changes to the Data Lake storage configuration.
- Click Save.
- Validate Data Lake Configuration
- Generate Data Lake Configuration
- Generate Wildcard Collections
Overview
You can configure Atlas Data Lake using the Data Lake Configuration. The configuration defines mappings between your data stores and Data Lake. To learn more about the configuration including the configuration fields and format, see Data Lake Configuration.
You can retrieve and update the Data Lake configuration by
connecting a mongo
shell to the
Data Lake. You can also update your Data Lake from the Atlas UI. See
Set or Update Data Lake Configuration for more information.
Any MongoDB user in the Atlas project with the atlasAdmin role can retrieve and update the Data Lake configuration.
Retrieve Data Lake Configuration
Once connected to the Data Lake, you can use the following database commands to retrieve the Data Lake configuration:
use admin db.runCommand( { "storageGetConfig" : 1 } )
The command returns the current Data Lake configuration. For complete documentation on the configuration fields and format, see Configuration Format.
Set or Update Data Lake Configuration
Once connected to the Data Lake, you can use the following database commands to set or update the Data Lake configuration:
use admin db.runCommand( { "storageSetConfig" : <config> } )
Replace <config>
with the Data Lake configuration. For
complete documentation on the configuration fields and format, see
Configuration Format. You can validate your
configuration before setting or
updating the Data Lake configuration by running the
storageValidateConfig command.
To set or update the storage configuration through the Atlas UI:
Log in to MongoDB Atlas.
You can also set and manage the storage configuration using the Administration Commands.
Validate Data Lake Configuration
You can run the following command to validate your Data Lake configuration.
use admin db.runCommand( { "storageValidateConfig" : <config> } )
Replace <config>
with the Data Lake configuration. For
complete documentation on the configuration fields and format, see
Configuration Format.
The command returns the following if your Data Lake configuration is valid:
{ "ok" : 1 }
The command returns the list of errors in the errs
field if your
Data Lake storage configuration is invalid:
{ "ok" : 1, "errs" : [ "<error>", "<error>", ... ] }
Generate Data Lake Configuration
You can run the storageGenerateConfig
command to regenerate a
Data Lake configuration.
The command returns an automatically generated configuration, which you can then modify and
upload. In the automatically
generated configuration, Data Lake regenerates a
database for each store:
- The
databases.[n].name
will be the same as thestores.[n].name
that it maps to. - Each database will contain up to 3 collections and a wildcard (
*
) collection.
As a result, the databases array in the generated configuration might be different from the databases array in your existing configuration.
You must have the storageSetConfig
privilege to run the
storageGenerateConfig
command. The atlasAdmin role has the
storageSetConfig
privilege by default.
To generate a Data Lake configuration, connect to the Data Lake and run the following database commands:
use admin db.runCommand( { "storageGenerateConfig" : 1 } )
For complete documentation on the configuration fields and format, see Configuration Format.
Generate Wildcard Collections
You can dynamically generate collection names that map to data in your
S3 bucket or Atlas cluster. To dynamically generate collection
names, specify the wildcard, *
, as the value for the collection
name setting in your Data Lake storage configuration. You can't
dynamically generate collection names in your Data Lake storage
configuration that map to data in your HTTP or HTTPS data store.
You can use the storageSetConfig
command to configure the settings for generating wildcard (*
)
collections.
To learn more about the configuration settings for generating wildcard collections, click on the tab for your data store:
To learn more about the configuration settings, see Data Lake Configuration.