Atlas Data Federation supports Azure Blob Storage containers as federated database instance stores. You must define mappings in your federated database instance to your Azure Blob Storage containers to run queries against your data.
Note
While we refer to blobs as files and delimiter-separated prefixes as directories in this page, these blob storage services are not actually file systems and don't have the same behaviors in all cases as files on a hard drive.
Configuration File Format
To define a federated database instance store for an Azure Blob Storage container, you can specify the configuration parameters in JSON format. The configuration contains the Azure Blob Storage data store and maps it to virtual collections that you can query.
The JSON configuration for data in Azure Blob Storage containers uses the following fields:
1 { 2 "stores" : [ 3 { 4 "name" : "<string>", 5 "provider": "<string>", 6 "region" : "<string>", 7 "serviceURL" : "<string>", 8 "containerName" : "<string>", 9 "delimiter" : "<string>", 10 "prefix": "<string>", 11 "public": <boolean> 12 } 13 ], 14 "databases" : [ 15 { 16 "name" : "<string>", 17 "collections" : [ 18 { 19 "name" : "<string>", 20 "dataSources" : [ 21 { 22 "storeName" : "<string>", 23 "path" : "<string>", 24 "defaultFormat" : "<string>", 25 "provenanceFieldName": "<string>", 26 "omitAttributes": <boolean> 27 } 28 ] 29 } 30 ], 31 "maxWildcardCollections" : <integer>, 32 "views" : [ 33 { 34 "name" : "<string>", 35 "source" : "<string>", 36 "pipeline" : "<string>" 37 } 38 ] 39 } 40 ] 41 } 42
The JSON configuration for an Azure Blob Storage contains two top-level objects:
stores
and databases
stores
The stores
object defines each data store associated with the
federated database instance. The federated database instance store captures files in a Azure Blob Storage container.
Data Federation can only access data stores defined in the stores
object.
The stores
object contains the following fields:
1 "stores" : [ 2 { 3 "name" : "<string>", 4 "provider" : "<string>", 5 "region" : "<string>", 6 "serviceURL" : "<string>", 7 "containerName" : "<string>", 8 "delimiter": "<string", 9 "prefix" : "<string>", 10 "public": <boolean> 11 } 12 ]
The following table describes the fields in the stores object:
Field | Type | Necessity | Description | ||||
---|---|---|---|---|---|---|---|
array | required | Array of objects where each object represents a data store to associate with the federated database instance. The federated database instance store captures:
Atlas Data Federation can only access data stores
defined in the | |||||
string | required | Name of the federated database instance store. The
| |||||
string | required | Defines where the data is stored. Value must be | |||||
string | required | Name of the Azure region in which the data is stored. | |||||
string | required | URL of the Azure Blob Storage account that contains your blob containers. The
where | |||||
string | required | Name of the Azure Blob Storage container that contains the files. | |||||
string | optional | Prefix Atlas Data Federation applies when searching for files in the Azure Blob Storage. For example, consider an an Azure Blob Storage container
The federated database instance store prepends the value of If omitted, Atlas Data Federation searches all files from the root of the Azure Blob Storage container. | |||||
string | optional | The delimiter that separates
If omitted, defaults to | |||||
boolean | optional | Specifies whether the Azure Blob Storage container is public. If set to If omitted, defaults to |
databases
The databases
object defines the mapping between each
federated database instance store defined in stores
and MongoDB collections
in the databases.
The databases
object contains the following fields:
1 "databases" : [ 2 { 3 "name" : "<string>", 4 "collections" : [ 5 { 6 "name" : "<string>", 7 "dataSources" : [ 8 { 9 "storeName" : "<string>", 10 "defaultFormat" : "<string>", 11 "path" : "<string>", 12 "provenanceFieldName": "<string>", 13 "omitAttributes": <boolean> 14 } 15 ] 16 } 17 ], 18 "maxWildcardCollections" : <integer>, 19 "views" : [ 20 { 21 "name" : "<string>", 22 "source" : "<string>", 23 "pipeline" : "<string>" 24 } 25 ] 26 } 27 ]
The following table describes the fields in the databases object:
Field | Type | Necessity | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
array | required | Array of objects where each object represents a database, its
collections, and, optionally, any views on
the collections. Each database can have multiple | |||||||||||||
string | required | Name of the database to which Atlas Data Federation maps the data contained in the data store. | |||||||||||||
array | required | Array of objects where each object represents a collection and data
sources that map to a | |||||||||||||
string | required | Name of the collection to which Atlas Data Federation maps the data contained in
each You can generate collection names dynamically from file paths by
specifying | |||||||||||||
array | required | Array of objects where each object represents a
| |||||||||||||
string | required | ||||||||||||||
string | required | Controls how Atlas Data Federation searches for and parses files in the
For example, consider an Azure Blob Storage container
A A If the Appending the
See Define Path for S3 Data for more information. When specifying the
When specifying attributes of the same type, do any of the following:
| |||||||||||||
string | optional | Default format that Data Federation assumes if it encounters a file
without an extension while searching the
The following values are valid for the defaultFormat field:
For more information, see Supported Data Formats | |||||||||||||
string | required | Name for the field that includes the provenance of the documents in the results. If you specify this setting in the storage configuration, Atlas Data Federation returns the following fields for each document in the result:
| |||||||||||||
boolean | required | Flag that specifies whether to omit the attributes (key and value pairs) that Atlas Data Federation adds to the collection. You can specify one of the following values:
If omitted, defaults to For example:
Consider a file named | |||||||||||||
integer | optional | Maximum number of wildcard * collections in the database. Each wildcard collection can have only one data source. Value can be between 1 and 1000, inclusive. If omitted, defaults to 100. | |||||||||||||
array | required | Array of objects where each object represents an aggregation pipeline on a collection. To learn more about views, see Views. | |||||||||||||
string | required | Label that identifies the view. | |||||||||||||
string | required | Name of the source collection for the view. If you want to create a view with a $sql stage, you must omit this field as the SQL statement will specify the source collection. | |||||||||||||
array | optional | Aggregation pipeline stage(s) to apply to the
|
Example Configuration for Azure Blob Storage Data Store
Example
Consider Azure Blob Storage container datacenter-alpha
containing data
collected from a datacenter:
|--metrics |--hardware
The /metrics/hardware
path stores JSON files with metrics
derived from the datacenter hardware, where each filename is
the UNIX timestamp in milliseconds of the 24 hour period
covered by that file:
/hardware/1564671291998.json
The following configuration:
Defines a federated database instance store on the
datacenter-alpha
Azure Blob Storage container in theeastus2
Azure region. The federated database instance store is specifically restricted to include only data files in themetrics
directory path.Maps files from the
hardware
directory to a MongoDB databasedatacenter-alpha-metrics
and collectionhardware
. The configuration mapping includes parsing logic for capturing the timestamp implied in the filename.
{ "stores" : [ { "name" : "datacenter", "provider" : "azure", "region" : "eastus2", "containerName" : "datacenter-alpha", "serviceURL" : "https://mystorageaccount.blob.core.windows.net/" } ], "databases" : [ { "name" : "datacenter-alpha-metrics", "collections" : [ { "name" : "hardware", "dataSources" : [ { "storeName" : "datacenter", "path" : "/hardware/{date date}" } ] } ] } ] }
Atlas Data Federation parses the Azure Blob Storage container datacenter-alpha
and
processes all files under /metrics/hardware/
. The collections
uses the path parsing syntax to map the
filename to the date
field, which is an ISO-8601 date, in each
document. If a matching date
field does not exist in a document,
Atlas Data Federation adds it.
Users connected to the federated database instance can use the MongoDB Query Language
and supported aggregations to analyze data in the Azure Blob Storage container
through the datacenter-alpha-metrics.hardware
collection.