Map Relational Schemas to MongoDB¶
On this page
Business intelligence tools connect to a data source and, given a fixed tabular schema, allow the user to visually explore their data. As MongoDB uses a flexible schema, these tools currently cannot use MongoDB as a native data source.
To use MongoDB with a relational business intelligence tool such as Tableau, you must define a relational mapping for your MongoDB data.
The schema generation tool mongodrdl
outputs such a schema in the
Document Relational Definition Language (DRDL) format, but you
should verify that the schema it proposes matches your data requirements.
Examine both the table structure and field types to ensure that you can
generate your intended reports.
You may have to manually edit the schema definition files to remove
unneeded fields; add fields that mongodrdl
did not discover
within the subset of documents that it sampled; and modify erroneous
field types.
You may edit DRDL files to:
- Add aggregation pipeline stages,
- Add fields,
- Remove fields,
- Remove tables,
- Rename tables, so long as the
collection
field remains the same, - Rename fields using the
sqlname
option
Important
Restart mongosqld
to apply any changes you make to your
DRDL
files.
Document Relational Definition Language¶
The Document Relational Definition Language (DRDL
) defines a
relational view of a MongoDB schema.
mongodrdl
samples documents from your MongoDB collections, and
derives a DRDL
file from those documents. mongosqld
then
uses the schema defined in DRDL
files to allow MySQL clients to
query your MongoDB data.
Field Types¶
The MongoDB Connector for BI correctly maps fields that always contain the same data type into the relational model. Schema generation deals specially with the following cases:
Numeric | The MongoDB Connector for BI uses the most precise numeric type that matches the sampled documents. If a field in a collection always has the same data type, the MongoDB Connector for BI uses that type. If a field in a collection can contain either floating point
values or integers, the MongoDB Connector for BI uses type |
---|---|
Dates | The MongoDB Connector for BI treats any field of type data_date as the
SQL timestamp type. |
Timestamps | The MongoDB Connector for BI ignores any field of type data_timestamp . |
ObjectID | The MongoDB Connector for BI treats any field of type data_oid as
the SQL type varchar . |
UUID | The MongoDB Connector for BI treats any field of type UUID as the SQL type
varchar . |
Geospatial | If a collection contains a geospatial index, the connector maps the indexed field into an array of numeric longitude-latitude coordinates. See Geospatial Data for an example.
|
Heterogeneous Fields | If a field contains inconsistent types, the MongoDB Connector for BI chooses the most frequently sampled type. If a field can contain either a type or an array of that type, the generated schema always specifies that the field contains an array. |
Embedded Documents¶
The MongoDB Connector for BI maps embedded documents to simple fields
that have a .
separator character, making them appear similar to the way you
would reference them using dot notation in a
MongoDB query.
While Tableau properly quotes identifiers, within ad-hoc
SQL expressions you must double-quote every identifier that contains .
characters or mixed-case characters.
Example¶
Consider the following document:
Running mongodrdl
on a collection containing this document
results in the following fields in the generated schema:
_id |
numeric |
familyName |
varchar |
hometown |
varchar |
address.street |
varchar |
address.city |
varchar |
address.state |
varchar |
address.zip |
varchar |
members_since |
timestamp |
Arrays¶
The MongoDB Connector for BI exposes arrays to business intelligence tools using two collections: one without the array, and the other having one document per array element.
Example¶
If you run mongodrdl
on a collection families
with the
following document:
This results in the following three tables:
families
_id
numeric
familyName
varchar
hometown
varchar
families_familyMembers
_id
numeric
familyMembers.age
numeric
familyMembers.firstname
varchar
familyMembers_idx
numeric
families_familyMembers_attributes
_id
numeric
familyMembers.attributes.name
varchar
familyMembers.attributes.value
varchar
familyMembers.attributes_idx
numeric
familyMembers_idx
numeric
You can join these tables together to view the data in a denormalized format. For example, you can list the people named in the above schema together with their family information using the following query:
Pre-Joining¶
If you provide the --preJoined
option to mongodrdl
,
then the MongoDB Connector for BI adds the fields in the containing document to each
array element’s document, thus “pre-joining” the table.
In the previous example, the tables would contain the following additional columns:
families_familyMembers
familyName
varchar
hometown
varchar
families_familyMembers_attributes
familyMembers.age
numeric
familyMembers.firstname
varchar
familyMembers_idx
numeric
familyName
varchar
hometown
varchar
Custom Filters¶
You can add a column of type mongo.Filter
to a collection in your
DRDL
file. This column type allows you to perform a custom
$match query.
For example, given the following schema describing a cloud of points with up to three components:
You can select only three-dimensional points using the following query:
Aggregation Pipelines¶
Aggregation Pipelines using Views¶
MongoDB 3.4 introduces Read-Only Views that you can use to filter incompatible data.
For example, you can create a view in the test
database that contains
only documents containing a number in the grade
field of a grades
collection:
You can then use mongodrdl
to generate a schema from this view
as you would a collection:
Aggregation Pipelines in DRDL¶
The MongoDB Connector for BI can use aggregation pipelines as part of the schema to transform documents from the collection into the proper form for the relational tables.
For example, consider a simple document in the collection
simpleFamilies
:
mongodrdl
will generate a schema with the tables
simpleFamilies
and simpleFamilies_familyMembers
.
The table simpleFamilies_familyMembers
enumerates each family
members, and has the following pipeline:
This pipeline uses $unwind
to create a new record for each
member of familyMembers
. The schema tracks the array index in
the field familyMembers_idx
.
Geospatial Data¶
New in version 1.1.0.
If a collection contains a geospatial index, the connector maps the indexed field into an array of numeric longitude-latitude coordinates.
Example¶
Given the following collection:
mongodrdl
generates the following schema:
mongodrdl
will currently not recognize geospatial fields
in a read-only view.