BI Connector: Mapping Document Structures to Relational Schema - Part 1
Building visual dashboards from a MongoDB flexible data store using standard SQL BI tools can be a challenge. Is complex ETL and manual intervention the only way to build such dashboards? Absolutely not! The MongoDB Connector for Business Intelligence (BI) allows users to create queries with SQL and visualize, graph, and report on their MongoDB data using existing relational business intelligence tools.
At its core, the MongoDB BI Connector performs a transformation of data from MongoDB's flexible documents to SQL's rigid schema of tables, rows and columns. That transformation is what allows users of the BI Connector to leverage the various SQL-focused BI tools. Knowing how BI Connector performs this and how you can take control of it is key to extracting the maximum value from it.
In this article, we'll look at how the BI connector automatically creates its schema mapping and how to control it to create the meaningful tables that will help you extract better insights from your MongoDB data.
The BI Connector has long supported the sharing of schema files (DRDL files - Document Relational Definition Language defines a relational view of a MongoDB schema). It has, though, previously relied on manually sharing the files between various users and manually managing the different versions needed. The latest enhancement to the BI Connector removes that manual management issue by storing the DRDL files where they can best be shared with different personas: in the MongoDB database itself.
In order to query MongoDB using SQL via the BI Connector, it is first necessary to map collection document structures to a relational schema which can be examined and queried using SQL client applications.
Schema Mapped from Sampled Documents
By default, the BI Connector handles this mapping automatically in what we call “standalone mode.” In standalone mode, the BI Connector randomly samples collection documents and maps a relational schema based on the underlying document structure. This mapping logic accounts for rich document structures and creates a relational schema that includes root tables for top-level document data, and sub-tables that represent embedded objects and arrays.
In many cases, this default sampling logic is sufficient to provide SQL client applications direct access to MongoDB. However, when working with complex document structures, it may be desirable to define customized mappings to meet the specific needs of connecting client applications.
Customizing Schema
Consider the following document:
{
 "_id": ObjectId("5bfabde76f280102ddf27969"),
 "band": "Slayer",
 "formation": {
 "year": ISODate("1982-01-01T00:00:00Z"),
 "city": "Los Angeles"
 },
 "popular_albums": [
 "Show No Mercy!",
 "Seasons in the Abyss",
 "Haunting the Chapel",
 "Divine Intervention"
 ],
 "members": [
 {
 "name": "Tom Araya",
 "dob": ISODate("1961-06-06T07:00:00Z"),
 "primary_instrument": "Bass/Vocals"
 },
 {
 "name": "Kerry King",
 "dob": ISODate("1964-06-03T07:00:00Z"),
 "primary_instrument": "Guitar"
 },
 {
 "name": "Jeff Hanneman",
 "dob": ISODate("1964-01-31T08:00:00Z"),
 "primary_instrument": "Guitar"
 },
 {
 "name": "Dave Lombardo",
 "dob": ISODate("1965-02-16T08:00:00Z"),
 "primary_instrument": "Drums"
 }
 ]
}

Standalone Sampling Schema Logic
In standalone mode where document-to-relational mapping is done via sampling, the BI Connector maps the following table and sub-tables:
mysql> show tables;
+-----------------------------------------+
| Tables_in_sql |
+-----------------------------------------+
| bands |
| bands_members |
| bands_popular_albums |

Table details:
bands:
mysql> describe bands;
+----------------+----------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------------+------+------+---------+-------+
| _id | varchar(24) | YES | PRI | NULL | |
| band | varchar(65535) | YES | | NULL | |
| formation.city | varchar(65535) | YES | | NULL | |
| formation.year | datetime | YES | | NULL | |
+----------------+----------------+------+------+---------+-------+
4 rows in set (0.08 sec)

NOTE: Given that the “formation” field contains a single object in a 1:1 relationship with top-level document fields, the embedded object’s fields are mapped to the root table.
bands_members:
mysql> describe bands_members;
+----------------------------+----------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+----------------+------+------+---------+-------+
| _id | varchar(24) | YES | PRI | NULL | |
| members.dob | datetime | YES | | NULL | |
| members.name | varchar(65535) | YES | | NULL | |
| members.primary_instrument | varchar(65535) | YES | | NULL | |
| members_idx | bigint(20) | YES | PRI | NULL | |
+----------------------------+----------------+------+------+---------+-------+
5 rows in set (0.05 sec)

bands_popular_albums:
mysql> describe bands_popular_albums;
+--------------------+----------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+----------------+------+------+---------+-------+
| _id | varchar(24) | YES | PRI | NULL | |
| popular_albums | varchar(65535) | YES | | NULL | |
| popular_albums_idx | bigint(20) | YES | PRI | NULL | |
+--------------------+----------------+------+------+---------+-------+
3 rows in set (0.06 sec)

We can join these tables using the _id field. For example:
mysql> SELECT * FROM bands
 -> JOIN bands_popular_albums ON bands._id = bands_popular_albums._id;

Customize Document-to-Relational Mappings
But what if our SQL clients do not require access to all fields within the underlying document? Or similarly, what if we want to provide clients with a single table view, rather than a root table with its two sub-tables?
We can customize our document-to-relational schema mapping by using the BI Connector in “custom mode.” When using custom mode, the BI Connector does no sampling of underlying documents to create mappings. Rather, the BI Connector relies on explicit mappings provided by the BI Connector administrator.
Document to Relational Definition Language (DRDL)
These custom mappings are managed through the Document to Relational Definition Language (DRDL).
As an example, let’s say that we want to present a simpler, single-table relational schema for our client applications to work with. Rather than presenting a root table and its sub-tables, we want to present a single table that contains a subset of fields from the underlying document structure.
Because our document structure contains an array of popular albums for each band, to present this data in a single table there will be a unique row for each element in the array.
This is our desired end state:
- Single table - bands
- band name
- band formation city
- album
Generate a DRDL File Mapping
To achieve this mapping we need to begin by creating a DRDL mapping file for the bands collection.
mongodrdl -d sql -c bands > bands.drdl

The bands.drdl file contains the mapping of a root table and two sub-tables.
schema:
- db: sql
 tables:
 - table: bands
 collection: bands
 pipeline: []
 columns:
 - Name: _id
 MongoType: bson.ObjectId
 SqlName: _id
 SqlType: objectid
 - Name: band
 MongoType: string
 SqlName: band
 SqlType: varchar
 - Name: formation.city
 MongoType: string
 SqlName: formation.city
 SqlType: varchar
 - Name: formation.year
 MongoType: date
 SqlName: formation.year
 SqlType: timestamp
 - table: bands_members
 collection: bands
 pipeline:
 - $unwind:
 includeArrayIndex: members_idx
 path: $members
 preserveNullAndEmptyArrays: false
 columns:
 - Name: _id
 MongoType: bson.ObjectId
 SqlName: _id
 SqlType: objectid
 - Name: members.dob
 MongoType: date
 SqlName: members.dob
 SqlType: timestamp
 - Name: members.name
 MongoType: string
 SqlName: members.name
 SqlType: varchar
 - Name: members.primary_instrument
 MongoType: string
 SqlName: members.primary_instrument
 SqlType: varchar
 - Name: members_idx
 MongoType: int
 SqlName: members_idx
 SqlType: int
 - table: bands_popular_albums
 collection: bands
 pipeline:
 - $unwind:
 includeArrayIndex: popular_albums_idx
 path: $popular_albums
 preserveNullAndEmptyArrays: false
 columns:
 - Name: _id
 MongoType: bson.ObjectId
 SqlName: _id
 SqlType: objectid
 - Name: popular_albums
 MongoType: string
 SqlName: popular_albums
 SqlType: varchar
 - Name: popular_albums_idx
 MongoType: int
 SqlName: popular_albums_idx
 SqlType: int

In order to simplify this mapping we need to first create a flat document shape using a simple aggregation pipeline and then map the resulting flat documents to a single-table schema.
Document Structure and Redacting Fields using Aggregation
In this case, we are only interested in 4 document fields. Namely, band, formation city, formation year, and popular albums. We also want to present these fields as columns in a single table. So, to create a completely flat document structure, we can use the following aggregation pipeline:
[{$unwind: {
 path: "$popular_albums" }},
 {$project: {
 band: "$band",
 formed: "$formation.year",
 city: "$formation.city",
 popular_album: "$popular_albums"
}}]

To apply this pipeline to a DRDL mapping, we simply add it to the pipeline section of the DRDL file:
schema:
- db: sql
 tables:
 - table: bands
 collection: bands
 pipeline: [{$unwind: {path: "$popular_albums"}},
 {$project: {
 band: "$band",
 formed: "$formation.year",
 city: "$formation.city",
 popular_album: "$popular_albums"
 }}]

Modifying Schema
Now that our documents are in the desired shape, we map the resulting fields to SQL columns in a single table. Also, we have adjusted column names for clarity/simplicity by editing the value of the SqlName.
schema:
- db: sql
 tables:
 - table: bands
 collection: bands
 pipeline: [{$unwind: {
 path: "$popular_albums"
}}, {$project: {
 band: "$band",
 formed: "$formation.year",
 city: "$formation.city",
 popular_album: "$popular_albums"
}}]
 columns:
 - Name: _id
 MongoType: bson.ObjectId
 SqlName: _id
 SqlType: objectid
 - Name: band
 MongoType: string
 SqlName: band
 SqlType: varchar
 - Name: city
 MongoType: string
 SqlName: city
 SqlType: timestamp
 - Name: popular_album
 MongoType: string
 SqlName: album
 SqlType: varchar
 - Name: formed
 MongoType: date
 SqlName: formed
 SqlType: timestamp

Using the DRDL file
Now, to use this DRDL file for real, until recently you would have had to name it on the command line like this
mongosqld --schema bands.drdl

Of course, you'd have to manage the versions of the drdl file by hand, ensure that you started mongosqld
with the correct schema basically house sit your definitions. There has to be an easier way and that's what we'll look at in part 2, BI Connector: Using Customizations as we introduce a new power feature in BI Connector.
Conclusion
MongoDB’s BI Connector is the best way to generate dashboards using SQL-based BI tools for MongoDB without having to perform any ETL operations. If you need more help, please check our docs. You can follow these steps to install BI Connector on-premises. If you have your MongoDB instance running on Atlas, you can connect via the BI Connector for Atlas.