BI Connector: Mapping Document Structures to Relational Schema - Part 1

Building visual dashboards from a MongoDB flexible data store using standard SQL BI tools can be a challenge. Is complex ETL and manual intervention the only way to build such dashboards? Absolutely not! The MongoDB Connector for Business Intelligence (BI) allows users to create queries with SQL and visualize, graph, and report on their MongoDB data using existing relational business intelligence tools.

At its core, the MongoDB BI Connector performs a transformation of data from MongoDB's flexible documents to SQL's rigid schema of tables, rows and columns. That transformation is what allows users of the BI Connector to leverage the various SQL-focused BI tools. Knowing how BI Connector performs this and how you can take control of it is key to extracting the maximum value from it.

In this article, we'll look at how the BI connector automatically creates its schema mapping and how to control it to create the meaningful tables that will help you extract better insights from your MongoDB data.

The BI Connector has long supported the sharing of schema files (DRDL files - Document Relational Definition Language defines a relational view of a MongoDB schema). It has, though, previously relied on manually sharing the files between various users and manually managing the different versions needed. The latest enhancement to the BI Connector removes that manual management issue by storing the DRDL files where they can best be shared with different personas: in the MongoDB database itself.

In order to query MongoDB using SQL via the BI Connector, it is first necessary to map collection document structures to a relational schema which can be examined and queried using SQL client applications.

Schema Mapped from Sampled Documents

By default, the BI Connector handles this mapping automatically in what we call “standalone mode.” In standalone mode, the BI Connector randomly samples collection documents and maps a relational schema based on the underlying document structure. This mapping logic accounts for rich document structures and creates a relational schema that includes root tables for top-level document data, and sub-tables that represent embedded objects and arrays.

In many cases, this default sampling logic is sufficient to provide SQL client applications direct access to MongoDB. However, when working with complex document structures, it may be desirable to define customized mappings to meet the specific needs of connecting client applications.

Customizing Schema

Consider the following document:

{
  "_id": ObjectId("5bfabde76f280102ddf27969"),
  "band": "Slayer",
  "formation": {
    "year": ISODate("1982-01-01T00:00:00Z"),
    "city": "Los Angeles"
  },
  "popular_albums": [
    "Show No Mercy!",
    "Seasons in the Abyss",
    "Haunting the Chapel",
    "Divine Intervention"
  ],
  "members": [
    {
      "name": "Tom Araya",
      "dob": ISODate("1961-06-06T07:00:00Z"),
      "primary_instrument": "Bass/Vocals"
    },
    {
      "name": "Kerry King",
      "dob": ISODate("1964-06-03T07:00:00Z"),
      "primary_instrument": "Guitar"
    },
    {
      "name": "Jeff Hanneman",
      "dob": ISODate("1964-01-31T08:00:00Z"),
      "primary_instrument": "Guitar"
    },
    {
      "name": "Dave Lombardo",
      "dob": ISODate("1965-02-16T08:00:00Z"),
      "primary_instrument": "Drums"
    }
  ]
}

Standalone Sampling Schema Logic

In standalone mode where document-to-relational mapping is done via sampling, the BI Connector maps the following table and sub-tables:

mysql> show tables;
+-----------------------------------------+
| Tables_in_sql                           |
+-----------------------------------------+
| bands                                   |
| bands_members                           |
| bands_popular_albums                    |

Table details:

bands:

mysql> describe bands;
+----------------+----------------+------+------+---------+-------+
| Field          | Type           | Null | Key  | Default | Extra |
+----------------+----------------+------+------+---------+-------+
| _id            | varchar(24)    | YES  | PRI  | NULL    |       |
| band           | varchar(65535) | YES  |      | NULL    |       |
| formation.city | varchar(65535) | YES  |      | NULL    |       |
| formation.year | datetime       | YES  |      | NULL    |       |
+----------------+----------------+------+------+---------+-------+
4 rows in set (0.08 sec)

NOTE: Given that the “formation” field contains a single object in a 1:1 relationship with top-level document fields, the embedded object’s fields are mapped to the root table.

bands_members:

mysql> describe bands_members;
+----------------------------+----------------+------+------+---------+-------+
| Field                      | Type           | Null | Key  | Default | Extra |
+----------------------------+----------------+------+------+---------+-------+
| _id                        | varchar(24)    | YES  | PRI  | NULL    |       |
| members.dob                | datetime       | YES  |      | NULL    |       |
| members.name               | varchar(65535) | YES  |      | NULL    |       |
| members.primary_instrument | varchar(65535) | YES  |      | NULL    |       |
| members_idx                | bigint(20)     | YES  | PRI  | NULL    |       |
+----------------------------+----------------+------+------+---------+-------+
5 rows in set (0.05 sec)

bands_popular_albums:

mysql> describe bands_popular_albums;
+--------------------+----------------+------+------+---------+-------+
| Field              | Type           | Null | Key  | Default | Extra |
+--------------------+----------------+------+------+---------+-------+
| _id                | varchar(24)    | YES  | PRI  | NULL    |       |
| popular_albums     | varchar(65535) | YES  |      | NULL    |       |
| popular_albums_idx | bigint(20)     | YES  | PRI  | NULL    |       |
+--------------------+----------------+------+------+---------+-------+
3 rows in set (0.06 sec)

We can join these tables using the _id field. For example:

mysql> SELECT * FROM bands
    -> JOIN bands_popular_albums ON bands._id = bands_popular_albums._id;

Customize Document-to-Relational Mappings

But what if our SQL clients do not require access to all fields within the underlying document? Or similarly, what if we want to provide clients with a single table view, rather than a root table with its two sub-tables?

We can customize our document-to-relational schema mapping by using the BI Connector in “custom mode.” When using custom mode, the BI Connector does no sampling of underlying documents to create mappings. Rather, the BI Connector relies on explicit mappings provided by the BI Connector administrator.

Document to Relational Definition Language (DRDL)

These custom mappings are managed through the Document to Relational Definition Language (DRDL).

As an example, let’s say that we want to present a simpler, single-table relational schema for our client applications to work with. Rather than presenting a root table and its sub-tables, we want to present a single table that contains a subset of fields from the underlying document structure.

Because our document structure contains an array of popular albums for each band, to present this data in a single table there will be a unique row for each element in the array.

This is our desired end state:

  • Single table - bands
    • band name
    • band formation city
    • album

Generate a DRDL File Mapping

To achieve this mapping we need to begin by creating a DRDL mapping file for the bands collection.

mongodrdl -d sql -c bands > bands.drdl

The bands.drdl file contains the mapping of a root table and two sub-tables.

schema:
- db: sql
  tables:
  - table: bands
    collection: bands
    pipeline: []
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: objectid
    - Name: band
      MongoType: string
      SqlName: band
      SqlType: varchar
    - Name: formation.city
      MongoType: string
      SqlName: formation.city
      SqlType: varchar
    - Name: formation.year
      MongoType: date
      SqlName: formation.year
      SqlType: timestamp
  - table: bands_members
    collection: bands
    pipeline:
    - $unwind:
        includeArrayIndex: members_idx
        path: $members
        preserveNullAndEmptyArrays: false
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: objectid
    - Name: members.dob
      MongoType: date
      SqlName: members.dob
      SqlType: timestamp
    - Name: members.name
      MongoType: string
      SqlName: members.name
      SqlType: varchar
    - Name: members.primary_instrument
      MongoType: string
      SqlName: members.primary_instrument
      SqlType: varchar
    - Name: members_idx
      MongoType: int
      SqlName: members_idx
      SqlType: int
  - table: bands_popular_albums
    collection: bands
    pipeline:
    - $unwind:
        includeArrayIndex: popular_albums_idx
        path: $popular_albums
        preserveNullAndEmptyArrays: false
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: objectid
    - Name: popular_albums
      MongoType: string
      SqlName: popular_albums
      SqlType: varchar
    - Name: popular_albums_idx
      MongoType: int
      SqlName: popular_albums_idx
      SqlType: int

In order to simplify this mapping we need to first create a flat document shape using a simple aggregation pipeline and then map the resulting flat documents to a single-table schema.

Document Structure and Redacting Fields using Aggregation

In this case, we are only interested in 4 document fields. Namely, band, formation city, formation year, and popular albums. We also want to present these fields as columns in a single table. So, to create a completely flat document structure, we can use the following aggregation pipeline:

[{$unwind: {
  path: "$popular_albums" }},
   {$project: {
     band: "$band",
     formed: "$formation.year",
     city: "$formation.city",
     popular_album: "$popular_albums"
}}]

To apply this pipeline to a DRDL mapping, we simply add it to the pipeline section of the DRDL file:

schema:
- db: sql
  tables:
  - table: bands
    collection: bands
    pipeline: [{$unwind: {path: "$popular_albums"}},
               {$project: {
                  band: "$band",
                  formed: "$formation.year",
                  city: "$formation.city",
                  popular_album: "$popular_albums"
                }}]

Modifying Schema

Now that our documents are in the desired shape, we map the resulting fields to SQL columns in a single table. Also, we have adjusted column names for clarity/simplicity by editing the value of the SqlName.

schema:
- db: sql
  tables:
  - table: bands
    collection: bands
    pipeline: [{$unwind: {
  path: "$popular_albums"
}}, {$project: {
  band: "$band",
  formed: "$formation.year",
  city: "$formation.city",
  popular_album: "$popular_albums"
}}]
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: objectid
    - Name: band
      MongoType: string
      SqlName: band
      SqlType: varchar
    - Name: city
      MongoType: string
      SqlName: city
      SqlType: timestamp
    - Name: popular_album
      MongoType: string
      SqlName: album
      SqlType: varchar
    - Name: formed
      MongoType: date
      SqlName: formed
      SqlType: timestamp

Using the DRDL file

Now, to use this DRDL file for real, until recently you would have had to name it on the command line like this

mongosqld --schema bands.drdl

Of course, you'd have to manage the versions of the drdl file by hand, ensure that you started mongosqld with the correct schema basically house sit your definitions. There has to be an easier way and that's what we'll look at in part 2, BI Connector: Using Customizations as we introduce a new power feature in BI Connector.

Conclusion

MongoDB’s BI Connector is the best way to generate dashboards using SQL-based BI tools for MongoDB without having to perform any ETL operations. If you need more help, please check our docs. You can follow these steps to install BI Connector on-premises. If you have your MongoDB instance running on Atlas, you can connect via the BI Connector for Atlas.