Document Relational Definition Language
Document Relational Definition Language (DRDL
) defines a
relational view of a MongoDB schema.
mongodrdl
samples documents from your MongoDB
collections and derives a DRDL
file from those documents.
mongosqld
then
uses the schema defined in DRDL
files to allow MySQL clients to
query your MongoDB data.
File Format
The DRDL
file lists your databases, tables, and columns in
YAML format.
schema: - db: <database name> tables: - table: <SQL table name> collection: <MongoDB collection name> pipeline: - <optional pipeline elements> columns: - Name: <MongoDB field name> MongoType: <MongoDB field type> SqlName: <mapped SQL column name> SqlType: <mapped SQL column type>
Example
Given documents of the following shape in the collection abc
in the
database test
:
{ "_id": ObjectId(), "close": 7.45, "detail": { "a": 2, "b": 3 } }
Run mongodrdl
to generate a schema based on this collection:
mongodrdl -d test -c abc -o schema.drdl
The generated schema file (schema.drdl
) looks similar to the following:
schema: - db: test tables: - table: abc collection: abc pipeline: [] columns: - Name: _id MongoType: bson.ObjectId SqlName: _id SqlType: varchar - Name: close MongoType: float64 SqlName: close SqlType: numeric - Name: detail.a MongoType: float64 SqlName: detail.a SqlType: numeric - Name: detail.b MongoType: float64 SqlName: detail.b SqlType: numeric
Field Types
BI Connector maps fields that always contain the same data type into the relational model. Schema generation deals specially with the following cases:
Numeric | BI Connector uses the most precise numeric type that matches the sampled documents. If a field in a collection always has the same data type, BI Connector uses that type. If a field in a collection can contain either floating point
values or integers, BI Connector uses type |
Dates | |
Timestamps | BI Connector ignores any field of type data_timestamp . |
ObjectID | BI Connector treats any field of type data_oid as
the SQL type varchar . |
UUID | BI Connector treats any field of type UUID as the SQL type
varchar . |
Geospatial | If a collection contains a geospatial index, BI Connector maps the indexed field into an array of numeric longitude-latitude coordinates. See Geospatial Data for an example. BI Connector does not recognize geospatial fields when reading from a view. |
Heterogeneous Fields | If a field contains inconsistent types, BI Connector chooses the most frequently sampled type. If a field can contain either a type or an array of that type, the generated schema always specifies that the field contains an array. To learn more, see How do I skip data incompatible with my DRDL type definition?. |
Embedded Documents
BI Connector maps embedded documents to simple fields
that have a .
separator character, making them appear similar
to the way you would reference them using dot notation in a MongoDB query.
While Tableau properly quotes
identifiers, within ad-hoc SQL expressions you must double-quote
every identifier that contains .
characters or mixed-case
characters.
Example
Consider the following document:
{ "_id": 1, "familyName": "Partridge", "hometown" : "Hollywood Hills", "address" : { "street": "123 Main Street", "city" : "Hollywood", "state" : "CA", "zip" : "90210" }, "members_since" : ISODate("2002-04-12T00:00:00Z") }
Running mongodrdl
on a collection containing this document
results in the following fields in the generated schema:
_id | numeric |
familyName | varchar |
hometown | varchar |
address.street | varchar |
address.city | varchar |
address.state | varchar |
address.zip | varchar |
members_since | timestamp |
Arrays
BI Connector exposes arrays to business intelligence tools using two collections: one without the array, and the other having one document per array element.
Example
If you run mongodrdl
on a collection named families
which contains the following document:
{ "_id": 1, "familyName": "Partridge", "hometown" : "Hollywood Hills", "familyMembers" : [ { "firstname" : "Shirley", "age" : 42, "attributes" : [ { "name" : "instrument", "value" : "singer" }, { "name" : "role", "value" : "mom" } ] }, { "firstname" : "Keith", "age" : 18, "attributes" : [ { "name" : "instrument", "value" : "guitar" }, { "name" : "role", "value" : "son" } ] }, { "firstname" : "Laurie", "age" : 16, "attributes" : [ { "name" : "instrument", "value" : "keyboard" }, { "name" : "role", "value" : "sister" } ] }] }
This results in the following three tables:
families
_id
numeric
familyName
varchar
hometown
varchar
families_familyMembers
_id
numeric
familyMembers.age
numeric
familyMembers.firstname
varchar
familyMembers_idx
numeric
families_familyMembers_attributes
_id
numeric
familyMembers.attributes.name
varchar
familyMembers.attributes.value
varchar
familyMembers.attributes_idx
numeric
familyMembers_idx
numeric
You can join these tables together to view the data in a denormalized format. For example, you can list the people named in the above schema together with their family information using the following query:
SELECT f.*, m.`familyMembers.firstname` FROM families_familyMembers m JOIN families f ON m._id = f._id;
Pre-Joining
If you provide the --preJoined
option to mongodrdl
,
BI Connector adds the fields in the containing document to each
array element's document, thus "pre-joining" the table.
In the previous example, the tables would contain the following additional columns:
families_familyMembers
familyName
varchar
hometown
varchar
families_familyMembers_attributes
familyMembers.age
numeric
familyMembers.firstname
varchar
familyMembers_idx
numeric
familyName
varchar
hometown
varchar
Custom Filters
You can add a column of type mongo.Filter
to a collection in your
DRDL
file. This column type allows you to perform a custom
$match query.
For example, given the following schema describing a cloud of points with up to three components:
schema: - db: test tables: - table: points collection: points pipeline: [] columns: - Name: _id MongoType: bson.ObjectId SqlName: _id SqlType: varchar - Name: x MongoType: float64 SqlName: x SqlType: numeric - Name: "y" MongoType: float64 SqlName: "y" SqlType: numeric - Name: z MongoType: float64 SqlName: z SqlType: numeric - Name: filter MongoType: mongo.Filter SqlName: filter SqlType: varchar
You can select only three-dimensional points using the following query:
SELECT x, y, z FROM points WHERE filter='{"z": {"$exists": true}}';
Aggregation Pipelines
Aggregation Pipelines using Views
MongoDB 3.4 introduces Read-Only Views that you can use to filter incompatible data.
For example, you can create a view in the test
database that contains
only documents containing a number in the grade
field of a grades
collection:
db.runCommand( { create: "numericGrades", viewOn: "grades", pipeline: [ { "$match": { "grade": { "$type": "number" } } } ] } )
You can then use mongodrdl
to generate a schema from this view
as you would a collection:
mongodrdl -d test -c numericGrades
Aggregation Pipelines in DRDL
BI Connector can use aggregation pipelines as part of the schema to transform documents from the collection into the proper form for the relational tables.
For example, consider a simple document in a collection named
simpleFamilies
:
{ "_id": 1, "familyName": "Partridge", "familyMembers" : [ "Shirley", "Keith", "Laurie"] }
mongodrdl
generates a schema with the tables
simpleFamilies
and simpleFamilies_familyMembers
.
The table simpleFamilies_familyMembers
enumerates each family
member and has the following pipeline:
pipeline: - $unwind: includeArrayIndex: familyMembers_idx path: $familyMembers
This pipeline uses $unwind
to create a new record for each
member of familyMembers
. The schema tracks the array index in
the field familyMembers_idx
.
Geospatial Data
If a collection contains a 2d
or 2dsphere
geospatial index,
BI Connector maps the indexed field into an array of numeric
longitude-latitude coordinates.
Example
Given the following collection:
db.points.createIndex( { pos : "2dsphere" } ) db.points.insertOne({ pos : { type: "Point", coordinates: [ -73.97, 40.77 ] }, name: "Central Park", category : "Parks" })
BI Connector generates the following schema:
schema: - db: test tables: - table: points collection: points pipeline: [] columns: - Name: _id MongoType: bson.ObjectId SqlName: _id SqlType: varchar - Name: category MongoType: string SqlName: category SqlType: varchar - Name: name MongoType: string SqlName: name SqlType: varchar - Name: pos.coordinates MongoType: geo.2darray SqlName: pos.coordinates SqlType: numeric[]
BI Connector does not recognize geospatial fields when reading from a view.