Map Relational Schemas to MongoDB¶
On this page
Overview¶
Business intelligence tools connect to a data source and, given a fixed tabular schema, allow users to visually explore their data. MongoDB uses a flexible schema, so some business intelligence tools cannot use MongoDB as a native data source.
To use MongoDB with a relational business intelligence tool such as Tableau, you must define a relational mapping for your MongoDB data. MongoDB Connector for BI provides several methods for creating and managing a relational schema.
BI Connector’s proxy server, mongosqld
, has startup options
which determine how it handles schema management. These are covered in
detail in the mongosqld
usage documentation.
Cached Sampling¶
New in version 2.3:
When mongosqld
starts up, it can sample documents on
a MongoDB instance to generate a relational representation of the
schema, which it caches in memory. This is the default mode, which
mongosqld
uses unless you specify a schema file with
the --schema
option or a schema
database with the
--sampleSource
option.
Note
If you have authentication
enabled,
ensure that your MongoDB user has the required
permissions for cached sampling.
You can configure mongosqld
to automatically resample on a
fixed schedule with the --sampleRefreshIntervalSecs
option. If you do not specify a resampling interval,
mongosqld
uses its initial schema for as long as the
process runs.
If you need to manually edit your schema to ensure correct representation of your data, you can use a schema file instead.
See Sampling Mode Reference Chart for more information on sampling modes.
User Permissions for Cached Sampling¶
If your MongoDB instance uses authentication and you wish to use cached sampling, your
BI Connector instance must also use authentication. The admin user that
connects to MongoDB via the mongosqld
program must
have the following privileges:
listDatabases
on the clusterlistCollections
on each database in the clusterfind
on each database you wish to sample
Alternatively, create a user with the built-in readAnyDatabase role to grant all of the required privileges:
Note
Be aware of all privileges included with the readAnyDatabase role before granting it to a user.
To sample all namespaces, start mongosqld
without the
--sampleNamespaces
option.
See also
Persist a Schema in MongoDB¶
The --sampleSource
option directs mongosqld
to
use a particular database for schema storage. mongosqld
can either create a new database to use for schema storage or use
an existing database.
If you specify an existing database which has been used previously
for schema storage, the --sampleMode
option determines
whether mongosqld
writes new schema data to the specified
database or only reads from it. If you specify a database which
doesn’t currently exist, you must set --sampleMode
to
write
.
The --sampleRefreshIntervalSecs
option determines how frequently mongosqld
resamples data
to update the schema.
To force a one-time update of the schema, use the FLUSH SAMPLE command from your SQL client.
See Sampling Mode Reference Chart for more information on sampling modes.
User Permissions for Persisted Schemas¶
When using a persisted schema, the required user permissions are
similar to those for cached sampling, with
the addition of a required write
permission on the specified
schema database. To create a user with the necessary permissions to
run mongosqld
with a persisted schema, execute the
following command in the mongo
shell, with the
placeholder values <username>
, <password>
, and <dbname>
filled in with your own values:
Resample Schema Data with FLUSH SAMPLE
¶
When the mongosqld
process starts it creates a
schema, either from a
schema file or by sampling data from a
MongoDB instance. If the data in your MongoDB instance changes shape
significantly with new fields or collections, you may wish to
regenerate the schema BI Connector uses. You can regenerate
the schema either by restarting mongosqld
or by issuing
the FLUSH SAMPLE
command from within the
MySQL shell.
The --sampleRefreshIntervalSecs
option provides another way
for mongosqld
to regenerate the schema by resampling data
at a user-specified interval.
Note
You cannot use the FLUSH SAMPLE
command if mongosqld
starts with a schema file
via the --schema
option.
Managing Schema with MongoDB Views¶
Another way to control the data visible to BI Connector without using a .drdl file is to read from a view. Views allow you to take a source collection and choose which fields to display, change the names of fields, sort data, and perform any kind of transformation that the aggregation pipeline can accomplish.
You can use the mongo
shell method createView
to define a view on a collection and then use that view as a
specified namespace when starting mongosqld
.
Examples¶
Consider a collection named salaries
in the
employees
database. By creating a view on the salaries
collection that excludes all personally identifiable information,
BI Connector can safely facilitate data analysis on the view without
compromising the private data in the source collection.
The following command starts mongosqld
with data from a view named salaries_redacted
in the
employees
database:
See also
Another use case for managing your schema with a MongoDB view is if your working data set includes a sparsely populated but important field which may be excluded in the BI Connector sampling process.
Consider a collection named sparseCollection
which contains one
field which occurs in every document and another field which occurs
only occasionally. A portion of the collection might look like this:
To ensure that sparseField
is included when BI Connector creates
its schema, create a view in which sparseField
is populated with
zeroes in every document in which it does not occur.
To use the view as a data source when starting
mongosqld
, specify it with the
--sampleNamespaces
option.
Note
BI Connector does not recognize geospatial fields when reading from a view.
Load a Schema from a DRDL File¶
The --schema
option allows you
to specify a schema file for mongosqld
to use.
The schema generation tool mongodrdl
outputs a schema file
in Document Relational Definition Language (DRDL)
format, but you should verify that the schema it proposes matches your
data requirements. Examine both the table structure and field types
to ensure that you can generate your intended reports.
It is possible to manually edit the schema definition files to perform the following actions:
- Add aggregation pipeline stages
- Add fields that
mongodrdl
did not discover within the subset of documents that it sampled - Remove fields
- Remove tables
- Rename tables, so long as the
collection
field remains the same - Rename fields using the
sqlname
option
Important
To apply any changes you make to your DRDL files, you must restart
mongosqld
.
Geospatial Data¶
If a collection contains a 2d
or 2dsphere
geospatial index,
BI Connector maps the indexed field into an array of numeric
longitude-latitude coordinates.