GIANT Stories at MongoDB

MongoDB 3.6: Here to SRV you with easier replica set connections

If you have logged into MongoDB Atlas recently – and you should, the entry-level tier is free! – you may have noticed a strange new syntax on 3.6 connection strings.

MongoDB Seed Lists

What is this mongodb+srv syntax?

Well, in MongoDB 3.6 we introduced the concept of a seed list that is specified using DNS records, specifically SRV and TXT records. You will recall from using replica sets with MongoDB that the client must specify at least one replica set member (and may specify several of them) when connecting. This allows a client to connect to a replica set even if one of the nodes that the client specifies is unavailable.

You can see an example of this URL on a 3.4 cluster connection string:

Note that without the SRV record configuration we must list several nodes (in the case of Atlas we always include all the cluster members, though this is not required). We also have to specify the ssl and replicaSet options.

With the 3.4 or earlier driver, we have to specify all the options on the command line using the MongoDB URI syntax.

The use of SRV records eliminates the requirement for every client to pass in a complete set of state information for the cluster. Instead, a single SRV record identifies all the nodes associated with the cluster (and their port numbers) and an associated TXT record defines the options for the URI.

Reading SRV and TXT Records

We can see how this works in practice on a MongoDB Atlas cluster with a simple Python script.

import srvlookup #pip install srvlookup
import sys 
import dns.resolver #pip install dnspython

host = None

if len(sys.argv) > 1 :
   host = sys.argv[1]

if host :
   services = srvlookup.lookup("mongodb", domain=host)
   for i in services:
       print("%s:%i" % (i.hostname, i.port))
   for txtrecord in dns.resolver.query(host, 'TXT'):
       print("%s: %s" % ( host, txtrecord))
else:
   print("No host specified")

We can run this script using the node specified in the 3.6 connection string as a parameter.

$ python mongodb_srv_records.py freeclusterjd-ffp4c.mongodb.net
freeclusterjd-shard-00-00-ffp4c.mongodb.net:27017
freeclusterjd-shard-00-01-ffp4c.mongodb.net:27017
freeclusterjd-shard-00-02-ffp4c.mongodb.net:27017
freeclusterjd-ffp4c.mongodb.net: "authSource=admin&replicaSet=FreeClusterJD-shard-0"
$

You can also do this lookup with nslookup:

JD10Gen-old:~ jdrumgoole$ nslookup
> set type=SRV
> _mongodb._tcp.rs.joedrumgoole.com
Server:        10.65.141.1
Address:    10.65.141.1#53

Non-authoritative answer:
_mongodb._tcp.rs.joedrumgoole.com    service = 0 0 27022 rs1.joedrumgoole.com.
_mongodb._tcp.rs.joedrumgoole.com    service = 0 0 27022 rs2.joedrumgoole.com.
_mongodb._tcp.rs.joedrumgoole.com    service = 0 0 27022 rs3.joedrumgoole.com.

Authoritative answers can be found from:
> set type=TXT
> rs.joedrumgoole.com
Server:        10.65.141.1
Address:    10.65.141.1#53

Non-authoritative answer:
rs.joedrumgoole.com    text = "authSource=admin&replicaSet=srvdemo"

You can see how this could be used to construct a 3.4 style connection string by comparing it with the 3.4 connection string above.

As you can see, the complexity of the cluster and its configuration parameters are stored in the DNS server and hidden from the end user. If a node's IP address or name changes or we want to change the replica set name, this can all now be done completely transparently from the client’s perspective. We can also add and remove nodes from a cluster without impacting clients.

So now whenever you see mongodb+srv you know you are expecting a SRV and TXT record to deliver the client connection string.

Creating SRV and TXT records

Of course, SRV and TXT records are not just for Atlas. You can also create your own SRV and TXT records for your self-hosted MongoDB clusters. All you need for this is edit access to your DNS server so you can add SRV and TXT records. In the examples that follow we are using the AWS Route 53 DNS service.

I have set up a demo replica set on AWS with a three-node setup. They are :

rs1.joedrumgoole.com
rs2.joedrumgoole.com
rs3.joedrumgoole.com

Each has a mongod process running on port 27022. I have set up a security group that allows access to my local laptop and the nodes themselves so they can see each other.

I also set up the DNS names for the above nodes in AWS Route 53.

We can start the mongod processes by running the following command on each node.

$ sudo /usr/local/m/versions/3.6.3/bin/mongod --auth --port 27022 --replSet srvdemo --bind_ip 0.0.0.0 --keyFile mdb_keyfile"

Now we need to set up the SRV and TXT records for this cluster.

The SRV record points to the server or servers that will comprise the members of the replica set. The TXT record defines the options for the replica set, specifically the database that will be used for authorization and the name of the replica set. It is important to note that the mongodb+srv format URI implicitly adds “ssl=true”. In our case SSL is not used for the demo so we have to append “&ssl=false” to the client connector. Note that the SRV record is specifically designed to look up the mongodb service referenced at the start of the URL.

The settings in AWS Route 53 are:

Which leads to the following entry in the zone file for Route 53.

Now we can add the TXT record. By convention, we use the same name as the SRV record (rs.joedrumgoole.com) so that MongoDB knows where to find the TXT record.

We can do this on AWS Route 53 as follows:

This will create the following TXT record.

Now we can access this service as :

mongodb+srv://rs.joedrumgoole.com/test

This will retrieve a complete URL and connection string which can then be used to contact the service.

The whole process is outlined below:

Once your records are set up, you can easily change port numbers without impacting clients and also add and remove cluster members.

SRV records are another way in which MongoDB is making life easier for database developers everywhere.

You should also check out full documentation on SRV and TXT records in MongoDB 3.6.

---

You can sign up for a free MongoDB Atlas tier which is suitable for single user use.

Find out how to use your favorite programming language with MongoDB via our MongoDB drivers.

Please visit MongoDB University for free online training in all aspects of MongoDB.

Follow Joe Drumgoole on twitter for more news about MongoDB.


Meet the team that builds MongoDB in-person at MongoDB World.

An Introduction to Change Streams

There is tremendous pressure for applications to immediately react to changes as they occur. As a new feature in MongoDB 3.6, change streams enable applications to stream real-time data changes by leveraging MongoDB’s underlying replication capabilities. Think powering trading applications that need to be updated in real time as stock prices change. Or creating an IoT data pipeline that generates alarms whenever a connected vehicle moves outside of a geo-fenced area. Or updating dashboards, analytics systems, and search engines as operational data changes. The list, and the possibilities, go on, as change streams give MongoDB users easy access to real-time data changes without the complexity or risk of tailing the oplog (operation log). Any application can readily subscribe to changes and immediately react by making decisions that help the business to respond to events in real time.

Change streams can notify your application of all writes to documents (including deletes) and provide access to all available information as changes occur, without polling that can introduce delays, incur higher overhead (due to the database being regularly checked even if nothing has changed), and lead to missed opportunities.

Characteristics of change streams

  1. Targeted changes
    Changes can be filtered to provide relevant and targeted changes to listening applications. As an example, filters can be on operation type or fields within the document.
  2. Resumablility
    Resumability was top of mind when building change streams to ensure that applications can see every change in a collection. Each change stream response includes a resume token. In cases where the connection between the application and the database is temporarily lost, the application can send the last resume token it received and change streams will pick up right where the application left off. In cases of transient network errors or elections, the driver will automatically make an attempt to reestablish a connection using its cached copy of the most recent resume token. However, to resume after application failure, the applications needs to persist the resume token, as drivers do not maintain state over application restarts.
  3. Total ordering
    MongoDB 3.6 has a global logical clock that enables the server to order all changes across a sharded cluster. Applications will always receive changes in the order they were applied to the database.
  4. Durability
    Change streams only include majority-committed changes. This means that every change seen by listening applications is durable in failure scenarios such as a new primary being elected.
  5. Security
    Change streams are secure – users are only able to create change streams on collections to which they have been granted read access.
  6. Ease of use
    Change streams are familiar – the API syntax takes advantage of the established MongoDB drivers and query language, and are independent of the underlying oplog format.
  7. Idempotence
    All changes are transformed into a format that’s safe to apply multiple times. Listening applications can use a resume token from any prior change stream event, not just the most recent one, because reapplying operations is safe and will reach the same consistent state.

An example

Let’s imagine that we run a small grocery store. We want to build an application that notifies us every time we run out of stock for an item. We want to listen for changes on our stock collection and reorder once the quantity of an item gets too low.

{    _id: 123UAWERXHZK4GYH
    product: pineapple
    quantity: 3
}

Setting up the cluster

As a distributed database, replication is a core feature of MongoDB, mirroring changes from the primary replica set member to secondary members, enabling applications to maintain availability in the event of failures or scheduled maintenance. Replication relies on the oplog (operation log). The oplog is a capped collection that records all of the most recent writes, it is used by secondary members to apply changes to their own local copy of the database. In MongoDB 3.6, change streams enable listening applications to easily leverage the same internal, efficient replication infrastructure for real-time processing.

To use change streams, we must first create a replica set. Download MongoDB 3.6 and after installing it, run the following commands to set up a simple, single-node replica set (for testing purposes).

mkdir -pv data/db 
mongod --dbpath ./data/db --replSet "rs"

Then in a separate shell tab, run: mongo

After the rs:PRIMARY> prompt appears, run: rs.initiate()

If you have any issues, check out our documentation on creating a replica set.

Seeing it in action

Now that our replica set is ready, let’s create a few products in a demo database using the following Mongo shell script:


conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

var docToInsert = {
  name: "pineapple",
  quantity: 10
};

function sleepFor(sleepDuration) {
  var now = new Date().getTime();
  while (new Date().getTime() < now + sleepDuration) {
    /* do nothing */
  }
}

function create() {
  sleepFor(1000);
  print("inserting doc...");
  docToInsert.quantity = 10 + Math.floor(Math.random() * 10);
  res = collection.insert(docToInsert);
  print(res)
}

while (true) {
  create();
}

Copy the code above into a createProducts.js text file and run it in a Terminal window with the following command: mongo createProducts.js.

Creating a change stream application

Now that we have documents being constantly added to our MongoDB database, we can create a change stream that monitors and handles changes occurring in our stock collection:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

const changeStreamCursor = collection.watch();

pollStream(changeStreamCursor);

//this function polls a change stream and prints out each change as it comes in
function pollStream(cursor) {
  while (!cursor.isExhausted()) {
    if (cursor.hasNext()) {
      change = cursor.next();
      print(JSON.stringify(change));
    }
  }
  pollStream(cursor);
}

By using the parameterless watch() method, this change stream will signal every write to the stock collection. In the simple example above, we’re logging the change stream's data to the console. In a real-life scenario, your listening application would do something more useful (such as replicating the data into a downstream system, sending an email notification, reordering stock...). Try inserting a document through the mongo shell and see the changes logged in the Mongo Shell.

Creating a targeted change stream

Remember that our original goal wasn’t to get notified of every single update in the stock collection, just when the inventory of each item in the stock collection falls below a certain threshold. To achieve this, we can create a more targeted change stream for updates that set the quantity of an item to a value no higher than 10. By default, update notifications in change streams only include the modified and deleted fields (i.e. the document “deltas”), but we can use the optional parameter fullDocument: "updateLookup" to include the complete document within the change stream, not just the deltas.

const changeStream = collection.watch(
  [{
    $match: {
      $and: [
        { "updateDescription.updatedFields.quantity": { $lte: 10 } },
        { operationType: "update" }
      ]
    }
  }],
  {
    fullDocument: "updateLookup"
  }
);

Note that the fullDocument property above reflects the state of the document at the time lookup was performed, not the state of the document at the exact time the update was applied. Meaning, other changes may also be reflected in the fullDocument field. Since this use case only deals with updates, it was preferable to build match filters using updateDescription.updatedFields, instead of fullDocument.

The full Mongo shell script of our filtered change stream is available below:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

let updateOps = {
  $match: {
    $and: [
      { "updateDescription.updatedFields.quantity": { $lte: 10 } },
      { operationType: "update" }
    ]
  }
};

const changeStreamCursor = collection.watch([updateOps]);

pollStream(changeStreamCursor);

//this function polls a change stream and prints out each change as it comes in
function pollStream(cursor) {
  while (!cursor.isExhausted()) {
    if (cursor.hasNext()) {
      change = cursor.next();
      print(JSON.stringify(change));
    }
  }
  pollStream(cursor);
}

In order to test our change stream above, let’s run the following script to set the quantity of all our current products to values less than 10:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;
let updatedQuantity = 1;

function sleepFor(sleepDuration) {
  var now = new Date().getTime();
  while (new Date().getTime() < now + sleepDuration) {
    /* do nothing */
  }
}

function update() {
  sleepFor(1000);
  res = collection.update({quantity:{$gt:10}}, {$inc: {quantity: -Math.floor(Math.random() * 10)}}, {multi: true});
  print(res)
  updatedQuantity = res.nMatched + res.nModified;
}

while (updatedQuantity > 0) {
  update();
}

You should now see the change stream window display the update shortly after the script above updates our products in the stock collection.

Resuming a change stream

In most cases, drivers have retry logic to handle loss of connections to the MongoDB cluster (such as , timeouts, or transient network errors, or elections). In cases where our application fails and wants to resume, we can use the optional parameter resumeAfter : <resumeToken>, as shown below:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

const changeStreamCursor = collection.watch();
resumeStream(changeStreamCursor, true);

function resumeStream(changeStreamCursor, forceResume = false) {
  let resumeToken;
  while (!changeStreamCursor.isExhausted()) {
    if (changeStreamCursor.hasNext()) {
      change = changeStreamCursor.next();
      print(JSON.stringify(change));
      resumeToken = change._id;
      if (forceResume === true) {
        print("\r\nSimulating app failure for 10 seconds...");
        sleepFor(10000);
        changeStreamCursor.close();
        const newChangeStreamCursor = collection.watch([], {
          resumeAfter: resumeToken
        });
        print("\r\nResuming change stream with token " + JSON.stringify(resumeToken) + "\r\n");
        resumeStream(newChangeStreamCursor);
      }
    }
  }
  resumeStream(changeStreamCursor, forceResume);
}

With this resumability feature, MongoDB change streams provide at-least-once semantics. It is therefore up to the listening application to make sure that it has not already processed the change stream events. This is especially important in cases where the application’s actions are not idempotent (for instance, if each event triggers a wire transfer).

All the of shell scripts examples above are available in the following GitHub repository. You can also find similar Node.js code samples here, where a more realistic technique is used to persist the last change stream token before it is processed.

Next steps

I hope that this introduction gets you excited about the power of change streams in MongoDB 3.6.

If you want to know more:

If you have any question, feel free to file a ticket at https://jira.mongodb.org or connect with us through one of the social channels we use to interact with the developer community.

About the authors – Aly Cabral and Raphael Londner

Aly Cabral is a Product Manager at MongoDB. With a focus on Distributed Systems (i.e. Replication and Sharding), when she hears the word election she doesn’t think about politics. You can follow her or ask any questions on Twitter at @aly_cabral

Raphael Londner is a Principal Developer Advocate at MongoDB. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

What’s New in MongoDB 3.6. Part 4 – Avoid Lock-In, Run Anywhere

Mat Keep
December 27, 2017
MongoDB 3.6

Welcome to part 4 of our MongoDB 3.6 blog series.

  • In part 1 we took a look at the new capabilities designed specifically to help developers build apps faster, including change streams, retryable writes, developer tools, and fully expressive array manipulation
  • In part 2, we dived into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression
  • In part 3 we covered what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including Cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to What's New in MongoDB 3.6.

Run Anywhere

Many organizations are turning to the cloud to accelerate the speed of application development, deployment, and data discovery. Replatforming to the cloud gives them the ability to enable self-service IT, to elastically scale resources on demand, and to align costs to actual consumption. But they are also concerned about exposing the business to deeper levels of lock-in – this time from the APIs and services of the cloud providers themselves.

Increasingly, users are demanding the freedom to run anywhere: private clouds in their own data center, in the public cloud, or in a hybrid model that combines the two. This flexibility is not available when they build on a cloud-proprietary database from a single vendor. Alternatively, the platform independence provided by MongoDB gives them the ability to respond to business or regulatory changes without incurring the complexity, risk, and time that comes from expensive database migrations whenever they need or want to transition to a new platform.

MongoDB Atlas

As a fully managed database service, MongoDB Atlas is the best way to run MongoDB in the public cloud. 2017 has already seen major evolutions in the Atlas service, with key highlights including:

  • Expansion beyond Amazon Web Services (AWS) to offer Atlas on Google Cloud Platform (GCP) and Microsoft Azure.
  • Achieving SOC2 Type 1 compliance.
  • The launch of managed database clusters on a shared architecture, including the free M0 instances, and the M2s and M5s, which allow customers to jumpstart their projects for a low and predictable price.
  • A live migration facility to move data from an existing MongoDB replica set into an Atlas cluster with minimal application impact.
  • The addition of the Data Explorer and Real Time Performance Panel, now coming to Ops Manager, as discussed above.

MongoDB 3.6 is available as a fully managed service on Atlas, along with important new features to support global applications, and with automated scalability and performance optimizations.

Turnkey Global Distribution of Clusters with Cross-Region Replication

MongoDB Atlas clusters can now span multiple regions offered by a cloud provider. This enables developers to build apps that maintain continuous availability in the event of geographic outages, and improve customer experience by locating data closer to users.

When creating a cluster or modifying its configuration, two options are now available:

  • Teams can now deploy a single MongoDB database across multiple regions supported by a cloud provider for improved availability guarantees. Reads and writes will default to a “preferred region” assuming that there are no active failure or failover conditions. The nearest read preference, discussed below, can be used to route queries to local replicas in a globally distributed cluster. Replica set members in additional regions will participate in the automated election and failover process if the primary member is affected by a local outage, and can become a primary in the unlikely event that the preferred region is offline.

  • Read-only replica set members can be deployed in multiple regions, allowing teams to optimize their deployments to achieve reduced read latency for a global audience. Read preference – providing a mechanism to control how MongoDB routes read operations across members of a replica set – can be configured using the drivers. For example, the nearest read preference routes queries to replicas with the lowest network latency from the client, thus providing session locality by minimizing the effects of geographic latency. As the name suggests, read-only replica set members will not participate in the automated election and failover process, and can never be become a primary.

Teams can activate both of the options outlined above in a single database to provide continuous availability and an optimal experience for their users.

Figure 1: Globally distributed MongoDB Atlas cluster, providing resilience to regional outages and lower latency experiences for global apps

Auto-Scaling Storage and Performance Optimization

MongoDB Atlas now supports automatic scaling for the storage associated with a cluster, making it easier for you to manage capacity. Enabled by default, auto-scaling for storage detects when your disks hit 90% utilization and provisions additional storage such that your cluster reaches a disk utilization of 70% on AWS & GCP, or a maximum of 70% utilization on Azure. This automated process occurs without impact to your database or application availability.

In addition to auto-storage scaling, the new Performance Advisor discussed earlier for Ops Manager is also available in MongoDB Atlas, providing you with always-on, data-driven insights into query behavior and index recommendations.

A Cloud Database Platform for Development & Testing

New enhancements to MongoDB Atlas make it the optimal cloud database for spinning up and running test and development environments efficiently.

  • You can now pause your MongoDB Atlas cluster, perfect for use cases where only intermittent access to your data is required, such as development during business hours or temporary testing. While your database instances are stopped, you are charged for provisioned storage and backup storage, but not for instance hours. You can restart your MongoDB Atlas cluster at any time on demand; your cluster configuration will be the same as when you stopped it and public DNS hostnames are retained so no modifications to your connection string are required. MongoDB Atlas clusters can be stopped for up to 7 days. If you do not start your cluster after 7 days, Atlas will automatically start your cluster. Pausing and restarting your MongoDB clusters can be triggered in the MongoDB Atlas UI or via the REST API.
  • Cross-project restores, introduced with Ops Manager 3.6, are also available in MongoDB Atlas, allowing users to restore to different MongoDB Atlas projects than the backup snapshot source.

Next Steps

That wraps up the final part of our what’s new blog series. I hope I’ve helped demonstrate how MongoDB 3.6 helps you move at the speed of your data. It enables new digital initiatives and modernized applications to be delivered to market faster, running reliably and securely at scale, and unlocking insights and intelligence ahead of your competitors.

  • Change streams, retryable writes, causal consistency, greater query and update expressivity, and Compass Community help developers move faster.
  • Ops Manager, schema validation, enhanced security, end to end compression, and user session management help operations teams scale faster.
  • The MongoDB aggregation pipeline, Connector for BI, and the recommended R driver help analysts and data scientists unlock insights faster.

And you have the freedom to run MongoDB anywhere – on-premises, public cloud, and as a service with MongoDB Atlas available on AWS, Azure, and GCP.

If you want to get the detail now on everything the new release offers, download the Guide to What's New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then:

What’s New in MongoDB 3.6. Part 3 – Speed to Insight

Mat Keep
December 20, 2017
MongoDB 3.6

Welcome to part 3 of our MongoDB 3.6 blog series.

  • In part 1 we took a look at the new capabilities designed specifically to help developers build apps faster, including change streams, retryable writes, developer tools, and fully expressive array manipulation.
  • In part 2, we dived into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression.
  • In today’s part 3 we’ll cover what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R.
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including Cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to What's New in MongoDB 3.6.

Speed to Insight

How quickly an organization can unlock and act on insights from data generated by new applications has become a material source of competitive advantage. Collecting data in operational systems and then relying on batch ETL (Extract, Transform, Load) processes to update an expensive data warehouse or complex and ungoverned data lake is no longer sufficient. Speed to insight is critical, and so analytics performed against live data to drive operational intelligence is fast becoming a necessity, without having to employ armies of highly skilled and scarce data engineers and scientists.

MongoDB 3.6 delivers a number of new features and capabilities that allow organizations to enable real-time analytics and action.

MongoDB Connector for BI: Faster and Simpler

MongoDB 3.6 brings a number of performance and ease-of-use enhancements to the BI Connector, enabling faster time to insight using SQL-based BI and Analytics platforms.

Faster The connector takes advantage of enhancements to the aggregation pipeline – discussed later on this post – to deliver higher performance, with more operations pushed natively to the database. Prior to MongoDB 3.6, only left outer equijoins could be pushed down to the database – all other JOIN types had to be executed within the BI connector layer, which firstly required all matching data to be extracted from the database. With MongoDB 3.6, support is being extended to non-equijoins and the equivalent of SQL subqueries. These enhancements will reduce the amount of data that needs to be moved and computed in the BI layer, providing faster time to insight.

In addition, performance metrics are now observable via the Show Status function, enabling deeper performance insights and optimizations.

Simpler To support easier configuration, the mongosqld process now samples and maps the MongoDB schema, caching the results internally and eliminating the need to install the separate mongodrdl component. Additionally, users can simplify lifecycle management by configuring, deploying, and monitoring the BI connector directly from Ops Manager.

To simplify the enforcement of access controls, BI Connector users can now be authenticated directly against MongoDB using new client-side plugins, eliminating the need to manage TLS certificates. Review the documentation for the C and JDBC authentication plugins to learn more. Authentication via Kerberos is also now supported.

Richer Aggregation Pipeline

Developers and data scientists rely on the MongoDB aggregation pipeline for its power and flexibility in enabling sophisticated data processing and manipulation demanded by real-time analytics and data transformations. Enhancements in the aggregation pipeline unlock new use cases.

A more powerful $lookup operator extends MongoDB’s JOIN capability to support the equivalent of SQL subqueries and non-equijoins. As a result, developers and analysts can write more expressive queries combining data from multiple collections, all executed natively in the database for higher performance, and with less application-side code.

In addition to $lookup, the aggregation pipeline offers additional enhancements:

  • Support for timezone-aware aggregations. Before timezone awareness, reporting that spanned regions and date boundaries was not possible within the aggregation pipeline. Now business analysts can group data for multi-region analysis that takes account of variances in working hours and working days across different geographic regions.
  • New expressions allow richer data transformations within the aggregation pipeline, including the ability to convert objects to arrays of key-value pairs, and arrays of key-value pairs to be converted to objects. The mergeObjects expression is useful for setting missing fields into default values, while the REMOVE variable allows the conditional exclusion of fields from projections based on evaluation criteria. You can learn more about the enhancements from the documentation.

More Expressive Query Language

MongoDB 3.6 exposes the ability to use aggregation expressions within the query language to enable richer queries with less client-side code. This enhancement allows the referencing of other fields in the same document when executing comparison queries, as well as powerful expressions such as multiple JOIN conditions and uncorrelated subqueries. The addition of the new expression operator allows the equivalent of SELECT * FROM T1 WHERE a>b in SQL syntax. Learn more from the $expr documentation.

R Driver for MongoDB

A recommended R driver for MongoDB is now available, enabling developers, data scientists, and statisticians to get the same first-class experience with MongoDB as that offered by the other MongoDB drivers – providing idiomatic, native language access to the database. The driver supports advanced MongoDB functionality, including:

  • Read and write concerns to control data consistency and durability.
  • Enterprise authentication mechanisms, such as LDAP and Kerberos, to enforce security controls against the database.
  • Support for advanced BSON data types such as Decimal 128 to support high precision scientific and financial analysis.

Next Steps

That wraps up the third part of our what’s new blog series. Remember, if you want to get the detail now on everything the new release offers, download the Guide to What’s New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then:

Enabling IP Security for MongoDB 3.6 on Ubuntu

Jay Gordon
December 19, 2017
MongoDB 3.6

MongoDB 3.6 provides developers and DevOps professionals with a secure by default configuration that protects data from external threats by denying unauthorized access on a public network. MongoDB servers will now only listen for connections on the local host unless explicitly configured to listen on another address.

This tutorial will briefly show you how to enable IP addresses beyond localhost to your MongoDB node to ensure your networked servers are able to connect to your database. You will see how easily MongoDB is configured to start up and listen on specific network interfaces.

This tutorial assumes you have:

  • Installed MongoDB 3.6 (this does not handle upgrading from previous versions)
  • Multiple network interfaces on your server (we'll use an AWS EC2 instance)
  • Basic understanding of IP Networks and how to configure a private network for your data (we’ll use an AWS VPC)
  • Understanding that "localhost" refers to IP 127.0.0.1

Getting Started

I have launched an AWS EC2 instance with Ubuntu 16.04 LTS and installed MongoDB as described on the MongoDB downloads page.

I want to enable the private IP range that is part of my VPC to allow us to access our MongoDB database. By doing this, we'll ensure that only our private network and "localhost" are valid network paths to connect to the database. This will help ensure we never have outsiders poking into our database!

I first launch an Ubuntu 16.04 EC2 instance in my public subnet within my VPC. By doing this, I will allow my network interface to allow network connections to the outside world without requiring a NAT Gateway.

Next, I follow the instructions on the MongoDB documentation on how to install MongoDB on Ubuntu. I can verify which ethernet interfaces the process starts on in Linux by running the following command:

ubuntu@ip-172-16-0-211:~$ sudo netstat -plant | egrep mongod
tcp        0      0 127.0.0.1:27017         0.0.0.0:*               LISTEN      2549/mongod

This output means that users are only permitted to access our MongoDB instance on port 27017 via IP 127.0.0.1. If you would like to make this available to other systems on your network, you'll want to bind the local IP associated with the private network. To determine network interface configuration easily, we can just run an ifconfig from the command line:

ubuntu@ip-172-16-0-211:~$ ifconfig
eth0      Link encap:Ethernet  HWaddr 0e:5e:76:83:49:3e
          inet addr:172.16.0.211  Bcast:172.16.0.255  Mask:255.255.255.0
          inet6 addr: fe80::c5e:76ff:fe83:493e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:65521 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7358 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:94354063 (94.3 MB)  TX bytes:611646 (611.6 KB)

We have the IP we want to make part of the list of acceptable network addresses we can listen from. I will open the /etc/mongodb.conf file and edit it to reflect the additional network IP:

The file's changes will be:

# network interfaces
net:
  port: 27017
  bindIp: 127.0.0.1,172.16.0.211

After modifying bindIP under "net" from just 127.0.0.1 to include private IP address 172.16.0.211, we should be able to restart and see it listening from netstat on both now:

ubuntu@ip-172-16-0-211:~$ sudo service mongod stop
ubuntu@ip-172-16-0-211:~$ sudo service mongod start
ubuntu@ip-172-16-0-211:~$ sudo netstat -plnt | egrep mongod
tcp        0      0 172.16.0.211:27017      0.0.0.0:*               LISTEN      2892/mongod
tcp        0      0 127.0.0.1:27017         0.0.0.0:*               LISTEN      2892/mongod

Now our database will be able to accept requests from both the specified IP address as well as localhost:

Shell access via localhost

ubuntu@ip-172-16-0-211:~$ mongo localhost
MongoDB shell version v3.6.0-rc2
connecting to: mongodb://127.0.0.1:27017/localhost

Shell access via private IP

ubuntu@ip-172-16-0-211:~$ mongo 172.16.0.211
MongoDB shell version v3.6.0-rc2
connecting to: mongodb://172.16.0.211:27017/test

Next Steps

The default localhost configuration has tremendous benefits to security as you now must explicitly allow network connections, blocking attackers from untrusted networks. Keeping your MongoDB database safe from remote intrusion is extremely important. Make sure you follow our Security Checklist to configure your MongoDB database cluster with the appropriate security best practices.

Now that you understand how to configure additional IP addresses on your MongoDB 3.6 server, you're able to begin configuring replication. Don't forget backups, monitoring and all the other important parts of your MongoDB clusters' health. If you'd rather spend less time on these tasks and deploy MongoDB clusters with a click or an API call, check out MongoDB Atlas, our fully managed database as a service.

New MongoDB 3.6 Security Features

MongoDB has always made it quick and easy to iterate -- from prototype to production to maturity, keeping pace in modern agile release cycles. Our 3.6 release is a milestone in security, adding two new features to improve the ease and cost of security management, even for the most fast-paced development environments.

Localhost Default

Flexibility has led to widespread adoption of MongoDB by users who appreciate the ease of installation and use. These same users also expect a level of balance between performance and safety that only can be achieved with secure-by-default configurations.

This is why we are happy to now provide our users with a localhost binding set by default. Upon installation MongoDB (3.6 and later) can only be accessed from the local machine on which it has been installed (using Mongo shell, a MongoDB driver, or tools and utilities such as Ops Manager or Compass). When MongoDB is started it has to have networking explicitly enabled and configured.

What happens when you connect your instance to the internet? “If you explicitly turn on [networking], but don’t turn on authentication, we can’t help you at that point. But you have to consciously do that, and we’d hope that people think about it a little” CTO Eliot Horowitz explained to The Next Web.

We see this change as fundamentally raising the bar on safety, eliminating whole classes of threats, while still preserving our popular deployment speed and ease.

IP Whitelisting for Authentication

After enabling whitelisting, a client authenticating against a user account in MongoDB must meet all listed restrictions in any document attached to that user. Clients authenticating against a user account which is a member of a restricted role must meet all listed restrictions in any document attached to that role.

For example here is how a document attached to a database user or role can be set to only allow 192.168.17.6 clients to connect. Beyond that it can be set to only connect to the services listening in 10.10.10.0/24. Using the following syntax an IP must be matched during authentication to login.

authenticationRestrictions: [{
  clientSource: [“192.168.17.6”, “127.0.0.1”]
  serverAddress: [“10.10.10.0/24”, “127.0.0.1”]
}]

We’re always striving to make safe development easier. That’s why we have taken an approach to facilitate safe choices within a flexible product, in order to serve the many development community decisions for deploying MongoDB.

The security changes in 3.6 remove bottlenecks and obviate workarounds, expanding MongoDB use to an even wider variety of applications, and providing security teams the configurations they demand for mission critical situations. MongoDB 3.6 applies the "safe by default" principle so you can confidently move at the speed of your data. To learn more about everything new in MongoDB 3.6, download the What's New guide.


About the Author - Davi Ottenheimer
Davi leads Product Security at MongoDB.

What’s New in MongoDB 3.6. Part 2 – Speed to Scale

Mat Keep
December 14, 2017
MongoDB 3.6

Welcome to part 2 of our MongoDB 3.6 blog series.

  • In part 1 we took a look at the new capabilities designed specifically to help developers build apps faster, including change streams, retryable writes, developer tools, and fully expressive array manipulation
  • In part 2, we’ll dive into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression
  • Part 3 will cover what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including Cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Speed to Scale

Unlike the traditional scale-up systems of the past, distributed systems enable applications to scale further and faster while maintaining continuous availability in the face of outages and maintenance. However, they can impose more complexity on the ops team, potentially slowing down the pace of delivering, scaling, and securing apps in production.

MongoDB 3.6 takes another important step in making it easier for operations teams to deploy and run massively scalable, always-on global applications that benefit from the power of a distributed systems architecture.

Ops Manager

MongoDB Ops Manager is the best way to run MongoDB on your own infrastructure, making operations staff 10x-20x more productive. Advanced management and administration delivered with Ops Manager 3.6 allow operations teams to manage, optimize, and backup distributed MongoDB clusters faster and at higher scale than ever before. Deeper operational visibility allows proactive database management, while streamlined backups reduce the costs and time of data protection.

Simplified Monitoring and Management

It is now easier than ever for administrators to synthesize schema design against real-time database telemetry and receive prescriptive recommendations to optimize database performance and utilization – all from a single pane of glass.

Ops Manager performance telemetry and prescriptive recommendations speeds time to scale

Figure 1: Ops Manager performance telemetry and prescriptive recommendations speeds time to scale

  • The Data Explorer allows operations teams to examine the database’s schema by running queries to review document structure, viewing collection metadata, and inspecting index usage statistics, directly within the Ops Manager UI.
  • The Real Time Performance Panel provides insight from live server telemetry, enabling issues to be immediately identified and diagnosed. The panel displays all operations in flight, network I/O, memory consumption, the hottest collections, and slowest queries. Administrators also have the power to kill long running operations from the UI.
  • The new Performance Advisor, available for both Ops Manager and MongoDB Atlas, continuously highlights slow-running queries and provides intelligent index recommendations to improve performance. Using Ops Manager automation, the administrator can then roll out the recommended indexes automatically, without incurring any application downtime.

Ops Manager Organizations To simplify management of global MongoDB estates, Ops Manager now provides a new Organizations and Projects hierarchy. Previously Projects, formerly called “groups”, were managed as individual entities. Now multiple Projects can be placed under a single organization, allowing operations teams to centrally view and administer all Projects under the organization hierarchy. Projects can be assigned tags, such as a “production” tag, against which global alerting policies can be configured.

Faster, Cheaper and Queryable Backups

Ops Manager continuously maintains backups of your data, so if an application issue, infrastructure failure, or user error compromises your data, the most recent backup is only moments behind, minimizing exposure to data loss. Ops Manager offers point-in-time backups of replica sets, and cluster-wide snapshots of sharded clusters, guaranteeing consistency and no data loss. You can restore to precisely the moment you need, quickly and safely. Ops Manager backups are enhanced with a range of new features:

  • Queryable Backups, first introduced in MongoDB Atlas, allow partial restores of selected data, and the ability to query a backup file in-place, without having to restore it. Now users can query the historical state of the database to track data and schema modifications – a common demand of regulatory reporting. Directly querying backups also enables administrators to identify the best point in time to restore a system by comparing data from multiple snapshots, thereby improving both RTO and RPO. No other non-relational database offers the ability to query backups in place.
  • The Ops Manager 3.6 backup agent has been updated to use a faster and more robust initial sync process. Now, transient network errors will not cause the initial sync to restart from the beginning of the backup process, but rather resume from the point the error occurred. In addition, refactoring of the agent will speed data transfer from MongoDB to the backup repository, with the performance gain dependent on document size and complexity.
  • Reducing backup storage overhead by 1x of your logical production data and further improving speed to recovery, Point-in-Time snapshots will now be created at the destination node for the restore operation, rather than at the backup server, therefore reducing network hops. The restore process now transfers backup snapshots directly to the destination node, and then applies the oplog locally, rather than applying it at the daemon server first and then pushing the complete restore image across the network. Note that this enhancement does not apply to restores via SCP.
  • Extending support for the AWS S3 object store, backups can now be routed to on-premises object stores such as EMC ECS or IBM Cleversafe. MongoDB’s backup integration provides administrators with greater choice in selecting the backup storage architecture that best meets specific organizational requirements for data protection. It enables them to take advantage of cheap, durable, and quickly growing object storage used within the enterprise. By limiting backups to filesystems or S3 only, most other databases fail to match the storage flexibility offered by MongoDB.
  • With cross-project restores, users can now perform restores into a different Ops Manager Project than the backup snapshot source. This allows DevOps teams to easily execute tasks such as creating multiple staging or test environments that match recent production data, while configured with different user access privileges or running in different regions.

Review the Ops Manager documentation to learn more.

Schema Validation

MongoDB 3.6 introduces Schema Validation via syntax derived from the proposed IETF JSON Schema standard. This new schema governance feature extends the capabilities of document validation, originally introduced in MongoDB 3.2.

While MongoDB’s flexible schema is a powerful feature for many users, there are situations where strict guarantees on data structure and content are required. MongoDB’s existing document validation controls can be used to require that any documents inserted or updated follow a set of validation rules, expressed using MongoDB query syntax. While this allows for the definition of required content for each document, it had no mechanism to restrict users from adding documents containing fields beyond those specified in the validation rules. In addition, there is no way for administrators to specify and enforce control over the complete structure of documents, including data nested inside arrays.

Using schema validation, DevOps and DBA teams can now define a prescribed document structure for each collection, which can reject any documents that do not conform to it. With schema validation, MongoDB enforces controls over JSON data that are unmatched by any other database:

  • Complete schema governance. Administrators can define when additional fields are allowed to be added to a document, and specify a schema on array elements including nested arrays.
  • Tunable controls. Administrators have the flexibility to tune schema validation according to use case – for example, if a document fails to comply with the defined structure, it can be either be rejected, or still written to the collection while logging a warning message. Structure can be imposed on just a subset of fields – for example requiring a valid customer a name and address, while others fields can be freeform, such as social media handle and cellphone number. And of course, validation can be turned off entirely, allowing complete schema flexibility, which is especially useful during the development phase of the application.
  • Queryable. The schema definition can be used by any query to inspect document structure and content. For example, DBAs can identify all documents that do not conform to a prescribed schema.

With schema validation, developers and operations teams have complete control over balancing the agility and flexibility that comes from a dynamic schema, with strict data governance controls enforced across entire collections. As a result, they spend less time defining data quality controls in their applications, and instead delegate these tasks to the database. Specific benefits of schema validation include:

  1. Simplified application logic. Guarantees on the presence, content, and data types of fields eliminates the need to implement extensive error handling in the application. In addition, the need to enforce a schema through application code, or via a middleware layer such as an Object Document Mapper, is removed.
  2. Enforces control. Database clients can no longer compromise the integrity of a collection by inserting or updating data with incorrect field names or data types, or adding new attributes that have not been previously approved.
  3. Supports compliance. In some regulated industries and applications, it is required that Data Protection Officers demonstrate that data is stored in a specific format, and that no additional attributes have been added. For example, the EU’s General Data Protection Regulation (GDPR) requires an impact assessment against all Personally Identifiable Information (PII), prior to any processing taking place.

Extending Security Controls

MongoDB offers among the most extensive and mature security capabilities of any modern database, providing robust access controls, end-to-end data encryption, and complete database auditing. MongoDB 3.6 continues to build out security protection with two new enhancements that specifically reduce the risk of unsecured MongoDB instances being unintentionally deployed into production.

From the MongoDB 2.6 release onwards, the binaries from the official MongoDB RPM and DEB packages bind to localhost by default. With MongoDB 3.6, this default behavior is extended to all MongoDB packages across all platforms. As a result, all networked connections to the database will be denied unless explicitly configured by an administrator. Review the documentation to learn more about the changes introduced by localhost binding. Combined with new IP whitelisting, administrators can configure MongoDB to only accept external connections from approved IP addresses or CIDR ranges that have been explicitly added to the whitelist.

End-to-End Compression

Adding to intra-cluster network compression released in MongoDB 3.4, the new 3.6 release adds wire protocol compression to network traffic between the client and the database.

Creating highly efficient distributed systems with end to end compression Figure 2: Creating highly efficient distributed systems with end to end compression

Wire protocol compression can be configured with the snappy or zLib algorithms, allowing up to 80% savings in network bandwidth. This reduction brings major performance gains to busy network environments and reduces connectivity costs, especially in public cloud environments, or when connecting remote assets such as IoT devices and gateways.

With compression configurable across the stack – for client traffic, intra-cluster communications, indexes, and disk storage – MongoDB offers greater network, memory, and storage efficiency than almost any other database.

Enhanced Operational Management in Multi-Tenant Environments

Many MongoDB customers have built out their database clusters to serve multiple applications and tenants. MongoDB 3.6 introduces two new features that simplify management and enhance scalability:

Operational session management enables operations teams to more easily inspect, monitor, and control each user session running in the database. They can view, group, and search user sessions across every node in the cluster, and respond to performance issues in real time. For example, if a user or developer error is causing runaway queries, administrators now have the fine-grained operational oversight to view and terminate that session by removing all associated session state across a sharded cluster in a single operation. This is especially useful for multi-tenant MongoDB clusters running diverse workloads, providing a much simpler interface for identifying active operations in the database cluster, recovering from cluster overloads, and monitoring active users on a system. Review the sessions commands documentation to learn more.

Improved scalability with the WiredTiger storage engine to better support common MongoDB use cases that create hundreds of thousands of collections per database, for example:

  • Multi-tenant SaaS-based services that create a collection for each user.
  • IoT applications that write all sensor data ingested over an hour or a day into a unique collection.

As the collection count increased, MongoDB performance could, in extreme cases, degrade as the WiredTiger session cache managing a cursor’s access to collections and indexes became oversubscribed. MongoDB 3.6 introduces a refactoring of the session cache from a list to hash table, with improved cache eviction policies and checkpointing algorithms, along with higher concurrency by replacing mutexes with Read/Write locks. As a result of this refactoring, a single MongoDB instance running with the WiredTiger storage engine can support over 1 million collections. Michael Cahill, director of Storage Engineering, presented a session on the development work at the MongoDB World ‘17 customer conference. Review the session slides to learn more.

Next Steps

That wraps up the second part of our what’s new blog series. Remember, if you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then:

JSON Schema Validation and Expressive Query Syntax in MongoDB 3.6

One of MongoDB’s key strengths has always been developer empowerment: by relying on a flexible schema architecture, MongoDB makes it easier and faster for applications to move through the development stages from proof-of-concept to production and iterate over update cycles as requirements evolve.

However, as applications mature and scale, they tend to reach a stable stage where frequent schema changes are no longer critical or must be rolled out in a more controlled fashion, to prevent undesirable data from being inserted into the database. These controls are especially important when multiple applications write into the same database, or when analytics processes rely on predefined data structures to be accurate and useful.

MongoDB 3.2 was the first release to introduce Document Validation, one of the features that developers and DBAs who are accustomed to relational databases kept demanding. As MongoDB’s CTO, Eliot Horowitz, highlighted in Document Validation and What Dynamic Schemas means:

Along with the rest of the 3.2 "schema when you need it" features, document validation gives MongoDB a new, powerful way to keep data clean. These are definitely not the final set of tools we will provide, but is rather an important step in how MongoDB handles schema.

Announcing JSON Schema Validation support

Building upon MongoDB 3.2’s Document Validation functionality, MongoDB 3.6 introduces a more powerful way of enforcing schemas in the database, with its support of JSON Schema Validation, a specification which is part of IETF’s emerging JSON Schema standard.

JSON Schema Validation extends Document Validation in many different ways, including the ability to enforce schemas inside arrays and prevent unapproved attributes from being added. These are the new features we will focus on in this blog post, as well as the ability to build business validation rules.

Starting with MongoDB 3.6, JSON Schema is the recommended way of enforcing Schema Validation. The next section highlights the features and benefits of using JSON Schema Validation.

Switching from Document Validation to JSON Schema Validation

We will start by creating an orders collection (based on an example we published in the Document Validation tutorial blog post):

db.createCollection("orders", {
  validator: {
    item: { $type: "string" },
    price: { $type: "decimal" }
  }
});

With this document validation configuration, we not only make sure that both the item and price attributes are present in any order document, but also that item is a string and price a decimal (which is the recommended type for all currency and percentage values). Therefore, the following element cannot be inserted (because of the "rogue" *price *attribute):

db.orders.insert({
    "_id": 6666, 
    "item": "jkl", 
    "price": "rogue",
    "quantity": 1 });

However, the following document could be inserted (notice the misspelled "pryce" attribute):

db.orders.insert({
    "_id": 6667, 
    "item": "jkl", 
    "price": NumberDecimal("15.5"),
    "pryce": "rogue" });

Prior to MongoDB 3.6, you could not prevent the addition of misspelled or unauthorized attributes. Let’s see how JSON Schema Validation can prevent this behavior. To do so, we will use a new operator, $jsonSchema:

db.runCommand({
  collMod: "orders",
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["item", "price"],
      properties: {

       item: {
            bsonType: "string"
       },
       price: {
          bsonType: "decimal"
        }
      }
    }
  }
});

The JSON Schema above is the exact equivalent of the document validation rule we previously set above on the orders collection. Let’s check that our schema has indeed been updated to use the new $jsonSchema operator by using the db.getCollectionInfos() method in the Mongo shell:

db.getCollectionInfos({name:"orders"})

This command prints out a wealth of information about the orders collection. For the sake of readability, below is the section that includes the JSON Schema:

...
"options" : {
    "validator" : {
        "$jsonSchema" : {
            "bsonType" : "object",
            "required" : [
                "item",
                "price"
            ],
            "properties" : {
                "item" : {
                    "bsonType" : "string"
                },
                "price" : {
                    "bsonType" : "decimal"
                }
            }
        }
    },
    "validationLevel" : "strict",
    "validationAction" : "error"
}
...

Now, let’s enrich our JSON schema a bit to make better use of its powerful features:

db.runCommand({
  collMod: "orders",
  validator: {
    $jsonSchema: {
      bsonType: "object",
      <strong>additionalProperties: false</strong>,
      required: ["item", "price"],
      properties: {
       <strong>_id: {}</strong>,
       item: {
            bsonType: "string",
            description: "'item' must be a string and is required"
        },
        price: {
          bsonType: "decimal",
          description: "'price' must be a decimal and is required"
        },
        quantity: {
          <strong>bsonType: ["int", "long"]</strong>,
          minimum: 1,
          maximum: 100,
          exclusiveMaximum: true,
          description:
            "'quantity' must be short or long integer between 1 and 99"
        }
      }
    }
  }
});

Let’s go through the additions we made to our schema:

  • First, note the use of the additionalProperties:false attribute: it prevents us from adding any attribute other than those mentioned in the properties section. For example, it will no longer be possible to insert data containing a misspelled pryce attribute. As a result, the use of additionalProperties:false at the root level of the document also makes the declaration of the _id property mandatory: whether our insert code explicitly sets it or not, it is a field MongoDB requires and would automatically create, if not present. Thus, we must include it explicitly in the properties section of our schema.
  • Second, we have chosen to declare the quantity attribute as either a short or long integer between 1 and 99 (using the minimum, maximum and exclusiveMaximum attributes). Of course, because our schema only allows integers lower than 100, we could simply have set the bsonType property to int. But adding long as a valid type makes application code more flexible, especially if there might be plans to lift the maximum restriction.
  • Finally, note that the description attribute (present in the item, price, and quantity attribute declarations) is entirely optional and has no effect on the schema aside from documenting the schema for the reader.

With the schema above, the following documents can be inserted into our orders collection:

db.orders.insert({ 
    "item": "jkl", 
    "price": NumberDecimal(15.50),
    "quantity": NumberInt(99)
  });

  db.orders.insert({ 
    "item": "jklm", 
    "price": NumberDecimal(15.50),
    "quantity": NumberLong(99)
  });

However, the following documents are no longer considered valid:

db.orders.insert({ 
    "item": "jkl", 
    "price": NumberDecimal(15.50),
    <strong>"quantity": NumberInt(100)</strong>
  });
  db.orders.insert({ 
    "item": "jkl", 
    "price": NumberDecimal(15.50),
    <strong>"quantity": "98"</strong>
  });
  db.orders.insert({ 
    "item": "jkl", 
    <strong>"pryce": NumberDecimal(15.50),</strong>
    "quantity": NumberInt(99)
  });

You probably noticed that our orders above are seemingly odd: they only contain one single item. More realistically, an order consists of multiple items and a possible JSON structure might be as follows:

{
    _id: 10000,
    total: NumberDecimal(141),
    VAT: 0.20,
    totalWithVAT: NumberDecimal(169),
    lineitems: [
        {
            sku: "MDBTS001",
            name: "MongoDB Stitch T-shirt",
            quantity: NumberInt(10),
            unit_price:NumberDecimal(9)
        },
        {
            sku: "MDBTS002",
            quantity: NumberInt(5),
            unit_price: NumberDecimal(10)
        }
    ]
}

With MongoDB 3.6, we can now control the structure of the lineitems array, for instance with the following JSON Schema:

db.runCommand({
    collMod: "orders",
    validator: {
      $jsonSchema: {
        bsonType: "object",       
        required: ["lineitems"],
        properties: {
        lineitems: {
              <strong>bsonType: ["array"],</strong>
              minItems: 1,
              maxItems:10,
              items: {
                  required: ["unit_price", "sku", "quantity"],
                  bsonType: "object",
                  additionalProperties: false,
                  properties: {
                      sku: {
                        bsonType: "string",
                        description: "'sku' must be a string and is required"
                      },
                      name: {
                        bsonType: "string",
                        description: "'name' must be a string"
                      },
                      unit_price: {
                        bsonType: "decimal",
                        description: "'unit_price' must be a decimal and is required"
                      },
                      quantity: {
                        bsonType: ["int", "long"],
                        minimum: 0,
                        maximum: 100,
                        exclusiveMaximum: true,
                        description:
                          "'quantity' must be a short or long integer in [0, 100)"
                      },
                  }                    
              }
          }
        }
      }
    }
  });

With the schema above, we enforce that any order inserted or updated in the orders collection contain a lineitems array of 1 to 10 documents that all have sku, unit_price and quantity attributes (with quantity required to be an integer).

The schema would prevent inserting the following, badly formed document:

db.orders.insert({
        total: NumberDecimal(141),
        VAT: NumberDecimal(0.20),
        totalWithVAT: NumberDecimal(169),
        lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                price: NumberDecimal(9) //this should be 'unit_price'
            },
            {
                name: "MDBTS002", //missing a 'sku' property
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

But it would allow inserting the following, schema-compliant document:

db.orders.insert({
        total: NumberDecimal(141),
        VAT: NumberDecimal(0.20),
        totalWithVAT: NumberDecimal(169),
        lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                unit_price: NumberDecimal(9)
            },
            {
                sku: "MDBTS002",
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

However, if you pay close attention to the order above, you may notice that it contains a few errors:

  1. The totalWithVAT attribute value is incorrect (it should be equal to 141*1.20=169.2)

  2. The total attribute value is incorrect (it should be equal to the sum of each line item sub-total, (i.e. 10*9+10*5=140)

Is there any way to enforce that total and totalWithVAT values be correct using database validation rules, without relying solely on application logic?

Introducing MongoDB expressive query syntax

Adding more complex business validation rules is now possible thanks to the expressive query syntax, a new feature of MongoDB 3.6.

One of the objectives of the expressive query syntax is to bring the power of MongoDB’s aggregation expressions to MongoDB’s query language. An interesting use case is the ability to compose dynamic validation rules that compute and compare multiple attribute values at runtime. Using the new $expr operator, it is possible to validate the value of the totalWithVAT attribute with the following validation expression:

$expr: {
   $eq: [
     "$totalWithVAT",
     {$multiply: [
       "$total", 
       {$sum: [1, "$VAT"]}
     ]}
   ]
}

The above expression checks that the totalWithVAT attribute value is equal to total * (1+VAT). In its compact form, here is how we could use it as a validation rule, alongside our JSON Schema validation:

db.runCommand({
    collMod: "orders",
    validator: {
 <strong>$expr:{$eq:[
           "$totalWithVAT",
           {$multiply:["$total", {$sum:[1,"$VAT"]}]}
             ]}</strong>,
      $jsonSchema: {
        bsonType: "object",       
        required: ["lineitems"],
        properties: {
          lineitems: {
              bsonType: ["array"],
              minItems: 1,
              maxItems:10,
              items: {
                  required: ["unit_price", "sku", "quantity"],
                  bsonType: "object",
                  additionalProperties: false,
                  properties: {
                      sku: {
                        bsonType: "string",
                        description: "'sku' must be a string and is required"
                      },
                      name: {
                        bsonType: "string",
                        description: "'name' must be a string"
                      },
                      unit_price: {
                        bsonType: "decimal",
                        description: "'unit_price' must be a decimal and is required"
                      },
                      quantity: {
                        bsonType: ["int", "long"],
                        minimum: 0,
                        maximum: 100,
                        exclusiveMaximum: true,
                        description:
                          "'quantity' must be a short or long integer in [0, 100)"
                      },
                  }                    
              }
          }
        }
      }
    }
  });

With the validator above, the following insert operation is no longer possible:

db.orders.insert({
        total: NumberDecimal(141),
        VAT: NumberDecimal(0.20),
        totalWithVAT: NumberDecimal(169),
        lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                Unit_price: NumberDecimal(9)
            },
            {
                sku: "MDBTS002",
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

Instead, the totalWithVAT value must be adjusted according to our new VAT validation rule:

db.orders.insert({
    total: NumberDecimal(141),
    VAT: NumberDecimal(0.20),
    <strong>totalWithVAT: NumberDecimal(169.2)</strong>,
    lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                unit_price: NumberDecimal(9)
            },
            {
                sku: "MDBTS002",
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

If we also want to make sure that the total value is the sum of each order line item value (i.e. quantity*unit_price), the following expression should be used:

$expr: { 
    $eq: [
       "$total", 
       {$sum: {
          $map: {
             "input": "$lineitems",
             "as": "item",
             "in": { 
                "$multiply": [
                   "$$item.quantity", 
                   "$$item.unit_price"
                ]
             } 
          }
       }}
    ]
  }

The above expression uses the $map operator to compute each line item’s sub-total, then sums all these sub-totals, and finally compares it to the total value. To make sure that both the Total and VAT validation rules are checked, we must combine them using the $and operator. Finally, our collection validator can be updated with the following command:

db.runCommand({
    collMod: "orders",
    validator: {
      $expr:{ $and:[
          {$eq:[ 
            "$totalWithVAT",
                   {$multiply:["$total", {$sum:[1,"$VAT"]}]}
          ]}, 
          {$eq: [
                   "$total", 
                {$sum: {$map: {
                    "input": "$lineitems",
                    "as": "item",
                    "in":{"$multiply":["$$item.quantity","$$item.unit_price"]}
                   }}}
             ]}
        ]},
      $jsonSchema: {
        bsonType: "object",       
        required: ["lineitems", "total", "VAT", "totalWithVAT"],
        properties: {
          total: { bsonType: "decimal" },
          VAT: { bsonType: "decimal" },
          totalWithVAT: { bsonType: "decimal" },
          lineitems: {
              bsonType: ["array"],
              minItems: 1,
              maxItems:10,
              items: {
                  required: ["unit_price", "sku", "quantity"],
                  bsonType: "object",
                  additionalProperties: false,
                  properties: {
                      sku: {bsonType: "string"},
                      name: {bsonType: "string"},
                      unit_price: {bsonType: "decimal"},
                      quantity: {
                        bsonType: ["int", "long"],
                        minimum: 0,
                        maximum: 100,
                        exclusiveMaximum: true

                      },
                  }                    
              }
          }
        }
      }
    }
  });

Accordingly, we must update the total and totalWithVAT properties to comply with our updated schema and business validation rules (without changing the lineitems array):

db.orders.insert({
      total: NumberDecimal(140),
      VAT: NumberDecimal(0.20),
      totalWithVAT: NumberDecimal(168),
      lineitems: [
          {
              sku: "MDBTS001",
              name: "MongoDB Stitch T-shirt",
              quantity: NumberInt(10),
              unit_price: NumberDecimal(9)
          },
          {
              sku: "MDBTS002",
              quantity: NumberInt(5),
              unit_price: NumberDecimal(10)
          }
      ]
  })

Next steps

With the introduction of JSON Schema Validation in MongoDB 3.6, database administrators are now better equipped to address data governance requirements coming from compliance officers or regulators, while still benefiting from MongoDB’s flexible schema architecture.

Additionally, developers will find the new expressive query syntax useful to keep their application code base simpler by moving business logic from the application layer to the database layer.

If you want to learn more about everything new in MongoDB 3.6, download our What’s New guide.

If you want to get deeper on the technical side, visit the Schema Validation and Expressive Query Syntax pages in our official documentation.

If you want to get more practical, hands-on experience, take a look at this JSON Schema Validation hands-on lab. You can try it right away on the MongoDB Atlas database service, which supports MongoDB 3.6 since its general availability date.

Last but not least, sign up for our free MongoDB 3.6 training from MongoDB University.

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

What’s New in MongoDB 3.6. Part 1 – Speed to Develop

Mat Keep
December 07, 2017
Company, MongoDB 3.6

MongoDB 3.6 is now Generally Available (GA), and ready for production deployment. In this short blog series, I’ll be taking you on a whirlwind tour of what’s new in this latest release:

  • Today, we’ll take a look at the new capabilities designed specifically to help developers build apps faster. We’ll take a look at change streams, retryable writes, developer tools, and fully expressive array manipulation
  • In part 2, we’ll dive into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression
  • Part 3 will cover what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Developer-First

MongoDB has always been a developer-first technology. Its document data model maps naturally to objects in application code, making it simple for developers to learn and use. A document’s schema can be dynamically created and modified without downtime, making it fast to build and evolve applications. Native, idiomatic drivers are provided for 10+ languages – and the community has built dozens more – enabling ad-hoc queries, real-time aggregation and rich indexing to provide powerful programmatic ways to access and analyze data of any structure.

MongoDB 3.6 builds upon these core capabilities to allow developers to create rich apps and customer experiences, all with less code.

Change Streams

Change streams enable developers to build reactive, real-time, web, mobile, and IoT apps that can view, filter, and act on data changes as they occur in the database. Change streams enable seamless data movement across distributed database and application estates, making it simple to stream data changes and trigger actions wherever they are needed, using a fully reactive programming style.

Implemented as an API on top of MongoDB’s operation log (oplog), consumers can open change streams against collections and filter on relevant events using the $match, $project, and $redact aggregation pipeline stages. The application can register for notifications whenever a document or collection is modified, enabling downstream applications and consumers to act on new data in real time, without constantly querying the entire collection to identify changes. Applications can consume change streams directly, via a message queue, or through a backend service such as MongoDB Stitch (coming soon).

Use cases enabled by MongoDB change streams include:

  • Powering trading applications that need to be updated in real time as stock prices rise and fall.
  • Synchronizing updates across serverless and microservices architectures by triggering an API call when a document is inserted or modified. For example, new customer orders written to the database may automatically trigger functions to generate invoices and delivery schedules.
  • Updating dashboards, analytics systems, and search engines as operational data changes.
  • Creating powerful IoT data pipelines that can react whenever the state of physical objects change. For example, generating alarms whenever a connected vehicle moves outside of a geo-fenced area.
  • Pushing new credit card transactions into machine learning training models to re-score fraud classifications.
  • Refreshing scoreboards in multiplayer games.

MongoDB change streams enable consumers to react to data changes in real time Figure 1: MongoDB change streams enable consumers to react to data changes in real time

Some MongoDB users requiring real-time notifications have built their own change data capture processes that “tail” the oplog. By migrating to change streams, these users can reduce development and operational overhead, improve usability, and increase data reliability. When compared to both oplog tailing and change notifications implemented by alternative databases, MongoDB change streams offer a number of advantages:

  • Change streams are flexible – users can register to receive just the individual deltas from changes to a document, or receive a copy of the full document.
  • Change streams are consistent – by utilizing a global logical clock, change streams ensure a total ordering of event notifications across shards. As a result, MongoDB guarantees the order of changes will be preserved, and can be safely processed by the consuming application in the order received from the stream.
  • Change streams are secure – users are able to create change streams only on collections to which they have been granted read access.
  • Change streams are reliable – notifications are only sent on majority committed write operations, and are durable when nodes or the network fails.
  • Change streams are resumable – when nodes recover after a failure, change streams can be automatically resumed, assuming that the last event received by the application has not rolled off the oplog.
  • Change streams are familiar – the API syntax takes advantage of the established MongoDB drivers and query language, and are independent of the underlying oplog format.
  • Change streams are highly concurrent – up to 1,000 change streams can be opened against each MongoDB instance with minimal performance degradation.

Review the MongoDB change streams documentation to learn more.

Retryable Writes

The addition of retryable writes to MongoDB moves the complexity of handling temporary system failures from the application to the database. Now, rather than the developer having to implement custom, client-side code, the MongoDB driver can automatically retry writes in the event of transient network failures or a primary replica election, while the MongoDB server enforces exactly-once processing semantics.

By assigning a unique transaction identifier to each write operation, the driver re-sends that ID to enable the server to evaluate success of the previous write attempt, or retry the write operation as needed. This implementation of retryable writes offers a number of benefits over approaches taken by other databases:

  • Retryable writes are not limited to idempotent operations only. They can also be applied to operations such as incrementing or decrementing a counter, or processing orders against stock inventory.
  • Retryable writes are safe for operations that failed to acknowledge success back to the application due to timeout exceptions, for example due to a transient network failure.
  • Retryable writes do not require developers to add any extra code to their applications, such as retry logic or savepoints.

Applications that cannot afford any loss of write availability, such as e-commerce applications, trading exchanges, and IoT sensor data ingestion, immediately benefit from retryable writes. When coupled with self-healing node recovery – typically within 2-seconds or less – MongoDB’s retryable writes enable developers to deliver always-on, global availability of write operations, without the risks of data loss and stale reads imposed by eventually consistent, multi-master systems.

Tunable Consistency

With tunable consistency, MongoDB affords developers precise control over routing queries across a distributed cluster, balancing data consistency guarantees with performance requirements. MongoDB 3.4 added linearizable reads, which were central to MongoDB passing Jepsen – some of the most stringent data safety and correctness tests in the database industry.

Now the MongoDB 3.6 release introduces support for causal consistency – guaranteeing that every read operation within a client session will always see the previous write operation, regardless of which replica is serving the request. By enforcing strict, causal ordering of operations within a session, causal consistency ensures every read is always logically consistent, enabling monotonic reads from a distributed system – guarantees that cannot be met by most multi-node databases.

Causal consistency allows developers to maintain the benefits of strict data consistency enforced by legacy single node relational databases, while modernizing their infrastructure to take advantage of the scalability and availability benefits of modern distributed data platforms.

Developer Tooling: MongoDB Compass

As the GUI for MongoDB, Compass has become an indispensable tool for developers and DBAs, enabling graphical schema discovery and query optimization. Compass now offers several new features:

  • Auto-complete: Enables developers to simplify query development with Compass providing suggestions for field names and MongoDB operators, in addition to matching braces and quotes as they code.
  • Query History: Allows developers to re-run their most recently executed queries, and save common queries to run on-demand.
  • Table View: Now developers can view documents as conventional tables, as well as JSON documents.

MongoDB Compass is not just a single tool – it’s a framework built to allow for the addition of modular components. Compass now exposes this as the Compass Plugin Framework, making Compass extensible by any user with the same methods used by MongoDB’s software engineers. Using the plugin API, users can build plugins to add new features to Compass. Examples include a GridFS viewer, a sample data generator, a hardware stats viewer, a log collector/analyzer, and more.

You can learn more about these new features in the MongoDB Compass documentation.

MongoDB Compass Community

With the MongoDB 3.6 release, the Compass family has expanded to now include the new, no-cost Compass Community edition.

Compass Community provides developers an intuitive visual interface to use alongside the MongoDB shell. It includes the core features of Compass, enabling users to review the hierarchy and size of databases and collections, inspect documents, and insert / update / delete documents. Developers can use the GUI to build queries, examine how they’re executed, and add or drop indexes to improve performance. Compass Community also supports the latest Compass functionality available with MongoDB 3.6, making developers even more productive.

MongoDB Compass Community Figure 2: MongoDB Compass Community, new no-cost GUI for MongoDB developers

MongoDB Compass Community is available from the MongoDB download center.

Fully Expressive Array Updates

Arrays are a powerful construct in MongoDB’s document data model, allowing developers to represent complex objects in a single document that can be efficiently retrieved in one call to the database. Before MongoDB 3.6, however, it was only possible to atomically update the first matching array element in a single update command.

With fully expressive array updates, developers can now perform complex array manipulations against matching elements of an array – including elements embedded in nested arrays – all in a single atomic update operation. MongoDB 3.6 adds a new arrayFilters option, allowing the update to specify which elements to modify in the array field. This enhancement allows even more flexibility in data modeling. It also delivers higher performance than alternative databases supporting JSON data as entire documents do not need to be rewritten when only selective array elements are updated.

Learn more from the array update documentation.

Next Steps

That wraps up the first part of our what’s new blog series. Remember, if you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then:

Announcing the General Availability of MongoDB 3.6

We announced MongoDB 3.6 in November. Following great community feedback on the 3.6 release candidates, we’re happy to say that 3.6 is now generally available and ready for production deployments. You can download the community version and MongoDB Enterprise Server today.

MongoDB 3.6 is also available on MongoDB Atlas, so you can try out 3.6 or upgrade your existing Atlas clusters to 3.6.

MongoDB 3.6 makes it easier than ever to work with data in the most natural, efficient, and frictionless way possible. In short, MongoDB helps you go faster when building and scaling apps. Key 3.6 features include:

Change streams enable you to build reactive web, mobile and IoT applications that can view, filter, and act on data changes as they occur in the database. Whenever data is changed in MongoDB, downstream systems are automatically notified of the updates in real time. Change streams provide an easy and efficient way to build reactive, event driven apps.

Retryable writes move the complexity of handling transient systems failures from the application to the database. Instead of you having to implement masses of custom, client-side code, MongoDB automatically retries write operations using exactly-once semantics.

With Schema validation, using syntax derived from the proposed IETF JSON Schema standard, we’ve extended the document validation capabilities originally introduced in MongoDB 3.2. Now, DevOps and DBA teams can define a prescribed document structure for each collection, down to the level of individual fields within nested arrays. And you’re able to tune this as you need: lock the schema down, open it up, apply it to a subset of fields – whatever you need for each app or stage of your project.

Binding to localhost by default: with MongoDB 3.6 all MongoDB packages across all platforms refuse all external connections to the database unless explicitly configured otherwise by the administrator. Combined with new IP whitelisting support, administrators can configure MongoDB to only accept external connections on approved IP addresses. These enhancements greatly reduce the risk of unsecured MongoDB instances unintentionally being deployed into production.

Aggregation enhancements support more expressive queries, giving you faster access to data-driven insights. MongoDB’s document data model allows you to model entities in the same way you represent them in code - as complete objects - so you don't need to worry about JOINs. But for analytics it’s useful to join data across multiple collections. We introduced left outer equijoins in MongoDB 3.2, but now we are expanding this with a more powerful $lookup operator to support the equivalent of SQL subqueries and non-equijoins. MongoDB's Connector for BI, which enables MongoDB to be used as a data source in SQL-based analytics and data visualization tools, takes advantage of these enhancements to deliver higher performance, with more analytic operations pushed natively to the database.

MongoDB Atlas is the best way to run MongoDB in the public cloud. MongoDB 3.6 is available as a fully managed service on Atlas, including important new features to support global applications, and with automated scalability and performance optimizations.

Cross-region replication allows Atlas clusters to span multiple cloud provider regions, maintaining continuous availability in the event of geographic outages, and providing optimal customer experience by distributing data closer to users. Atlas now also supports automatic scaling for storage associated with a cluster, making it easier for you to manage capacity. The new performance advisor continuously highlights slow-running queries and provides intelligent index recommendations to improve performance.

Community Contribution

We’d like to acknowledge these users who contributed to this release: adrien petel, aftab muhammed khan, Andrew, atish, Ben Calev, Bodenhaltung, Christian, Curtis Hovey, daniel moqvist, Dawid Esterhuizen, Denis Shkirya, Deyoung Hong, Dmitri Shubin, Edik Mkoyan, Eugene Ivanov, Evan Broder, Gian Maria Ricci, hotdog929, Igor Canadi, Igor Solodovnikov, Igor Wiedler, Ivy Wang, James Reitz, Jelle van der Waa, Jim Van Fleet, Joe, KarenHuang, kurevo18, Marek Skalický, Markus Penttil, Matt Berry, may, Meni Livne, Michael Coleman, Michael Liu, Mike Zraly, Renaud Allard, Richard Hutta, Rob Clancy, ryankall, Sergey, Steven Green, Thales Ceolin, Tom Leibu, Ultrabug, Wayne Wang, and Yuriy Solodkyy.

Thank you! Please keep the feedback coming.

Learn More

With over a hundred new and updated features, MongoDB 3.6 is our biggest release yet. Download the Guide to What’s New in MongoDB 3.6 to learn more.

Or, to get started now:

Download MongoDB 3.6 Enterprise
Download MongoDB 3.6 Community