MongoDB vs SQL: Day 14 - Queries

Buzz Moschetti


Welcome back to our blog series highlighting the differences between developing an app with MongoDB vs. with RDBMS/SQL. Last week, we began to cover Day 14 and added a list of startup apps organized by region. We also added data that was provided to us by an external entitlements service. This week we’re going to continue our discussion of Day 14 by diving into queries.

Before we begin, let's refresh our memories with the framework we’ve been using to stage our discussion.

  • We are using Java
  • Assume we have a data access layer in between our application and MongoDB
  • In terms of the date counts as we go through the examples, treat them as relative progress indicators and not the actual time needed to complete the task.
  • We won’t get into exception or error-handling. We won’t muddy the code with boilerplate or persistor logic that does not change from day to day. We won’t get into the database connection or other setup resources. The primary focus will be the core data-handling code.

MongoDB vs SQL: Day 14 - Queries

At this point, we’ve covered how to address broad queries that enable us to get everything or one thing from a collection. But we know the hallmark of SQL is its rich querying capability. Lucky for us, MongoDB has rich querying capabilities as well.

The big difference between SQL and MongoDB’s query language is that the latter is not a single string “sentence.” It doesn’t require white spaces in between words, or commas, or parentheses, or quoted characters. Instead, MongoDB uses a “cascade” of operator/operand structures, typically in name:value pairs. In other words, the operand in one structure can be another operator/operand structure.

This makes things very exciting for developers because the same techniques we use in our code to manipulate rich shapes of data going into and coming out of MongoDB - whether it’s Java, Python, JavaScript, etc. - can be used to construct, manipulate, and “parse” our query expressions. In fact, no parser is necessary. It is trivially easy to walk the cascade with standard programming techniques and find fields, values, etc. Because the query expression is a structured cascade and not a single string, this also means that it is easy to incrementally add subexpressions to the query without first having to break it apart into component pieces.

MongoDB vs SQL: Query Examples

Now let’s look at some compare and contrast, side-by-side code examples for queries.

First, we see one of the popular command line interpreters for SQL which is going to fetch us some contacts and phones. This will yield a rectangle, as we saw in earlier posts.

Next, in the MongoDB command line interpreter (“CLI”), we set up the equivalent query. Notice that we can use “dotpath” syntax to address subfields within fields in the rich shape. It is also worth noting that the “equals” operator is so common that as a shortcut, MongoDB interprets name:value as “name = value”, without having to explicitly supply the $eq operator.

Third, we see the equivalent query in Java / JDBC. Note that although the “sentence” is similar, we start to bump into irritants like escaping quotes.

Lastly, we see the equivalent in MongoDB via the Java driver.

We can see that the overall semantics of queries in MongoDB and SQL are the same and follow the common pattern of query setup, issuance of query, and iteration over a cursor.

Let’s look at some more complicated queries now.

In this query, we’re looking for contacts who either have at least one work phone OR have been hired after a specific date. Again, we can see that the equivalent in MongoDB is pretty straightforward. Note the use of dollar signs in the operator names (`$or`, `$gt`) as syntactic sugar. Also note that in both examples it’s important to use an actual date in our comparison, not a string.

The equivalent query in Java / JDBC will look largely the same as before, with a few more escaped quotes.

However, in practice it isn’t as complicated as it appears -- and it actually offers more flexibility than SQL:
  1. First of all, it’s really the same two or three lines just repeated over and over again with different field:value pairs. This makes it easy to cut-and-paste these expressions as you build up your query.
  2. Second, it is simple to dynamically construct filters and queries without worrying about where we are in the predicate path. We don’t have to worry about white space, commas, or parentheses. We don’t have to worry about splicing in a `SORT` statement or dynamically adjusting the names of returned fields sandwiched between `SELECT` and `WHERE`. Parameter substitution is very straightforward and easily coded, especially when dynamic logical `AND` and `OR` statements come into the picture.

If we extrapolate beyond this small code example, it’s evident how easy it is to add more expressions into the $or statement, or to call out to another function that independently crafts a small filtering fragment that we can add to our overall query. As dynamic queries become more complex in SQL, however, the syntactic sugar that makes SQL “human readable” in the CLI begins to work against you in the programmatic construction of a query.

We’ve used the very basic Java query APIs to illustrate the operator/operand nature of the language. We also deliberately chose standard Java HashMap objects to further reduce coupling until the last moment - when we constructed the BasicDBObject to pass to the find() method. For greater convenience, a Builder pattern set of APIs exist as well, but in the end it is still building a cascade of operator/operand structures.

More MongoDB Query Capabilities

MongoDB offers capabilities you come to expect in a full-featured query language including:

  1. Arbitrary sorting on one or more fields, ascending or descending.
  2. Projection, i.e. retrieving only specified fields from each document in the cursor, not the entire document.
  3. Cursor `skip()` and `limit()` to easily implement pagination if desired.
  4. `explain()`, which returns a wealth of information including full details on query path analysis, document and index counts, and estimated vs. actual processing time.

Perhaps the most important feature is the Aggregation Framework (“agg”), MongoDB’s answer to SQL’s GROUP BY clause. Exploring the power of agg is an entire blog in its own right; please see Asya Kamsky’s posts to get an idea of the agg’s power and programmability. For now, here’s an example to get you thinking about it. Suppose we want to count all the different kinds of cell phones owned by our contacts hired before June 1, 2013. Let’s make it a bit more interesting - let’s capture the names of the people who have these phones and only emit those types where more than 1 person has it. Expressed in the MongoDB CLI, we’d try this:

x2 =[
   { $match: { "hiredate": {"$lt": new ISODate("20130601") }}},
   { $unwind: "$phones"},
   { $group: { "_id": "$phones.type",
               "n": {$sum: 1},
               "who": {$push: “$name”},
   { $match: { “n”: {“$gt”: 1}} },
   { $sort: { "n": -1, "_id": 1} }
x2.forEach(function(r) { printjson(r); });

This might yield:

{ "_id" : "mobile", "n" : 3, who: [ “buzz”, “sam”, “dan” ] }
{ "_id" : "work", "n" : 2, who: [ “sam”, “kay” ] }

The important concepts to grasp with agg are:

  1. Data is “flowed” through a pipeline of operations (e.g. `$match` and `$unwind`). Output from each stage is passed to the next. In the example above, we use the `$match` operator twice: once to filter the input set, then a second to filter the group set.
  2. The `$unwind` operator turns arrays of things into “virtual” documents, one for each element of the array, to simplify further processing.
  3. The `$group` operator is extremely powerful and can even create brand new fields based on numeric and string operations of other fields. In particular, as you group data and aggregate on a scalar (e.g. the count of types), you can use the `$push` operator to capture other information related to that aggregation. The output cursor contains very clear, usable rich shapes.
  4. The agg pipeline in the Java driver (in fact, most of the drivers) is simply a List of operators, very similar to what we saw earlier with `find()`. Thus, the same power and flexibility in terms of parameter substitution and subclause conditional inclusion applies to pipeline construction.

Of course, all this functionality is performed efficiently at the engine, not in the client, so millions (or billions) of documents do not have to be dragged across the network.

Next week, we’ll dive into RAD (Rapid Agile Development) and switch over to some Python examples.

For more information on migration, read our migration best practices white paper.
Read the Migration Guide

<< Day 14 (Part 1)

Day 30 (RAD) >>

About the Author - Buzz Moschetti

Buzz is a solutions architect at MongoDB. He was formerly the Chief Architecture Officer of Bear Stearns before joining the Investment Bank division of JPMorganChase as Global Head of Architecture. His areas of expertise include enterprise data design, systems integration, and multi-language tiered software leverage with C/C++, Java, Perl, Python, and Ruby. He holds a bachelor of science degree from the Massachusetts Institute of Technology.