When we last left off in our MongoDB vs SQL blog series, we covered Day 1 and Day 2 of building the same application using MongoDB vs using SQL with code comparisons. Before we jump into the next couple of days, let’s go over the ground rules again:
- We’ll be using Java
- Assume we have a data access layer in between our application and MongoDB
- In terms of the date counts as we go through the examples, just treat them as progress indicators and not the actual time needed to complete the specified task.
- We won’t get into exception or error-handling. We won’t muddy the code with boilerplate or persistor logic that does not change from day to day. We won’t get into the database connection or other setup resources. The primary focus will be the core data-handling code.
Now let’s jump into the differences between SQL and MongoDB for Day 3 through Day 5.
SQL vs MongoDB: Day 3We have already covered saving and fetching data using a Java Map as the data carrier in the Data Access Layer, and adding a few simple fields. For day 3, we’re going to add some phone numbers to the structure.
The Task: Add A List of Phone NumbersThis is where we were:
m.put(“name”, “buzz”); m.put(“id”, “K1”); m.put(“title”, “Mr.”); m.put(“hireDate”, new Date(2011, 11, 1));
Each phone number has associated with it a type, “home” or “work.” I also know that I may want to associate other data with the phone number in the near future like a “do not call” flag. A list of substructures is a great way to organize this data and gives me plenty of room to grow. It is very easy to add this to my map:
n1.put(“type”, “work”); n1.put(“number”, “1-800-555-1212”)); n1.put(“doNotCall”, false); // throw one in now just to test... list.add(n1); n2.put(“type”, “home”)); n2.put(“number”, “1-866-444-3131”)); list.add(n2); m.put(“phones”, list);
The persistence code, however, is a different story.
SQL Day 3 - Option 1: Assume Only One Work and One Home Phone Number
This is just plain bad, but it’s worth noting here because we’ve seen this so many times, often far later than day 3 when there’s strong motivation to avoid creating a new table. With this code, we’re assuming that people only have one home and one work phone number. Let’s take the high road on day 3 and model this properly in relational form.
SQL Day 3: Option 2: Proper Approach with Multiple Phone Numbers
Here we’re doing it the right way. We’ve created a phones table and we’ve updated the way we interact with it using joins.
You can see that the incremental addition of a simple list of data is by no means trivial. We once again encounter the “alter table” problem because the SQL will fail unless it points at a database that has been converted to the new schema. The coding techniques used to save and fetch a contact are starting to diverge; the save side doesn’t “look” like the fetch side. And in particular, you’ll notice that fetching data is no longer as simple as building it into the map and passing it back. With joins, one or more (typically many more) of the columns are repeated over and over. Clearly, we don’t want to return such a redundant rectangular structure in the Data Access Layer and burden the application. We must “unwind” the SQL result set and carefully reconstruct the desired output, which is one name associated with a list of phone numbers and types.
This sort of unwinding work takes time and money. Many rely on ORMs like Hibernate to take care of this, but sooner rather than later, the ORM logic required to unwind a complex SQL query leads to unacceptable performance and or resource issues -- and you end up having to code a solution like what’s shown above anyway.
SQL Day 5: ZombiesWith SQL, you’ll have to deal with zombies: (z)ero (o)r (m)ore (b)etween (e)ntities. We can’t forget that some people in our contact list do not have phones. Our earlier query, which is a simple join, produces a Cartesian product and will not return individuals without at least one phone.
To address this, we have to go back and change the query to do an outer join. But much more importantly, it also means changing the unwind logic because we don’t want to add blank phone numbers in our list. This takes even more time and money.
As an aside, even though the SQL based logic is burdening us, at least we’ve confined the impact to just the Data Access Layer. Imagine the impact if we had no Data Access Layer and applications were themselves constructing SQL and unwind logic. Just adding a list of phone numbers would have been a major undertaking.
MongoDB Day 3Now let’s take a look at doing what we just went over, this time with MongoDB:
With MongoDB, there is no change. The list of phone numbers, which is actually a list of structures with numbers and types, flows into MongoDB and is natively stored as a list of structures. Just like on day 2, it is our choice to go back and backfill phone information for those entries already in the database. Gone are the burdens of having to set up another table, another set of foreign keys, managing those keys, and adding yet another join into what will ultimately become a very complex SQL expression. We also don’t have to immediately commit to a one-or-more vs. zero-or-more design. The time and effort saved with richly shaped MongoDB documents is significant.
Next week, we’ll dive even deeper as we add externally sourced material to our contact structure and expose the compromises development teams make in SQL / RDBMS in later-stage development.
For more information on migration, read our migration best practices white paper.
*About the Author - Buzz Moschetti*
Buzz is a solutions architect at MongoDB. He was formerly the Chief Architecture Officer of Bear Stearns before joining the Investment Bank division of JPMorganChase as Global Head of Architecture. His areas of expertise include enterprise data design, systems integration, and multi-language tiered software leverage with C/C++, Java, Perl, Python, and Ruby. He holds a bachelor of science degree from the Massachusetts Institute of Technology.