Norberto Leite

8 results

Learn New Skills in a MongoDB World Pre-Conference Workshop

MongoDB World is just around the corner, and you might be wondering, Why should I go? By not attending, you’d be missing out on great educational opportunities! First, there are the conference sessions. You get to connect with MongoDB engineers, who give you an insider’s look on how the database is built and what’s coming on the roadmap. You can get your detailed technical questions answered in a free consulting slot during Ask the Experts. You’ll have the opportunity to meet other MongoDB users who showcase the projects they’ve built, talk about their pains, and discuss where MongoDB fits and doesn't fit their requirements. Still, for me, the pre-conference workshops we offer are the icing on the cake. These full day training sessions, delivered by MongoDB engineers, provide the opportunity to experiment with MongoDB in a different context. By attending, you’ll get real, hands-on experience. What You’ll Learn Once again, we’re offering two of our most popular workshops on Ops Manager and data modeling. This year, we’re adding two new topics: how to keep your database secure and how to migrate an application from a relational database to MongoDB. No matter what your skill level is, we have a workshop to fit your needs. MongoDB Security In the MongoDB Security workshop, we will give you a set of instances and an app which you will need to secure, from top to bottom. You’ll start at the connection layer, using SSL and X509 certificates, then move to the underlying encryption of the storage layer by configuring the cluster using MongoDB encrypted storage engine. We’ll also cover auditing, authentication, and role-based access control. Cleansing Time - 99% SQL Free Applications Because migrations from relational can be cumbersome, in we’ll go through the different techniques and procedures that make this as painless as possible. You might be wondering where to start, or how to break that annoying super large join into a nice MongoDB query. We’ll address these and other common obstacles in migration. This workshop is the result of several years of experience helping our customers to perform these migrations. Getting Started with MongoDB Ops Manager The Getting Started with MongoDB Ops Manager workshop, for system administrators and DBAs, is a great crash course in MongoDB administration. Using a pre-configured Ops Manager installation, and using a set of AWS instances, we will be setting up and managing MongoDB clusters through both the Ops Manager UI and API commands. This will be a great way to explore the full set of features that Ops Manager provides. Data Modeling Data Modeling is a complex exercise, and you want to ensure you analyze all possible options to define your database documents and operations. In this workshop, we will cover the majority of schema design patterns and trade-offs between different approaches to a common use case. We want these workshops to be an opportunity for you to learn new skills in a context that allows you to transfer them into your day-to-day work. We limit the class size for each workshop to ensure that you’ll receive individual attention. Sign up soon; prices increase after April 28. Register now

March 28, 2017

How to Perform Random Queries on MongoDB

As part of my day-to-day tasks I occasionally run random queries on datasets. I could need these queries to get a nice example of a random set or to put together a list of MUG members for a quick raffle or swag winner. MUG member holding prize winning raffle To pull these queries using MongoDB (obviously I'm going to use MongoDB for the task!) we can apply a few different approaches. The traditional approach Not long ago, to raffle a few conference tickets at one of the MongoDB User Groups, I invited all members to come up with an implementation on how to get a random document from MongoDB. I was hoping for highly efficient approaches that would involve changing the MongoDB kernel and implementing a new feature in the database, but that’s not what our MUG members came up with. All proposed algorithms were instead client based, meaning that all the randomization phase of the algorithm was performed on the client side. The algorithms would be based on the client random libraries and would consist of the following approach: Get data from MongoDB Run it through a random library Spill out a winner A simple example for this type of approach can be the following: def load_data(collection, n=100): for i,d in load_data_file(n): collection.insert( d ) def get_data(collection, query={}): for d in collection.find(query): yield d Load the entire contents of the collection in memory. elements = [] for e in get_data(collection): elements.append(e) idx = random.randint(0, len(elements)) print "AND THE WINNER IS ..... " + elements[idx]['name'] We could adjust the above code to avoid loading the entire contents of the collection into memory at once by using reservoir sampling. Reservoir sampling is generally equivalent to the process described above, except that we sample the winners immediately from the incoming MongoDB result set, as a stream, so that we don't need to keep all documents in memory at once. The algorithm works by replacing the current winner with decreasing probability such that all elements--without even knowing how many there are beforehand--have an equal probability of being selected. This is actually how mongodb-js/collection-sample works when communicating with older versions of the server that don't support the new $sample operator. Adding a bit of salt ... Although the previous approach can be slightly improved by adding filters and indexes, they are client bound for choosing a random document. If we have 1M records (MongoDB User groups are quite popular!), then iterating over that many documents becomes a burden on the application and we are not really using any MongoDB magic to help us complete our task. #### Adding an incremental value A typical approach would be to mark our documents with a value that we would then use to randomly pick elements. We can start by setting an incremental value to our documents and then query based on the range of values that we recently marked the documents with. def load_data(collection, n=100): #for each element we will insert the `i` value for i in xrange(n): name = ''.join(random.sample( string.letters, 20)) collection.insert( {'name': name, 'i': i}) After tattooing our documents with this new element we are now able to use some of the magic of a MongoDB query language to start collecting our well deserved prize winner. mc = MongoClient() db = mc.simplerandom collection = db.names number_of_documents = 100 load_data(collection, number_of_documents ) query = {'i': random.randint(0, number_of_documents ) } winner = collection.find_one(query); print "AND THE WINNER IS ..... " + winner['name'] While this approach seems fine and dandy there are a few problems here: we need to know the number of documents inserted we need to make sure that all data is available Double trip to mother ship To attend the first concern we need to know the number of documents inserted , which we can count with the MongoDB count operator . number_of_documents = collection.count() This operator will immediately tell us the number of elements that a given collection contains But it does not solve all of our problems. Another thread might be working concurrently and deleting or updating your documents. We need to know the higher value of i since that might differ from the count of documents in the collection and we need to account for any eventual skip of the value of i (deleted document, incorrect client increment). For accomplishing this in a truly correct way we would need to use distinct to make sure we are not missing any values and then querying for a document that would contain such a value. def load_data(collection, n=100): #let's skip some elements skiplist = [10, 12, 231 , 2 , 4] for i,d in load_data_file(n): d['i'] = i if i in skiplist: continue collection.insert( d ) load_data(collection, 100) distinct = collection.distinct('i') ivalue = random.sample(distinct, 1) winner = collection.find_one({ 'i': ivalue }) print "AND THE WINNER IS ..... " + winner['name'] Although we are starting to use MongoDB magic to give us some element of randomness this is not truly a good solution. Requires multiple trips to the database Becomes computationally bound on the client (again) Very prone to errors while data is being used Making more of the same tatoo To avoid a large computational bound on the client due to the high number of distinct i values, we could use a limited amount of i elements to mark our documents and randomize the occurrence of this mark : def load_data(collection, n=100): #fixed number of marks max_i = 10 for j,d in load_data_file(n): d['i'] = random.randint(0, max_i) collection.insert( d ) This way we are limiting the variation of the i value and limiting the computational task on both the client and on MongoDB: number_of_documents = 100 load_data(collection, number_of_documents ) query = {'i': random.randint(0, 10 ) } docs = [x for x in collection.find(query)] winner = random.sample(docs, 1)[0] print "AND THE WINNER IS ..... " + winner['name'] ... but then again, we have a better solution! The above implementations, no matter how magical they might be, more or less simplistic, multiple or single trips to the database ... they tend to: Be inefficient Create artificial workarounds on data Are not natively / pure MongoDB implementations And the random implementation is always bound to the client But fear not! Our 3.2 release brings a solution to this simple wished task: $sample $sample is a new aggregation framework operator that implements a native random sample operation over a collection data set: no more playing with extra fields and extra indexes no more double trips to the database to get your documents native, optimized, sampling operator implemented in the database: number_of_documents = 100 load_data(collection, number_of_documents) winner = [ d for d in collection.aggregate([{'$sample': {'size': 1 }}])][0] print "AND THE WINNER IS ..... " + winner['name'] Just beautiful! Learn more about the latest features included in the release of MongoDB 3.2. What's new in MongoDB 3.2 About the Author - Norberto Leite Norberto is a Software Engineer at MongoDB working on the content and materials that make up the MongoDB University curriculum. Prior to this role, Norberto served as Developer Advocate and Solutions Architect helping professionals develop their skills and understanding of MongoDB. Norberto has several years of software development experience in large scalable systems.

February 16, 2016

Santa Claus and His Distributed Team

So here it comes again, that happy season when Santa visits and brings the presents that we've earned for being so "well behaved" throughout the year. Well… it happens that there are a lot of requests (+6 billion of those!), requiring the Elves to build a very scalable system to support so many gift deliveries. Not only handling the total number of requests, but also the concentration of these requests around the holiday season. And of course, do not forget about the different types of websites that we need to build according to the regional variations of the requests (wool pajamas are likely to be requested next to the North Pole but not so much in the tropics!). To make sure that all requests are well served we need to have different applications serving this immense variability. Architecture design In order to deliver all presents in time for Christmas, Santa has asked Elf Chief Architect (ECA) to build a distributed, globally scalable platform so that regional Elves could build their apps to meet the needs of their local users. Apparently Elf Chief Architect has been attending several MongoDB Days Conferences and will certainly be attending MongoDB World , but one of the things that intrigued Elf Chief Architect was a talk around distributed container platforms supported by a MongoDB sharded cluster. Elf Chief Architect has a great deal of experience scaling databases using MongoDB, but the use of containers started gaining a lot of traction so he decided to give it a go. First thing that the ECA did was to deploy a container fleet on different data centers across the world: Schema design Using tag aware sharding, the Elfs team in Brazil can now build their app for South America with complete independence from the Japanese team //brazil team node.js focus on 4 languages portuguese spanish french dutch - yes, surinam and the duct antilles are located in SA! schema design for present requests { _id: "surf_board_12334244", color: "azul e amarelo", size: "M" present_for: "Ze Pequeno", address: { street: "Rua Noe 10", city: "Cidade de Deus" zip: "22773-410 - RJ", geo: { "type": "Point", "coordinates": [ -43.36263209581375, -22.949136390313374 ] } ... } //japan team ruby focus on 2 languages japanese english schema design for present requests { _id: { inc: 5535345, name: "Maneki Neko"} shape: "Round" receiver: "Shinobu Tsukasa", street: "〒東京都中央区 日本橋3-1-15, 1F", city: "Tokio" zip: "103-0027 - TK", position: { "type": "Point", "coordinates": [ 136.8573760986328, 35.14882797801035 ] } ... } As one can figure out from the above misaligned schema designs, the 2 teams have considerably different views on how to model some important base information. We start by the simple but quite important format of _id . On the Brazil team, a simple composed string would be enough to uniquely identify the intended gift while in Japan, they adopted a different strategy setting _id as a sub-document with 2 fields, incremental value ( inc ) and the name of the object ( name ). While both are valid MongoDB schemas,and can coexist the same collection (shared or not), this situation can cause some "interesting" side effects: more complex queries to find all intended elements index inefficiencies due to the multiple data types that we are indexing sorting issues ordering of keys in sub-documents will matter specific match criterion > db.test.find() { "_id" : { "a" : 1234, "b" : 5678 } } > db.test.find( { _id : { b : 5678, a : 1234 } } ) db.test.find( { _id : { a : 1234, b : 5678 } } )

December 23, 2015

Spring + MongoDB: Two Leaves of the Same Tree

Norberto Leite spoke at Spring.io last week. In this blog post, he will discuss his talk: Spring + MongoDB: Two Leaves of the Same Tree. Spring Framework and MongoDB At MongoDB, we believe that integration is key. It is important to integrate with a wide variety of languages and ecosystems, and it is important to expose all the features and functionality that make MongoDB great. We want you to be able to build all kinds of applications on all sorts of different environments using the tools that are best suited for your purpose. As part of this effort of enabling and boosting developer productivity, we support a large variety of programming languages through our official and community drivers and enabling existing frameworks and applications stacks to integrate correctly with MongoDB. Spring Framework, and in particular, Spring Data, is a very good example of how one can consolidate the development experience using familiar or well understood tools to build their applications Spring Projects and MongoDB Spring is one of the most prominent frameworks used across Java enterprise projects. Many applications across a variety of businesses, environments, and stacks rely on Spring projects for many of the integrations and general implementation of functionality. Some Spring projects are widely used like Spring Batch which offers a generic approach to batch processing, Spring Boot where we can automatize a large set of application processes so developers can focus on the business logic and differentiated algorithms of their apps and services. Spring Data offers applications a very powerful ODM to support not only application level abstraction of the persistence layer but also an integrated approach to handle all the data manipulation and common impedance mismatches that application logic provokes. This presentation discusses a set of features that make this integration "gel" well: Spring Data abstraction layer The way that Spring Data covers Object Mapping Optimizations that Spring Data enables to make the most out of MongoDB Batch processing and indexing will also be covered, with particular emphasis around the method overriding and query optimization. Use your tools wisely There are significant benefits of using ODMs, especially for large complex projects: Assures integration with existing components Abstraction layers allow architects to delay decisions and avoid pre-optimizations Common patterns that are recurrent across different data stores But also bear in mind that many of the existing ORMs/ODMs do not have a "Document Oriented Database" first policy but have been evolving to adjust to today’s database industry revolution. Many of the implementations are based on an architecture that is oriented to relational technology, and they make significant tradeoffs to accommodate several different systems. Spring Data is one of the most popular and best-designed ORM technologies out there. MongoDB is committed to making sure the integration between these technologies is great. View Norberto's presentation here .

May 8, 2015

Building your first application with MongoDB: Creating a REST API using the MEAN Stack - Part 2

Updated March 2017 Since this post, other MEAN & MERN stack posts have been written: The Modern Application Stack by Andrew Morgan. In the first part of this blog series , we covered the basic mechanics of our application and undertook some data modeling. In this second part, we will create tests that validate the behavior of our application and then describe how to set-up and run the application. Write the tests first Let&#x2019;s begin by defining some small configuration libraries. file name: test/config/test_config.js module.exports = { url : 'http://localhost:8000/api/v1.0' } Our server will be running on port 8000 on localhost. This will be fine for initial testing purposes. Later, if we change the location or port number for a production system, it would be very easy to just edit this file. To prepare for our test cases, we need to ensure that we have a good test environment. The following code achieves this for us. First, we connect to the database. file name: test/setup_tests.js function connectDB(callback) { mongoClient.connect(dbConfig.testDBURL, function(err, db) { assert.equal(null, err); reader_test_db = db; console.log("Connected correctly to server"); callback(0); }); } Next, we drop the user collection. This ensures that our database is in a known starting state. function dropUserCollection(callback) { console.log("dropUserCollection"); user = reader_test_db.collection('user'); if (undefined != user) { user.drop(function(err, reply) { console.log('user collection dropped'); callback(0); }); } else { callback(0); } }, Next, we will drop the user feed entry collection. function dropUserFeedEntryCollection(callback) { console.log("dropUserFeedEntryCollection"); user_feed_entry = reader_test_db.collection('user_feed_entry'); if (undefined != user_feed_entry) { user_feed_entry.drop(function(err, reply) { console.log('user_feed_entry collection dropped'); callback(0); }); } else { callback(0); } } Next, we will connect to Stormpath and delete all the users in our test application. function getApplication(callback) { console.log("getApplication"); client.getApplications({ name: SP_APP_NAME }, function(err, applications) { console.log(applications); if (err) { log("Error in getApplications"); throw err; } app = applications.items[0]; callback(0); }); }, function deleteTestAccounts(callback) { app.getAccounts({ email: TU_EMAIL_REGEX }, function(err, accounts) { if (err) throw err; accounts.items.forEach(function deleteAccount(account) { account.delete(function deleteError(err) { if (err) throw err; }); }); callback(0); }); } Next, we close the database. function closeDB(callback) { reader_test_db.close(); } Finally, we call async.series to ensure that all the functions run in the correct order. async.series([connectDB, dropUserCollection, dropUserFeedEntryCollection, dropUserFeedEntryCollection, getApplication, deleteTestAccounts, closeDB]); Frisby was briefly mentioned earlier. We will use this to define our test cases, as follows. file name: test/create_accounts_error_spec.js TU1_FN = "Test"; TU1_LN = "User1"; TU1_EMAIL = "testuser1@example.com"; TU1_PW = "testUser123"; TU_EMAIL_REGEX = 'testuser*'; SP_APP_NAME = 'Reader Test'; var frisby = require('frisby'); var tc = require('./config/test_config'); We will start with the enroll route in the following code. In this case we are deliberately missing the first name field, so we expect a status reply of 400 with a JSON error that we forgot to define the first name. Let&#x2019;s &#x201C;toss that frisby&#x201D;: frisby.create('POST missing firstName') .post(tc.url + '/user/enroll', { 'lastName' : TU1_LN, 'email' : TU1_EMAIL, 'password' : TU1_PW }) .expectStatus(400) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSON({'error' : 'Undefined First Name'}) .toss() In the following example, we are testing a password that does not have any lower-case letters. This would actually result in an error being returned by Stormpath, and we would expect a status reply of 400. frisby.create('POST password missing lowercase') .post(tc.url + '/user/enroll', { 'firstName' : TU1_FN, 'lastName' : TU1_LN, 'email' : TU1_EMAIL, 'password' : 'TESTUSER123' }) .expectStatus(400) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSONTypes({'error' : String}) .toss() In the following example, we are testing an invalid email address. So, we can see that there is no @ sign and no domain name in the email address we are passing, and we would expect a status reply of 400. frisby.create('POST invalid email address') .post(tc.url + '/user/enroll', { 'firstName' : TU1_FN, 'lastName' : TU1_LN, 'email' : "invalid.email", 'password' : 'testUser' }) .expectStatus(400) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSONTypes({'error' : String}) .toss() Now, let&#x2019;s look at some examples of test cases that should work. Let&#x2019;s start by defining 3 users. file name: test/create_accounts_spec.js TEST_USERS = [{'fn' : 'Test', 'ln' : 'User1', 'email' : 'testuser1@example.com', 'pwd' : 'testUser123'}, {'fn' : 'Test', 'ln' : 'User2', 'email' : 'testuser2@example.com', 'pwd' : 'testUser123'}, {'fn' : 'Test', 'ln' : 'User3', 'email' : 'testuser3@example.com', 'pwd' : 'testUser123'}] SP_APP_NAME = 'Reader Test'; var frisby = require('frisby'); var tc = require('./config/test_config'); In the following example, we are sending the array of the 3 users we defined above and are expecting a success status of 201. The JSON document returned would show the user object created, so we can verify that what was created matched our test data. TEST_USERS.forEach(function createUser(user, index, array) { frisby.create('POST enroll user ' + user.email) .post(tc.url + '/user/enroll', { 'firstName' : user.fn, 'lastName' : user.ln, 'email' : user.email, 'password' : user.pwd }) .expectStatus(201) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSON({ 'firstName' : user.fn, 'lastName' : user.ln, 'email' : user.email }) .toss() }); Next, we will test for a duplicate user. In the following example, we will try to create a user where the email address already exists. frisby.create('POST enroll duplicate user ') .post(tc.url + '/user/enroll', { 'firstName' : TEST_USERS[0].fn, 'lastName' : TEST_USERS[0].ln, 'email' : TEST_USERS[0].email, 'password' : TEST_USERS[0].pwd }) .expectStatus(400) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSON({'error' : 'Account with that email already exists. Please choose another email.'}) .toss() One important issue is that we don&#x2019;t know what API key will be returned by Stormpath a priori. So, we need to create a file dynamically that looks like the following. We can then use this file to define test cases that require us to authenticate a user. file name: /tmp/readerTestCreds.js TEST_USERS = [{ "_id":"54ad6c3ae764de42070b27b1", "email":"testuser1@example.com", "firstName":"Test", "lastName":"User1", "sp_api_key_id":” ", "sp_api_key_secret":” ” }, { "_id":"54ad6c3be764de42070b27b2”, "email":"testuser2@example.com", "firstName":"Test", "lastName":"User2”, "sp_api_key_id":” ", "sp_api_key_secret":” ” }]; module.exports = TEST_USERS; In order to create the temporary file above, we need to connect to MongoDB and retrieve user information. This is achieved by the following code. file name: tests/writeCreds.js TU_EMAIL_REGEX = new RegExp('^testuser*'); SP_APP_NAME = 'Reader Test'; TEST_CREDS_TMP_FILE = '/tmp/readerTestCreds.js'; var async = require('async'); var dbConfig = require('./config/db.js'); var mongodb = require('mongodb'); assert = require('assert'); var mongoClient = mongodb.MongoClient var reader_test_db = null; var users_array = null; function connectDB(callback) { mongoClient.connect(dbConfig.testDBURL, function(err, db) { assert.equal(null, err); reader_test_db = db; callback(null); }); } function lookupUserKeys(callback) { console.log("lookupUserKeys"); user_coll = reader_test_db.collection('user'); user_coll.find({email : TU_EMAIL_REGEX}).toArray(function(err, users) { users_array = users; callback(null); }); } function writeCreds(callback) { var fs = require('fs'); fs.writeFileSync(TEST_CREDS_TMP_FILE, 'TEST_USERS = '); fs.appendFileSync(TEST_CREDS_TMP_FILE, JSON.stringify(users_array)); fs.appendFileSync(TEST_CREDS_TMP_FILE, '; module.exports = TEST_USERS;'); callback(0); } function closeDB(callback) { reader_test_db.close(); } async.series([connectDB, lookupUserKeys, writeCreds, closeDB]); In the following code, we can see that the first line uses the temporary file that we created with the user information. We have also defined several feeds, such as Dilbert and the Eater Blog. file name: tests/feed_spec.js TEST_USERS = require('/tmp/readerTestCreds.js'); var frisby = require('frisby'); var tc = require('./config/test_config'); var async = require('async'); var dbConfig = require('./config/db.js'); var dilbertFeedURL = 'http://feeds.feedburner.com/DilbertDailyStrip'; var nycEaterFeedURL = 'http://feeds.feedburner.com/eater/nyc'; Previously, we defined some users but none of them had subscribed to any feeds. In the following code we test feed subscription. Note that authentication is required now and this is achieved using .auth with the Stormpath API keys. Our first test is to check for an empty feed list. function addEmptyFeedListTest(callback) { var user = TEST_USERS[0]; frisby.create('GET empty feed list for user ' + user.email) .get(tc.url + '/feeds') .auth(user.sp_api_key_id, user.sp_api_key_secret) .expectStatus(200) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSON({feeds : []}) .toss() callback(null); } In our next test case, we will subscribe our first test user to the Dilbert feed. function subOneFeed(callback) { var user = TEST_USERS[0]; frisby.create('PUT Add feed sub for user ' + user.email) .put(tc.url + '/feeds/subscribe', {'feedURL' : dilbertFeedURL}) .auth(user.sp_api_key_id, user.sp_api_key_secret) .expectStatus(201) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSONLength('user.subs', 1) .toss() callback(null); } In our next test case, we will try to subscribe our first test user to a feed that they are already subscribed-to. function subDuplicateFeed(callback) { var user = TEST_USERS[0]; frisby.create('PUT Add duplicate feed sub for user ' + user.email) .put(tc.url + '/feeds/subscribe', {'feedURL' : dilbertFeedURL}) .auth(user.sp_api_key_id, user.sp_api_key_secret) .expectStatus(201) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSONLength('user.subs', 1) .toss() callback(null); } Next, we will subscribe our test user to a new feed. The result returned should confirm that the user is subscribed now to 2 feeds. function subSecondFeed(callback) { var user = TEST_USERS[0]; frisby.create('PUT Add second feed sub for user ' + user.email) .put(tc.url + '/feeds/subscribe', {'feedURL' : nycEaterFeedURL}) .auth(user.sp_api_key_id, user.sp_api_key_secret) .expectStatus(201) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSONLength('user.subs', 2) .toss() callback(null); } Next, we will use our second test user to subscribe to a feed. function subOneFeedSecondUser(callback) { var user = TEST_USERS[1]; frisby.create('PUT Add one feed sub for second user ' + user.email) .put(tc.url + '/feeds/subscribe', {'feedURL' : nycEaterFeedURL}) .auth(user.sp_api_key_id, user.sp_api_key_secret) .expectStatus(201) .expectHeader('Content-Type', 'application/json; charset=utf-8') .expectJSONLength('user.subs', 1) .toss() callback(null); } async.series([addEmptyFeedListTest, subOneFeed, subDuplicateFeed, subSecondFeed, subOneFeedSecondUser]); The REST API Before we begin writing our REST API code, we need to define some utility libraries. First, we need to define how our application will connect to the database. Putting this information into a file gives us the flexibility to add different database URLs for development or production systems. file name: config/db.js module.exports = { url : 'mongodb://localhost/reader_test' } If we wanted to turn on database authentication we could put that information in a file, as shown below. This file should not be checked into source code control for obvious reasons. file name: config/security.js module.exports = { stormpath_secret_key : ‘YOUR STORMPATH APPLICATION KEY’; } We can keep Stormpath API and Secret keys in a properties file, as follows, and need to carefully manage this file as well. file name: config/stormpath_apikey.properties apiKey.id = YOUR STORMPATH API KEY ID apiKey.secret = YOUR STORMPATH API KEY SECRET Express.js overview In Express.js, we create an &#x201C;application&#x201D; (app). This application listens on a particular port for HTTP requests to come in. When requests come in, they pass through a middleware chain. Each link in the middleware chain is given a req (request) object and a res (results) object to store the results. Each link can choose to do work, or pass it to the next link. We add new middleware via app.use() . The main middleware is called our &#x201C;router&#x201D;, which looks at the URL and routes each different URL/verb combination to a specific handler function. Creating our application Now we can finally see our application code, which is quite small since we can embed handlers for various routes into separate files. file name: server.js var express = require('express'); var bodyParser = require('body-parser'); var mongoose = require('mongoose'); var stormpath = require('express-stormpath'); var routes = require("./app/routes"); var db = require('./config/db'); var security = require('./config/security'); var app = express(); var morgan = require('morgan’); app.use(morgan); app.use(stormpath.init(app, { apiKeyFile: './config/stormpath_apikey.properties', application: ‘YOUR SP APPLICATION URL', secretKey: security.stormpath_secret_key })); var port = 8000; mongoose.connect(db.url); app.use(bodyParser.urlencoded({ extended: true })); routes.addAPIRouter(app, mongoose, stormpath); We define our own middleware at the end of the chain to handle bad URLs. app.use(function(req, res, next){ res.status(404); res.json({ error: 'Invalid URL' }); }); Now our server application is listening on port 8000. app.listen(port); Let&#x2019;s print a message on the console to the user. console.log('Magic happens on port ' + port); exports = module.exports = app; Defining our Mongoose data models We use Mongoose to map objects on the Node.js side to documents inside MongoDB. Recall that earlier, we defined 4 collections: Feed collection. Feed entry collection. User collection. User feed-entry-mapping collection. So we will now define schemas for these 4 collections. Let&#x2019;s begin with the user schema. Notice that we can also format the data, such as converting strings to lowercase, and remove leading or trailing whitespace using trim. file name: app/routes.js var userSchema = new mongoose.Schema({ active: Boolean, email: { type: String, trim: true, lowercase: true }, firstName: { type: String, trim: true }, lastName: { type: String, trim: true }, sp_api_key_id: { type: String, trim: true }, sp_api_key_secret: { type: String, trim: true }, subs: { type: [mongoose.Schema.Types.ObjectId], default: [] }, created: { type: Date, default: Date.now }, lastLogin: { type: Date, default: Date.now }, }, { collection: 'user' } ); In the following code, we can also tell Mongoose what indexes need to exist. Mongoose will also ensure that these indexes are created if they do not already exist in our MongoDB database. The unique constraint ensures that duplicates are not allowed. The &#x201C;email : 1&#x201D; maintains email addresses in ascending order. If we used &#x201C;email : -1&#x201D; it would be in descending order. userSchema.index({email : 1}, {unique:true}); userSchema.index({sp_api_key_id : 1}, {unique:true}); We repeat the process for the other 3 collections. var UserModel = mongoose.model( 'User', userSchema ); var feedSchema = new mongoose.Schema({ feedURL: { type: String, trim:true }, link: { type: String, trim:true }, description: { type: String, trim:true }, state: { type: String, trim:true, lowercase:true, default: 'new' }, createdDate: { type: Date, default: Date.now }, modifiedDate: { type: Date, default: Date.now }, }, { collection: 'feed' } ); feedSchema.index({feedURL : 1}, {unique:true}); feedSchema.index({link : 1}, {unique:true, sparse:true}); var FeedModel = mongoose.model( 'Feed', feedSchema ); var feedEntrySchema = new mongoose.Schema({ description: { type: String, trim:true }, title: { type: String, trim:true }, summary: { type: String, trim:true }, entryID: { type: String, trim:true }, publishedDate: { type: Date }, link: { type: String, trim:true }, feedID: { type: mongoose.Schema.Types.ObjectId }, state: { type: String, trim:true, lowercase:true, default: 'new' }, created: { type: Date, default: Date.now }, }, { collection: 'feedEntry' } ); feedEntrySchema.index({entryID : 1}); feedEntrySchema.index({feedID : 1}); var FeedEntryModel = mongoose.model( 'FeedEntry', feedEntrySchema ); var userFeedEntrySchema = new mongoose.Schema({ userID: { type: mongoose.Schema.Types.ObjectId }, feedEntryID: { type: mongoose.Schema.Types.ObjectId }, feedID: { type: mongoose.Schema.Types.ObjectId }, read : { type: Boolean, default: false }, }, { collection: 'userFeedEntry' } ); The following is an example of a compound index on 4 fields. Each index is maintained in ascending order. userFeedEntrySchema.index({userID : 1, feedID : 1, feedEntryID : 1, read : 1}); var UserFeedEntryModel = mongoose.model('UserFeedEntry', userFeedEntrySchema ); Every route that comes in for GET , POST , PUT and DELETE needs to have the correct content type, which is application/json . Then the next link in the chain is called. exports.addAPIRouter = function(app, mongoose, stormpath) { app.get('/*', function(req, res, next) { res.contentType('application/json'); next(); }); app.post('/*', function(req, res, next) { res.contentType('application/json'); next(); }); app.put('/*', function(req, res, next) { res.contentType('application/json'); next(); }); app.delete('/*', function(req, res, next) { res.contentType('application/json'); next(); });</code></pre> Now we need to define handlers for each combination of URL/verb. The link to the complete code is available in the resources section and we just show a few examples below. Note the ease with which we can use Stormpath. Furthermore, notice that we have defined /api/v1.0 , so the client would actually call /api/v1.0/user/enroll , for example. In the future, if we changed the API, say to 2.0, we could use /api/v2.0 . This would have its own router and code, so clients using the v1.0 API would still continue to work. var router = express.Router(); router.post('/user/enroll', function(req, res) { logger.debug('Router for /user/enroll'); … } router.get('/feeds', stormpath.apiAuthenticationRequired, function(req, res) { logger.debug('Router for /feeds'); … } router.put('/feeds/subscribe', stormpath.apiAuthenticationRequired, function(req, res) { logger.debug('Router for /feeds'); … } app.use('/api/v1.0', router); } Starting the server and running tests Finally, here is a summary of the steps we need to follow to start the server and run the tests. Ensure that the MongoDB instance is running mongod Install the Node libraries npm install Start the REST API server node server.js Run test cases node setup_tests.js jasmine-node create_accounts_error_spec.js jasmine-node create_accounts_spec.js node write_creds.js jasmine-node feed_spec.js MongoDB University provides excellent free training. There is a course specifically aimed at Node.js developers and the link can be found in the resources section below. The resources section also contains links to good MongoDB data modeling resources. Resources HTTP status code definitions Chad Tindel&#x2019;s Github Repository M101JS: MongoDB for Node.js Developers Data Models Data Modeling Considerations for MongoDB Applications Want to learn more MongoDB? Explore our Starter-Kit: MongoDB Starter-Kit << Read Part 1 &#xA0; About the Author - Norberto Norberto Leite is Technical Evangelist at MongoDB. Norberto has been working for the last 5 years on large scalable and distributable application environments, both as advisor and engineer. Prior to MongoDB Norberto served as a Big Data Engineer at Telefonica.

April 16, 2015

Building your first application with MongoDB: Creating a REST API using the MEAN Stack - Part 1

If you’re looking for REST-like access to your Atlas data, check out the new Data API to get instantly generated endpoints for reading and modifying your data over HTTPS. Introduction In this 2-part blog series , you will learn how to use MongoDB, Mongoose Object Data Mapping (ODM) with Express.js and Node.js. These technologies use a uniform language - JavaScript - providing performance gains in the software and productivity gains for developers. In this first part, we will describe the basic mechanics of our application and undertake data modeling. In the second part , we will create tests that validate the behavior of our application and then describe how to set-up and run the application. No prior experience with these technologies is assumed and developers of all skill levels should benefit from this blog series. So, if you have no previous experience using MongoDB, JavaScript or building a REST API , don&#x2019;t worry - we will cover these topics with enough detail to get you past the simplistic examples one tends to find online, including authentication, structuring code in multiple files, and writing test cases. Let&#x2019;s begin by defining the MEAN stack. What is the MEAN stack? The MEAN stack can be summarized as follows: M = MongoDB/Mongoose.js: the popular database, and an elegant ODM for node.js. E = Express.js: a lightweight web application framework. A = Angular.js: a robust framework for creating HTML5 and JavaScript-rich web applications. N = Node.js: a server-side JavaScript interpreter. The MEAN stack is a modern replacement for the LAMP (Linux, Apache, MySQL, PHP/Python) stack that became the popular way for building web applications in the late 1990s. In our application, we won&#x2019;t be using Angular.js, as we are not building an HTML user interface. Instead, we are building a REST API which has no user interface, but could instead serve as the basis for any kind of interface, such as a website, an Android application, or an iOS application. You might say we are building our REST API on the ME(a)N stack, but we have no idea how to pronounce that! What is a REST API? REST stands for Representational State Transfer. It is a lighter weight alternative to SOAP and WSDL XML-based API protocols. REST uses a client-server model, where the server is an HTTP server and the client sends HTTP verbs (GET, POST, PUT, DELETE), along with a URL and variable parameters that are URL-encoded. The URL describes the object to act upon and the server replies with a result code and valid JavaScript Object Notation (JSON). Because the server replies with JSON, it makes the MEAN stack particularly well suited for our application, as all the components are in JavaScript and MongoDB interacts well with JSON. We will see some JSON examples later, when we start defining our Data Models. The CRUD acronym is often used to describe database operations. CRUD stands for CREATE, READ, UPDATE, and DELETE. These database operations map very nicely to the HTTP verbs, as follows: POST: A client wants to insert or create an object. GET: A client wants to read an object. PUT: A client wants to update an object. DELETE: A client wants to delete an object. These operations will become clear later when define our API. Some of the common HTTP result codes that are often used inside REST APIs are as follows: 200 - &#x201C;OK&#x201D;. 201 - &#x201C;Created&#x201D; (Used with POST). 400 - &#x201C;Bad Request&#x201D; (Perhaps missing required parameters). 401 - &#x201C;Unauthorized&#x201D; (Missing authentication parameters). 403 - &#x201C;Forbidden&#x201D; (You were authenticated but lacking required privileges). 404 - &#x201C;Not Found&#x201D;. A complete description can be found in the RFC document, listed in the resources section at the end of this blog. We will use these result codes in our application and you will see some examples shortly. Why Are We Starting with a REST API? Developing a REST API enables us to create a foundation upon which we can build all other applications. As previously mentioned, these applications may be web-based or designed for specific platforms, such as Android or iOS. Today, there are also many companies that are building applications that do not use an HTTP or web interface, such as Uber, WhatsApp, Postmates, and Wash.io. A REST API also makes it easy to implement other interfaces or applications over time, turning the initial project from a single application into a powerful platform. Creating our REST API The application that we will be building will be an RSS Aggregator, similar to Google Reader. Our application will have two main components: The REST API Feed Grabber (similar to Google Reader) In this blog series we will focus on building the REST API, and we will not cover the intricacies of RSS feeds. However, code for Feed Grabber is available in a github repository, listed in the resources section of this blog. Let&#x2019;s now describe the process we will follow in building our API. We will begin by defining the data model for the following requirements: Store user information in user accounts Track RSS feeds that need to be monitored Pull feed entries into the database Track user feed subscriptions Track which feed entry a user has already read Users will need to be able to do the following: Create an account Subscribe/unsubscribe to feeds Read feed entries Mark feeds/entries as read or unread Modeling Our Data An in-depth discussion on data modeling in MongoDB is beyond the scope of this article, so see the references section for good resources on this topic. We will need 4 collections to manage this information: Feed collection Feed entry collection User collection User-feed-entry mapping collection Let&#x2019;s take a closer look at each of these collections. Feed Collection Lets now look at some code. To model a feed collection, we can use the following JSON document: { "_id": ObjectId("523b1153a2aa6a3233a913f8"), "requiresAuthentication": false, "modifiedDate": ISODate("2014-08-29T17:40:22Z"), "permanentlyRemoved": false, "feedURL": "http://feeds.feedburner.com/eater/nyc", "title": "Eater NY", "bozoBitSet": false, "enabled": true, "etag": "4bL78iLSZud2iXd/vd10mYC32BE", "link": "http://ny.eater.com/", "permanentRedirectURL": null, "description": "The New York City Restaurant, Bar, and Nightlife Blog” } If you are familiar with relational database technology, then you will know about databases, tables, rows and columns. In MongoDB, there is a mapping to most of these Relational concepts. At the highest level, a MongoDB deployment supports one or more databases. A database contains one or more collections, which are the similar to tables in a relational database. Collections hold documents. Each document in a collection is, at a highest level, similar to a row in a relational table. However, documents do not follow a fixed schema with pre-defined columns of simple values. Instead, each document consists of one or more key-value pairs where the value can be simple (e.g., a date), or more sophisticated (e.g., an array of address objects). Our JSON document above is an example of one RSS feed for the Eater Blog, which tracks information about restaurants in New York City. We can see that there are a number of different fields but the key ones that our client application may be interested in include the URL of the feed and the feed description . The description is important so that if we create a mobile application, it would show a nice summary of the feed. The remaining fields in our JSON document are for internal use. A very important field is _id . In MongoDB, every document must have a field called _id . If you create a document without this field, at the point where you save the document, MongoDB will create it for you. In MongoDB, this field is a primary key and MongoDB will guarantee that within a collection, this value is unique. Feed Entry Collection After feeds, we want to track feed entries. Here is an example of a document in the feed entry collection: { "_id": ObjectId("523b1153a2aa6a3233a91412"), "description": "Buzzfeed asked a bunch of people...”, "title": "Cronut Mania: Buzzfeed asked a bunch of people...", "summary": "Buzzfeed asked a bunch of people that were...”, "content": [{ "base": "http://ny.eater.com/", "type": "text/html", "value": ”LOTS OF HTML HERE ", "language": "en" }], "entryID": "tag:ny.eater.com,2013://4.560508", "publishedDate": ISODate("2013-09-17T20:45:20Z"), "link": "http://ny.eater.com/archives/2013/09/cronut_mania_41.php", "feedID": ObjectId("523b1153a2aa6a3233a913f8") } Again, we can see that there is a _id field. There are also some other fields, such as description , title and summary . For the content field, note that we are using an array, and the array is also storing a document. MongoDB allows us to store sub-documents in this way and this can be very useful in some situations, where we want to hold all information together. The entryID field uses the tag format to avoid duplicate feed entries. Notice also the feedID field that is of type ObjectId - the value is the _id of the Eater Blog document, described earlier. This provides a referential model, similar to a foreign key in a relational database. So, if we were interested to see the feed document associated with this ObjectId , we could take the value 523b1153a2aa6a3233a913f8 and query the feed collection on _id , and it would return the Eater Blog document. User Collection Here is the document we could use to keep track of users: { "_id" : ObjectId("54ad6c3ae764de42070b27b1"), "active" : true, "email" : "testuser1@example.com", "firstName" : "Test", "lastName" : "User1", "sp_api_key_id" : "6YQB0A8VXM0X8RVDPPLRHBI7J", "sp_api_key_secret" : "veBw/YFx56Dl0bbiVEpvbjF”, "lastLogin" : ISODate("2015-01-07T17:26:18.996Z"), "created" : ISODate("2015-01-07T17:26:18.995Z"), "subs" : [ ObjectId("523b1153a2aa6a3233a913f8"), ObjectId("54b563c3a50a190b50f4d63b") ], } A user has an email address, first name and last name . There is also an sp_api_key_id and sp_api_key_secret - we will use these later with Stormpath , a user management API. The last field, called subs, is a subscription array. The subs field tells us which feeds this user is subscribed-to. User-Feed-Entry Mapping Collection The last collection allows us to map users to feeds and to track which feeds have been read. { "_id" : ObjectId("523b2fcc054b1b8c579bdb82"), "read" : true, "user_id" : ObjectId("54ad6c3ae764de42070b27b1"), "feed_entry_id" : ObjectId("523b1153a2aa6a3233a91412"), "feed_id" : ObjectId("523b1153a2aa6a3233a913f8") } We use a Boolean (true/false) to mark the feed as read or unread. Functional Requirements for the REST API As previously mentioned, users need to be able to do the following: Create an account. Subscribe/unsubscribe to feeds. Read feed entries. Mark feeds/entries as read or unread. Additionally, a user should be able to reset their password. The following table shows how these operations can be mapped to HTTP routes and verbs. Route Verb Description Variables /user/enroll POST Register a new user firstName lastName email password /user/resetPassword PUT Password Reset email /feeds GET Get feed subscriptions for each user with description and unread count /feeds/subscribe PUT Subscribe to a new feed feedURL /feeds/entries GET Get all entries for feeds the user is subscribed to /feeds/&ltfeedid&gt/entries GET Get all entries for a specific feed /feeds/&ltfeedid&gt PUT Mark all entries for a specific feed as read or unread read = &lttrue | false&gt /feeds/&ltfeedid&gt/entries/&ltentryid&gt PUT Mark a specific entry as either read or unread read = &lttrue | false&gt /feeds/&ltfeedid&gt DELETE Unsubscribe from this particular feed In a production environment, the use of secure HTTP (HTTPS) would be the standard approach when sending sensitive details, such as passwords. Real World Authentication with Stormpath In robust real-world applications it is important to provide user authentication. We need a secure approach to manage users, passwords, and password resets. There are a number of ways we could authenticate users for our application. One possibility is to use Node.js with the Passport Plugin, which could be useful if we wanted to authenticate with social media accounts, such as Facebook or Twitter. However, another possibility is to use Stormpath. Stormpath provides User Management as a Service and supports authentication and authorization through API keys. Basically, Stormpath maintains a database of user details and passwords and a client application REST API would call the Stormpath REST API to perform user authentication. The following diagram shows the flow of requests and responses using Stormpath. In detail, Stormpath will provide a secret key for each &#x201C;Application&#x201D; that is defined with their service. For example, we could define an application as &#x201C;Reader Production&#x201D; or &#x201C;Reader Test&#x201D;. This could be very useful when we are still developing and testing our application, as we may be frequently adding and deleting test users. Stormpath will also provide an API Key Properties file. Stormpath also allows us to define password strength requirements for each application, such as: Must have >= 8 characters. Must include lowercase and uppercase. Must include a number. Must include a non-alphabetic character Stormpath keeps track of all of our users and assigns them API keys, which we can use for our REST API authentication. This greatly simplifies the task of building our application, as we don&#x2019;t have to focus on writing code for authenticating users. Node.js Node.js is a runtime environment for server-side and network applications. Node.js uses JavaScript and it is available for many different platforms, such as Linux, Microsoft Windows and Apple OS X. Node.js applications are built using many library modules and there is a very rich ecosystem of libraries available, some of which we will use to build our application. To start using Node.js, we need to define a package.json file describing our application and all of its library dependencies. The Node.js Package Manager installs copies of the libraries in a subdirectory, called node_modules/ , in the application directory. This has benefits, as it isolates the library versions for each application and so avoids code compatibility problems if the libraries were to be installed in a standard system location, such as /usr/lib , for example. The command npm install will create the node_modules/ directory, with all of the required libraries. Here is the JavaScript from our package.json file: { "name": "reader-api", "main": "server.js", "dependencies": { "express" : "~4.10.0", "stormpath" : "~0.7.5", "express-stormpath" : "~0.5.9", "mongodb" : "~1.4.26”, "mongoose" : "~3.8.0", "body-parser" : "~1.10.0”, "method-override" : "~2.3.0", "morgan" : "~1.5.0”, "winston" : "~0.8.3”, "express-winston" : "~0.2.9", "validator" : "~3.27.0", "path" : "~0.4.9", "errorhandler" : "~1.3.0", "frisby" : "~0.8.3", "jasmine-node" : "~1.14.5", "async" : "~0.9.0" } } Our application is called reader-api . The main file is called server.js . Then we have a list of the dependent libraries and their versions. Some of these libraries are designed for parsing the HTTP queries. The test harness we will use is called frisby . The jasmine-node is used to run frisby scripts. One library that is particularly important is async . If you have never used node.js, it is important to understand that node.js is designed to be asynchronous. So, any function which does blocking input/output (I/O), such as reading from a socket or querying a database, will take a callback function as the last parameter, and then continue with the control flow, only returning to that callback function once the blocking operation has completed. Let&#x2019;s look at the following simple example to demonstrate this. function foo() { someAsyncFunction(params, function(err, results) { console.log(“one”); }); console.log(“two”); } In the above example, we may think that the output would be: one two but in fact it might be: two one because the line that prints &#x201C;one&#x201D; might happen later, asynchronously, in the callback. We say &#x201C;might&#x201D; because if conditions are just right, &#x201C;one&#x201D; might print before &#x201C;two&#x201D;. This element of uncertainty in asynchronous programming is called non-deterministic execution. For many programming tasks, this is actually desirable and permits high performance, but clearly there are times when we want to execute functions in a particular order. The following example shows how we could use the async library to achieve the desired result of printing the numbers in the correct order: actionArray = [ function one(cb) { someAsyncFunction(params, function(err, results) { if (err) { cb(new Error(“There was an error”)); } console.log(“one”); cb(null); }); }, function two(cb) { console.log(“two”); cb(null); } ] async.series(actionArray); In the above code, we are guaranteed that function two will only be called after function one has completed. Wrapping Up Part 1 Now that we have seen the basic mechanics of node.js and async function setup, we are ready to move on. Rather than move into creating the application, we will instead start by creating tests that validate the behavior of the application. This approach is called test-driven development and has two very good features: It helps the developer really understand how data and functions are consumed and often exposes subtle needs like the ability to return 2 or more things in an array instead of just one thing. By writing tests before building the application, the paradigm becomes &#x201C;broken / unimplemented until proven tested OK&#x201D; instead of &#x201C;assumed to be working until a test fails.&#x201D; The former is a &#x201C;safer&#x201D; way to keep the code healthy. Learn more MongoDB by exploring our Starter-Kit: MongoDB Starter-Kit Read Part 2 >> About the Author - Norberto Norberto Leite is Technical Evangelist at MongoDB. Norberto has been working for the last 5 years on large scalable and distributable application environments, both as advisor and engineer. Prior to MongoDB Norberto served as a Big Data Engineer at Telefonica.

April 14, 2015

How Santa Uses MongoDB Part 2: Delivering Presents and Data Close To You!

Which Presents Should Be Packed Together? In the past, one magical bag fit all , but Santa learned that efficiently grouping presents into several different bags can really reduce his carbon footprint and save on reindeer overhead. And since Christmas is celebrated at different times across the globe, Santa needs to follow the sun when delivering presents. The number of bags and their contents is one consideration, but so is when and where. Santa knows, time zone is an important field to take into account: db.presents.update( { "location.address.country": {"$in": ["PT", "UK", "IR" ] } }, {"$set": {"location.time_zone": "WET"}} ) {code} Because MongoDB has a very flexible data structure, Master Elf DBA can easily make changes to the application without migrating the data. Santa and his team can easily adopt the evolving needs of the holiday season without ever asking the world to reschedule Christmas: { "_id": 123300410230, "name": "Norberto Leite", "location":{ "geo": { "type": "Point", "coordinates": [ -8.611720204353333, 41.14341253682037] }, "address" : { "country": "PT", "city": "Porto", "street": "Rua Escura 1", "zip": [4000, "Porto"], "time_zone": "WET" }, }, } With this information set on our schema Santa can now allocate presents into bags for efficient delivery in each time zone. Santa needs to make sure that the Elves organize the bags optimally. Elves are very cool developers, true hackers, and know that MongoDB offers an aggregation framework that allows them to perform real-time analytics and demanding aggregation queries. The Elves collect information on how to group presents by time zone using a document: { "_id": 123300410230, "name": "Norberto Leite", "location":{ "geo": { "type": "Point", "coordinates": [ -8.611720204353333, 41.14341253682037] }, "address" : { "country": "PT", "city": "Porto", "street": "Rua Escura 1", "zip": [4000, "Porto"], }, "time_zone": "WET" }, "present": { "type": "game console", "brand": "NotPlaystation", "name": "PS", "model": "4" }, } The Elves can then run queries that collect and group the presents based on time zone and present name: //collect all presents for Santa's Bag for EST time zone db.presents.aggregate( [ { "$match": { "location.time_zone": "EST" } } , // group and count the number of presents per present {"$group" : { "_id": "$present.name", "numberOf": {"$sum":1} } } ] ) This flexibility and power makes the Elves feel very good about themselves! Santa Reads Each Letter One By One First of all, paper letters are not in fashion anymore. Santa uses email. For the traditionalists that like to send paper letters we’re pretty sure that Santa is using a MongoDB client with technology to process ordinary paper letters, digitize them and deliver them via email. (In addition to email you can also reach Santa by Twitter or even by Facebook , although you want to keep your present requests private and away from social criticism, be careful about that!) Santa will decide if you deserve your present requests by checking if you’ve been a good boy or girl on your Facebook and Twitter profiles! OK, that’s not true. Santa asks your parents for feedback -- Parental Control 101! Be good to your mom and dad and you’ll get your presents. Santa uses your social profile to get in touch with your parents and asks them to validate that you have been a good boy or girl this year: { db.presents.update( {"_id": 123300410230}, {"$set": { "present.approved": { "whom": "dad", "when": ISODate("2014-12-18T11:27:05.228Z") } }}) } Santa will only deliver the presents that have approval. Therefore we need to check which presents have the approval field: db.presents.find( {"present.parents.approved" : {"$exists":1} }) Santa and His Elves All Live in Lappland We all know that the real Santa Claus lives in Finnish Lappland . But like MongoDB, Santa decided to have personnel all over the Globe . There are lots of advantages, including: Engagement with local culture Removal of language barriers More efficient distribution of presents Less fuel consumption ( reindeer also produce greenhouse gas emissions! ) Animal and Wildlife agencies were also pressuring Santa to reduce the stress caused by constant jet-lag endured by the reindeer. From an operational perspective these seem like best practices. But how about data? How would the beach bum Elves, who work from sunny Barcelona, be able to efficiently access the data that tells them what needs to be bagged for Europe? How would the parents in Australia be able to approve their children’s requests given that data would need to travel all the way back to Santa’s Lappland? Well, for that MongoDB also comes to the rescue. MongoDB offers different ways to partition data and distribute load across data centers through a technique called sharding. To efficiently accomplish this we need to select a good shard key . Santa and Master Elf ( certified MongoDB DBA ) sought out to analyse the types of queries (functionality requested) and understand how to best distribute data (load distribution) across all aspects of the application. This was their line of thought: Presents will be put in bags according to time zone (distribution of load) Elves that work on the bagging process only need access to data for their time zone (read isolation) Parents need to approve their children’s requests, and we want to minimize the latency for that operation (local writes) New present requests and parent operations must be very fast and processed concurrently (write distribution) There’s risk we might have saturation on different time zones, so we might need to have more nodes per time zone (capacity distribution) Santa’s present intake dashboard should perform in the same way if all data is at Lappland (local reads) Given this scenario, Santa asked master Elf to shard data based on {time_zone} and {_id}. This will give a combination of locality and high cardinality. Sharded data will also reflect the majority of queries of the system, allowing a good, effective distribution of data and load. Master Elf created the shard cluster through the following procedure: Launched config servers (on distributed and time zones data centers) Launched multiple mongos processes in each of Santa’s data centers (more than one for mongos failover!) Added the existing replica to the shard cluster Redirected Santa’s presents management application to the new mongos processes Then finally Master Elf shared the presents collection: //enable sharding on the 2014 presents database sh.enableSharding( "2014"); //let's not forget to create the appropriate index db.presents.ensureIndex( { "location.time_zone":1, "_id":1 }) //set the shard key on the presents collection sh.shardCollection("2014.presents", { "location.time_zone":1, "_id":1 } ) Obviously Master Elf did all of this work from the shell. But he did not need to do that! Master Elf could have simply connected to his MongoDB Management Service (MMS) account and deployed a sharded cluster following just a couple of clicks . But that my friends, is material for next year’s story! Santa Claus Must Be Loaded! Well, Santa Claus is obviously a very generous person but that does not make him a crazy spender! Like any good CTO or CEO, Santa wants to make sure he keeps to the yearly budget while still providing the elves with all the right tools and infrastructure they need to do their jobs. By having everyone certified as either MongoDB developers or DBAs , Santa avoids any issues during the holidays. Santa also cuts costs by reducing infrastructure spend during the off season. Santa has an elastic infrastructure that allows him to deploy more nodes efficiently to accommodate the seasonal nature of the application. He holds two replicas of all time zones in Lappland, having a local primary in each time zone data center, but also shutting down primary during the off season. This allows Santa to only deploy processing power when needed. Santa Uses MongoDB! Off course he does! Why do think Elves dress in green? Considering all the cool features that MongoDB offers: Geo Distribution of Data Write Distribution Read Isolation Geospatial indexes Aggregation Framework Dynamic Schema Why wouldn’t he!? If you're interested in learning more about how Santa put together his Christmas plan, download the MongoDB Architecture guide here: <a class="btn btn-primary href="/lp/whitepaper/architecture-guide" target="_BLANK">Get the guide About Norberto Leite Norberto Leite is Technical Evangelist at MongoDB. Norberto has been working for the last 5 years on large scalable and distributable application environments, both as advisor and engineer. Prior to MongoDB Norberto served as BigData Engineer at Telefonica. < How Santa Uses MongoDB Part 1

December 23, 2014

How Santa Uses MongoDB Part 1: Using Geospatial Indexes to Deliver Presents Around the World

We love watching everyone open their presents on Christmas morning to see what jolly old Santa brought us! Have you ever wondered how Santa delivers 6 billion presents in one night? How does he collect all those “Dear Santa” letters, read them one by one, make sure that there are no mistakes, no missed presents, and no inconsistent state, so some poor person doesn’t get two daVinci 1.0 3D printers, while their neighbor down the hall gets none? How does Santa know how many bicycles to get? How many dolls? How many fire trucks? Delivering on Christmas is an incredible task! We have recently discovered Santa uses MongoDB!!! Let’s take a closer look. Santa Loves Geospatial Indexes! With over 6 billion gifts to deliver across 24 time zones in one night, it’s an incredible achievement, right? So how does Santa know how to reach everyone in time? And how does he ensure everyone gets their fair share of presents from their wishlist? Santa keeps a database of all the “dear Santa” letters that reach his inbox (yes Santa knows where you live!) and he has lots of bags grouping the presents per city. In the past, Santa used one magical bag to hold all the presents. Then he received an inspection by the Finnish Work Environment Safety Agency and they recommended spreading the load across several bags , both for health (Santa is not getting any younger) and efficiency reasons (what happens if you lose that single bag?!?). Besides, Santa has been around the block a few times so he knows that monolithic approaches are a 90’s thing. Santa maintains a record for each one of us, where he keeps track of where he needs to deliver the presents. He uses MongoDB (obviously!) with geospatial information to understand where to deliver your freshly wrapped gifts. Each record looks like this: { "_id": 123300410230, "name": "Norberto Leite", "location":{ "geo": { "type": "Point", "coordinates": [ -8.611720204353333, 41.14341253682037] }, "address" : { "country": "PT", "city": "Porto", "street": "Rua Escura 1", "zip": [4000, "Porto"], }, }, } This type of data structure allows Santa to do lots of different things with his gift distribution algorithm. For example, find all presents to be delivered in Portugal: > db.presents.find( { "location.address.country": "PT" }) But what Santa really wants to know is, which presents need to be delivered once he gets to a certain city? To solve this traveling Santa problem uses MongoDB geospatial capabilities . Santa asked his Master Elf DBA to create a 2dsphere index so we can easily and efficiently make his deliveries: db.presents.ensureIndex( {"location.geo": "2dsphere"} ); Obviously, as any good and Certified MongoDB DBA , to avoid operational overhead in the database, Master Elf DBA decided to run this index creation on secondaries without disrupting Santa’s work. No gift delivery status update will need to wait for the creation of this index to be efficiently accepted! Santa can then deliver presents to each house based on their distance from the center of the city. To get a rough approximation based on a city’s centroid, Santa can use the {noformat}$near{noformat} operator to get that list of presents to be delivered: db.presents.find( {"location.geo": { "$near": { "$geometry": { "type": "Point", "coordinates": [-8.611720204353333, 41.14341253682037] } } } } ) But Santa is a bit of a perfectionist, and to get a more accurate guide to his delivery requirements, Santa uses the Polygon map of the city Porto, in this case {}: db.presents.find( {"location.geo" :{ "$geoWithin": { "$geometry":{ "type": "Polygon", "coordinates": [ [ [ -8.688468933105469, 41.17400251011821 ], [ -8.642463684082031, 41.18459702669797 ], [ -8.601951599121094, 41.18459702669797 ], [ -8.582038879394531, 41.169609159184255 ], [ -8.578948974609375, 41.148413563966386 ], [ -8.594741821289062, 41.14091592012965 ], [ -8.604354858398438, 41.14453557935463 ], [ -8.614654541015625, 41.1406573653974 ], [ -8.635597229003906, 41.14815503879421 ], [ -8.667869567871094, 41.148413563966386 ], [ -8.677825927734373, 41.15022321163024 ], [ -8.688468933105469, 41.17400251011821 ] ] ] } } }} ) In the next post we will look at some of the techniques Santa has used to improve the efficiency of his delivery algorithm by moving from a single magical bag, to multiple bags. After all, even Holiday Spirit has a carbon offset, and it isn’t cheap. If you're interested in learning more about how Santa put together his Christmas plan, download the MongoDB Architecture guide here: <a class="btn btn-primary href="/lp/whitepaper/architecture-guide" target="_BLANK">ARCHITECTURE GUIDE About Norberto Leite Norberto Leite is Technical Evangelist at MongoDB. Norberto has been working for the last 5 years on large scalable and distributable application environments, both as advisor and engineer. Prior to MongoDB Norberto served as BigData Engineer at Telefonica.

December 23, 2014