Boosting JavaScript: From MongoDB's shell to Node.js

Dj Walker-Morgan

#Technical#Developer#Node#JavaScript

Boosting JavaScript: From MongoDB's shell to Node.js

Moving a script from MongoDB’s JavaScript-powered shell to Node.js offers a chance to get to use an enormous range of tools and libraries. Find out how to do this with only a few extra lines of code and how to then optimize the resulting script.

MongoDB’s mongo shell is a powerful little utility, mainly because it is actually a JavaScript interpreter. That means you can write scripts which do smart things. Here’s one I was using today called shellfaker.js:

count=1000

var start=new Date().getTime()

collection=db.getCollection("newcoll")
for(i=0;i<count;i++) {
   collection.insertOne( { name: generateName() });
}

var end=new Date().getTime()

print(end-start)

There’s three functions I’m not showing you here, mainly because one is just a huge list of names used to generate random names. Follow the shellfaker.js link to see the complete example on GitHub. The shellfaker.js script also gives a time in milliseconds for how long it has run. I can run this script, shellfaker.js against a MongoDB Atlas cluster with a URL to connect to, the user name, and password and followed by the name of the script, like so:

mongo "mongodb+srv://clustername.mongodb.net/test" --username djwm --password itbesecret shellfaker.js

And around 33 seconds later, we’re back with a thousand new records in our “test.newcoll” collection. Which is fine, but we should be able to make it better. In the process we can also improve it so it isn’t tied to the mongo shell. Imagine this is your first step in prototyping a command to be used in your systems. Where would you go next? For me, I’m going to Node.js and I’m going to make it as simple as possible to move this and other code in the same route.

Start with a Driver and Arguments

Moving to Node.js means we’re forgoing the entire database connection mechanism in mongo shell. Let’s copy our shellfaker.js over to a nodefaker.js file and run npm init to initialise our project. I’m assuming you’ve got Node.js installed of course. Now we can start adding libraries. We’ll want the mongodb native driver of course, and something else…

Notice when we ran the mongo shell version, we passed a connection string, a username, and a password. We’ll want to handle those arguments ourselves (and in the future, probably more). Here we’ll use yargs.js. It’s very configurable, but also smart when you let it run without configuration.

npm install mongodb yargs --save

That will install our two needed libraries. At the top of our code we can now include them with:

const MongoClient=require('mongodb').MongoClient;
var argv=require('yargs').argv;

Get Ready to Run

Here’s where it gets interesting. We are trying to emulate the behavior of the shell and to do that we ideally want to behave in an asynchronous manner. We should be able to wait for any command to complete before moving on to the next one. The simplest way to do that with modern Node.js is to wrap everything in an async function where we can use await to wait for commands to finish processing. Here’s what we’re going to add:

run().catch(error => console.error(error));
async function run() {

This is where our code is going to start running and… oh yes, when running under the mongo shell, all the connection work was done for us. We’re going to need to create a connection. Here’s some more code to add:

   url=argv._[0]
   auth="//"+argv.username+":"+argv.password+"@"
   url=url.replace(/\/\//,auth)
   client= await MongoClient.connect(url, { useNewUrlParser: true })
   const db= await client.db();

Don’t panic. The first line here gets the URL from the now-parsed arguments. The URL is the first argument and, as there’s no flags around it, it sits in the array of unmarked arguments _. The next line is putting together a string which includes the username and password of the user. Those values had flags in the command line so they get names after being parsed by yargs, names which match the flags. The next step is to insert the completed auth variable into the URL string using a regular expression. Then we are ready to connect using that URL…

The client= await MongoClient.connect(url,... looks like a typical function call but that await will take the returned promise and wait for it to resolve. That means that there will be a connection when we get to the next line… const db=await client.db() .. where we can get the database connection. The db variable is now initialized as if it were in the shell and with that we’re ready to put in our script as it appeared above...

Insert the Script

We paste the original script in and wrap it up with a closing “}” because we’re remembering this is inside our async function.

   count=1000

   var start=new Date().getTime()

   collection=db.collection("newcoll")
   for(i=0;i<count;i++) {
       collection.insertOne( { name: generateName() })
   }

   var end=new Date().getTime();
   print(end-start);
}

After closing the function, we’ll paste in the other three functions we mentioned.

Node.js doesn’t have the mongoshell’s print function though, so there’s one other bit of code to add. We can pop this in pretty much anywhere - we’ll put it before the run command.

function print(v) { console.log(v); }

We’re now ready to run. You'll find this version of the code on Github as nodefaker.basic.js:

node nodefaker.js "mongodb+srv://clustername.mongodb.net/test" --username djwm --password itbesecret 

It’s almost exactly the same invocation, just we’re using node now and the script name follows that directly. And we find two things, it claims to have run in a couple of hundred milliseconds and it doesn’t exit. If you look at the database, you’ll probably find all one thousand new entries are there though. What’s going on here?

Await for every one

As far as the program is concerned, it really did run in that short a time. That’s because every inserted document returned a promise and went off to execute in the background. It doesn’t take long for that to happen. But the insertion operation will take a little time so the Node engine stays running ready to service any work that might appear. If we want to ensure that the code is running just like the script behaved, we’ll need to pop an await in the insertOne to make each call wait for completion.

   for(i=0;i<count;i++) {
       await collection.insertOne( { name: generateName() })
   }

And because we know that the script will have finished working with this await in place, we can add a process.exit(0) to make the programme exit cleanly at the end. Time to run this version - (nodefaker.await.js)…

And it runs in about 30 seconds. In repeated runs it is pretty much comparable with the mongo shell version. So what have we achieved? Well, the script works with Node and the core of it is pretty much unchanged apart from that added await. If you wanted to carry on working with it as a tool, the basic framework is there. There is one other thing though, you can improve it.

An Array of Promises

Remember we mentioned how each one of those insertOne calls would return a Promise. Well, waiting for it to complete with await means you have to wait for each insert to do the entire network round trip. What if you could just make a note of the promises and wait for them all to be done. Let’s do that right now. We’ll want somewhere to keep the promises:

   promisewait=[]

Then we carry on as before except each time we call insertOne, we push the resulting promise onto that array.

   collection=db.collection("newcoll")
   for(i=0;i<count;i++) {
       promisewait.push(collection.insertOne( { name: generateName() }));
   }

With an array of promises to check up on, the code could be somewhat tedious to write. But the Promise.all method is built to check a whole array of promises and wait till they are all done or have thrown errors, but that's a discussion for another day. For now, we can wait and upon successful completion of all the promises, we’ll print our final time and exit. If there’s an error, we’ll print that and exit too. All that is accomplished with this:

   Promise.all(promisewait).then(res=>{
       var end=new Date().getTime();
       print(end-start);
       process.exit(0);
   }).catch(err => {
       console.log(err);
       process.exit(1); }
       )

Running this version of the code completes, here, in under ten seconds. We’re making the most of the local system, the network and the MongoDB server up there in the cloud for this example, designed to run lots of commands and plenty of round trips to the server. You can find the code on GitHub in nodefaker.promise.js.

Wrapping up

You should have a good idea now how to start scripting in the MongoDB mongo shell and, when ready, quickly port your script creations to Node.js. The fact that the shell is based on JavaScript means that there's a lot of utility hiding just under the surface of the command-line interface. Being able to move your scripts over to Node with minimal changes lets you turn them into the basis for fully fledged applications.

A Footnote on Performance

Of course, if you are just bulk inserting data into the database, the better choice to get performance is to use the bulk insertion command. That reduces the entire process to sending a single request to the server with all 1000 of our tiny records in it. Run that and we get a time of .2 seconds to get all the records up. You can find the code for that at nodefaker.bulk.js.