GIANT Stories at MongoDB

MongoDB 4.0 Release Candidate 0 Has Landed

MongoDB enables you to meet the demands of modern apps with a technology foundation that enables you through:

  1. The document data model – presenting you the best way to work with data.
  2. A distributed systems design – allowing you to intelligently put data where you want it.
  3. A unified experience that gives you the freedom to run anywhere – future-proofing your work and eliminating vendor lock-in.

Building on the foundations above, MongoDB 4.0 is a significant milestone in the evolution of MongoDB, and we’ve just shipped the first Release Candidate (RC), ready for you to test.

Why is it so significant? Let’s take a quick tour of the key new features. And remember, you can learn about all of this and much more at MongoDB World'18 (June 26-27).

Multi-Document ACID Transactions

Previewed back in February, multi-document ACID transactions are part of the 4.0 RC. With snapshot isolation and all-or-nothing execution, transactions extend MongoDB ACID data integrity guarantees to multiple statements and multiple documents across one or many collections. They feel just like the transactions you are familiar with from relational databases, are easy to add to any application that needs them, and and don't change the way non-transactional operations are performed. With multi-document transactions it’s easier than ever for all developers to address a complete range of use cases with MongoDB, while for many of them, simply knowing that they are available will provide critical peace of mind that they can meet any requirement in the future. In MongoDB 4.0 transactions work within a replica set, and MongoDB 4.2 will support transactions across a sharded cluster*.

To give you a flavor of what multi-document transactions look like, here is a Python code snippet of the transactions API.

with client.start_session() as s:
    s.start_transaction():
    try:
        collection.insert_one(doc1, session=s)
        collection.insert_one(doc2, session=s)
    except:
        s.abort_transaction()
        raise
    s.commit_transaction()

And now, the transactions API for Java.

try (ClientSession clientSession = client.startSession()) {
          clientSession.startTransaction();
           try {
                   collection.insertOne(clientSession, docOne);
                   collection.insertOne(clientSession, docTwo);
                   clientSession.commitTransaction();
          } catch (Exception e) {
                   clientSession.abortTransaction();
           }
    }

Our path to transactions represents a multi-year engineering effort, beginning over 3 years ago with the integration of the WiredTiger storage engine. We’ve laid the groundwork in practically every part of the platform – from the storage layer itself to the replication consensus protocol, to the sharding architecture. We’ve built out fine-grained consistency and durability guarantees, introduced a global logical clock, refactored cluster metadata management, and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers. We are feature complete in bringing multi-document transactions to replica sets, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

Take a look at our multi-document ACID transactions web page where you can hear directly from the MongoDB engineers who have built transactions, review code snippets, and access key resources to get started.

Aggregation Pipeline Type Conversions

One of the major advantages of MongoDB over rigid tabular databases is its flexible data model. Data can be written to the database without first having to predefine its structure. This helps you to build apps faster and respond easily to rapidly evolving application changes. It is also essential in supporting initiatives such as single customer view or operational data lakes to support real-time analytics where data is ingested from multiple sources. Of course, with MongoDB’s schema validation, this flexibility is fully tunable, enabling you to enforce strict controls on data structure, type, and content when you need more control.

So while MongoDB makes it easy to ingest data without complex cleansing of individual fields, it means working with this data can be more difficult when a consuming application expects uniform data types for specific fields across all documents. Handling different data types pushes more complexity to the application, and available ETL tools have provided only limited support for transformations. With MongoDB 4.0, you can maintain all of the advantages of a flexible data model, while prepping data within the database itself for downstream processes.

The new $convert operator enables the aggregation pipeline to transform mixed data types into standardized formats natively within the database. Ingested data can be cast into a standardized, cleansed format and exposed to multiple consuming applications – such as the MongoDB BI and Spark connectors for high-performance visualizations, advanced analytics and machine learning algorithms, or directly to a UI. Casting data into cleansed types makes it easier for your apps to to process, sort, and compare data. For example, financial data inserted as a long can be converted into a decimal, enabling lossless and high precision processing. Similarly, dates inserted as strings can be transformed into the native date type.

When $convert is combined with over 100 different operators available as part of the MongoDB aggregation pipeline, you can reshape, transform, and cleanse your documents without having to incur the complexity, fragility, and latency of running data through external ETL processes.

Non-Blocking Secondary Reads

To ensure that reads can never return data that is not in the same causal order as the primary replica, MongoDB blocks readers while oplog entries are applied in batches to the secondary. This can cause secondary reads to have variable latency, which becomes more pronounced when the cluster is serving write-intensive workloads. Why does MongoDB need to block secondary reads? When you apply a sequence of writes to a document, then MongoDB is designed so that each of the nodes must show the writes in the same causal order. So if you change field "A" in a document and then change field "B", it is not possible to see that document with changed field "B" and not changed field "A". Eventually consistent systems suffer from this behavior, but MongoDB does not, and never has.

By taking advantage of storage engine timestamps and snapshots implemented for multi-document ACID transactions, secondary reads in MongoDB 4.0 become non-blocking. With non-blocking secondary reads, you now get predictable, low read latencies and increased throughput from the replica set, while maintaining a consistent view of data. Workloads that see the greatest benefits are those where data is batch loaded to the database, and those where distributed clients are accessing low latency local replicas that are geographically remote from the primary replica.

40% Faster Data Migrations

Very few of today’s workloads are static. For example, the launch of a new product or game, or seasonal reporting cycles can drive sudden spikes in load that can bring a database to its knees unless additional capacity can be quickly provisioned. If and when demand subsides, you should be able to scale your cluster back in, rightsizing for capacity and cost.

To respond to these fluctuations in demand, MongoDB enables you to elastically add and remove nodes from a sharded cluster in real time, automatically rebalancing the data across nodes in response. The sharded cluster balancer, responsible for evenly distributing data across the cluster, has been significantly improved in MongoDB 4.0. By concurrently fetching and applying documents, shards can complete chunk migrations up to 40% faster, allowing you to more quickly bring new nodes into service at just the moment they are needed, and scale back down when load returns to normal levels.

Extensions to Change Streams

Change streams, released with MongoDB 3.6, enable developers to build reactive, real-time, web, mobile, and IoT apps that can view, filter, and act on data changes as they occur in the database. Change streams enable seamless data movement across distributed database and application estates, making it simple to stream data changes and trigger actions wherever they are needed, using a fully reactive programming style.

With MongoDB 4.0, Change Streams can now be configured to track changes across an entire database or whole cluster. Additionally, change streams will now return a cluster time associated with an event, which can be used by the application to provide an associated wall clock time for the event.

Getting Started with MongoDB 4.0

Hopefully this gives you a taste of what’s coming in 4.0. There’s a stack of other stuff we haven’t covered today, but you can learn about it all in the resources below.

To get started with the RC now:

  1. Head over to the MongoDB download center to pick up the latest development build.
  2. Review the 4.0 release notes.
  3. Sign up for the forthcoming MongoDB University training on 4.0.

And you can meet our engineering team and other MongoDB users at MongoDB World'18 (June 26-27).

---

* Safe Harbor Statement

This blog post contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation.

In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.

MongoDB Celebrates Reaching 1,000 Employees

MongoDB
May 18, 2018

We’re excited to announce that MongoDB has officially reached 1,000 employees across the globe.

For some of us, it’s a time to reflect on how far we’ve come.

For others, this milestone helps to generates excitement for what is to come in the future.

And for the rest of us, it was another moment to be proud of, and a new reason to celebrate.

Were proud of how far we’ve come, and we’re looking forward to the next 1,000.

How to Integrate MongoDB Atlas and Segment using MongoDB Stitch

Jesse Krasnostein
May 17, 2018

It can be quite difficult tying together multiple systems, APIs, and third-party services. Recently, we faced this exact problem in-house, when we wanted to get data from Segment into MongoDB so we could take advantage of MongoDB’s native analytics capabilities and rich query language. Using some clever tools we were able to make this happen in under an hour – the first time around.

While this post is detailed, the actual implementation should only take around 20 minutes. I’ll start off by introducing our cast of characters (what tools we used to do this) and then we will walk through how we went about it.

The Characters

To collect data from a variety of sources including mobile, web, cloud apps, and servers, developers have been turning to Segment since 2011. Segment consolidates all the events generated by multiple data sources into a single clickstream. You can then route the data to over 200+ integrations all at the click of a button. Companies like DigitalOcean, New Relic, InVision, and Instacart all rely on Segment for different parts of their growth strategies.

To store the data generated by Segment, we turn to MongoDB Atlas – MongoDB’s database as a service. Atlas offers the best of MongoDB:

  • A straightforward query language that makes it easy to work with your data
  • Native replication and sharding to ensure data can live where it needs to
  • A flexible data model that allows you to easily ingest data from a variety of sources without needing to know precisely how the data will be structured (its shape)

All this is wrapped up in a fully managed service, engineered and run by the same team that builds the database, which means that as a developer you actually can have your cake and eat it too.

The final character is MongoDB Stitch, MongoDB’s serverless platform. Stitch streamlines application development and deployment with simple, secure access to data and services – getting your apps to market faster while reducing operational costs. Stitch allows us to implement server-side logic that connects third-party tools like Segment, with MongoDB, while ensuring everything from security to performance is optimized.

Order of Operations

We are going to go through the following steps. If you have completed any of these already, feel free to just cherry pick the relevant items you need assistance with:

  1. Setting up a Segment workspace
  2. Adding Segment’s JavaScript library to your frontend application – I’ve also built a ridiculously simple HTML page that you can use for testing
  3. Sending an event to Segment when a user clicks a button
  4. Signing up for MongoDB Atlas
  5. Creating a cluster, so your data has somewhere to live
  6. Creating a MongoDB Stitch app that accepts data from Segment and saves it to your MongoDB Atlas cluster

While this blog focusses on integrating Segment with MongoDB, the process we outline below will work with other APIs and web services. Join the community slack and ask questions if you are trying to follow along with a different service.

Each time Segment sees new data a webhook fires an HTTP Post request to Stitch. A Stitch function then handles the authentication of the request and, without performing any data manipulation, saves the body of the request directly to the database – ready for further analysis.

Setting up a Workspace in Segment

Head over to Segment.com and sign up for an account. Once complete, Segment will automatically create a Workspace for you. Workspaces allow you to collaborate with team members, control permissions, and share data sources across your whole team. Click through to the Workspace that you've just created.

To start collecting data in your Workspace, we need to add a source. In this case, I’m going to collect data from a website, so I’ll select that option, and on the next screen, Segment will have added a JavaScript source to my workspace. Any data that comes from our website will be attributed to this source. There is a blue toggle link I can click within the source that will give me the code I need to add to my website so it can send data to Segment. Take note of this as we will need it shortly.

Adding Segment to your Website

I mentioned a simple sample page I had created in case you want to test this implementation outside of other code you had been working on. You can grab it from this GitHub repo.

In my sample page, you’ll see I’ve copied and pasted the Segment code and dropped it in between my page’s <head> tags. You’ll need to do the equivalent with whatever code or language you are working in.

If you open that page in a browser, it should automatically start sending data to Segment. The easiest way to see this is by opening Segment in another window and clicking through to the debugger.

Clicking on the debugger button in the Segment UI takes you to a live stream of events sent by your application.

Customizing the events you send to Segment

The Segment library enables you to get as granular as you like with the data you send from your application.

As your application grows, you’ll likely want to expand the scope of what you track. Best practice requires you to put some thought into how you name events and what data you send. Otherwise different developers will name events differently and will send them at different times – read this post for more on the topic.

To get us started, I’m going to assume that we want to track every time someone clicks a favorite button on a web page. We are going to use some simple JavaScript to call Segment’s analytics tracking code and send an event called a “track” to the Segment API. That way, each time someone clicks our favorite button, we'll know about it.

You’ll see at the bottom of my web page, that there is a jQuery function attached to the .btn class. Let’s add the following after the alert() function.

analytics.track("Favorited", {
        itemId: this.id,
        itemName: itemName
      });

Now, refresh the page in your browser and click on one of the favorite buttons. You should see an alert box come up. If you head over to your debugger window in Segment, you’ll observe the track event streaming in as well. Pretty cool, right!

You probably noticed that the analytics code above is storing the data you want to send in a JSON document. You can add fields with more specific information anytime you like. Traditionally, this data would get sent to some sort of tabular data store, like MySQL or PostgreSQL, but then each time new information was added you would have to perform a migration to add a new column to your table. On top of that, you would likely have to update the object-relational mapping code that's responsible for saving the event in your database. MongoDB is a flexible data store, that means there are no migrations or translations needed, as we will store the data in the exact form you send it in.

Getting Started with MongoDB Atlas and Stitch

As mentioned, we’ll be using two different services from MongoDB. The first, MongoDB Atlas, is a database as a service. It’s where all the data generated by Segment will live, long-term. The second, MongoDB Stitch, is going to play the part of our backend. We are going to use Stitch to set up an endpoint where Segment can send data, once received, Stitch validates that the request Stitch was sent from Segment, and then coordinate all the logic to save this data into MongoDB Atlas for later analysis and other activities.

First Time Using MongoDB Atlas?

Click here to set up an account in MongoDB Atlas.

Once you’ve created an account, we are going to use Atlas’s Cluster Builder to set up our first cluster (every MongoDB Atlas deployment is made up of multiple nodes that help with high availability, that’s why we call it a cluster). For this demonstration, we can get away with an M0 instance – it's free forever and great for sandboxing. It's not on dedicated infrastructure, so for any production workloads, its worth investigating other instance sizes.

When the Cluster Builder appears on screen, the default cloud provider is AWS, and the selected region is North Virginia. Leave these as is. Scroll down and click on the Cluster Tier section, and this will expand to show our different sizing options. Select M0 at the top of the list.

You can also customize your cluster’s name, by clicking on the Cluster Name section.

Once complete, click Create Cluster. It takes anywhere from 7-10 minutes to set up your cluster so maybe go grab a drink, stretch your legs and come back… When you’re ready, read on.

Creating a Stitch Application

While the Cluster is building, on the left-hand menu, click Stitch Apps. You will be taken to the stitch applications page, from where you can click Create New Application.

Give your application a name, in this case, I call it “SegmentIntegration” and link it to the correct cluster. Click Create.

Once the application is ready, you’ll be taken to the Stitch welcome page. In this case, we can leave anonymous authentication off.

We do need to enable access to a MongoDB collection to store our data from Segment. For the database name I use “segment”, and for the collection, I use “events”. Click Add Collection.

Next, we will need to add a service. In this case, we will be manually configuring an HTTP service that can communicate over the web with Segment’s service. Scroll down and click Add Service.

You’ll jump one page and should see a big sign saying, “This application has no services”… not for long. Click Add a Service… again.

From the options now visible, select HTTP and then give the service a name. I’ll use “SegmentHTTP”. Click Add Service.

Next, we need to add an Incoming Webhook. A Webhook is an HTTP endpoint that will continuously listen for incoming calls from Segment, and when called, it will trigger a function in Stitch to run.

Click Add Incoming Webhook

  • Leave the default name as is and change the following fields:
  • Turn on Respond with Result as this will return the result of our insert operation
  • Change Request Validation to “Require Secret as Query Param”
  • Add a secret code to the last field on the page. Important Note: We will refer to this as our “public secret” as it is NOT protected from the outside world, it’s more of a simple validation that Stitch can use before running the Function we will create. Shortly, we will also define a “private secret” that will not be visible outside of Stitch and Segment.

Finally, click “Save”.

Define Request Handling Logic with Functions in Stitch

We define custom behavior in Stitch using functions, simple JavaScript (ES6) that can be used to implement logic and work with all the different services integrated with Stitch.

Thankfully, we don’t need to do too much work here. Stitch already has the basics set up for us. We need to define logic that does the following things:

  1. Grabs the request signature from HTTP headers
  2. Uses the signature to validate the requests authenticity (i.e., it came from Segment)
  3. Write the request to our segment.events collection in MongoDB Atlas

Getting an HTTP Header and Generating an HMAC Signature

Add the following to line 8, after the curly close brace }.

const signature = payload.headers['X-Signature'];

And then use Stitch’s built-in Crypto library to generate a digest that we will compare with the signature.

const digest = utils.crypto.hmac(payload.body.text(), context.values.get("segment_shared_secret"), "sha1", "hex");

A lot is happening here so I’ll step through each part and explain. Segment signs requests with a signature that is a combination of the HTTP body and a shared secret. We can attempt to generate an identical signature using the utils.crytop.hmac function if we know the body of the request, the shared secret, the hash function Segment uses to create its signatures, and the output format. If we can replicate what is contained within the X-Signature header from Segment, we will consider this to be an authenticated request.

Note: This will be using a private secret, not the public secret we defined in the Settings page when we created the webhook. This secret should never be publicly visible. Stitch allows us to define values that we can use for storing variables like API keys and secrets. We will do this shortly.

Validating that the Request is Authentic and Writing to MongoDB Atlas

To validate the request, we simply need to compare the digest and the signature. If they’re equivalent, then we will write to the database. Add the following code directly after we generate the digest.

if (digest == signature) {
    // Request is valid
} else {
    // Request is invalid
    console.log("Request is invalid");
}

Finally, we will augment the if statement with the appropriate behavior needed to save our data. On the first line of the if statement, we will get our “mongodb-atlas” service. Add the following code:

let mongodb = context.services.get("mongodb-atlas");

Next, we will get our database collection so that we can write data to it.

let events = mongodb.db("segment").collection("events");

And finally, we write the data.

events.insertOne(body);

Click the Save button on the top left-hand side of the code editor. At the end of this, our entire function should look something like this:

exports = function(payload) {

  var queryArg = payload.query.arg || '';
  var body = {};
  
  if (payload.body) {
    body = JSON.parse(payload.body.text());
  }
  
  // Get x-signature header and create digest for comparison
  const signature = payload.headers['X-Signature'];
  const digest = utils.crypto.hmac(payload.body.text(), 
    context.values.get("segment_shared_secret"), "sha1", "hex");
  
  //Only write the data if the digest matches Segment's x-signature!
  if (digest == signature) {
    
    let mongodb = context.services.get("mongodb-atlas");
    
    // Set the collection up to write data
    let events = mongodb.db("segment").collection("events");
    
    // Write the data
    events.insertOne(body);
    
  } else  {
    console.log("Digest didn't match");
  }
  
  return queryArg + ' ' + body.msg;
};

Defining Rules for a MongoDB Atlas Collection

Next, we will need to update our rules that allow Stitch to write to our database collection. To do this, in the left-hand menu, click on “mongodb-atlas”.

Select the collection we created earlier, called “segment.events”. This will display the Field Rules for our Top-Level Document. We can use these rules to define what conditions must exist for our Stitch function to be able to Read or Write to the collection.

We will leave the read rules as is for now, as we will not be reading directly from our Stitch application. We will, however, change the write rule to "evaluate" so our function can write to the database.

Change the contents of the “Write” box:

  • Specify an empty JSON document {} as the write rule at the document level.
  • Set Allow All Other Fields to Enabled, if it is not already set.

Click Save at the top of the editor.

Adding a Secret Value in MongoDB Stitch

As is common practice, API keys and passwords are stored as variables, meaning they are never committed to a code repo – visibility is reduced. Stitch allows us to create private variables (values) that may be accessed only by incoming webhooks, rules, and named functions.

We do this by clicking Values on the Stitch menu, clicking Create New Value, and giving our value a name – in this case segment_shared_secret (we will refer to this as our private secret). We enter the contents in the large text box. Make sure to click Save once you’re done.

Getting Our Webhook URL

To copy the webhook URL across to Segment from Stitch, navigate using the Control menu: Services > SegmentHTTP > webhook0 > Settings (at the top of the page). Now copy the “Webhook URL”.

In our case, the Webhooks looks something like this:

https://webhooks.mongodb-stitch.com/api/client/v2.0/app/segmentintegration/service/SegmentHTTP/incoming_webhook/webhook0

Adding the Webhook URL to Segment

Head over to Segment and log in to your workspace. In destinations, we are going to click Add Destination.

Search for Webhook in the destinations catalog and click Webhooks. Once through to the next page, click Configure Webhooks. Then select any sources from which you want to send data. Once selected, click Confirm Source.

Next, we will find ourselves on the destination settings page. We will need to configure our connection settings. Click the box that says Webhooks (max 5).

Copy your webhook URL from Stitch, and make sure you append your public secret to the end of it using the following syntax:

Initial URL:

https://webhooks.mongodb-stitch.com/api/client/v2.0/app/segmentintegration/service/SegmentHTTP/incoming_webhook/webhook0
Add the following to the end: ?secret=<YOUR_PUBLIC_SECRET_HERE>

Final URL:

https://webhooks.mongodb-stitch.com/api/client/v2.0/app/segmentintegration/service/SegmentHTTP/incoming_webhook/webhook0?secret=PUBLIC_SECRET

Click Save

We also need to tell Segment what our private secret is so it can create a signature that we can verify within Stitch. Do this by clicking on the Shared Secret field and entering the same value you used for the segment_shared_secret. Click Save.

Finally, all we need to do is activate the webhook by clicking the switch at the top of the Destination Settings page:

Generate Events, and See Your Data in MongoDB

Now, all we need to do is use our test HTML page to generate a few events that get sent to Segment – we can use Segment’s debugger to ensure they are coming in. Once we see them flowing, they will also be going across to MongoDB Stitch, which will be writing the events to MongoDB Atlas.

We’ll take a quick look using Compass to ensure our data is visible. Once we connect to our cluster, we should see a database called “segment”. Click on segment and then you’ll see our collection called “events”. If you click into this you’ll see a sample of the data generated by our frontend!

The End

Thanks for reading through – hopefully you found this helpful. If you’re building new things with MongoDB Stitch we’d love to hear about it. Join the community slack and ask questions in the #stitch channel!

Learn more about transactions at MongoDB World

In February, we announced that MongoDB 4.0 will support multi-document transactions. Curious to know what this will look like? Aly Cabral, Product Manager at MongoDB, is excited to share an early version of the syntax:

Each feature we build is with users like you in mind. When you attend our events, you’re able to connect with the people who work on the database you use every day – like Aly. In sessions, Birds of a Feather meetings, and one-on-one in Ask the Experts, you get to ask questions, share ideas, and be heard.

Additionally, you learn tips and tricks from power users and companies that will allow you to optimize your deployments. To get your hands on new tools to accelerate your development goals, join us at MongoDB World, June 26-27 in NYC.

Early Bird pricing ends Friday, May 11. Tickets are going fast! Sign up now to get your discounted conference pass. Don't forget, groups automatically get 25% off!


Event Details:
Date: June 26-27, 2018
Location: New York Hilton Midtown, 1335 6th Ave, New York, NY 10019
Learn More & Sign Up: mongodbworld.com

DarwinBox Evolves HR SaaS Platform and Prepares for 10x Growth with MongoDB Atlas

DarwinBox found a receptive market for its HR SaaS platform for medium to large businesses, but rapid success strained their infrastructure and challenged their resources. We talked to Chaitanya Peddi, Co-founder and Head of Product to find out how they addressed those challenges with MongoDB Atlas.

Evolution favors those that find ways to thrive in changing environments. DarwinBox has done just that, providing a full spectrum of HR services online and going from a standing start to a top-four sector brand in the Indian market in just two years. From 40 enterprise clients in its first year to more than 80 in its second, it now supports over 200,000 employees, and is hungrily eyeing expansion in new territories.

“We’re expecting 10x growth in the next two years,” says Peddi. “That means aggressive scaling for our platform and MongoDB Atlas will play a big role."

Starting from a blank sheet of paper

The company’s key business insight is that employees have grown accustomed to the user experience of online services they access in their personal lives. However, the same ease of use is simply not found at work, especially in HR solutions that address holiday booking, managing benefits, and appraisals. DarwinBox’s approach is to deliver a unified platform of user-friendly HR services to replace a jumble of disparate offerings, and to do so in a way that supports its own aggressive growth plans. The company aims to support nearly every employee interaction with corporate HR, such as recruitment, employee engagement, expense management, separation, and more.

“We started in 2015 from a blank sheet of paper,” Peddi says. “It became very clear very quickly that for most of our use cases, only a non-relational database would work. Not only did we want to provide an exceptionally broad set of integrated services, but we also had clients with a large number of customization requirements. This meant we needed a very flexible data model. We looked at a lot of options. We wanted an open source technology to avoid lock-in and our developers pushed for MongoDB, which fit all our requirements and was a pleasure to work with. Our databases are now 90 percent MongoDB. We expect that to be at 100 percent soon.”

Reducing costs and future-proofing database management

When DarwinBox launched, it ran its databases in-house, which wasn’t ideal. “We have a team of 40+ developers, QA and testers, and three running infrastructure, and suddenly we’re growing much faster than we expected. It’s a good problem to have, but we couldn’t afford to offer anything less than excellent service.” Peddi emphaszied that of all the things they wanted to do to succeed, becoming database management experts wasn’t high on the list.

This wasn’t the only reason that MongoDB Atlas looked like the next logical step for the company when it became available, says Peddi, “We were rapidly developing our services and our customer base, but our strategies for backing up the databases, for scaling, for high availability, and for monitoring performance weren’t keeping up. In the end, we decided that we’d migrate to Atlas for a few major reasons.”

The first reason was the most obvious. “The costs of managing the databases, infrastructure, and backups were increasing. In addition, it became increasingly difficult to self-manage everything as requirements became more sophisticated and change requests became more frequent. Scaling up and down to match demand and launching new clusters consumed precious man hours. Monitoring performance and issue resolution was taking up more time than we wanted. We had built custom scripts, but they weren’t really up to the task.”

With MongoDB Atlas on AWS, Peddi says, all these issues are greatly reduced. “We’re able to do everything we need with our fully managed database very quickly – scale according to business need at the press of a button, for example. There are other benefits. With MongoDB technical engineers a phone call away, we’re able to fix issues far quicker than we could in the past. MongoDB Compass, the GUI for the database, is proving helpful in letting our teams visually explore our data and tune things accordingly.”

Migrating to Atlas has also helped Darwinbox dramatically reduce costs.

We’ve optimized our database infrastructure and how we manage backups. Not only did we bring down costs by 40%, but by leveraging the queryable snapshot feature, we’re able to restore the data we actually need 80% faster.

Chaitanya Peddi, Co-founder and Head of Product, DarwinBox

The increased availability and data resilience from the switch to MongoDB Atlas on AWS eases the responsibility in managing the details of 200,000 employees’ working lives. “Data is the most sensitive part of our business, the number one thing that we care about,” says Peddi, “We can’t lose even 0.00001 percent of our data. We used to take snapshots of the database, but that was costly and difficult to manage. Now, it’s more a live copy process. We can guarantee data retention for over a year, and it only takes a few moments to find what you need with MongoDB Atlas.”

For DarwinBox to achieve its target of 10x growth in two years, it has to – and plans to – go international.

“We had that in mind from the outset. We’ve designed our architecture to cope with a much larger scale, both in total employee numbers and client numbers, and to handle different regulatory regimes.” According to Peddi, that means moving to microservices, developing data analytics, maybe even looking at other cloud providers to host the DarwinBox HR Platform. He added: “If we were to do this on AWS and self-manage the database with our current resources, we would have to invest a significant amount of effort into orchestrating and maintaining a globally distributed database. MongoDB Atlas with its cross-region capabilities makes this all much easier.”

Darwinbox is confident that MongoDB Atlas will help the organization achieve its product plans.

“MongoDB Atlas will be able to support the business needs that we've planned out for the next two years.” says Peddi, “We’re happy to see how rapidly the Atlas product roadmap is evolving.”

Get started with MongoDB Atlas and deploy a free database in minutes.

MongoDB Presents an Evening With Eliot Horowitz and Stitch

On April 19th, 2018, the MongoDB User Group (MUG) met at the MongoDB HQ in New York City for an evening of conversation, trivia, and a live coding session from MongoDB co-founder and CTO Eliot Horowitz.

MongoDB Hosts the First Annual Women in Computer Science Summit in NYC

On April 20th MongoDB NYC hosted fifteen incredible college students from schools across the country for our first ever Women in Computer Science Summit.

The full day event, which was organized and hosted by the MongoDB Campus Recruiting Team, included a packed agenda with technical learning sessions, application building, mock interviews, and a panel discussion with MongoDB engineers. The summit offered an opportunity for young women from different colleges and universities a chance to connect, learn from one another, and support each other down the line.

Smitha Nagar, a UT Austin sophomore and Computer Science major, found value in being able to meet her peers. "Everything is a lot more fun when you’re surrounded by badass women. Everyone was intelligent, friendly, and wanted to learn and wanted to support each other, which is what made it so amazing. It was a great way to make new friends with similar interests. It was very refreshing.”

The panel discussion with three MongoDB engineers helped to demonstrate how the attendees can grow their careers at companies like MongoDB, as well as help to better the future of the tech industry for women overall.

For Washington State University sophomore Jessica Zhou, “It’s inspiring to not only be able to look up to women engineers thriving and doing a lot of cutting edge work, but also to meet and share these experiences with other women in computer science from schools all over the country. It was easy to relate with other people there, and it was cool to be in a room of female CS students during the technical talk and workshop on databases. For me, it’s something very rare in the classroom. I’ve been trying to figure out if I want to go to grad school for research or right into industry, and what I learned from the panel is that you can still read and discuss papers, have that spirit of inquiry and innovation you find in academia while in an industry setting. I see Computer Science as an interesting academic subject but also a means of building cool things and delivering tangible change.”

Brown University sophomore Cece Xiao “really enjoyed the event. The overall structure was very well organized, and there was not a moment where I felt disengaged. The pace at this summit allowed for me to get to know MongoDB more intimately, and being onsite allowed for a more hands on real life experience. It gave me a personal view of MongoDB as a company, and to better understand the culture and what it’s really about. I never really understood the magnitude of one line of code, but with so many customers using MongoDB, I find it fascinating the lengths it can go. ”

The day also included mock interviews for attendees to highlight their skills in an environment that was conducive to learning and growth. Each attendee was paired up with an engineer with previous interview experience and was given honest, transparent advice on how to strengthen their skills when it comes to communicating and conveying information. For Smitha, this was a highlight: “it really blew me away. The mock interview was so helpful and a really good learning experience. It wasn’t stressful and I was able to receive really good feedback. I was given specific advice that I had not heard before that I can apply to not only future interviews, but also to future presentations or interactions with a team.”

For us, the event was a great way to meet young engineers, inspire them to continue working towards their goals, encourage them to stick to their passions, and provide them with information necessary for success. The ability to give advice from personal experience, provide support, and connect the next generation of female technologists is what will allow technology to not only move forward, but also expand its potential.

If you’d like to learn more about the opportunities at MongoDB, click here.

Bienvenue à MongoDB Atlas: MongoDB as a Service Now Available in France

Leo Zheng
May 07, 2018
Cloud

En français

MongoDB Atlas, the fully automated cloud database, is now available in France on Amazon Web Services and Microsoft Azure. Located in the Paris area, these newly supported cloud regions will allow organizations using MongoDB Atlas to better serve their customers in and around France. For deployments in AWS EU (Paris), the following instance sizes are supported. MongoDB Atlas deployments in this cloud region will automatically be distributed across three AWS availability zones (AZ), ensuring that the failure of a single AZ will not impact the database’s automated election and failover process. Currently, customers deploying to AWS EU (Paris) can also replicate their data to regions of their choosing (to provide even greater fault tolerance or fast, responsive read access) if they’re using the M80 (low CPU), M200 (low CPU), or M400 (low CPU) instance sizes.

For MongoDB Atlas deployments in Azure France Central, the following instance sizes are supported. Deployments in this cloud region will automatically be distributed across 2 Azure fault domains. Assuming that a customer is deploying a 3-node replica set, 2 of those nodes will be located in 1 fault domain and the last node will live in its own fault domain. While this configuration does have a higher chance of loss of availability in the event that a fault domain goes down, cross-region replication can be configured to withstand fault domain and regional outages and is compatible with any Atlas instance size available in Azure France Central.

MongoDB is certified under the EU-US Privacy Shield, and the MongoDB Cloud Terms of Service now includes GDPR-required data processing terms to help MongoDB Atlas customers prepare for May 25, 2018 when the GDPR becomes enforceable.

MongoDB Atlas in France is open for business now and you can start using it today! Get started here.






MongoDB Atlas, la base de donnée entièrement automatisée dans le cloud, est maintenant disponible en France sur Amazon Web Services et Microsoft Azure. Localisés dans la région Parisienne, ces data centers nouvellement supportés permettront à votre organisation d’utiliser MongoDB Atlas pour répondre au mieux aux besoins de vos clients en France et ses environs. Pour les déploiements Atlas sur AWS EU (Paris), les tailles d’instances suivantes sont supportées. Les déploiements sur Atlas dans cette région du cloud seront automatiquement distribués au travers de trois zones de disponibilités pour assurer qu’une panne dans l’une de ces zones n’impacte pas le système d’élection automatique et le processus de basculement vers un nouveau noeud. Actuellement, les clients d’Atlas qui déploient sur AWS EU (Paris) peuvent aussi répliquer leurs données dans les autres régions de leur choix (pour permettre une encore plus grande résistance à la panne ou pour des accès en lecture plus réactifs et plus rapides) s'ils utilisent les tailles d’instances M80 (CPU faible), M200 (CPU faible), ou M400 (CPU faible).

Pour les déploiements dans “Azure France Central”, les tailles d’instances suivantes sont supportées. Les déploiements Atlas dans cette région du cloud seront automatiquement distribuée dans deux data centers Azure. En supposant qu’un client déploie un replica set de trois noeuds, deux de ces noeuds seront localisés dans un data center et le dernier sera situé dans son propre data center. Bien que cette configuration possède plus de chance de perte de disponibilité dans le cas d’une panne sur un datacenter entier, la réplication au travers de plusieurs régions peut être configurée pour résister à une panne générale d’un datacenter ou à des coupures régionales. Cette réplication inter-régionale est compatible avec n’importe quelle taille d’instance disponible sur Azure France Central.

MongoDB est certifié dans le cadre du Privacy Shield EU-US, et les conditions d'utilisation de MongoDB Cloud incluent désormais les termes de traitement de données requis par GDPR pour aider les clients de MongoDB Atlas à se préparer pour le 25 mai 2018.

MongoDB Atlas in France is open for business now and you can start using it today! Get started here.

BookMyShow Continues to Lead Online Entertainment Ticketing in India and Scales to 25 Million Users with MongoDB

India's twin passions for cinema and tech make it a natural fit for automated ticketing. But if ever a market needs scalable solutions, this 1.4 billion-strong nation is it.

That’s a lesson Viraj Patel, VP Technology for BigTree Entertainment, learned the hard way. "We started out in ticketing distribution in 1999 using telephones," he says, "before mobile platforms and internet access were on the scene. It just didn't work. The investors pulled the plug in 2002.”

Undeterred, the company successfully pivoted to selling software to cinema chains. By 2006, Viraj and team were ready to aim for the big prize again. They just needed the right tools. With the internet and mobile data fitting into place, a trial project in online ticket aggregation looked promising enough for investors to fund the launch of BookMyShow in 2007.

“We launched with a 100 percent Microsoft stack,” says Viraj, “but soon realized that scaling with Microsoft was not an easy job.” It wasn’t the Windows platform or the developer tools that were the problem, he recalls: “It was the SQL Server database. That was the first bottleneck as we got more and more traffic, and it soaked up more and more resources and money. It wasn’t the right solution. It couldn’t scale with us.”

Spoiler: By 2018, BookMyShow, each month, sells more than 10 million tickets for all manner of movies and events and serves three billion pages a month across the web and its 50 million plus installed apps. Scaling happened.

The plot changed for the better in 2010 with the discovery of MongoDB. “We were looking around for alternatives, and it was the new kid on the block.” (In fact, MongoDB 1.0 had launched just the year before, and MongoDB India was yet to come.) “We tested it internally as a straight distributed database for monolithic SQL database swap. Every web and mobile application we built needed a database that had performance and scalability, and MongoDB blew us away on both.”

MongoDB really won its spurs when the company added Facebook Connect to its registration process. “The registration database was the first thing we built, and it was running on SQL Server. Which was OK, until Facebook Connect came along and we added that as a registration option. Then the database really struggled. We switched to MongoDB and it was night and day. Tremendous gains. Not only did we get the ability to represent customers directly as JSON documents in the database, which made our data model much simpler, but we got all our performance back.

“We want the flexibility of upgrading the schema for future use cases, and that’s so much easier in MongoDB. The data structures we create are clear and easy to read, and it’s so much simpler to understand and extend,” Viraj adds, about their discovery of the advantages of document-model storage.

MongoDB’s second big job was also thoroughly web scale, as it took on the task of giving each of those millions of users their own bespoke, personalized view of the service. This time, the engineering team knew where to start. “About five years ago, we built our personalization engine on MongoDB,” says Viraj, “and it continues to scale with us. It stores a lot of customer information and when a customer visits, it pulls it out, personalizes it in real time and delivers it. That really improves the customer experience. We see an 18 percent increase in conversion, personalized versus non-personalized.”

Today, MongoDB is the default database for developing ideas and services in BigTree, and Viraj cheerfully admits he has long ago stopped counting how many nodes are in use. “Last time I looked, it was between 100-160,” he says.

Future plans include containerization of the databases to smooth out upgrades and ease of deployment with BigTree’s agile DevOps production pipeline and, when the time comes, sharding the customer database. That’s planned for, but not currently necessary. He explains: “We just haven’t reached the point where writes to MongoDB are the limiting factor anywhere in the service. We get a long way with MongoDB replica sets, and are safe in the knowledge that there are no limitations to scaling further when we need to.”

Viraj cares deeply about latency – “We’re a performance-sensitive company” – and much of the service is instrumented by monitoring and management platforms such as New Relic. While initial performance gains were superlative, he says, things have only continued to improve as new features and technologies have been added. “We had been using SQL tabular databases for customer booking history,” says Viraj. “We moved this to MongoDB and have seen a superb performance boost. What used to take up to 5000 ms on traditional SQL databases went down to 10-20 ms on MongoDB using the MMAP storage engine. When we moved to MongoDB’s default WiredTiger storage engine, it improved five to ten times further, to 2ms. We’re still getting this performance, even though the database now has close to 200 million documents.”

There have been other benefits from following MongoDB’s roadmap. “WiredTiger has made things much more cost-effective,” he says. “Security is better as we now encrypt data instead of storing it in plain JSON. Our customer database is five times more compact and our personalization database uses nearly eight times less storage.”

In the future, he says, they expect aggregation queries and query caching mechanisms will improve performance still more. As for reliability, “MongoDB auto-heals so well in the event of any failures in our platform we don’t even need to worry about it. That’s highly appreciated, and much better than any of the other databases we have used.”

There can be few better stories of early adoption and innovation with MongoDB than the success BigTree Entertainment has enjoyed with BookMyShow. Viraj and his engineers insist on picking the right tools for each part of the job running India’s favourite online ticketing service, their long experience of casting this particular actor in so many roles makes MongoDB a performer they’ve come to rely on.


Read more about what others are building with MongoDB.

Future Facilities Triples the Speed of Development with MongoDB

Future Facilities is an OEM partner of MongoDB that helps engineers and IT professionals use virtual prototyping to better plan IT deployments within data centers. By leveraging Computational Fluid Dynamics (CFD) simulation, users can test what-if scenarios unique to their facilities. Their web-based platform was originally built on MySQL, but the team quickly realized that the database couldn’t scale to meet their needs.

Instead, Future Facilities chose to migrate to MongoDB Enterprise Advanced. We sat down with Akhil Docca, Corporate Marketing & Product Strategy Manager of Future Facilities, to learn how migrating to MongoDB helped to triple the speed of development.

----

Can you tell us a little bit about yourself and Future Facilities?

I lead the marketing and product strategy here at Future Facilities. We provide software and services specifically focused on physical infrastructure design and management to customers in the data center market. Our solutions span the entire data center ecosystem, from design to operations. By utilizing a digital clone that we call the Virtual Facility (VF), our users can see the impact of any change like adding new capacity, upgrading equipment, etc., before it is implemented.

In 2004 we released 6SigmaRoom, the data center industry’s leading CFD software for data centers. 6SigmaRoom is how our users create a VF, where they can input live data from their facility, and include necessary objects such as cooling and power units, servers and racks. Having this digital twin allows engineers to troubleshoot, predict and analyze the impact of any deployment plan, and find the optimal method for implementation. With 6SigmaRoom, engineers can speed up capacity planning and improve the overall efficiency and resilience of their data center.

6SigmaRoom is essential for accurate data center capacity planning, however, it’s a heavy-duty desktop application developed for engineers. We wanted to create a product that Facilities and IT teams could use to improve both their processes and overall data center performance. In 2016 we launched a new product, 6SigmaAccess, to do just that.

6SigmaAccess is a multi-user, browser-based software platform that allows IT professionals to interact with their data center model and propose changes through a central management system. The browser-based architecture allows us to load up a lighter version of the 3D model specifically tailored to the IT capacity planning process.

Here’s how it works. IT planners propose changes such as adding new IT or racks, decommissioning equipment or cabinets, or simply editing attributes. These changes are then submitted and queued up via MongoDB. When the data center engineer opens up 6SigmaRoom, the proposed changes are automatically merged, allowing the engineer to simply run the simulation to see how the changes would affect the facility. If the analysis reveals that the proposed installations don’t impact performance, they can then be approved, merged back into the database and scheduled for deployment

MongoDB is the integration layer between 6SigmaAccess and 6SigmaRoom that makes this process possible.

What were you using before MongoDB?

We initially started building on MySQL, but quickly ran into challenges. Whenever we wanted to make an update to the database schema, there would be a huge demand on time and resources from our developers, DBAs, and ops teams. It quickly became apparent that we wouldn’t be able to scale to meet the needs of our customers. While redesigning the platform, we knew that we had to get away from the rigid architecture of a SQL tabular database.

Our goal was to find a data platform that was easy to work with, that developers would like, and that could scale as our business grew. After briefly considering Cassandra and CouchDB, we selected MongoDB for its strong community ecosystem, which made adopting the technology seamless. MongoDB allows us to focus on delivering new features instead of having to worry about managing the database. We are able to code, test and deliver incremental changes to 6SigmaAccess without having to change 6SigmaRoom. This will shorten our development cycles by 66%, from 9 to 3 months.

Can you describe your MongoDB deployment?

The key components of 6SigmaAccess are node.js, angular.js, JSON, and RESTful APIs. 6SigmaRoom is built on C++. We are currently deploying a 3-node cluster to our enterprise customers.

Our technology is built in a way that we aren’t always writing massive amounts of data to the database. 6SigmaAccess changes tend to be a few MBs at a time. 6SigmaRoom data files tend to be in the 100s of GB range, but we only write the data into the database based on a user action. The typical (minimum) server configuration that we’ve sized for our applications are: 4-16 Cores, 64 GB of RAM & 1 TB of disk space.

We are Windows Active Directory compliant and have additional access controls built into our software that enforces roles and permissions when connecting to the database.

What advice would you give someone who is considering using MongoDB for their next project?

Start early and incorporate MongoDB in your project from the beginning. Redundancy and scalability are important at the heart of any application and planning how to achieve those goals from the onset will make development much smoother down the road. Additionally, choose a vendor with a strong support team. We were extremely impressed with MongoDB’s sales and technical team prowess throughout the conversion process, and look forward to working with them in the future.