DarwinBox Evolves HR SaaS Platform and Prepares for 10x Growth with MongoDB Atlas
Evolution favors those that find ways to thrive in changing environments. DarwinBox has done just that, providing a full spectrum of HR services online and going from a standing start to a top-four sector brand in the Indian market in just two years. From 40 enterprise clients in its first year to more than 80 in its second, it now supports over 200,000 employees, and is hungrily eyeing expansion in new territories.
“We’re expecting 10x growth in the next two years,” says Peddi. “That means aggressive scaling for our platform and MongoDB Atlas will play a big role."
Starting from a blank sheet of paper
The company’s key business insight is that employees have grown accustomed to the user experience of online services they access in their personal lives. However, the same ease of use is simply not found at work, especially in HR solutions that address holiday booking, managing benefits, and appraisals. DarwinBox’s approach is to deliver a unified platform of user-friendly HR services to replace a jumble of disparate offerings, and to do so in a way that supports its own aggressive growth plans. The company aims to support nearly every employee interaction with corporate HR, such as recruitment, employee engagement, expense management, separation, and more.
“We started in 2015 from a blank sheet of paper,” Peddi says. “It became very clear very quickly that for most of our use cases, only a non-relational database would work. Not only did we want to provide an exceptionally broad set of integrated services, but we also had clients with a large number of customization requirements. This meant we needed a very flexible data model. We looked at a lot of options. We wanted an open source technology to avoid lock-in and our developers pushed for MongoDB, which fit all our requirements and was a pleasure to work with. Our databases are now 90 percent MongoDB. We expect that to be at 100 percent soon.”
Reducing costs and future-proofing database management
When DarwinBox launched, it ran its databases in-house, which wasn’t ideal. “We have a team of 40+ developers, QA and testers, and three running infrastructure, and suddenly we’re growing much faster than we expected. It’s a good problem to have, but we couldn’t afford to offer anything less than excellent service.” Peddi emphaszied that of all the things they wanted to do to succeed, becoming database management experts wasn’t high on the list.
This wasn’t the only reason that MongoDB Atlas looked like the next logical step for the company when it became available, says Peddi, “We were rapidly developing our services and our customer base, but our strategies for backing up the databases, for scaling, for high availability, and for monitoring performance weren’t keeping up. In the end, we decided that we’d migrate to Atlas for a few major reasons.”
The first reason was the most obvious. “The costs of managing the databases, infrastructure, and backups were increasing. In addition, it became increasingly difficult to self-manage everything as requirements became more sophisticated and change requests became more frequent. Scaling up and down to match demand and launching new clusters consumed precious man hours. Monitoring performance and issue resolution was taking up more time than we wanted. We had built custom scripts, but they weren’t really up to the task.”
With MongoDB Atlas on AWS, Peddi says, all these issues are greatly reduced. “We’re able to do everything we need with our fully managed database very quickly – scale according to business need at the press of a button, for example. There are other benefits. With MongoDB technical engineers a phone call away, we’re able to fix issues far quicker than we could in the past. MongoDB Compass, the GUI for the database, is proving helpful in letting our teams visually explore our data and tune things accordingly.”
Migrating to Atlas has also helped Darwinbox dramatically reduce costs.
We’ve optimized our database infrastructure and how we manage backups. Not only did we bring down costs by 40%, but by leveraging the queryable snapshot feature, we’re able to restore the data we actually need 80% faster.Chaitanya Peddi, Co-founder and Head of Product, DarwinBox
The increased availability and data resilience from the switch to MongoDB Atlas on AWS eases the responsibility in managing the details of 200,000 employees’ working lives. “Data is the most sensitive part of our business, the number one thing that we care about,” says Peddi, “We can’t lose even 0.00001 percent of our data. We used to take snapshots of the database, but that was costly and difficult to manage. Now, it’s more a live copy process. We can guarantee data retention for over a year, and it only takes a few moments to find what you need with MongoDB Atlas.”
For DarwinBox to achieve its target of 10x growth in two years, it has to – and plans to – go international.
“We had that in mind from the outset. We’ve designed our architecture to cope with a much larger scale, both in total employee numbers and client numbers, and to handle different regulatory regimes.” According to Peddi, that means moving to microservices, developing data analytics, maybe even looking at other cloud providers to host the DarwinBox HR Platform. He added: “If we were to do this on AWS and self-manage the database with our current resources, we would have to invest a significant amount of effort into orchestrating and maintaining a globally distributed database. MongoDB Atlas with its cross-region capabilities makes this all much easier.”
Darwinbox is confident that MongoDB Atlas will help the organization achieve its product plans.
“MongoDB Atlas will be able to support the business needs that we've planned out for the next two years.” says Peddi, “We’re happy to see how rapidly the Atlas product roadmap is evolving.”
Bienvenue à MongoDB Atlas: MongoDB as a Service Now Available in France
MongoDB Atlas, the fully automated cloud database, is now available in France on Amazon Web Services and Microsoft Azure. Located in the Paris area, these newly supported cloud regions will allow organizations using MongoDB Atlas to better serve their customers in and around France. For deployments in AWS EU (Paris), the following instance sizes are supported. MongoDB Atlas deployments in this cloud region will automatically be distributed across three AWS availability zones (AZ), ensuring that the failure of a single AZ will not impact the database’s automated election and failover process. Currently, customers deploying to AWS EU (Paris) can also replicate their data to regions of their choosing (to provide even greater fault tolerance or fast, responsive read access) if they’re using the M80 (low CPU), M200 (low CPU), or M400 (low CPU) instance sizes.
For MongoDB Atlas deployments in Azure France Central, the following instance sizes are supported. Deployments in this cloud region will automatically be distributed across 2 Azure fault domains. Assuming that a customer is deploying a 3-node replica set, 2 of those nodes will be located in 1 fault domain and the last node will live in its own fault domain. While this configuration does have a higher chance of loss of availability in the event that a fault domain goes down, cross-region replication can be configured to withstand fault domain and regional outages and is compatible with any Atlas instance size available in Azure France Central.
MongoDB is certified under the EU-US Privacy Shield, and the MongoDB Cloud Terms of Service now includes GDPR-required data processing terms to help MongoDB Atlas customers prepare for May 25, 2018 when the GDPR becomes enforceable.
MongoDB Atlas, la base de donnée entièrement automatisée dans le cloud, est maintenant disponible en France sur Amazon Web Services et Microsoft Azure. Localisés dans la région Parisienne, ces data centers nouvellement supportés permettront à votre organisation d’utiliser MongoDB Atlas pour répondre au mieux aux besoins de vos clients en France et ses environs. Pour les déploiements Atlas sur AWS EU (Paris), les tailles d’instances suivantes sont supportées. Les déploiements sur Atlas dans cette région du cloud seront automatiquement distribués au travers de trois zones de disponibilités pour assurer qu’une panne dans l’une de ces zones n’impacte pas le système d’élection automatique et le processus de basculement vers un nouveau noeud. Actuellement, les clients d’Atlas qui déploient sur AWS EU (Paris) peuvent aussi répliquer leurs données dans les autres régions de leur choix (pour permettre une encore plus grande résistance à la panne ou pour des accès en lecture plus réactifs et plus rapides) s'ils utilisent les tailles d’instances M80 (CPU faible), M200 (CPU faible), ou M400 (CPU faible).
Pour les déploiements dans “Azure France Central”, les tailles d’instances suivantes sont supportées. Les déploiements Atlas dans cette région du cloud seront automatiquement distribuée dans deux data centers Azure. En supposant qu’un client déploie un replica set de trois noeuds, deux de ces noeuds seront localisés dans un data center et le dernier sera situé dans son propre data center. Bien que cette configuration possède plus de chance de perte de disponibilité dans le cas d’une panne sur un datacenter entier, la réplication au travers de plusieurs régions peut être configurée pour résister à une panne générale d’un datacenter ou à des coupures régionales. Cette réplication inter-régionale est compatible avec n’importe quelle taille d’instance disponible sur Azure France Central.
MongoDB est certifié dans le cadre du Privacy Shield EU-US, et les conditions d'utilisation de MongoDB Cloud incluent désormais les termes de traitement de données requis par GDPR pour aider les clients de MongoDB Atlas à se préparer pour le 25 mai 2018.
Cloud Data Strategies: Preventing Data Black Holes in the Cloud
Black holes are regions in spacetime with such strong gravitational pull that nothing can escape. Not entirely destructive as you might have been lead to believe, their gravitational effects help drive the formation and evolution of galaxies. In fact, our own Milky Way galaxy orbits a supermassive black hole with 4.1 million times the mass of the Sun. Some theorize that none of us would be here were it not for a black hole.
On the flip side, black holes can also be found hurtling through the cosmos — often at millions of miles per hour — tearing apart everything in their path. It’s said that anything that makes it into their event horizons, the “point of no return”, will never be seen or heard from again, making black holes some of the most interesting and terrifying objects in space.
Why are we going on about black holes, gravitational effects, and points of no return? Because something analogous is happening right now in computing.
First coined in 2010 by Dave McCrory, the concept of “data gravity” treats data as if it were a planet or celestial object with mass. As data accumulates in an environment, applications and services that rely on that data will naturally be pulled into the same environment. The larger the “mass” of data there is, the stronger the “gravitational pull” and the faster this happens. Applications and services each have their own gravity but data gravity is by far the strongest, especially as:
- The further away data is, the more drastic the impacts on application performance, and user experience. Keeping applications and services physically nearby reduces latency, maximizes throughput, and makes it easier for teams to build performant applications.
- Moving data around has a cost. In most cases, it makes sense to centralize data to reduce that cost, which is why data tends to amass in one location or environment. Yes, distributed systems do allow organizations to partition data in different ways for specific purposes — for example, fencing sets of data by geographic borders to comply with regulations — but within those partitions, minimal data movement is still desirable.
- And finally, efforts to digitize business and organizational activities, processes, and models (dubbed by many as “digital transformation” initiatives) succeed or fail based on how effectively data is utilized. If software is the engine by which digital transformation happens, then data is its fuel.
As in the real world, the larger the mass of an object, the harder it is to move, so data gravity also means that once your mass of data gets large enough, it is also harder (and in some cases, near impossible) to move. What makes this relevant now more than ever is the shift to cloud computing. As companies move to the cloud, they need to make a decision that will have massive implications down the line — where and how are they going to store their data? And how do they not let data gravity in the cloud turn into a data black hole?
There are several options for organizations moving from building their own IT to consuming it as a service in the cloud.
Proprietary Tabular (Relational) Databases
The companies behind proprietary tabular databases often penalize their customers for running these technologies on any cloud platform other than their own. This should not surprise any of us. These are the same vendors that for decades have been relying on selling heavy proprietary software with multi-year contracts and annual maintenance fees. Vendor lock-in is nothing new to them.
Organizations choosing to use proprietary tabular databases in the cloud also carry over all the baggage of those technologies and realize few cloud benefits. These databases scale vertically and often cannot take advantage of cloud-native architectures for scale-out and elasticity without massive compromises. If horizontal scale-out of data across multiple instances is available, it isn’t native to the database and requires complex configurations, app-side changes, and additional software.
Lifting and shifting these databases to the cloud does not change the fact that they’re not designed to take advantage of cloud architectures.
Open Source Tabular Databases
Things are a little better with open source tabular databases insofar as there is no vendor enforcing punitive pricing to keep you on their cloud. However, similar to proprietary tabular databases, most of these technologies are designed to scale vertically; scaling out to fully realize cloud elasticity is often managed with fragile configurations or additional software.
Many companies running these databases in the cloud rely on a managed service to reduce their operational overhead. However, feature parity across cloud platforms is nonexistent, making migrations complicated and expensive. For example, databases running on Amazon Aurora leverage Aurora-specific features not found on other clouds.
Proprietary Cloud Databases
With proprietary cloud databases, it’s very easy to get into a situation where data goes in and nothing ever comes out. These database services run only in their parent cloud and often provide very limited database functionality, requiring customers to integrate additional cloud services for anything beyond very simple use cases.
For example, many of the proprietary cloud NoSQL services offer little more than key-value functionality; users often need to pipe data into a cloud data warehouse for more complex queries and analytics. They also tend to be operationally immature, requiring additional integrations and services to address data protection and provide adequate performance visibility. And it doesn’t stop there. New features are often introduced in the form of new services, and before users know it, instead of relying on a single cloud database, they’re dependent on an ever-growing network of cloud services. This makes it all the more difficult to ever get data out.
The major cloud providers know that if they’re able to get your data in one of their proprietary database services, they’ve got you right where they want you. And while some may argue that organizations should actually embrace this new, ultimate form of vendor lock-in to get the most out of the cloud, that doesn’t leave customers with many options if their requirements, or if data regulations, change. What if the cloud provider you’re not using releases a game-changing service you need to edge out your competition? What if they open up a data center in a new geographic region you’ve prioritized and yours doesn’t have it on their roadmap? What if your main customer dictates that you should sever ties with your cloud provider? It’s happened before.
These are all scenarios where you could benefit from using a database that runs the same, everywhere.
The database that runs the same ... everywhere
As you move into the cloud, how you prevent data gravity from turning against you and limiting your flexibility is simple — use a database that runs the same in any environment.
One option to consider is MongoDB. As a database, it combines the flexibility of the document data model with sophisticated querying and indexing required by a wide range of use cases, from simple key-value to real-time aggregations powering analytics.
MongoDB is a distributed database designed for the cloud at its core. Redundancy for resilience, horizontal scaling, and geographic distribution are native to the database and easy to use.
And finally, MongoDB delivers a consistent experience regardless of where it is deployed:
- For organizations not quite ready to migrate to the cloud, they can deploy MongoDB on premises behind their own firewalls and manage their databases using advanced operational tooling.
- For those that are ready to migrate to the cloud, MongoDB Atlas delivers the database as a fully managed service across more than 50 regions on AWS, Azure, and Google Cloud Platform. Built-in automation of proven practices helps reduce the number of time-consuming database administration tasks that teams are responsible for, and prevents organizations from migrating their operational overhead into the cloud as well. Of course, if you want to self-manage MongoDB in the cloud, you can do so.
- And finally, for teams that are well-versed in cloud services, MongoDB Atlas delivers a consistent experience across AWS, Azure, and Google, allowing the development of multi-cloud strategies on a single, unified data platform.
Data gravity will no doubt have a tremendous impact on how your IT resources coalesce and evolve in the cloud. But that doesn’t mean you have to get trapped. Choose a database that delivers a consistent experience across different environments and avoid going past the point of no return.
To learn more about MongoDB, check out our architecture guide.
You can also get started with a free 512 MB database managed by MongoDB Atlas here.
Header image via Paramount
New to MongoDB Atlas — Cloud Provider Snapshots on Azure, Expanded API for Snapshots and Restore Jobs
One of the core components of MongoDB Atlas, the cloud database service for MongoDB, is the fully managed disaster recovery functionality. With continuous backups, you can take consistent, cluster-wide snapshots of sharded deployments and trigger point-in-time restores to satisfy demanding recovery point objectives (RPOs) from the business. Continuous backups also allow you to query backup snapshots to restore granular data in a fraction of the time it would take to restore an entire snapshot.
Today we’re making it even easier to manage your backups with an expanded Atlas API. Programmatically get metadata about your snapshots, delete them, or change their expiration. Trigger restore jobs and retrieve them. The MongoDB Atlas API allows you to incorporate the rich functionality of Atlas fully managed backups into workflows optimized for how you manage your IT resources.
Visit our documentation for more information.
Cloud Provider Snapshots for Azure
We are also introducing a new type of managed backup service for MongoDB Atlas, using the native snapshot capabilities of your cloud provider. With cloud provider snapshots, your backups will be stored in the same cloud region as your managed databases, granting you better governance over where all of your data lives.
Compared to continuous backups, cloud provider snapshots allow for fast restores of snapshot images. Pricing, which varies slightly from region to region, is also lower.
Cloud provider snapshots are available today for replica sets on Microsoft Azure. Support for Amazon Web Services and Google Cloud Platform will be rolled out later this year.
If you’re considering switching backup methods (from continuous backup to cloud provider snapshots), consider creating a separate project in MongoDB Atlas. For each Atlas project, the first cluster you enable backups for will dictate the backup method for all subsequent clusters in the project. To change the backup method within the same the project, disable backups for all clusters in the project, then re-enable backups using your preferred backup methodology. MongoDB Atlas automatically deletes any stored snapshots when you disable backups for a cluster.
Not yet a MongoDB Atlas user? Create an account and get a free 512 MB database.
Push Your MongoDB Atlas Alerts to Datadog
MongoDB Atlas, the fully managed cloud database, provides customers with pre-built and customizable alerts that can easily be configured for different channels, including Slack, Hipchat, PagerDuty, Flowdock, and more.
Due to popular demand, we’ve recently added Datadog as an optional endpoint for Atlas alerts. An increasing number of companies are using Datadog to monitor their entire application estate; this new integration will allow them to quickly get a sense of any database alerts from a dashboard they regularly view.
Setup is simple. Select a MongoDB Atlas Project, and click on “Settings” in the left-hand menu. Scroll down to “Datadog Settings” and paste in your Datadog API key.
Next, click on “Alerts” in the left-hand menu. You will see a screen that shows all alerting activity. Click on the green “Add” button in the upper right corner of your screen to create a new alert. You can now customize a new alert and specify “Datadog” as the endpoint.
To send an existing alert to Datadog, simply click on “Alert Settings” in the top navigation of your main Alerts screen. This will show you all of your existing alerts, and allow you to edit them using the same UI you use to create new alerts.
And that’s it. You should now start seeing MongoDB Atlas alerts in Datadog.
Not yet a MongoDB Atlas user? Create an account and get a free 512 MB database.
New to MongoDB Atlas: Availability across all Google Cloud Platform regions
A wide variety of companies around the world, from innovators in the social media space to industry leaders in energy, are running MongoDB on Google Cloud Platform (GCP). Increasingly, these organizations are consuming MongoDB as a fully managed service with MongoDB Atlas, which boosts the productivity of teams that touch the database by reducing the operational overhead of setup, ongoing management, and performance optimization.
When MongoDB Atlas became available on GCP last June, users were able to run it in 4 regions: us-east1 (South Carolina), us-central1 (Iowa), asia-east1 (Taiwan), europe-west1 (Belgium). This week we’re excited to launch the service across all Google Cloud Platform regions, allowing you to easily deploy and run MongoDB near you.
Most GCP regions are made up of 3 isolated locations called zones where resources can be provisioned. MongoDB Atlas automatically distributes a 3-node replica set across the zones in a region, ensuring that the automated election and failover process can complete successfully if the zone containing the primary node becomes unavailable.
For Atlas deployments in GCP’s Singapore region, which contains 2 zones instead of 3, it’s recommended that users enable Atlas’s cross-region replication to obtain a similar level of redundancy.
Atlas is available across all GCP regions now. We’re excited to see what you build with MongoDB and Google services!
Not an Atlas user yet? Get started here.
New to MongoDB Atlas: Pause/Resume Clusters, M200 Instance Size on AWS
MongoDB Atlas, the managed MongoDB service, now allows you to pause and restart your database clusters. This makes it easy and affordable for you to integrate MongoDB into DevOps workflows where always-on access to the underlying data is not required — e.g. development or testing.
When combined with Atlas’s fully managed backup service, this new functionality allows you to seamlessly create multiple environments for development and testing while keeping infrastructure and operational costs to a minimum.
For example, you could restore a subset of your production data (using queryable snapshots) to a smaller database to try out new features introduced in MongoDB 3.6. You can even restore to different Atlas Projects, regions, or clouds to give different members of your organization local access. And now with the pause cluster feature, your development and testing teams can easily stop any databases when they’re not being used.
Pausing and resuming a cluster requires just a few clicks in the Atlas UI or a single call with the Atlas API. When the cluster is paused, you are charged for provisioned storage and any associated backups, but not for compute instance hours associated with your Atlas cluster. Clusters can be paused for up to 7 days. If you do not resume a paused cluster within the 7 day window, Atlas will automatically resume the cluster.
The pause/resume feature is now available for all dedicated instance sizes (M10 and above) in every supported region on AWS, Microsoft Azure, and Google Cloud Platform.
Larger Max Instance Size on AWS (M200)
MongoDB Atlas now supports a larger instance size on Amazon Web Services. The new M200 clusters are designed for the most demanding production workloads and peak hours of activity. Each instance features 64 vCPUs, 256 GB of RAM, and 1500 GB of storage included, with 25 Gigabit network connectivity.
M200 instances are available in all 14 AWS regions supported by MongoDB Atlas.
Not an Atlas user yet? Get started with a 512MB database for free.
Q4 Inc. Relies on MongoDB Atlas to Boost Productivity, Outpace the Competition, and Lower Costs
Investor relations (IR) teams integrate information from finance, communications, compliance, and marketing to drive the conversation between a company, their shareholders and investors, and the larger financial community. Knowing the positive effect that a sophisticated web presence would have on investor sentiment, in 2006, Q4 Inc. (Q4) set out to provide multi-functional website solutions for IR teams. Q4 has since expanded their offerings to include capital markets intelligence, and Q4 Desktop – the industry’s first fully-integrated IR platform, which combines communications tools, surveillance, and analytics into a fully featured IR workflow and Customer Relationship Management (CRM) application.
Now with over 1,000 global clients, including many of the Fortune 500, Toronto-based Q4 is the fastest-growing provider of cloud-based IR solutions in the industry. We sat down with Alex Corotchi, VP of Technology, to learn more about their company and how they use MongoDB Atlas across their product portfolio.
Tell us about Q4 and how it’s unique in your industry.Our goal is to provide best-in-class products for every aspect of IR so that our customers can engage with the right investors, at the right time, with the right message. We started with corporate websites, then moved into investor sites and mobile solutions. As we realized the need for a great, digital-first experience in IR, we added webcasting and teleconferencing to form a complete line of communications solutions. In 2015, we expanded into stock surveillance and capital markets intelligence. Today, we provide a full suite of IR products, many of which are integrated into Q4 Desktop. We are unique in that we typically adopt new technologies earlier than our competition, are always pushing the boundaries, and are helping to make our customers leaders in IR.
How were you introduced to MongoDB? What problem were you trying to solve at the time?
We were introduced to MongoDB a number of years ago when we were building a small application that integrated streams from multiple social media sources. Our relational database at the time, SQL Server, made it difficult to effectively work with different data formats and data types. Instead we turned to MongoDB, which didn’t force us to define schemas upfront.
At around the same time, we were rapidly scaling the company and needed to onboard many new developers. By using MongoDB in our technology stack, and taking advantage of its ease of use and the quality of the online documentation, we were able to significantly decrease the amount of time it took to ramp new hires and make them productive. This was another main driver behind our adoption of the technology.
Today with MongoDB, I can ramp up a new developer in less than a week. I can’t say that about any other database. This is important for the business because it’s our developers that drive our products. Every day is important and we save a significant amount of money by using our time more effectively.
What applications within your product portfolio use MongoDB?
Today, three out of our four main products are supported by MongoDB in some way. Those products are: Websites, which includes corporate websites, investor websites, online reports, and newsrooms; Intelligence, which helps our clients convert capital markets information into actionable intelligence; and Q4 Desktop, our integrated IR CRM and workflow platform.
MongoDB is used for a wide variety of datasets, including news, press releases, user data, stock and market data, and social media. We run an equally wide range of queries against our data - everything from simple find operations to complex aggregations.
What other databases do you use? How do you decide when to use MongoDB versus another technology?
MongoDB is one of a few databases we use in our company. For relational data, we use either SQL Server or PostgreSQL. We also use DynamoDB for a very specific set of use cases. DynamoDB is a good service but as a database, it’s not nearly as powerful as MongoDB. There are no aggregations, the query language isn’t as elegant, and we don’t use it to store anything complex.
The majority of our products are composed of specialized microservices and for us, MongoDB is a great fit for working within this paradigm. In general, MongoDB is our go-to database for data that doesn’t map neatly into rows and columns, anytime we can benefit from not having a predefined schema, or when we need to combine multiple data structures. We’ve also found that MongoDB queries are typically more transparent than the long, complex SQL queries most of us have grown accustomed to. This can save us up to an hour a day during debugging.
How do you currently run MongoDB?
When it comes to running MongoDB, we rely on the fully managed database service, MongoDB Atlas. In our experience, it is the best automated service for running MongoDB in the cloud. Atlas provides more functionality, more customization options, and better tools than the third party MongoDB as a service providers we’ve used in the past.
What alternatives did you evaluate?
When we were first looking at MongoDB services, MongoDB Atlas had not yet been released. We started our development on Heroku where some of the third party MongoDB service providers are available as Heroku Elements (add-ons). While Heroku was great at keeping our overhead low at the start, costs grew significantly when our products began taking off.
The pricing model of the first third-party MongoDB service provider we tried (Compose) quickly became untenable. We also found that the latest versions of the database were not supported.
We migrated to another MongoDB service (mLab) but again, we encountered issues with the pricing model. Their pricing was strictly tier-based with little flexibility to tweak our deployment configuration.
Lastly, we deployed a few MongoDB clusters using a service associated with Rackspace Cloud (ObjectRocket). Once again, we found the service to be behind in delivering the latest database features and updates.
Most recently, we migrated to MongoDB Atlas because of the cost savings, and because managing a growing microservices architecture leveraging multiple MongoDB service providers with Heroku was becoming increasingly difficult. By moving to Atlas, we’re able to save money and consolidate the management of all of our MongoDB clusters.
Tell us about your migration to MongoDB Atlas.
During the migration, the Atlas team helped us ensure that everything went seamlessly. Anytime you migrate data from one place to another, it’s a big risk to the business. However, we found the Atlas live migration service to be amazing. We originally tested it with pre-production and staging environments. When all went well, we completed a rapid production migration process and didn’t even need a maintenance window. I was pleasantly surprised with how smooth our move to MongoDB Atlas was.
Which public cloud do you use?
We’re about 70-80% on AWS and the rest is on Rackspace. We don’t just use AWS as a hosting provider, but also for Alexa, Lambda, and their streaming offering.
For example, we were able to quickly use MongoDB Stitch, MongoDB’s backend as a service, to integrate Alexa Voice Server, AWS Lambda, and our data in MongoDB Atlas to deliver a voice-driven demo at one of the largest investor relations conferences.
How has your experience been with MongoDB Atlas?
We have over a dozen MongoDB clusters currently running in MongoDB Atlas across different projects, all on AWS. We are planning to migrate even more over in the next couple of months.
I like the fully managed backup service that Atlas provides as it has more functionality than anything I’ve used with other providers. The ability to restore to any point in time, restore to different projects in MongoDB Atlas, and query snapshots in place, allows us to meet our disaster recovery objectives and easily spin up new environments for testing.
Additionally, the ongoing support from the MongoDB Atlas team has been very helpful. Even the simple chat in the application UI has been very responsive. Having quick and easy access to an expert support line is like having a life preserver. We don’t want to use it much, but when it’s needed, we need to know that it will work.
Alex – I’d like to thank you for taking the time to share your insights with the MongoDB community.
Available on AWS, Azure, and GCP, MongoDB Atlas is the best way to deploy, operate, and scale MongoDB in the cloud. Get started in minutes with a 512MB database for free.
Leaf in the Wild: World’s Most Installed Learning Record Store Migrates to MongoDB Atlas to Scale Data 5x, while Reducing Costs
Learning Locker moves away from ObjectRocket to scale its learning data warehouse, used by the likes of Xerox, Raytheon and U.K. Universities.
From Amazon’s recommendations to the Facebook News Feed, personalization has become ingrained in consumer experience, so it should come as no surprise that resourceful educators are now trying improve learning outcomes with that same concept. After all, no two students are identical in much the same way that no two consumers are exactly alike. Developing a truly personalized educational experience is no easy feat, but emerging standards like the xAPI are helping to make this lofty goal a reality.
xAPI is an emerging specification that enables communication between disparate learning systems in a way that standardizes learning data. That data could include things like a student’s attendance in classes, or participation in online tools, but can also stretch to performance measures in the real-world, how students apply their learning. This data-led approach to Learning Analytics is helping educators improve learning practices, tailor teaching and take early intervention if it looks like a student is moving in the wrong direction.
But the implications of this go far beyond the classroom, and increasingly companies are using these same techniques to support their employees development and to measure the impact of training on performance outcomes. Whilst educators are predicting the chances of a particular student dropping out, businesses can use these same tools to forecast organizational risk, based on compliance training and performance data, for example.
We recently spoke with James Mullaney, Lead Developer at HT2 Labs a company that is at the forefront of the learning-data movement. HT2 Labs’ flagship product, Learning Locker, is an open source data warehouse used by the likes of the Xerox, Raytheon and a wide-range of universities to prove the impact of training and to make more informed decisions on future learning design. To continue to scale the project, better manage their operations and reduce costs, Learning Locker migrated from ObjectRocket to database as a service MongoDB Atlas.
Tell us about HT2 Labs and Learning Locker.
HT2 Labs is the creator of Learning Locker, which is a data warehouse for learning activity data (commonly referred to as a Learning Record Store or LRS). We have a suite of other learning products that are all integrated; Learning Locker acts as the hub that binds everything together. Our LRS uses the xAPI, which is a specification developed in part by the U.S. Department of Defense to help track military training initiatives. It allows multiple learning technology providers to send data into a single data store in a common format
We started playing around with xAPI around four years ago as we were curious about the technology and had our own Social Learning Management System (LMS), Curatr. Today, Learning Locker receives learning events via an API, analyzes the data stored, and is instrumental in creating reports for our end customers.
Who is using Learning Locker?
The software is open source so our users range from hobbyists to enterprise companies, like Xerox, who use our LRS to track internal employee training.
Another example is Jisc, the R&D organization that advances technologies in UK Higher & Further Education.. Jisc are running one of the largest national-level initiatives to implement Learning Analytics across universities in the UK and our LRS is used to ingest data and act as a single source of data for predictive models. This increased level of insight into individual behavior allows Jisc to do some interesting things, such as predict and preempt student dropouts.
How has Learning Locker evolved?
We’re currently on version two of Learning Locker. We’ve open sourced the product and we’ve also launched it as a hosted Software as a service (SaaS) product. Today we have clients using our LRS in on-premise installations and in the cloud. Each on-prem installation comes packaged with MongoDB. The SaaS version of Learning Locker typically runs in AWS supported by MongoDB Atlas, the managed MongoDB as a Service.
Tell us about your decision to go with MongoDB for the underlying database.
MongoDB was a very natural choice for us as the xAPI specification calls for student activities to be sent as JSON. These documents are immutable. For example, you might send a document that says, “James completed course XYZ.” You can’t edit that document to say that he didn’t complete it. You would have to send another document to indicate a change. This means that scale is very important as there is a constant stream of student activity that needs to be ingested and stored. We’ve been very happy with how MongoDB, with its horizontal scale-out architecture, is handling increased data volume; to be frank, MongoDB can handle more than our application can throw at it.
In fact, our use of MongoDB is actually award-winning: Last year we picked up the MongoDB Innovation Award for best open source project.
Beyond using the database for ingesting and storing data in Learning Locker, how else are you using MongoDB?
As mentioned earlier, our LRS runs analytics on the data stored and those analytics are then using in reporting for our end users. For running those queries, we use MongoDB’s aggregation framework and the associated aggregation APIs. This allows our end users to get quick reports on information they’re interested in, such as course completion rates, score distribution, etc.
Our indexes are also rather large compared to the data. We index on a lot of different fields using MongoDB’s secondary indexes. This is absolutely necessary for real-time analytics, especially when the end user wants to ask many different questions. We work closely with our clients to figure out the indexes that make the most sense based on the queries they want to run against the data.
Tell us about your decision to run MongoDB in the cloud. Did you start with MongoDB Atlas or were you using a third party vendor?
Our decision to use a MongoDB as a service provider was pretty simple — we wanted someone else to manage the database for us. Initially we were using ObjectRocket and that made sense for us at the time because we were hosting our application servers on Rackspace.
Interesting. Can you describe your early experiences with MongoDB Atlas and the migration process?
We witnessed the launch of MongoDB Atlas last year at MongoDB World 2016 and spun up our first cluster with Atlas in October. It became pretty clear early on that it would work for what we needed. First we migrated our Jisc deployment and our hosted SaaS product to MongoDB Atlas and we also moved our application servers to AWS for lower latency. The migration was completed in December with no issues.
Why did you migrate to MongoDB Atlas from ObjectRocket?
Cost was a major driving force for our migration from ObjectRocket. We’ve been growing and are now storing five times as much data in MongoDB Atlas at about the same costs.
ObjectRocket was also pretty opaque about what was happening in the background and that’s not the case with MongoDB Atlas, which gives you greater visibility and control. I can see, for example, exactly how much RAM I’m using at any point in time.
And finally, nobody is going to tell you that security isn’t important, especially in an industry where we’re responsible for handling potentially-sensitive student data. We were very happy with the native security features in MongoDB Atlas and the fact that we aren’t charged a percentage uplift for encryption, which was not the case with ObjectRocket.
Do you have any plans to integrate MongoDB with any other technologies to build more functionality for Learning Locker?
We’re looking into Hadoop, Spark, and Tableau for a few of our clients. MongoDB’s native connectors for Hadoop, Spark, and BI platforms come in handy for those projects.
Any advice for people looking into MongoDB and MongoDB Atlas?
Plan for scale. Think about what you’re doing right now and ask yourself, “Will this work when I have 100x more data? Can we afford this at 100x the scale?”
The MongoDB Atlas UI makes most things extremely easy, but remember that some things you can only do through the mongo shell. You should ensure your employees learn or retain the skills necessary to be dangerous in the CLI.
And this isn’t specific to just MongoDB, but think about the technology you’re partnering with and the surrounding community. For us, it’s incredibly important that MongoDB is a leader in the NoSQL space as it’s made it that much easier to talk about Learning Locker to prospective users and clients. We view it as a symbiotic relationship; if MongoDB is successful then so are we.
James, thanks for taking the time to share your experiences with the MongoDB community and we look forward to seeing you at MongoDB World 2017.
For deploying and running MongoDB, MongoDB Atlas is the best mix of speed, scalability, security, and ease-of-use.
Thermo Fisher moves into the cloud with MongoDB Atlas & AWS
Biotechnology giant uses MongoDB Atlas and an assortment of AWS technologies and services to reduce experiment times from days to minutes.
Thermo Fisher (NYSE: TMO) is moving its applications to the public cloud as part of a larger Thermo Fisher Cloud initiative with the help of offerings such as MongoDB Atlas and Amazon Web Services. Last week, our CTO & Cofounder Eliot Horowitz presented at AWS re:Invent with Thermo Fisher Senior Software Architect Joseph Fluckiger on some of the transformative benefits they’re seeing internally and across customers. This recap will cover Joseph’s portion of the presentation.
Joseph started by telling the audience that Thermo Fisher is maybe the largest company they’d never heard of. Thermo Fisher employs over 51,000 people across 50 countries, with over $17 billion in revenues in 2015. Formed a decade ago through the merger of Thermo Electron & Fisher Scientific, it is one of the leading companies in the world in the genetic testing and precision laboratory equipment markets.
The Thermo Fisher Cloud is a new offering built on Amazon Web Services consisting of 35 applications supported by over 150 Thermo Fisher developers. It allows customers to streamline their experimentation, processing, and collaboration workflows, fundamentally changing how researchers and scientists work. It serves 10,000 unique customers and stores over 1.3 million experiments, making it one of the largest cloud platforms for the scientific community. For internal teams, Thermo Fisher Cloud has also streamlined development workflows, allowing developers to share more code and create a consistent user experience by taking advantage of a microservices architecture built on AWS.
One of the precision laboratory instruments the company produces is a mass spectrometer, which works by taking a sample, bombarding it with electrons, and separating the ions by accelerating the sample and subjecting it to an electric or magnetic field. Atoms within the sample are then sorted by mass and charge and matched to known values to help customers figure out the exact composition of the sample in question. Joseph’s team develops the software powering these machines.
Thermo Fisher mass spectrometers are used to:
- Detect pesticides & pollutants — anything that’s bad for you
- Identify organic molecules on extraplanetary missions
- Process samples from athletes to look for performance-enhancing substances
- Drive product authenticity tests
- And more
During the presentation, Joseph showed off one application in the Thermo Fisher Cloud called MS Instrument Connect, which allows customers to see the status of their spectrometry instruments with live experiment results from any mobile device or browser. No longer does a scientist have to sit at the instrument to monitor an ongoing experiment. MS Instrument Connect also allows Thermo Fisher customers to easily query across instruments and get utilization statistics. Supporting MS Instrument Connect and marshalling data back and forth is a MongoDB cluster deployed in MongoDB Atlas, our hosted database as a service.
Joseph shared that MongoDB is being used across multiple projects in Thermo Fisher and the Thermo Fisher Cloud, including Instrument Connect, which was originally deployed on DynamoDB. Other notable applications include the Thermo Fisher Online Store (which was migrated from Oracle), Ion Reporter (which was migrated from PostgreSQL), and BioPharma Finder (which is being migrated from SQL Lite).
To support scientific experiments, Thermo Fisher needed a database that could easily handle a wide variety of fast-changing data and allow its customers to slice and dice their data in many different ways. Experiment data is also very large; each experiment produces millions of “rows” of data. When explaining why MongoDB was chosen for such a wide variety of use cases across the organization, Joseph called the database a “swiss army knife” and cited the following characteristics:
- High performance
- High flexibility
- Ability to improve developer productivity
- Ability to be deployed in any environment, cloud or on premises
What really got the audience’s attention was a segment where Joseph compared incumbent databases that Thermo Fisher had been using with MongoDB.
MongoDB compared to MySQL (Aurora)
If I were to reduce my slides down to one, this would be that slide,” Joseph stated, “This is absolutely phenomenal. What we did was we inserted data into MongoDB & Aurora and with only 1 line of code, we were able to beat the performance of MySQL.
In additional to delivering 6x higher performance with 40x less code, MongoDB also helped reduce the schema complexity of the app.
MongoDB compared to SQL Lite
For the mass spectrometry application used in performance enhancing drug testing, Thermo Fisher rewrote the data layer from SQL Lite to MongoDB and reduced their code by a factor of about 3.5.
MongoDB compared to DynamoDB
Joseph then compared MongoDB to DynamoDB, stating that while both databases are great and easy to deploy, MongoDB offers a more powerful query language for richer queries to be run and allows for much simpler schema evolution. He also reminded the audience that MongoDB can be run in any environment while DynamoDB can only be run on AWS.
Finally, Joseph showed an architecture diagram showing how MongoDB is being used with several AWS technologies and services (including AWS Lambda, Docker, & Apache Spark) to parallelize algorithms and significantly reduce experiment processing times.
He concluded his presentation by explaining why Thermo Fisher is pushing applications to MongoDB Atlas, citing its ease of use, the seamless migration process, and how there has been no downtime, even when reconfiguring the cluster. The company began testing MongoDB Atlas around its release date in early July and began launching production applications on the service in September. With the time Thermo Fisher team is saving by using MongoDB Atlas (that would have otherwise been spent on writing and optimizing their data layer), they’re able to invest more time in improving their algorithms, their customer experience, and their processing infrastructure.
Anytime I can use a service like MongoDB Atlas, I’m going to take that so that we at Thermo Fisher can focus on what we’re good at, which is being the leader in serving science.
To view Joseph & Eliot’s AWS re:Invent presentation in its entirety, click here.