Peter Zawistowicz

15 results

Adopting a Serverless Approach at Bazaarvoice with MongoDB Atlas and AWS Lambda

I recently had the pleasure of welcoming Ani Hammond, Senior Staff Software Engineer from Bazaarvoice , to the MongoDB World stage. To a completely packed room, Ani chronicled her team’s journey as they replatformed Bazaarvoice’s Curations service from a runaway monolith architecture to a completely serverless architecture backed by MongoDB Atlas. Even if you’ve never heard of Bazaarvoice, it’s almost impossible that you’ve never interacted with their services. To use Ani’s own description, “If you're shopping online and you’re reading a review, it's probably powered by us.” Bazaarvoice strives to connect brands and retailers with consumers through the gathering, curation, and display of user-generated content—anything from pictures on Instagram to an online product review—during a potential customer’s buying journey. To give you a sense of the scale of this task, Bazaarvoice clocked over a billion total page views between Thanksgiving Day and Cyber Monday in 2017, peaking at around 6,000 page views per second! Even if you’ve never heard of Bazaarvoice, it’s almost impossible that you’ve never interacted with their services. One of the technologies behind this herculean task is the Curations platform. To understand how this platform works, let’s look at an example: An Instagram user posts a cute photo of their child wearing a particular brand’s rain boots. Using Curations, that brand is watching for specific content that mentions their products, so the social collection service picks up that post and shows it to the client team in the Curations application. The post can then be enriched in various manual and automatic ways. For example, a member of the client team can append metadata describing the product contained in the image or automatic rules can filter content for potentially offensive material. The Curations platform then automates the process of securing the original poster’s permission for the client to use their content. Now, this user-generated content is able to be displayed in real time on the brand’s homepage or product pages to potential customers considering similar products. In a nutshell, this is what Curations does for hundreds of clients and hundreds of thousands of individual content pieces. The technology behind Curations was previously a monolithic Python/Django-based stack on Amazon EC2 instances on top of a MySQL datastore deployed via RDS. The technology behind Curations was previously a monolithic Python/Django-based stack on Amazon EC2 instances on top of a MySQL datastore deployed via RDS. This platform was effective in allowing Bazaarvoice to scale to hundreds of new clients. However, this architecture did have an Achilles heel: each additional client onboarded to Bazaarvoice’s platform represented an additional Python/Django/MySQL cluster to manage. Not only was this configuration expensive (approximately $60,000/month), the operational overhead generated by each additional cluster made debugging, patching, releases, and general data management an ever-growing challenge. As Ani put it, “Most of our solutions were basically to throw more hardware/money at the problem and have a designated DevOps person to manage these clusters.” One of the primary factors in selecting MongoDB for the new Curations platform was its support for a variety of different access patterns. For example, the part of the platform responsible for sourcing new social content had to support high write volume whereas the mechanism for displaying the content to consumers is read-intensive with strict availability requirements. Diving into the specifics of why the Bazaarvoice team opted to move from a MySQL-based stack to one built on MongoDB is a blog post for another day. (Though, if you’d like to see what motivated other teams to do so, I recommend How DevOps, Microservices, and MongoDB are Making HSBC “Simpler, Better, and Faster” and Breuninger delivers omnichannel shopping experience for thousands of daily online users .) That is to say, the focus of this particular post is the paradigm shift the Curations team made from a linearly-scaling monolith to a completely serverless approach, underpinned by MongoDB Atlas. The new Curations platform is broken into three distinct services for content collection, enrichment, and display. The collections service is powered by a series of AWS Lambda functions triggered by an Amazon Kinesis stream written in Node.js whereas the enrichment and display services are built on autoscaling AWS Elastic Beanstalk instances. All three services making up the new Curations platform are backed by MongoDB Atlas. Not only did this approach address the cluster-per-customer challenges of the old system, but the monthly costs were reduced by nearly 90% to approximately $6,500/month. The results are, again, best captured by Ani’s own words: Massive cost savings, huge performance gains, strong consistency, and a handful of services rather than hundreds of clusters. MongoDB Atlas was a natural fit in this new serverless paradigm as the team is fully able to focus on developing their product rather than on infrastructure management. In fact, the team had originally opted to manage the MongoDB instances on AWS themselves. After a couple of iterations of manual deployment and management, a desire to gain even more operational efficiency and increased insight into database performance prompted their move to Atlas. According to Ani, the cost of migrating to and leveraging a fully managed service was, "Way cheaper than having dedicated DevOps engineers.” Atlas’ support for direct VPC peering also made the transition to a hosted solution straightforward for the team. Speaking of DevOps, one of the first operational benefits Ani and her team experienced was the ability to easily optimize their index usage in MongoDB. Previously, their approach to indexing was “build stuff that makes sense at the time and is easy to iterate on.” After getting up and running on Atlas, they were able to use the built-in Performance Advisor to make informed decisions on indexes to add and unused ones to remove. As Ani puts it: An index killed is as valuable as an index added. This ensures all your indexes to fit into memory and a bad index doesn't push out the good ones. Ani’s team also used the Atlas Performance Advisor to diagnose and correct inefficient queries. According to her, the built-in tools helped keep the team honest, "[People] say, ‘My database isn't scaling. It's not able to perform complex queries in real time...it doesn't work.’ Fix your code. The hardware is great, the tools are great but they can only carry you so far. I think sometimes we tend to get sloppy with how we write our code because of how cheap and how easy hardware is but we have to write code responsibly too.” In another incident, a different Atlas feature, the Real Time Performance Panel, was key to identifying an issue with high load times in the display service. Some client’s displays were taking more than 6 seconds to load. (For context, content delivery network provider, Akamai, found that a two-second delay in web page load time can cause bounce rates to double! ) High-level metrics in Datadog reported 5+ seconds query response times, while Atlas reported less than 100 ms response times for the same query. The team used both data points to triangulate and soon realized the discrepancy was a result of the time it took for Lambda to connect to MongoDB for each new operation. Switching from standard Lambda functions to a dockerized service ensured each operation could leverage an open connection rather than initiating a “cold start.” I know a lot of the cool things that Atlas does can be done by hand but unless this is your full-time job, you're just not going to do it and you’re not going to do it as well. Ani’s team also used the Atlas Performance Advisor to diagnose and correct inefficient queries. Before wrapping up her presentation, Ani shared an improvement over the old system that the team wasn’t expecting. Using Atlas, they were able to provide the customer support and services teams read-only views into the database. This afforded them deeper insight into the data and allowed them to perform ad-hoc queries directly. The result was a more proactive approach to issue management, leading to an 80% reduction in inbound support tickets. By re-architecting their Curations platform, Bazaarvoice is well-positioned to bring on hundreds of new clients without a proportional increase in operations work for the team. But once again, Ani summarized it best: As the old commercial goes… ‘Old platform: $60,000. New platform: $6,000. Getting to focus all of my time on development: priceless.' Thank you very much to Ani Hammond and the rest of the Curations team at Bazaarvoice for putting together the presentation that inspired this post. Be sure to check out Ani’s full presentation in addition to dozens of other high-quality talks from MongoDB World on our YouTube channel . If you haven’t tried out MongoDB Atlas for yourself, you can started with a free sandbox cluster. Start Here

August 9, 2018

MongoDB World: Just another conference or something different?

When the team at MongoDB considers which events to invest in, we do so with the understanding that everyone is operating with a limited budget for travel and professional development. We’re also very mindful of the many other tech conferences and trade shows in the space. With this in mind, we focus our events on the opportunities that provide the biggest, most visible return on your time and financial investment.

April 19, 2018

16 Cities in 5 Months: The MongoDB team is coming to an AWS Summit near you

As our community of users continues to grow and become more diverse, we want to ensure all of our customers are fully equipped to be successful on MongoDB Atlas. To that end, we have partnered with AWS, committing to 16 of their regional Summits. These 16 events span 13 different countries and expect to draw thousands of members of the AWS and MongoDB communities.

April 3, 2018

How Voya.ai uses MongoDB Atlas to Bring a Seamless Customer Experience to the Business Travel Market

As consumer travel apps like Expedia and Kayak are continuously innovating to provide more seamless booking experiences for their customers, their B2B counterparts can seem very outdated in comparison. Hamburg-based startup, Voya.ai is looking to change that. We recently sat down with Voya’s CTO, Pepijn Schoen, to learn more about how they are using MongoDB alongside natural language processing and machine learning to bring B2B travel booking into 2018 with their chat-based app. MongoDB: Tell me about Voya. Pepijn Schoen: Voya is a purely digital, business travel app that brings the convenience and customer experience of B2C travel booking tools to the B2B market. We use a chat-based, conversational interface to interpret our users’ travel needs and extract the search parameters using natural language processing. We started the company in 2015 and after winning the Best Travel Technology Award in 2016, grew the company to a now 50-people team of travel experts, servicing 150 companies with their business travel needs. What about the B2B travel booking market are you trying to disrupt? Most of the tools businesses use today were created 10 to 20 years ago. Since then, companies like Expedia, Kayak, and Booking.com have transformed our expectations of what a travel booking experience should be like. In addition, today’s business travel booking process includes many different layers of vendors. For example, a company may work with different vendors for flights, car rental, and hotels on top of vendors for expense management and for providing the search and booking front end. All of this creates unnecessary friction for the end user. Voya allows a direct, simple solution for business travelers to create itineraries that comply with company policies and expense processes. Tell me about how Voya is using AI. We use artificial intelligence in two primary ways. Firstly, we use natural language processing to interpret chat-based user inputs. For most requests, the entire booking experience can be handled this way, but we also have a mechanism to connect the user to a live agent for more complex requests. Secondly, we have built a proprietary flight and hotel matching engine that considers a multitude of different parameters when recommending travel options to a user. For example, companies may have price or airline restrictions, users may have a preference for a certain rewards program, and nearly all business travellers prefer shorter, more direct routes over long layovers. Our matching engine considers these factors to suggest the best flights and hotels. What tools and technologies are you using to make this possible? Our NLP is powered by a Java-based application using Google Dialogflow and the Layer API for messaging. The rest of our stack includes AngularJS (including Angular 4 and 5), Python, .NET, MySQL, and MongoDB in AWS via MongoDB Atlas . We also use Kubernetes which makes our deployment very portable. For example, we can leverage Google technologies while keeping our primary datastores in AWS. How are you using MongoDB? We use MongoDB to store data about almost every one of the approximately 1.5 million hotels in the world. The support for GeoJSON was one of the key reasons we decided to build on MongoDB, and we feel it is the best option to power our geolocation searches. By storing hotel location and metadata in MongoDB, we can then let our users easily find matching properties by generating geospatial queries behind the scenes without custom application code. There was a learning curve with this technology. For example, we had to troubleshoot a query that was dependent on a 2D index, rather than a more appropriate 2Dsphere index to take into consideration the fact that the Earth is not flat! Currently, we query with a bounding box, but cities are never perfect squares and are therefore best approximated with polygons. We could definitely improve the accuracy of the data we get back from this type of query by using a more complex model. Why did you decide to use MongoDB Atlas? Originally, Voya was built on a single EC2 instance in AWS and we were running several other tools in a similar way. Rather than spread ourselves too thin building scalable, always-on, backed up clusters ourselves, we explicitly looked for managed service— MongoDB Atlas was a great fit. The other advantage of building on MongoDB Atlas is that it allows us to expand globally without significant time investments from our team. Our application is currently available in English and German, with most of our users in Central Europe, so we minimize latency by running our MongoDB cluster in AWS’s Frankfurt region. As our user base expands, the ability to take advantage of multi-region replication to maintain this level of service will be incredibly valuable. What’s next for Voya? As a full-service travel solution, we are constantly looking at fulfilling our customer’s travel needs. To us, the fragmentation in business travel, with separate travel management companies and online bookings tools, didn't make sense. That's why we've unified them in one solution. To this, we're adding expense management. Travel expenses are a huge pain for many, wasting hours of travelers’ time tracking receipts manually and filling in forms in Excel for their accounting department. Technologically, this will bring another challenge for us, as we're trying to encode applicable local legislation (which can change annually) in MongoDB. For example, returning from a business trip to Copenhagen, Denmark, and continuing onwards to Bucharest the same day, requires precise understanding of the applicable allowances. Additionally, we're continuously investing in artificial intelligence to decrease the turnaround time for travel requests. Our travel experts are there to help reroute you if you miss your New York - London flight, but we're working towards a state where all flight and hotel requests are completely automated. Learn more about MongoDB Atlas

February 23, 2018

Predictions for AWS re:Invent 2017 (tl;dr: AI & IoT)

This post is the second installment of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. See all posts here. In just under two months, more than 46,000 technologists will descend on Las Vegas for this year’s AWS re:Invent . Ranging from seasoned members of the AWS community to the cloud-curious, re:Invent attendees should expect the conference’s sixth iteration to deliver the same parade of ecosystem partners, an extensive agenda focused on moving to (and being successful in) AWS cloud, and the inevitable announcement of a fresh batch of new AWS services. In attempting to predict what this year’s re:Invent keynote will unveil, we’ll look at how the industry has changed since last November, as well as Amazon’s track record for debuting new products at past re:Invents. Since last year’s conference, the two most significant shifts in the space are underpinned by the two largest trends of the moment: AI and IoT. It is safe to assume that we will see an augmentation of AWS’s artificial intelligence and machine learning offerings next month. Last year’s conference brought us Lex , Polly , and Rekognition as Amazon made its entrée into advanced text, voice, and image processing. Widespread adoption of this flavor of artificial intelligence is still modest, so these releases may have been overshadowed by seemingly more relevant tools like Athena, which allows users to run SQL-based queries on data stored in S3. Nonetheless, the development of its AI portfolio is of strategic importance for AWS. Despite being the most popular public cloud, Amazon has faced increasing pressure from Azure and Google Cloud Platform. The latter has been able to differentiate itself among the early-adopter community primarily for its more mature AI offerings. To remain dominant over Google in the space, Amazon must prove able to keep up with the same pace of innovation in this sector. The areas that appear most ripe for innovation from AWS this year are in voice, image, and video analysis. Already, we have seen success among e-commerce players when using text and image-based search to shorten their conversion cycles. In fact, Gartner reports that voice-based search is the fastest growing mobile search type . The opportunity to exploit users’ devices for image and voice-based search is evident in Amazon’s offerings (Alexa, Amazon iOS/Android app). Furthermore, the explosion of intelligent chat-based interfaces (Messenger, Drift, etc.) has increased the demand for a broader set of capabilities in natural language processing services like Lex. As a result, we should be prepared to see further enhancements to Lex, Polly, and Rekognition. Video remains the one area of machine learning-based processing AWS has yet to touch. As their image analysis engines improve, the next logical step would be for the low-latency processing of video inputs. With the untold volume of video content being generated every day by ever-improving cameras, it stands to reason that organizations will want to turn that into insight and profit. These first two predictions hint at another group of potential releases we could see from AWS next month. The development of extensible models for the analysis of text, voice, image, and video is predicated on the accessibility of high quality, low-cost microphones and cameras. While smartphones have supported these inputs for more than a decade now, the availability of WiFi and reliable cellular networks has increased the speed and frequency by which their outputs can be shared or uploaded for further analysis. So, that brings us to our next theme: the Internet of Things. Many analysts and skeptics have suggested IoT adoption is weak and its promises are over-hyped . Their skepticism is primarily centered on two ongoing challenges with IoT: 1) the lack of one or two emergent platforms on which IoT technologies can standardize and 2) the relatively limited ability for data from decentralized sensors to be analyzed at “the edge” rather than in a central cloud. As with operating systems, media encodings, or network protocols, mass adoption of the technologies they support is typically preceded by one to three main players emerging as the default options. AWS entered the competition to build the winning IoT platform at re:Invent 2015 with its announcement of AWS IoT . All other major technology companies have made similar bids for dominance of this market. In addition, there are hundreds of venture-funded startups aiming to serve as a universal platform untethered from an existing “marketecture.” Nevertheless, the fact remains that no winner in this race has yet been crowned. This remains a large opportunity and Amazon is well-poised with its existing portfolio of software and ecosystem of networking and hardware partners. AWS appeared to renew its commitment to capturing the IoT market at last year’s re:Invent with the debut of AWS Greengrass and Lambda@Edge . Greengrass allows for the running of Lambda functions on local, offline devices rather than in Amazon’s cloud. Lambda@Edge is one of AWS’s first forays into “edge computing,” allowing users to run low-latency and device-specific Node.js functions in their “edge locations”. Both releases mark a shift from centralized cloud computing to distributed edge computing—perhaps less comfortable for AWS, but necessary for sometimes-offline or time-sensitive IoT projects. However, Greengrass was just the first step to enabling AWS users to better serve disparate, intermittently-connected devices. Notably, Greengrass still requires ML-powered data processing and analysis to take place in the cloud rather than locally (at the edge). Improvements in hardware technology may also prompt AWS to improve their on-device offerings and make services like S3 and DynamoDB available outside of their infrastructure to better store and process sensor data on the devices themselves. Similarly, we may also see devices become a more significant player in more seasoned services like Kinesis, enabling the local ingestion of data. No matter what gets announced on the keynote stage this year, you can rest assured it will lead the conversation for the months that follow.

November 7, 2017

The AWS Refresher: Lambda, Kinesis, Step Functions & more

With the next AWS Summit just around the corner ( literally and figuratively), we thought we’d recap some of our favorite AWS-related tutorials. 1. Using Kinesis for high-volume data streaming: This post was written in the run-up to AWS re:Invent last year but it’s still very useful for understanding how AWS Kinesis allows for efficient data ingestion. In this post, we walk you through initiating a sensor data stream with Kinesis and storing that data in a MongoDB Atlas cluster. 2. Building Alexa skills in the Amazon Developer Console: Part of our series on the modern app stack (MEAN & MERN), this blog shows how to create various user experiences beyond the traditional web app. Since this is an AWS-themed recap, we’ve included this post for the section where we demonstrate how to build an Alexa skill for your app through the Amazon Developer Console. 3. Using Lambda functions to build a Facebook Chatbot: The advantages of “serverless” functions becomes apparent in this post as we walk you through how to build a Facebook Messenger bot that automatically responds to user-provided city names with weather data fetched from the Yahoo API. Rather than worrying about setting up an app server, you run all of the logic required to process the user request and return the correct response is run on Lambda. 4. Creating Service API Workflows with AWS Step Functions: AWS introduced Step Functions in late-2016 as a way to trigger multiple functions or service interactions in a particular order using intuitive visual “workflows.” In this post, we test drive this functionality by using Step Functions to orchestrate Twilio and SES calls from a simple restaurant aggregator app. MongoDB Stitch (currently in beta) provides an alternative approach to orchestrating functions. In addition to providing a REST-like API to MongoDB, Stitch's multi-stage service pipelines allow each stage to act on the data before passing its results on to the next. 5. Optimizing Lambda Performance: For the advanced serverless developer, this tutorial dives into the nitty gritty of optimizing Lambda functions for performance. While Lambda functions aren’t known for being the snappiest way to complete a task, there are tweaks you can make in order to reduce latency. We hope you enjoyed this brief foray into our blog archives and we hope to see you next week at AWS Summit NYC! The MongoDB team will be there all day to answer questions, give out shirts, and talk shop with the AWS community. If you’re not registered yet, you can get your ticket to this free event here.

August 10, 2017

Bond & MongoDB: Delivering Thoughtfulness at Scale Using MongoDB Atlas & AWS

On the third floor of a pre-war building in Manhattan’s Chelsea neighborhood, you might not expect to stumble upon a fleet of hundreds of handwriting robots. However, in the offices of Bond , that’s exactly what you’ll find. Bond began in 2013 as a gifting company, adorning each of their gifts with a handwritten note. It soon became clear that the note (and not the gift) would be the kickstart to Bond’s success. Bond’s notes are generated with proprietary machine learning algorithms that mimic the way we write letters. The team examines the way different letters of the alphabet relate to each other and recreate that effect using NodeJS and their purpose-built robotic fleet. It’s one of the few companies where you’ll find calligraphers sitting alongside software engineers. Selecting MongoDB over MySQL While novelty may be part of the reason Bond’s notes catch the attention of millions of senders and recipients across the world, the company’s mission is more elegant: to equip anyone with the technology to be more thoughtful to the important people in their lives. This mission resonated with thousands of new Bond customers , who quickly pushed the limits of Bond’s existing technical infrastructure. Originally built on MySQL running in Amazon Relational Database Service (RDS), the platform through which customers create and order notes was seeing upwards of 1,000 read operations per second. This read workload came at the expense of write consistency. The business was scaling exponentially but their database wasn’t keeping pace. Before long, the engineering team was spending more cycles troubleshooting issues with the datastore rather than building out the core product offering. Bond’s CTO began evaluating other options with a particular focus on NoSQL databases for their horizontal scalability. However, the team quickly realized that most NoSQL databases weren’t ready for primetime—they either lacked the required querying capabilities or were too infrastructure-intensive for their rapidly-growing requirements. MongoDB was ultimately selected for its robust ecosystem, expressive query language, and scalability. Migrating to MongoDB Initially, Bond chose to continue to route write operations to MySQL and pass them to a hosted MongoDB instance where the data could be read at a much higher frequency. However, the team has since migrated completely to MongoDB as their database of record. Ensuring a more stable IOPS load enabled the platform to scale, and therefore allowed Bond to process more orders. In the 6 months after migrating to MongoDB, Bond fulfilled twice as many orders than in the previous 2 years on MySQL. Throughout the process, the team also transitioned from working with PHP to building predominantly in Node with Python for machine learning. Having used a managed service on AWS for MySQL, Bond's team was eager to hand over the day-to-day management of the database so they turned to Compose.io, a third party MongoDB service provider. While offloading their MongoDB management to a Compose-hosted deployment on AWS enabled the team to return focus to the consumer-facing portions of their app, it became apparent that the lack of encryption and features in the most recent releases of MongoDB were becoming a security and operational hurdle. Finding MongoDB Atlas Prompted by their need for end-to-end encryption and the upcoming support for the Decimal 128 data type in MongoDB 3.4 , Bond began migrating their data from Compose to MongoDB Atlas shortly after its debut in the summer of 2016. MongoDB Atlas exposed all of the latest functionality of the underlying database, allowing Bond’s technology to not only keep pace with their rapidly-growing business, but to also accelerate to the point where innovation is now driving their business growth. The team has since built a machine data analytics platform to understand and optimize the performance of their robotic fleet, allowing them to fulfill more orders with the same proprietary infrastructure. Using the Connector for Apache Spark , Bond is also using machine learning to extract usage data from MongoDB to anticipate the needs of their many types of customers. To see Bond in action, watch our video with Chief Product Officer, Sam Broe:

June 1, 2017

Introducing the MongoDB Connector for BI 2.0

Earlier this week, we had the pleasure of co-presenting a webinar with our partner, Tableau. Buzz Moschetti (Enterprise Architect at MongoDB) and Vaidy Krishnan (Product Marketing at Tableau) rolled out the updated MongoDB Connector for BI. In addition to explaining how the connector works, Buzz created on-the-fly visualizations of a sample data set in Tableau. When you pair Tableau’s ease of use, MongoDB’s flexibility, and the connector’s agility, your “time to analytics” gets a whole lot shorter. Here are the highlights from the session. What is the Connector for BI? To answer that question, let's look at the ways MongoDB natively manipulates data. Our highly expressive MongoDB Query Language (MQL) and the many operators in our Aggregation Framework are powerful tools to process and transform data within MongoDB. We have made many improvements to MQL over the years and with each release, we introduce new operators and different ways to manipulate the contents of your collections. While MQL has slowly incorporated much of the functionality of SQL, the Aggregation Framework will always use the pipeline/stage approach rather than the more grammatical style of SQL. > db.foo.insert({_id:1, "poly": [ [0,0], [2,12], [4,0], [2,5], [0,0] ] }); > db.foo.insert({_id:2, "poly": [ [2,2], [5,8], [6,0], [3,1], [2,2] ] }); > db.foo.aggregate([ {$project: {"conv": {$map: { input: "$poly", as: "z", in: { x: {$arrayElemAt: ["$$z”,0]}, y: {$arrayElemAt: ["$$z”,1]} ,len: {$literal: 0} }}}}} ,{$addFields: {first: {$arrayElemAt: [ "$conv", 0 ]} }} ,{$project: {"qqq": {$reduce: { input: "$conv", initialValue: "$first", in: { x: "$$this.x”, y: "$$this.y" ,len: {$add: ["$$value.len", // len = oldlen + newLen {$sqrt: {$add: [ {$pow:[ {$subtract:["$$value.x","$$this.x"]}, 2]} ,{$pow:[ {$subtract:["$$value.y","$$this.y"]}, 2]} ] }} ] } }} ,{$project: {"len": "$qqq.len"}} { "_id" : 1, “len" : 35.10137973546188 } { "_id" : 2, "len" : 19.346952903339393 } An example of an MQL aggregation pipeline to calculate the perimeter of simple polygons. Note that the polygons themselves are well-modeled as an array of points – each point itself being a two item array. The native functions of MongoDB are an excellent match for the document data model and processing nested arrays within documents is uniquely suited for the pipeline methodology. However, the fact remains that MongoDB does not speak SQL. We were motivated to create the Connector for BI because of the robust ecosystem of SQL-based tools that empower everyone within an organization to get to data-driven insights faster. Enter the Connector for BI 2.0. The connector is a separate process that takes a MongoDB database and maps the document schema into a relational structure that is then held in MySQL. One of the most powerful characteristics of the connector is that it is not bulk ETL processing. The Connector for BI provides a read-on-demand bridge between your MongoDB collections and your SQL-based tools. How does the Connector for BI work? As the Connector for BI is a tool built for the enterprise, we designed it with security and access control in mind. The Connector for BI accesses data stored in your MongoDB database using the same authentication and entitlements you created to secure your data. Fundamentally, that means you cannot process data through the connector that would be otherwise inaccessible from MongoDB directly. Not only does this keep your data secure, it reduces the need for a separate set of credentials for your InfoSec team to manage. Along with the connector, MongoDB provides a utility called 'mongodrdl' which examines a source MongoDB database and quickly constructs a default set of mappings between the structures it finds in MongoDB and the tables and columns appropriate to project in a relational schema. This utility is governed by the same security and access protocols as the connector itself. ![The MongoDB Connector: A "SQL Bridge"](https://webassets.mongodb.com/_com_assets/cms/MongoDB_BI_Connector-9zd27pz8h8.png "The MongoDB BI Connector: A "SQL Bridge") Using Tableau with MongoDB At MongoDB, we’re committed to helping developers focus on building next-generation apps and not on database operations. Likewise, Tableau's mission is to help people understand the insights behind their data regardless of skill set or functional role. Part of this mission encompasses the notion that data will be coming from a wide variety of sources. This requires Tableau to work seamlessly with a broad range of data platforms. To accomplish this ever-growing task, the team at Tableau has engineered a range of data connectors in order to expose information to Tableau’s end user, regardless of where the source data sits. This is essential for Tableau to deliver on their promise of “code-free analytics.” Tableau is also heavily invested in ensuring that queries run in their platform are returned at optimal speeds, regardless of platform. As Vaidy put it, “Speed to insight is a function not only of query performance but of the entire process of analytics being more agile.” That’s why MongoDB and Tableau are excited not only to optimize the speed at which data stored in MongoDB can be processed, but also to make the entire user experience more intuitive and seamless. The ability to capture data without ETL or to painstakingly reformat documents into a relational schema results in a significant reduction of cost and complexity. How are teams using MongoDB and Tableau today? Big Data today is not just limited to exploratory data science use cases. It's even being used for operational reporting on day-to-day workloads – the kind traditionally handled by data warehouses. Modern organizations are responding to these hybrid needs by pursuing use case-specific architecture design. This design strategy involves tiering data based on a host of factors including volume, frequency of access, speed of data, and level of aggregation. Broadly, these tiers are: “Cold” - Data in its rawest form, useful for exploration on large volumes “Warm” - Aggregated data for ad hoc diagnostic analyses “Hot” - Fast data for repeatable use cases (KPI dashboards etc.) In most cases, organizations will use different stores for each tier. With that said,If a deployment is well-tuned and well-indexed, MongoDB can serve as a datastore for “cold” data (ex: data late), “warm” data (ex: a semi-structured data warehouse), or “hot” data (ex: computational data stored in-memory). ![MongoDB serves as a datastore](https://webassets.mongodb.com/_com_assets/cms/MongoDB_BI_Connector_datastore-xa69g5qkm2.png "MongoDB serves as a datastore for "cold" data") This means that there is a large spectrum of use cases for how MongoDB and Tableau can be deployed in parallel. See the connector in action To demonstrate how the connector works, we will be using a MongoDB dataset with information about 25,000 different New York City restaurants. Here’s what the documents look like: > db.restaurants.findOne(); { "_id" : ObjectId("5877d52bbf3a4cfc41ef8a03"), "address" : { "building" : "1007", "coord" : [-73.856077, 40.848447], "street" : "Morris Park Ave", "zipcode" : "10462"}, "borough" : "Bronx", "cuisine" : "Bakery", "grades" : [ {"date" : ISODate("2014-03-03T00:00:00Z"), "grade" : "A", "score" : 2, "inspectorID" : "Z149"}, {"date" : ISODate("2013-09-11T00:00:00Z"), "grade" : "A", "score" : 6, "inspectorID" : "Z126"}, {"date" : ISODate("2013-01-24T00:00:00Z"), "grade" : "A", "score" : 10, "inspectorID" : "Z39"}, {"date" : ISODate("2011-11-23T00:00:00Z"), "grade" : "A", "score" : 9, "inspectorID" : "Z204"}, {"date" : ISODate("2011-03-10T00:00:00Z"), "grade" : "B", "score" : 14, "inspectorID" : "Z189"}], "name" : "Morris Park Bake Shop", "restaurant_id" : "30075445", "avgprc" : NumberDecimal("12.2500000000000") } As you can see, this collection contains data points you’d expect (address, cuisine, etc.), but it also contains time-based sanitation grade ratings as a nested array. In a relational schema, you might expect to see this data stored in a different table whereas in MongoDB, it can be retained within the restaurant object. To transform this database into a form that a SQL-based tool can parse, we use the mongodrdl utility to create the mapping file. Inspecting the output file will reveal that the nested arrays have been transformed into relational tables. Indeed, connecting to the file from the MySQL shell reveals the new schema: Notice how the geospatial data in the source document ("address.coord") was transformed from an array to 2 doubles corresponding to longitude and latitude. In MongoDB: "coord" : [-73.856077,40.848447], Output from the connector: _id address.coord_longitude address.coord_latitude 5877d52bbf3a4cfc41ef8a03 -73.856077 40.848447 What’s more, if you manipulate data in your original MongoDB collection, the changes will map in real time to the output file. Now that our data is in a form that a SQL-based tool can understand, let’s move into Tableau. When connecting to the server through Tableau, we select “MySQL” as that is how Tableau is reading our mapped data set. You will then see that all the data has been pulled into Tableau with their correct types. For example, if we drill down on our longitude and latitude columns, Tableau knows to toggle into geo mode: This allows us to create interesting visualizations with our MongoDB data. Say we want to zoom into New York City and filter by Asian and Chinese cuisine... ...you’ll notice a big cluster on the southeast side of Manhattan. We've found Chinatown! Be sure to watch the full demo to see Buzz explore all of the various ways the connector can be used to pre-aggregate data, hide particular fields, and even do field-level dynamic redaction of data. Best practices for using MongoDB with Tableau When preparing a dataset stored in MongoDB for analysis in Tableau, be sure you are following MongoDB best practices. Do you have indexes on frequently-queried fields? Have you pre-joined tables as nested arrays (like the sanitation grades example above)? As we saw with the translation of geospatial arrays into longitude and latitude doubles, there is great value in letting your data types flow into Tableau. Avoid blurring rich types like datetimes and decimal by down-converting them to strings. Avoid casting. Some of these operations will be processed in the connector itself, not in MongoDB. For example, complex date arithmetic is not yet pushed down into MongoDB and can greatly impact latency. Frequently Asked Questions Should I use the Connector for BI or Tableau Data Extract? Remember that Tableau will not be able to run queries faster than MongoDB allows. If your data is under-optimized, you may want to consider using Tableau Data Extract instead. Extracts can also be a helpful tool to augment query speed, however they work better for smaller datasets (fewer than 100,000,000 records, 100 columns, etc.). Extracts can reduce load on MongoDB cluster if your cluster is being accessed by many users Is the Connector for BI available for MongoDB Community? At this time, the Connector for BI is available as part of MongoDB Enterprise Advanced. What kind of overhead do the connector and Tableau add to MongoDB response times? Unless you're running into edge cases where processing is happening in the connector rather than in the database, you will not notice additional latency. With the previous version of the BI Connector we ran into issues with joins between collections. The recent release of the Connector for BI (v2.0) introduces significant performance enhancements for these use cases over v1.0. Be sure to watch the full demo here , and download an evaluation version of the Connector for BI 2.0 for yourself! Try the MongoDB Connector for BI

January 20, 2017

Questions we got at AWS re:Invent

Last week, the MongoDB team had the pleasure of attending the world’s foremost cloud computing conference. At AWS re:Invent in Las Vegas, we heard our CTO explain how Thermo Fisher reduced mass spectrometry experiment times from days to minutes , we announced the general availability of MongoDB 3.4 , and we formally introduced MongoDB Atlas to the AWS community. Here are some of the most popular questions we were asked last week: “I’m new to cloud technology. Tell me…” Where does my application server live and where does my database live? Depending on your architecture, your app server may run on a different machine or in a different data center. For most applications, latency can be significantly reduced by spinning up your app server and database in regions closest to your user traffic. For example, MongoDB Atlas can currently run in any of 4 supported Amazon regions, with more being added soon! What’s the easiest way to migrate to the cloud? As Amazon dramatically revealed in their keynote this year, cloud migration is a very hot topic. Short of uploading all of your data to an Amazon 18-wheeler, there are many options for migration. Depending on your availability requirements, this can be as simple as creating a copy of your data in the cloud and re-routing traffic to the cloud-hosted database . For more mission critical deployments, migration can be achieved with no downtime. We also provide tailored consulting services to help you create a migration path . “I’m new to MongoDB. Can you explain…” What are secondary indexes and how do I use them? Indexes are powerful tools that let you significantly improve query performance by avoiding full collection scans. Secondary indexes can be added to highly-queried fields (or sets of fields) to extend these performance gains to additional queries, aggregations, or sorts. Adding too many indexes can unnecessarily harm read/write performance, so we recommending using MongoDB Compass to identify which indexes are being used and where additional indexes would be most beneficial. What are the benefits of MongoDB’s native aggregation functionality? MongoDB’s aggregation framework gives you the capability to run advanced analytics processing right inside the database, rather than in your application. Since MongoDB 2.2, the aggregation framework has been significantly expanded to include many of the rich aggregation features you’d expect from a relational database. Does MongoDB offer support? Yes. Creating a MongoDB Atlas account means you have access to assistance with the UI and support for any connectivity issues. You also have our global site reliability engineering team focused on monitoring and ensuring availability of the service. For support of the underlying database software, we recommend a MongoDB Enterprise Advanced or MongoDB Professional subscription. We also offer a subscription designed specifically for MongoDB Atlas customers. These subscriptions include access to our support engineers 24 hours a day, 365 days a year. Our support team not only helps you troubleshoot specific questions, but can also guide you through upgrades and suggest optimizations. Support plans are also available for applications in development. Learn more about MongoDB support . “MongoDB Atlas looks great! Tell me more about…” How secure is MongoDB Atlas? Is my data encrypted? MongoDB Atlas is secure by default. Each MongoDB Atlas group is provisioned into its own AWS Virtual Private Cloud, isolating your data and underlying systems from other MongoDB Atlas users. You can even directly peer your MongoDB Atlas VPC with your application servers deployed to AWS using private IP addresses. We also configure network encryption and access control by default, and IP whitelists (enforced with AWS Security Groups) let you limit access to a specific range of IP addresses. MongoDB engineers automatically apply security patches as soon as they are released. Finally, you can elect to encrypt your storage volumes at no additional cost using using AWS's encrypted EBS volumes. How does MongoDB Atlas handle backups? Backups can be configured optionally and are billed by the size of the data being backed up. Backups in MongoDB Atlas are continuous and maintained seconds behind the production cluster, meaning you can conduct point-in-time restores. MongoDB Atlas backups are stored in MongoDB’s private data centers for added redundancy. “I’m thinking about using DynamoDB…” How does MongoDB Atlas compare? There are a few features in MongoDB Atlas that our users find valuable: Rich query language: The MongoDB Query language gives you the querying capabilities you’d expect from a SQL-based database with the flexibility and performance of NoSQL. Native aggregation: MongoDB’s aggregation framework gives you the capability to run advanced analytics processing right inside the database, rather than in your application. Secondary indexing support: Support for secondary indexes allows you to extend the performance gains of traditional indexing to additional fields or field sets. Robust tooling: MongoDB was released in 2009 and, in addition to the vast ecosystem of tooling that lets you support, monitor, and visualize your deployment with confidence, MongoDB Atlas is built using the expertise gained from years of building enterprise-grade products for the most complex, mission critical deployments. Choice of infrastructure vendor (coming soon!): With upcoming support for Google Cloud Platform and Microsoft Azure, you’ll be able to choose where your deployment lives based on your app requirements. “I’m currently using a 3rd party hosted MongoDB service...” How does MongoDB Atlas compare? MongoDB Atlas is built and supported by the team who builds the database. This means you get access to the latest security patches, major releases, and features when they’re released—not a year later! MongoDB Atlas is also the most cost-effective way to run MongoDB 3.2 or 3.4 with our WiredTiger storage engine. With hourly billing and independently-tuneable memory, storage size, and disk I/O, you only pay for what you need. All MongoDB Atlas clusters are hosted in isolated, secure VPCs which mean you aren’t sharing bandwidth with other databases within the same virtual machine. -- Download MongoDB Atlas Best Practice Guide

December 9, 2016

Listen to Eliot Horowitz on the Future of the Database

"The main motivation for people trying out MongoDB and adopting MongoDB really came around from developers wanting to be more productive." Six years after MongoDB was open sourced, we’re still thinking about how to empower software engineers by making app development more efficient and productive. Our CTO and co-founder, Eliot Horowitz, recently sat down with Jeff Meyerson, host of Software Engineering Daily , to talk about how the evolution of MongoDB and its ecosystem has been propelled by the goal of developer productivity. MongoDB is best known for its JSON-based documents and Eliot explains that this data model provides a "fundamentally easier data structure for developers to work with. It more naturally suits the way programming languages work and thee way people think. No one thinks about breaking things up into rows and columns but they do think of things as structures." { '_id' : 1, 'name' : { 'first' : 'John', 'last' : 'Backus' }, 'contribs' : [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ], 'awards' : [ { 'award' : 'W.W. McDowell Award', 'year' : 1967, 'by' : 'IEEE Computer Society' }, { 'award' : 'Draper Prize', 'year' : 1993, 'by' : 'National Academy of Engineering' } ] } *An example JSON document* By basing all interactions with data on the document model the creators of MongoDB made it easier for them to store and work with data, and therefore easier to get more value out of it. Reducing friction for developers doesn't just reduce developer headaches, it also has a direct impact on the bottom line. Since the 1980s hardware and infrastructure costs have fallen, the value of the individual developer has soared. Ensuring individual engineers are productive is critical to today’s businesses. This is the story that Eliot and MongoDB have been telling for years, but it's particularly interesting to hear Eliot discuss how MongoDB has evolved alongside two other major trends in software engineering: cloud computing and service-oriented architectures (and, by extension, microservices). Not coincidentally, both of these paradigms are also rooted in unburdening the individual developer. Cloud computing reduces things like lengthy infrastructure provisioning times whereas microservices decouple application logic to allow for faster iteration and feature development. As Eliot points out, it also fundamentally changes the way apps are built as developers are able to use third party services in place of coding necessary functionality from scratch. Listen in to Eliot's conversation with Jeff as, in addition to talking about the evolution of MongoDB, Eliot talks about the future of the database as well as how we use our own products internally in a hybrid cloud configuration. If you’re interested in listening to Jeff’s other conversations around the software landscape, Software Engineering Daily comprises hours of fascinating technical content and many of our own engineers are already avid listeners! I hope you'll listen in as this episode kicks off MongoDB’s first podcast partnership. We’re looking forward to engaging with you through this medium. As always, please give us suggestions for new ways to contribute to the ever-growing MongoDB community! Listen to Eliot on the Software Engineering Daily podcast Can't listen? You can view the transcript here .

December 1, 2016

5 Blogs to Read Before You Head to AWS re:Invent Next Month

This post is part of our Road to re:Invent series series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. ![Road to AWS re:Invent](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent-683wqzsi2z.jpg) Before you head to AWS re:Invent next month, we’ve pulled together our most popular blog posts about running MongoDB alongside different AWS solutions. 1. Virtualizing MongoDB on Amazon EC2 and GCE As part of a migration to a cloud hosting environment, David Mytton, Founder and CTO of Server Density, did an investigation into the best ways to deploy MongoDB into two popular platforms, Amazon EC2, and Google Compute Engine. In this two part series, we will review David’s general pros and cons of virtualization along with the challenges and methods of virtualizing MongoDB on EC2 and GCE. Read the post > 2. Maximizing MongoDB Performance on AWS You have many choices to make when running MongoDB on AWS: from instance type and security, to how you configure MongoDB processes and more. In addition, you now have options for tooling and management. In this post we’ll take a look at several recommendations that can help you get the best performance out of AWS. Read the post > 3. Develop & Deploy a Node.js App to AWS Elastic Beanstalk & MongoDB Atlas AWS Elastic Beanstalk is a service offered by Amazon to make it simple for developers to deploy and manage their cloud-based applications. In this post, Andrew Morgan will walk you through how to build and deploy a Node.js app to AWS Elastic Beanstalk using MongoDB Atlas. Read the tutorial > 4. Oxford Nanopore Technologies Powers Real-Time Genetic Analysis Using Docker, MongoDB, and AWS In this post, we take a look at how containerization, the public cloud, and MongoDB is helping a UK-based biotechnology firm track the spread of Ebola. Get the full story > 5. Selecting AWS Storage for MongoDB Deployments: Ephemeral vs. EBS Last but not least, take a look at what we were writing about this time last year as Bryan Reinero explores how to select the right AWS solution for your deployment. Keep reading > Want more? We’ll be blogging about MongoDB and the cloud leading up to re:Invent again this year in our Road to re:Invent series. You can see the posts we’ve already published here . Going to re:Invent? The MongoDB team will be in Las Vegas at re:Invent 11/29 to 12/2. If you’re attending re:Invent, be sure to visit us at booth 2620! MongoDB Atlas, the cloud database service for MongoDB, is the easiest way to deploy and run MongoDB, allowing you to get started in minutes. Click here to learn more . Get the guide for MongoDB on AWS

October 24, 2016

Crossing the Chasm: Looking Back on a Seminal Year of Cloud Technology

This post is part of our Road to re:Invent series . In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. ![Road to AWS re:Invent](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent-683wqzsi2z.jpg) On the main stage of Amazon’s AWS re:Invent conference in Las Vegas last year, Capital One’s CIO, Rob Alexander made his way into headlines of tech publications when he explained that, under his leadership, the bank would be reducing the number of data centers from 8 in 2015 to just 3 in 2018. Capital One began using cloud-hosted infrastructure organically, with developers turning to the public cloud for a quick and easy way to provision development environments. The increase in productivity prompted IT leadership to adopt a cloud-first strategy not just for development and test environments, but for some of the bank’s most vital production workloads. What generated headlines just a short year ago, Capital One’s story has now become just one of many examples of large enterprises shifting mission critical deployments to the cloud. In a recent report released by McKinsey & Company, the authors declared “the cloud debate is over—businesses are now moving a material portion of IT workloads to cloud environments.” The report goes on to validate what many industry-watchers (including MongoDB, in our own Cloud Brief this May) have noted: cloud adoption in the enterprise is gaining momentum and is driven primarily by benefits in time to market. According to McKinsey’s survey almost half (48 percent) of large enterprises have migrated an on-premises workload to the public cloud . Based on the conventional model of innovation adoption, this marks the divide between the “early majority” of cloud adopters and “late majority.” This not only means that the cloud computing “chasm” has been crossed, but that we have entered the period where the near term adoption of cloud-centric strategies will play a strong role in an organization’s ability to execute, and as a result, its longevity in the market. ![](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent_Adoption_Lifecycle-awjdat7emu.png) Image source: [Technology Adoption Lifecycle](https://upload.wikimedia.org/wikipedia/commons/d/d3/Technology-Adoption-Lifecycle.png) An additional indication that the “chasm” has been bridged comes as more heavily-regulated industries put down oft-cited security concerns and pair public cloud usage with other broad-scale digitization initiatives. As Amazon, Google, and Microsoft (the three “hyperscale” public cloud vendors as McKinsey defines them) continue to invest significantly in securing their services, the most memorable soundbite from Alexander’s keynote continues to ring true: that Capital One can “operate more securely in the public cloud than we can in our own data centers." As the concern over security in the public cloud continues to wane, other barriers to cloud adoption are becoming more apparent. Respondents to McKinsey’s survey and our own Cloud Adoption Survey earlier this year reported concerns of vendor lock-in and of limited access to talent with the skills needed for cloud deployment. With just 4 vendors holding over half of the public cloud market , CIOs are careful to select technologies that have cross-platform compatibility as Amazon, Microsoft, IBM, and Google continue to release application and data services exclusive to their own clouds. This reluctance to outsource certain tasks to the hyperscale vendors is mitigated by a limited talent pool. Developers, DBAs, and architects with experience building and managing internationally-distributed, highly-available, cloud-based deployments are in high demand. In addition, it is becoming more complex for international business to comply with the changing landscape of local data protection laws as legislators try to keep pace with cloud technology. As a result, McKinsey predicts enterprises will increasingly turn to managed cloud offerings to offset these costs. It is unclear whether the keynote at Amazon’s re:Invent conference next month will once again predicate the changing enterprise technology landscape for the coming year. However, we can be certain that the world’s leading companies will be well-represented as the public cloud continues to entrench itself even deeper into enterprise technology. MongoDB Atlas, the cloud database service for MongoDB, is the easiest way to deploy and run MongoDB, allowing you to get started in minutes. Click here to learn more . The MongoDB team will be at AWS re:Invent this November in Las Vegas and our CTO Eliot Horowitz will be speaking Thursday (12/1) afternoon. If you’re attending re:Invent, be sure to attend the session & visit us at booth #2620! Learn more about AWS re:Invent

October 18, 2016