GIANT Stories at MongoDB

MongoDB Atlas: Connector for Apache Spark now Officially Certified for Azure Databricks

We are happy to announce that the MongoDB Connector for Apache Spark is now officially certified for Microsoft Azure Databricks. MongoDB Atlas users can integrate Spark and MongoDB in the cloud for advanced analytics and machine learning workloads by using the MongoDB Connector for Apache Spark which is fully supported and maintained by MongoDB.

The MongoDB Connector for Apache Spark exposes all of Spark’s libraries, including Scala, Java, Python, and R. MongoDB data is materialized as DataFrames and Datasets for analysis with machine learning, graph, streaming, and SQL APIs. The MongoDB Connector for Apache Spark can take advantage of MongoDB’s aggregation pipeline and rich secondary indexes to extract, filter, and process only the range of data it needs – for example, analyzing all customers located in a specific geography. This is very different from simple NoSQL data stores that do not offer secondary indexes or in-database aggregations and require the extraction of all data based on a simple primary key, even if only a subset of that data is needed for the Spark process. This results in more processing overhead, more hardware, and longer time-to-insight for data scientists and engineers.

Additionally, MongoDB’s workload isolation makes it easy for users to efficiently process data drawn from multiple sources into a single database with zero impact on other business-critical database operations. Running Spark on MongoDB reduces operational overhead as well by greatly simplifying your architecture and increasing the speed at which analytics can be executed.

MongoDB Atlas, our on-demand, fully-managed cloud database service for MongoDB, makes it even easier to run sophisticated analytics processing by eliminating the operational overhead of managing database clusters directly. By combining Azure Databricks and MongoDB, Atlas users can make benefit of a fully managed analytics platform, freeing engineering resources to focus on their core business domain and deliver actionable insights quickly.

What's Next?

MongoDB Connector for Apache Spark now Officially Certified by Cloudera

Bryan Reinero

We are delighted to announce that the MongoDB Connector for Apache Spark is officially certified by Cloudera. MongoDB users may already integrate Spark and MongoDB using the MongoDB Connector for Apache Spark, a fully supported package maintained by MongoDB. This connector allows you to perform advanced analytics and machine learning against the data sets that reside in MongoDB. Users of Cloudera may use this same connector to run Spark jobs from their managed clusters against both MongoDB Atlas and self-managed MongoDB instances.

Apache Spark and MongoDB are a potent analytics combination. MongoDB’s flexible schema, secondary indexing, aggregation pipelines, and workload isolation make it easy for users to efficiently process data drawn from multiple sources into a single database with zero impact to other business-critical database operations. Running Spark on MongoDB reduces operational overhead as well. Running Spark jobs on MongoDB eliminates the need to ETL duplicate data to a separate cluster of HDFS servers, greatly simplifying your architecture and increasing the speed at which analytics can be executed.

MongoDB Atlas, our on-demand, fully-managed cloud database service for MongoDB, makes it even easier to run sophisticated analytics processing by eliminating the operational overhead of managing database clusters directly. By combining Cloudera and MongoDB, Atlas users can make benefit of a fully managed analytics platform, freeing engineering resources to focus on their core business domain and deliver actionable insights quickly.

What’s next

Welcome 2016 MongoDB Masters

Bryan Reinero

Company

We are extremely excited to announce the 2016 MongoDB Masters Class. The MongoDB Masters are leaders within their community, experts in MongoDB, and love to share their knowledge with others. This year’s class includes returning Masters, as well as new members who have distinguished themselves in the past year.

The MongoDB Masters Program began in 2011 and became one of the most important parts of our community. These Masters are some of the first users of MongoDB and they have done it all – from maintaining their own open source MongoDB projects, to organizing MongoDB User Groups, to writing books about databases.

Masters have provided valuable product feedback and driven thought leadership in our field. We look forward to deepening this relationship over the coming year. This year’s class of Masters will be encouraged to participate in beta testing programs, share their experiences with MongoDB, and establish their own voice as leaders in the database community.

Growing the Masters is a priority for the MongoDB Team. In addition to having our existing Masters recommend peers as new master candidates, we’re also finding and mentoring Masters via the MongoDB Advocacy Hub, an online platform designed to further involve the community engagement. Advocates who display a high level of leadership and experience within the Advocacy hub can then be invited to join the Masters program.

We are extremely proud of the MongoDB Masters Program and look forward to working with the 2016 class with verve. Preparations are underway for MongoDB Masters Summit, which will be held on June 27th as part of MongoDB World 2016, where Dr. Eric Brewer, Dr. Hannah Fry, and Mythbuster Adam Savage will be keynoting. We encourage all members of our community to register for MongoDB World 2016, meet the Masters in person, and join our Advocacy Hub to start their own path to becoming a MongoDB Master.

Read more about the 2016 Masters and their bios.


Officially join the MongoDB community. Become a part of the Advocacy Hub.
Join the MongoDB Advocacy Hub

About the Author - Bryan Reinero

Bryan is US Developer Advocate at MongoDB fostering understanding and engagement in the community. Previously Bryan was a Senior Consulting Engineer at MongoDB, helping users optimize MongoDB for scale and performance and a contributor to the Java Driver for MongoDB.

Earlier, Bryan was Software Engineering Manager at Valueclick, building and managing large scale marketing applications for advertising, retargeting, real-time bidding and campaign optimization. Earlier still, Bryan specialized in software for embedded systems at Ricoh Corporation and developed data analysis and signal processing software at the Experimental Physics Branch of Ames Research Center.

MongoDB Radio: Our New Podcast Project

Welcome to the inaugural post of MongoDB Radio, our new podcast project. We’re very excited to bring you great content about MongoDB, the people who build it, and the people who use it. Throughout this series we will feature interviews with MongoDB engineers, experts in the field of distributed computing and databases, stories from our community and trends in technology, and much more. The world of distributed systems and next generation applications is a fascinating place, and we can’t wait to share it with you.

In episode one we spent time with Luke Lovett, a software engineer on the driver’s integration team at MongoDB. Among many things, Luke is responsible for maintaining one of our most popular projects – the Hadoop connector for MongoDB. The connector allows you to plug MongoDB into the Hadoop ecosystem of tools and perform sophisticated processing against the data within MongoDB.

We spoke with Luke during our developer conference in San Jose, where he was delivering a talk on some of the new features available on the connector. We discussed the connector in depth, what it’s like to work on an open source project with the community, and how he got started at MongoDB.


Join us for two days of GIANT thinking.
Learn more about MongoDB World

About the Author - Bryan Reinero

Bryan is US Developer Advocate at MongoDB fostering understanding and engagement in the community. Previously Bryan was a Senior Consulting Engineer at MongoDB, helping users optimize MongoDB for scale and performance and a contributor to the Java Driver for MongoDB.

Earlier, Bryan was Software Engineering Manager at Valueclick, building and managing large scale marketing applications for advertising, retargeting, real-time bidding and campaign optimization. Earlier still, Bryan specialized in software for embedded systems at Ricoh Corporation and developed data analysis and signal processing software at the Experimental Physics Branch of Ames Research Center.

Nominations for the 2016 MongoDB Masters Program are Open

We’re excited to announce that nominations for membership in next year's MongoDB Masters program are now open. New members will be announced in June and will be invited to participate in the MongoDB Masters Summit held before MongoDB World.

What is the MongoDB Masters program?

The Masters program is a special effort to gather a group of experts outside of MongoDB for the sake of sharing knowledge, educating other users, and serving as leaders in our community. The folks we're looking to attract have both real world experience with MongoDB and exceptional expertise within their domain.

How will Masters be chosen?

MongoDB Masters are our best community advocates and MongoDB experts. This year we will be choosing MongoDB Masters based on their work in the Advocacy Hub.

What is the MongoDB Advocacy Hub?

Advocacy Hub is a portal designed to engage our most enthusiastic users in a personal fashion. The Hub is a place where they can learn more about MongoDB but also increase technical skills and understanding. Members of the Hub, who we call Giants, complete challenges and earn points for participating in the community. We’ve already seen a great amount of interest in the short time since its launch. This year, we are using the hub to identify and reward our best advocates, and put new candidates on the path towards becoming a Master.

Anyone in the MongoDB Community can join the Advocacy Hub through our open sign up. Those who excel on the Hub will become contenders for the MongoDB Masters Program.

Nominations

You can nominate a MongoDB Master by joining the Advocacy Hub and completing the “Nominate a Master” challenge. You can nominate yourself if you’d like to be a Master. It will only take you 3 minutes. Once you’re done, have a peek around and see the other technical challenges that you can complete to learn more about MongoDB.

We look forward to seeing your nominations!

Community participation provides us with the feedback required to make our product better, for users to share experiences, and for the expansion of knowledge and expertise around MongoDB. We’re looking forward to seeing your nominations and announcing the next Masters!


Become an active member of our community.
Join the Advocacy Hub

About the Author - Bryan Reinero

Bryan is US Developer Advocate at MongoDB fostering understanding and engagement in the community. Previously Bryan was a Senior Consulting Engineer at MongoDB, helping users optimize MongoDB for scale and performance and a contributor to the Java Driver for MongoDB.

Earlier, Bryan was Software Engineering Manager at Valueclick, building and managing large scale marketing applications for advertising, retargeting, real-time bidding and campaign optimization. Earlier still, Bryan specialized in software for embedded systems at Ricoh Corporation and developed data analysis and signal processing software at the Experimental Physics Branch of Ames Research Center.

The Traveling Santa

Bryan Reinero

Company

Challenge

It’s the holiday season and at MongoDB, we got to thinking about how efficiently Santa could deliver presents around the world. Visiting every Christmas-celebrating house in one night seems like a challenge, but none too big for an engineer.

We presented the below challenge in our MongoDB Advocacy Hub on the classic Traveling Salesperson problem. Given a set of geo coordinates, one for each of the 10 most populous urban areas on the planet, how would you find the shortest path the salesperson (or in this case, Santa) could use to reach each city? (To participate in our next our next challenge register for the Advocate Hub!)

We provided the following JSON dataset containing the 10 most populous cities, including the North Pole:

{"_id":"Beijing","population":1.952e+07,"location":{"type":"Point","coordinates":[116.383333,39.916667]}}
{"_id":"Delhi","population":2.4953e+07,"location":{"type":"Point","coordinates":[77.23,28.61]}}
{"_id":"Guangzhou","population":2.0597e+07,"location":{"type":"Point","coordinates":[113.266667,23.133333]}}
{"_id":"Mexico City","population":2.0843e+07,"location":{"type":"Point","coordinates":[-99.133333,19.433333]}}
{"_id":"Mumbai","population":2.0741e+07,"location":{"type":"Point","coordinates":[72.825833,18.975]}}
{"_id":"New York","population":1.8591e+07,"location":{"type":"Point","coordinates":[-74.0059,40.7127]}}
{"_id":"North Pole","population":1.0,"location":{"type":"Point","coordinates":[0.0,90.0]}}
{"_id":"Osaka","population":2.0123e+07,"location":{"type":"Point","coordinates":[135.502222, 34.693889]}}
{"_id":"Shanghai","population":2.2991e+07,"location":{"type":"Point","coordinates":[121.5,31.2]}}
{"_id":"São Paulo","population":2.0831e+07,"location":{"type":"Point","coordinates":[-46.633333,-23.55]}}
{"_id":"Tokyo","population":3.7833e+07,"location":{"type":"Point","coordinates":[139.683333,35.683333]}}

Solution

Congratulations to Dror Asaf for solving the traveling Santa challenge. Dror determined the correct path for Santa to take to reach the 10 most populous cities in the least distance required.

The order of cities visited is:

  1. North Pole
  2. New York
  3. Mexico City
  4. São Paulo
  5. Mumbai
  6. Delhi
  7. Guangzhou
  8. Shanghai
  9. Beijing
  10. Osaka
  11. Tokyo
  12. North Pole

Dror’s methodology was pretty straightforward. Starting at the North Pole, Dror used MongoDB’s geospatial indexing and queries to select the next city to visit at each stage of the journey. Dror’s strategy finds the closest city to Santa’s current position and sets that city to visit next in Santa’s world tour.

You may notice that while this works for the set of cities in the challenge, a next-nearest-destination isn’t a general solution to the traveling Santa problem. I’ve described a general solution below. But before we get there I need to admit a mistake.

In the original data set I mistakenly assigned a negative latitude for the city of Osaka, Japan. Here’s the erroneous document:

{"_id":"Osaka","population":2.0123e+07,"location":{"type":"Point","coordinates":[135.502222,-34.693889]}}

Which would put Osaka at a location adjacent to Port Lincoln, Australia. Last time I checked, Osaka doesn’t go there. The correct latitude is 34.693889. Now, let’s take a closer look at the Traveling Santa problem to find a general solution.

How far from here to there?

The starting dataset I provided only included the the geo-coordinates of the cities. You, the participant, need to calculate the distances between the cities before you can actually solve the problem at hand. I included this additional twist in the problem to focus attention on a common snag when calculating distance between two points on a globe.

For example, let’s say I wish to fly from Honolulu to Tokyo. Tokyo is west of Honolulu, so it would make sense to fly west to reach it. However, not all maps get this right. Check out what happens when I ask this web-based mapping service for a path from Honolulu to Tokyo.

The wrong way to go from Honolulu to Tokyo

This mapping service pointed me in the wrong direction, going the long way around the planet. It did this because it models the Earth as a 2 dimensional surface with edges, just like a flat piece of paper. Its surprisingly common for databases that claim to support geospatial indexes to only expose 2D planes. In this model, the Earth is simply a flat plain over which we’ve imposed a coordinate system centered at latitude 0, longitude 0. Here’s a visualization of this 2D way of thinking, where the origin of the coordinate system is marked with the red dot and the edges of the Earth are marked with red lines. The edges of this model is the anti-meridian.

The Anti-Meridian

The Earth is of course not a flat plane but a 3 dimensional surface. So, we’ll need a way to calculate the distance between two points on a globe. This is called a haversine formula, and with this function we can derive the shortest distance between Honolulu and Tokyo.

The right way to Tokyo

At this point I’ve chosen to calculate the distance between each city and write it to MongoDB for use later. I have chosen to represent the distance between two cities as an “edge” document with the following format

{
        "_id" : ObjectId("5676f18a66414e88e89a09c3"),
        "from" : "Delhi",
        "dest" : "Beijing",
        "dist" : 3776477.26147437
}

At this point I am ready to find the shortest path between the cities. I am going to use a recursive function to traverse each path and find the shortest among them. First I need an object to represent the path I am currently building / traversing. In JavaScript this object takes the following form:

function Path(){
    this.distance = 0;
    this.visited =  [];
    this.clone = function() {
        var newPath = new Path();
        newPath.distance = this.distance;
        for ( i in this.visited )
            newPath.visited.push ( this.visited[i] );
        return newPath;
    };
}

To find the shortest path, starting at the North Pole, I initialize the Path object by plugging in the North Pole as the first city I have visited and then pass this initialized object to the recursive function ‘visit’.

function visit ( path ) {
    // get last city visited
    var city = path.visited[ path.visited.length - 1 ];

    var paths = [];

    db.edges.find( { from: city } ).forEach (
        function( edge ) {
            // avoid cities I've already visited
            if ( path.visited.indexOf( edge.dest ) == -1 ) {

                // branch out a new path
                var newPath = path.clone();
                //print( "Adding the next edge "+edge.dest );
                newPath.visited.push( edge.dest );
                newPath.distance += edge.dist;

                //recurse
                paths.push( visit ( newPath ) );
            }
        }
    );

    if( paths.length == 0 ) {

        // if no new paths the tour is complete. Now go home!
        var lastEdge = db.edges.findOne(
            { from: path.visited[ path.visited.length -1 ],
              dest: path.visited[0]
            }
        );

        path.visited.push( path.visited[0] );
        path.distance += lastEdge.dist;

        print( "Path complete "+path.visited );
        paths.push( path );
    }

    paths.sort( compare );
    return paths[0];
}

The complete code solution is available here, and once complete I can assert that Santa’s best route to reach all 10 cities in the shortest time.

"distance" : 45344411.211476855,
        "path" : [
                "North Pole",
                "Tokyo",
                "Osaka",
                "Beijing",
                "Shanghai",
                "Guangzhou",
                "Delhi",
                "Mumbai",
                "São Paulo",
                "Mexico City",
                "New York",
                "North Pole"
        ],
        "route" : {
                "type" : "LineString",
                "coordinates" : [
                        [
                                0,
                                90
                        ],
                        [
                                139.683333,
                                35.683333
                        ],
                         is: 

[
                                135.502222,
                                34.693889
                        ],
                        [
                                116.383333,
                                39.916667
                        ],
                        [
                                121.5,
                                31.2
                        ],
                        [
                                113.266667,
                                23.133333
                        ],
                        [
                                77.23,
                                28.61
                        ],
                        [
                                72.825833,
                                18.975
                        ],
                        [
                                -46.633333,
                                -23.55
                        ],
                        [
                                -99.133333,
                                19.433333
                        ],
                        [
                                -74.0059,
                                40.7127
                        ],
                        [
                                0,
                                90
                        ]
                ]
        }
}

Very astute readers will see that that this is the same path as Dror’s solution, just traversed in reverse order. The “route” field in the document is a geojson formatted LineString I’ve added to the result document which was used to render the paths you see in the maps above.

Now, you may notice this is an exhaustive and brute force approach to solving a problem which is known to be more difficult to solve the more points you need to visit. This is why I limited the number of cities Santa would be visiting to 10. However, I can certainly think or a couple of optimizations which could be added to make my code even faster. I’d encourage you to take a look at the solution and make some suggestions in the comments below. Better yet, register in our Advocacy Hub, take part in the challenge with a gist of your own solution!


Want to learn more about geospatial capabilities in MongoDB? Check out my presentation on geo-spatial queries or read our recent blog post on the geospatial performance improvements now live in MongoDB 3.2.

Geospatial performance improvements in 3.2


About the Author - Bryan Reinero

Bryan Reinero is US Developer Advocate at MongoDB fostering understanding and engagement in the community. Previously Bryan was a Senior Consulting Engineer at MongoDB, helping users optimize MongoDB for scale and performance and a contributor to the Java Driver for MongoDB.

Earlier, Bryan was Software Engineering Manager at Valueclick, building and managing large scale marketing applications for advertising, retargeting, real-time bidding and campaign optimization. Earlier still, Bryan specialized in software for embedded systems at Ricoh Corporation and developed data analysis and signal processing software at the Experimental Physics Branch of Ames Research Center.

The Ops Guide to a Peaceful Thanksgiving

Bryan Reinero and Dana Groce

Technical

Imagine a Thanksgiving holiday spent with your closest family and friends - enjoying turkey with gravy, mashed potatoes, green bean casserole, and old fashioned eggnog. For those of us on the operations team, a peaceful few days at home just isn’t a possibility. Ops is always on call, fixing the database at all hours, and over all holidays.

Selecting AWS Storage for MongoDB Deployments: Ephemeral vs. EBS

Bryan Reinero

Technical

The excitement around AWS re:Invent put us in a blogging mood. So, what better to talk about than how to select high performance storage for running MongoDB on EC2?

Your Ultimate Guide to Rolling Upgrades

No matter what database you use, there’s a variety of maintenance tasks that are periodically performed to keep your system healthy. And no matter what database you use, maintenance work on a production system can be risky. For this reason maintenance work is typically performed during periods of scheduled downtime – the database is taken offline, and normal business operations are suspended. Usually these hours are more convenient for users, but less so for the operations teams (e.g., early morning hours on the weekend).