We hope you enjoyed part 1 of our Back to Basics series where we introduced you to NoSQL and MongoDB.
In part II we will actually get to code and build a blogging application. We will start in the Mongo Shell to show you how to interact directly with MongoDB on the command line, then move to an IDE to show you how to build a complete web application with the MongoDB Python driver.
Once your application is built we will show you how to add indexes to improve the performance of your queries. Like most databases, indexes can dramatically improve query performance. MongoDB comes with a query analyser and a tool called Explain that can give detailed insight into query performance.
At the end of this webinar you will know how to:
- Run MongoDB
- Create a basic database and set of collections using the MongoDB Shell
- Create a basic database and collection using one of our language drivers
- Add an index to a collection to improve query performance
- Review the efficiency of your queries using the explain framework
Register now for this webinar and all the remaining webinars in the series.
MongoDB Connector for Apache Spark: Announcing Early Access Program & New Spark Course
**Update: August 4th 2016** Since this original post, the connector has been declared generally available, for production usage. Click through for a tutorial on using the new MongoDB Connector for Apache Spark . We live in a world of “big data”. But it isn’t only the data itself that is valuable – it’s the insight it can generate. How quickly an organization can unlock and act on that insight has become a major source of competitive advantage. Collecting data in operational systems and then relying on nightly batch ETL (Extract Transform Load) processes to update the Enterprise Data Warehouse (EDW) is no longer sufficient. Speed-to-insight is critical, and so analytics against live operational data to drive real-time action is fast becoming a necessity, enabled by a new generation of technologies like MongoDB and Apache Spark. The new native MongoDB Connector for Apache Spark provides higher performance, greater ease of use, and access to more advanced Spark functionality than any connector available today. The new MongoDB University course for Apache Spark provides a fast track introduction for developers and data scientists building new generations of operational applications incorporating sophisticated real-time analytics. The Rise of Apache Spark Apache Spark is one of the fastest-growing big data projects in the history of the Apache Software Foundation. With its memory-oriented architecture, flexible processing libraries, and ease-of-use, Spark has emerged as a leading distributed computing framework for real-time analytics. As a general-purpose framework, Spark is used for many types of data processing – it comes packaged with support for machine learning, interactive queries (SQL), statistical queries with R, graph processing, ETL, and streaming. Spark allows programmers to develop complex, multi-step data pipelines using a directed acyclic graph (DAG) pattern. It supports in-memory data sharing across DAGs, so that different jobs can work with the same data. Additionally, Spark supports a variety of popular programming languages including Scala, Java, and Python. Sign up for the new Spark course at MongoDB University. For loading and storing data, Spark integrates with a number of storage and messaging platforms including Amazon S3, Kafka, HDFS, machine logs, relational databases, NoSQL datastores, MongoDB, and more. MongoDB and Spark Today While MongoDB natively offers rich real-time analytics capabilities , there are use cases where integrating the Spark engine can extend the processing of operational data managed by MongoDB. This allows users to operationalize results generated from Spark within real-time business processes supported by MongoDB. Examples of users already using MongoDB and Spark to build modern-data driven applications include: A multinational banking group operating in 31 countries with 51 million clients has implemented a unified real-time monitoring application with Apache Spark and MongoDB . The platform enables the bank to improve customer experience by continuously monitoring client activity across its online channels to check service response times and identify potential issues. A global manufacturing company estimates warranty returns by analyzing material samples from production lines. The collected data enables them to build predictive failure models using Spark machine learning and MongoDB. A video sharing website is using Spark with MongoDB to place relevant advertisements in front of users as they browse, view, and share videos. A global airline has consolidated customer data scattered across more than 100 systems into a single view stored in MongoDB. Spark processes are run against the live operational data in MongoDB to update customer classifications and personalize offers in real time, as the customer is live on the web or speaking with the call center. Artificial intelligence personal assistant company x.ai uses MongoDB and Spark for distributed machine learning problems. There are a number of ways users integrate MongoDB with Spark. For example, the MongoDB Connector for Hadoop provides a plug-in for Spark. There are also multiple 3rd party connectors available. Today we are announcing the early access to a new native Spark connector for MongoDB. Introducing the MongoDB Connector for Apache Spark The new MongoDB Connector for Apache Spark provides higher performance, greater ease of use and, access to more advanced Spark functionality than the MongoDB Connector for Hadoop. The following table compares the capabilities of both connectors. MongoDB Connector for Spark MongoDB Connector for Hadoop with Spark Plug-In Written in Scala, Spark’s native language Yes No, Java Support for Scala, Java, Python & R APIs Yes Yes Support for the Spark interactive shell Yes Yes Support for native Spark RDDs Yes No Java RDDs. More verbose and complex to work with Support for Spark DataFrames and Datasets Yes DataFrames Only Schema must be manually inferred Automated MongoDB schema inference Yes No Support for Spark core Yes Yes Support for Spark SQL Yes Yes Support for Spark Streaming Yes Yes Support for Spark Machine Learning Yes Yes Support for Spark GraphX Yes No Data locality awareness Yes The Spark connector is aware which MongoDB partitions are storing data No Support for MongoDB secondary indexes to filter input data Yes Yes Support for MongoDB aggregation pipeline to filter input data Yes No Compatibility with MongoDB replica sets and sharded clusters Yes Yes Support for MongoDB 2.6 and higher Yes Yes Support for Spark 1.6 and above Yes Yes Supported for production usage Not Currently Available for early access evaluation Yes Written in Spark’s native language, the new connector provides a more natural development experience for Spark users as they are quickly able to apply their Scala expertise. The connector provides access to the Spark interactive shell for data exploration and rapid prototyping. The connector exposes all of Spark’s libraries, enabling MongoDB data to be materialized as DataFrames and Datasets for analysis with SQL (benefiting from automatic schema inference), streaming, machine learning, and graph APIs. The Spark connector can take advantage of MongoDB’s aggregation pipeline and rich secondary indexes to extract, filter, and process only the range of data it needs – for example, analyzing all customers located in a specific geography. This is very different from more simple NoSQL datastores that do not offer either secondary indexes or in-database aggregations. In these cases, Spark would need to extract all data based on a simple primary key, even if only a subset of that data is required for the Spark process. This means more processing overhead, more hardware, and longer time-to-insight for the analyst. To maximize performance across large, distributed data sets, the Spark connector is aware of data locality in a MongoDB cluster. RDDs are automatically co-located with the associated MongoDB shard to minimize data movement across the cluster. The nearest read preference can be used to route Spark queries to the closest physical node in a MongoDB replica set, thus reducing latency. Review the MongoDB Connector for Spark documentation to learn how to get started with the connector, and view code snippets for different APIs and libraries. Fast Track to Apache Spark: New MongoDB University Course To get the most out of any technology, you need more than documentation and code. Over 350,000 students have registered for developer and operations courses from MongoDB University. Now developers and budding data scientists can get a quick-start introduction to Apache Spark and the MongoDB connector with early access to our new online course. Getting Started with Spark and MongoDB provides an introduction to Spark and teaches students how to use the new connector to build data analytics applications. In this course, we provide an overview of the Spark Scala and Java APIs with plenty of sample code and demonstrations. Upon completing this course, students will be able to: Outline the roles of major components in the Spark framework Connect Spark to MongoDB Source data from MongoDB for processing in Spark Write data from Spark into MongoDB The course does not assume a prior knowledge of Spark, but does require an intermediate level of expertise with MongoDB. The course is free. Sign up at MongoDB University . Next Steps To wrap up, we are very excited about the possibilities Spark and MongoDB present together, and we hope with the new connector and course, you will be well on your way to building modern, data-driven applications. We would love to hear from you as you explore this new connector and put it through its paces - you can provide feedback and file bugs under the MongoDB Spark Jira project . Here’s a summary of how to get started: Read the MongoDB Connector for Spark documentation and download the connector If you have any questions, please send them to the MongoDB user mailing list Sign up for the new Spark course at MongoDB University
What the C-Suite Should Know About Data Strategy for 2023
Trying to predict the future is obviously fraught with difficulty. Anything can happen. Just look at the past few years, where it seemed like everything and anything did happen. With us now in the second month of 2023 and the rest of the year shaping up to be one of potentially big changes and disruptions, the only clear indicator of what’s to come is what we’ve seen trending in the months, weeks, and days preceding this new year. So, with that said, here are five things the C-suite is likely to see more of as 2023 progresses. And what it all means for building a resilient, enduring, and innovative data strategy. 1. Software may still be eating the world, but developers are eating all the work Almost 12 years ago, Marc Andreessen proclaimed, “software is eating the world.” And while that sentiment still holds true today, the biggest beneficiaries of software’s global appetites will continue to be developers. In an interview with The Cube at last year’s AWS re:Invent, MongoDB CEO Dev Ittycheria put it this way: “It’s almost a cliche to say now that software is eating the world. Because every company’s value proposition is driven by software. But what that really means is developers are eating all the work.” One of the best examples of developers “eating all the work” is DevOps. At the advent of DevOps, we saw software development teams incorporate the previously separate domain of IT operations into their work, while turning infrastructure into a programmable interface and creating a continuous feedback loop that improved developer agility. But DevOps was just the start. We’re now seeing developers embedding other previously separate domains into their work, such as security, data science, and data analytics (more on that below). The business implications of embedding these previously disparate domains into software development are quite huge. It means rapid innovation, faster time-to-market, better fraud detection and prevention, A/B testing — the list goes on and on. With software continuing to eat the world, developers are continuing to eat all the work while also taking massive bites out of silos. 2. Builder teams will require less and less complexity With software development teams taking on more work, we’re also going to see the need to reduce complexity. Particularly when it comes to bolt-on solutions. Search is a good example here. For a lot of teams out there, database operations and search have traditionally been two separate systems that are then glued together. Which doesn’t usually decrease complexity. In fact, the opposite happens. Such as having to manage dependencies across systems. But when teams have access to a single, unified, and fully-managed platform that integrates the database, search engine, and sync mechanism, you remove the need for glue and the complexity goes way down. As SVP of products at MongoDB Andrew Davidson said on a recent episode of The Cloudcast : “...Search as a bolt-on [and] entirely different system… has such a profoundly inconsistent experience that if you can bring it in to have near consistency in line with the database, that's a game changer…” And with development teams taking on more and more work previously associated with separate domains, like analytics (described above), they’re needing to use other systems that have also been traditionally glued together. So the question facing many organizations this year and beyond will be: Why spend time moving data between separate glued-together solutions for things like search, visualization, and analytics, when a single data platform can handle it all? 3. Apps are going to get a lot smarter If you were to go back 15 years to 2008 — which, wow, can’t believe that was 15 years ago, but anyway… — you’d notice just how radically the technology landscape has really changed. Cloud computing wasn’t quite yet a thing back then. And mobile was really just getting off the ground. Today, an equally sizable shift is happening. In an interview with SiliconAngle this past November, MongoDB CEO Dev Ittycheria said: “I believe the next big platform shift is moving from dumb apps to smart apps that incorporate machine learning, AI, and very sophisticated automation.” As mentioned previously, development teams are taking on more work associated with previously separate domains. This is also happening with data analytics, which previously lived outside the application development process. But now analytics is “ shifting left ” directly into app development. The results for businesses are: the ability for applications to process and analyze real-time data much, much faster and at a lower cost, and to both understand trends and make more informed predictions based on those trends. The results for customers are greater personalization and richer digital experiences. Building smarter applications is the future. But how quickly and effectively organizations do that is still dependent on their data platforms . Not all can bring analytics into app development in the same ways. In this respect, the future may be smarter applications; but for different businesses — to paraphrase author William Gibson — that future isn’t evenly distributed. Yet. Encryption, encryption, [$a&*9Qd] Encryption will not only continue to be critical for how organizations store their data, it will also revolutionize how data is used in the application development process. Ask a lot of software veterans about data encryption and they’re likely to tell you how important it is. They’ll also likely say that encryption, particularly in-use encryption, can have scalability issues and/or complexity problems . But in 2023 and beyond, new advancements will make those issues a thing of the past. With new technologies, like Queryable Encryption , the ability to build smarter applications that use end-to-end encrypted data can move at the speed that development teams and businesses require. The added benefit is that this increases end-user trust. As MongoDB’s chief information security officer Lena Smart said in an interview with SiliconAngle in December 2022 : “By giving people things like Queryable Encryption, for example, you’re going to free up a whole bunch of headspace. [Their customers] don’t have to worry about their data being … harvested from memory or harvested while at rest or in motion.” The name of the game in 2023 will be 8QTwZm* *encrypted for demonstration purposes. 5. Bottom line: Your data strategy is your business strategy When we get to the end of December 2023, we’ll probably look back on the intervening months between now and then and see a lot of stuff we didn’t expect. What we do know is that data is going to play an increasingly important role in how businesses operate. Why do we know this? Well, because this has been a trend in each and every year since organizations first started using data to build better software and richer digital experiences. Software might be eating the world, and developers might be eating the work, but data is eating business. So in 2023, it’s incumbent for business leaders to set the table accordingly. To get started building your data strategy with MongoDB, get in touch with our experts .