We are excited to announce that Cloud Manager now supports converting a replica set to a sharded cluster via Automation. This oft-requested feature will help you scale your deployment more easily. Here’s how:
- Head to your Deployment Page and use the “…” menu for an existing replica set to choose “Convert to Sharded Cluster”
- Enter in the requested details, like which hosts you want your mongoSes and config servers to be deployed on and on what ports.
- Click “Next” and then you can drag and drop processes as needed to tweak your deployment.
- Push “Review and Deploy” and then “Confirm and Deploy”
- Modify your application to reach out to the mongoSes rather than the replica set
Cloud Manager Automation has now converted your replica set to a sharded cluster. This does not mean that all of your collections and databases are sharded, though. Cloud Manager Automation does not issue
shardCollection commands to activate sharding for particular databases and collections, so issuing these and choosing a suitable shard key are left to you. Otherwise, you’re good to go, including adding additional shards via the wrench menu, if necessary.
MongoDB Debuts in Gartner’s Magic Quadrant for Data Warehouse & Data Management Solutions for Analytics
Why, you may ask, is MongoDB profiled in a research report dedicated to evaluating key trends and vendors in the data warehousing market? After all, MongoDB is designed to serve operational use-cases , including Internet of Things applications, customer data management, catalog and content management, mobile services and more. In fact, Gartner placed MongoDB as a Leader in its most recent Magic Quadrant for Operational Database Management Systems in recognition of its completeness of vision and ability to execute against requirements in the operational database market. While MongoDB is not a data warehouse, we believe its inclusion within Gartner’s latest DW/DMSA Magic Quadrant [available at no cost to eligible Gartner clients] reflects the growing demand from business users to accelerate speed-to-insight and turn analytics into real-time action. Whether that is to detect fraud during transaction processing, present relevant recommendations to shoppers as they browse an eCommerce store, or alert operators to the impending failure of a critical piece of manufacturing equipment, creating fast, actionable insight is accomplished by embedding real-time analytics into operational processes. Gartner calls this trend Hybrid Transactional/Analytical Processing (HTAP), and it is this specific capability, highlighted by users surveyed in Gartner’s research, that has driven MongoDB’s inclusion into the Magic Quadrant. Not only is this placement a first for MongoDB, it is also a first for Gartner. No other open source, non-relational database has ever been included in the DW/DMSA Magic Quadrant. Augmenting the Data Warehouse: Unlocking Real-Time Analytics Using traditional data warehousing platforms, the flow of data – starting with its acquisition from source systems through to transformation, consolidation, analysis, and reporting – follows a well-defined sequential process, as illustrated in Figure 1. Figure 1 : Data Flow in Traditional Analytics Processes Operational data from multiple source systems is integrated into a centralized Enterprise Data Warehouse (EDW) and local data-marts using Extract Transform Load (ETL) processes. Reports and visualizations of the data are then generated by BI tools. This workflow is predicated on a number of assumptions: Predictable Frequency. Data is extracted from source systems at regular intervals – typically measured in days, months and quarters. Static Sources. Data is sourced from controlled, internal systems supporting established and well-defined back-office processes. Fixed Models. Data structures are known and modeled in advance of analysis. This enables the development of a single schema to accommodate data from all of the source systems, but adds significant time to the upfront design. Defined Queries. Questions to be asked of the data (i.e., the analytical queries) are pre-defined. If not all of the query requirements are known upfront, or requirements change, then the schema is modified to accommodate changes. Slow-changing requirements. Rigorous change control is enforced before the introduction of new data sources or reporting requirements. Limited users. The consumers of BI reports and analytics are typically business managers and senior executives. Technology Foundations for Real-Time Analytics This workflow remains incredibly valuable, enabling businesses to run deep, historical analysis to monitor performance and inform business strategy. But it presents a significant “impedance mismatch” to the requirements presented by real time analytics: Eliminate latency. The frequency of data acquisition, processing and analysis must increase from days to seconds or less. Source data needs to be analyzed as it is generated by operational applications in order to provide the speed-to-insight demanded by the business. Moving data through an ETL pipeline to the data warehouse will not work for real time use-cases. Uncontrolled sources. Organizations need to harness data that is generated outside of their own firewalls – from location data, to web clicks, to sensors, to social media. The analytics team has no control over these data sources. Dynamic structures. Much of this data is rapidly changing with polymorphic, semi-structured or unstructured formats that do not map neatly to the fixed schema of traditional relational databases powering most data warehouses. Changing query patterns. It is impossible to predict the types of questions that will be asked of the data. Search, aggregations, geospatial analytics, and machine learning are just some of the tools now available to analysts as they explore new data sets and discover previously undetected trends. ”Big” volume. Data arrives faster, and in quantities that overwhelm traditional data management technologies. It means scaling out databases and analytics across commodity hardware, rather than the scale-up approach typical of most data warehouses. Wide consumption. Analytics now extends well beyond the management suite. Permeating through every part of the organization, analytics now need to be accessible to staff on the shopfloor, and consumed by operational applications to control real-time behavior. MongoDB augments the data warehouse by addressing the challenges above, enabling users to run analytics in real-time directly against their data: Rich data structures with complex attributes comprising text, geospatial data, media, arrays, embedded elements, and other complex types can be easily mapped to MongoDB’s JSON-based document data model. A dynamic schema means that each document (record) does not need to have the same set of fields. Users can adapt the structure of documents just by adding new fields or deleting existing ones, making it very simple to extend and evolve applications by adding new attributes for analysis and reporting. An expressive query language and secondary indexes allow fast and rich access to data, enabling complex analytics and search to be performed in place, without having to move the data to dedicated analytics infrastructure. Auto-shading allows MongoDB to partition and distribute large data sets across clusters of commodity servers in the data center or in the cloud. The latest MongoDB 3.2 release builds on these capabilities with advanced feature sets to enhance analytics: The MongoDB Connector for BI allows analysts, data scientists, and business users to seamlessly explore and visualize multi-structured data stored in MongoDB with industry-standard SQL-based BI and analytics platforms such as Tableau, Business Objects, and more. MongoDB Compass presents a simple-to-use, sophisticated GUI that allows any user to visualize and explore data with ad-hoc queries in just a few clicks – all with zero knowledge of the MongoDB query language. For data governance, document validation allows you to enforce checks on document structure, data types, data ranges, and the presence of mandatory fields. Dynamic lookup, new math operators and enhanced search allow richer analytics to be run against live, operational data Putting Real-Time Analytics to Work Some of the world’s largest and most innovative organizations are putting real-time analytics to work, creating operational efficiencies and building competitive advantage: Bosch uses MongoDB at the heart of its IoT Suite. Ingesting real-time telemetry data from millions of vehicles enables auto-manufacturers to deliver predictive maintenance schedules to their customers, and improve product design. The City of Chicago uses MongoDB to pull together millions of data points across its most crucial departments, providing real-time data analysis to city managers so they can better predict and allocate resources, respond quickly to emergencies, regulate traffic flow and uncover trends that would have otherwise been invisible. Media company BuzzFeed uses MongoDB to pinpoint when content is viewed, where it’s shared, and how it’s being consumed by its 400 million monthly website visitors. The system enables BuzzFeed’s employees to analyse, track, and display these metrics to writers and editors. The website of OTTO, Germany’s largest online retailer, generates some 10,000 events per second. Every click and hover of every mouse is stored in MongoDB , and real-time data analytics is used to provide unique and personalised web experiences to individual visitors. Hadoop and Spark: Building the Complete Data Analytics Platform Of course, its not just real-time analytics that is driving innovation in the data warehouse world – Apache Hadoop has emerged as a key part of the data management landscape. Some assumed Hadoop would replace the enterprise data warehouse, but that prediction was wrong. In fact, Hadoop is augmenting the data warehouse, in many cases, off-loading data and specific data transformation workloads from existing data warehouses to less-expensive commodity hardware in scale-out environments. Many organizations are harnessing Hadoop and MongoDB together using the MongoDB Connector for Hadoop , providing the ability to use MongoDB as an input source and an output destination for MapReduce, Spark, HIVE and Pig jobs. With this combination, users can create complete analytics and data management platforms: MongoDB powers the online, real time operational application, serving business processes and end-users Hadoop consumes data from MongoDB, blending its with data from other operational systems to fuel sophisticated analytics and machine learning. Results are loaded back to MongoDB to serve smarter operational processes. For example, Ebay handles user data and metadata management for its product catalog in MongoDB, and Hadoop for user analysis to provide personalized search & recommendations. Orbitz uses MongoDB for the management of hotel data and pricing, with Hadoop powering hotel segmentation to support building search facets. Pearson manages student identity and access control along with content management of course materials in MongoDB, and Hadoop for student analytics to create adaptive learning programs. The Rise of Spark No analytics discussion is complete without reference to Apache Spark – it has become one of the fastest growing Apache Software Foundation projects. With its memory-oriented architecture, flexible processing systems, and easy-to-use APIs, Apache Spark has emerged as a leading framework for real-time analytics, supporting streaming, machine learning, SQL processing and more. Unlike Hadoop which has to move all data into HDFS, Spark can directly work against data stored in any database, file system, or message queue. The MongoDB Connector for Hadoop provides a Spark plug-in , allowing Spark jobs to use MongoDB as both a source and a sink. A range of community-developed connectors are also available for MongoDB and Spark integration. Figure 2 : Modernized data architecture: MongoDB, Spark, and Hadoop Many organizations are already combining MongoDB and Spark to build new analytics-rich applications. A global manufacturing company has built a pilot project to estimate warranty returns by analyzing material samples from production lines. The collected data enables them to build predictive failure models using Spark Machine Learning and MongoDB. A video sharing website is using Spark with MongoDB to place relevant advertisements in front of users as they browse, view and share videos. A multinational banking group operating in 31 countries with 51 million clients implemented a unified real-time monitoring application, running Apache Spark and MongoDB . The bank wanted to ensure a high quality of service across its online channels, and needed to continuously monitor client activity to check service response times and identify potential issues. All log data is collected in Apache Flume before being persisted to MongoDB where Spark jobs then analyze that data to power real time visualizations and alerts of system health. MongoDB was selected due to high scalability, dynamic schema that can ingest and manage quickly changing log data, and a rich array of secondary indexes, allowing Spark job to efficiently filter and access only the slices of data that are needed to drive the analytics. This approach results in lower latency and higher analytical throughput. Putting it all Together If anyone ever tells you the data warehouse market was slow and boring, dominated by just a few mega-vendors, tell them they are wrong. With the adoption of modern technologies such as MongoDB, Hadoop and Spark, organizations are creating new classes of applications and analytics that offer the promise of unlocking new efficiencies, creating new business models and out-pacing competitors. And with MongoDB serving both operational and analytical use-cases, you can build those applications faster, with lower cost, complexity and risk. To learn more about real time analytics with MongoDB, Spark and Hadoop, read our white paper. Turning Analytics into Real-Time Action References: Gartner Magic Quadrant for Operational Database Management Systems , Donald Feinberg, Merv Adrian, Nick Heudecker, Adam M. Ronthal, Terilyn Palanca, and October 12, 2015. Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics , Roxane Edjlali, Mark A. Beyer, and February 25, 2016. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
How DataSwitch And MongoDB Atlas Can Help Modernize Your Legacy Workloads
Data modernization is here to stay, and DataSwitch and MongoDB are leading the way forward. Research strongly indicates that the future of the Database Management System (DBMS) market is in the cloud, and the ideal way to shift from an outdated, legacy DBMS to a modern, cloud-friendly data warehouse is through data modernization. There are a few key factors driving this shift. Increasingly, companies need to store and manage unstructured data in a cloud-enabled system, as opposed to a legacy DBMS which is only designed for structured data. Moreover, the amount of data generated by a business is increasing at a rate of 55% to 65% every year and the majority of it is unstructured. A modernized database that can improve data quality and availability provides tremendous benefits in performance, scalability, and cost optimization. It also provides a foundation for improving business value through informed decision-making. Additionally, cloud-enabled databases support greater agility so you can upgrade current applications and build new ones faster to meet customer demand. Gartner predicts that by 2022, 75% of all databases will be on the cloud – either by direct deployment or through data migration and modernization. But research shows that over 40% of migration projects fail. This is due to challenges such as: Inadequate knowledge of legacy applications and their data design Complexity of code and design from different legacy applications Lack of automation tools for transforming from legacy data processing to cloud-friendly data and processes It is essential to harness a strategic approach and choose the right partner for your data modernization journey. We’re here to help you do just that. Why MongoDB? MongoDB is built for modern application developers and for the cloud era. As a general purpose, document-based, distributed database, it facilitates high productivity and can handle huge volumes of data. The document database stores data in JSON-like documents and is built on a scale-out architecture that is optimal for any kind of developer who builds scalable applications through agile methodologies. Ultimately, MongoDB fosters business agility, scalability and innovation. Key MongoDB advantages include: Rich JSON Documents Powerful query language Multi-cloud data distribution Security of sensitive data Quick storage and retrieval of data Capacity for huge volumes of data and traffic Design supports greater developer productivity Extremely reliable for mission-critical workloads Architected for optimal performance and efficiency Key advantages of MongoDB Atlas , MongoDB’s hosted database as a service, include: Multi-cloud data distribution Secure for sensitive data Designed for developer productivity Reliable for mission critical workloads Built for optimal performance Managed for operational efficiency To be clear, JSON documents are the most productive way to work with data as they support nested objects and arrays as values. They also support schemas that are flexible and dynamic. MongoDB’s powerful query language enables sorting and filtering of any field, regardless of how nested it is in a document. Moreover, it provides support for aggregations as well as modern use cases including graph search, geo-based search and text search. Queries are in JSON and are easy to compose. MongoDB provides support for joins in queries. MongoDB supports two types of relationships with the ability to reference and embed. It has all the power of a relational database and much, much more. Companies of all sizes can use MongoDB as it successfully operates on a large and mature platform ecosystem. Developers enjoy a great user experience with the ability to provision MongoDB Atlas clusters and commence coding instantly. A global community of developers and consultants makes it easy to get the help you need, if and when you need it. In addition, MongoDB supports all major languages and provides enterprise-grade support. Why DataSwitch as a partner for MongoDB? Automated schema re-design, data migration & code conversion DataSwitch is a trusted partner for cost-effective, accelerated solutions for digital data transformation, migration and modernization through a modern database platform. Our no-code and low-code solutions along with cloud data expertise and unique, automated schema generation accelerates time to market. We provide end-to-end data, schema and process migration with automated replatforming and refactoring, thereby delivering: 50% faster time to market 60% reduction in total cost of delivery Assured quality with built-in best practices, guidelines and accuracy Data modernization: How “DataSwitch Migrate” helps you migrate from RDBMS to MongoDB DataSwitch Migrate (“DS Migrate”) is a no-code and low-code toolkit that leverages advanced automation to provide intuitive, predictive and self-serviceable schema redesign from a traditional RDBMS model to MongoDB’s Document Model with built-in best practices. Based on data volume, performance, and criticality, DS Migrate automatically recommends the appropriate ETTL (Extract, Transfer, Transform & Load) data migration process. DataSwitch delivers data engineering solutions and transformations in half the timeframe of the existing typical data modernization solutions. Consider these key areas: Schema redesign – construct a new framework for data management. DS Migrate provides automated data migration and transformation based on your redesigned schema, as well as no-touch code conversion from legacy data scripts to MongoDB Atlas APIs. Users can simply drag and drop the schema for redesign and the platform converts it to a document-based JSON structure by applying MongoDB modeling best practices. The platform then automatically migrates data to the new, re-designed JSON structure. It also converts the legacy database script for MongoDB. This automated, user-friendly data migration is faster than anything you’ve ever seen. Here’s a look at how the schema designer works. Refactoring – change the data structure to match the new schema. DS Migrate handles this through auto code generation for migrating the data. This is far beyond a mere lift and shift. DataSwitch takes care of refactoring and replatforming (moving from the legacy platform to MongoDB) automatically. It is a game-changing unique capability to perform all these tasks within a single platform. Security – mask and tokenize data while moving the data from on-premise to the cloud. As the data is moving to a potentially public cloud, you must keep it secure. DataSwitch’s tool has the capability to configure and apply security measures automatically while migrating the data. Data Quality – ensure that data is clean, complete, trustworthy, consistent. DataSwitch allows you to configure your own quality rules and automatically apply them during data migration. In summary: first, the DataSwitch tool automatically extracts the data from an existing database, like Oracle. It then exports the data and stores it locally before zipping and transferring it to the cloud. Next, DataSwitch transforms the data by altering the data structure to match the re-designed schema, and applying data security measures during the transform step. Lastly, DS Migrate loads the data and processes it into MongoDB in its entirety. Process Conversion Process conversion, where scripts and process logic are migrated from legacy DBMS to a modern DBMS, is made easier thanks to a high degree of automation. Minimal coding and manual intervention are required and the journey is accelerated. It involves: DML – Data Manipulation Language CRUD – typical application functionality (Create, Read, Update & Delete) Converting to the equivalent of MongoDB Atlas API Degree of automation DataSwitch provides during Migration Schema Migration Activities DS Automation Capabilities Application Data Usage Analysis 70% 3NF to NoSQL Schema Recommendation 60% Schema Re-Design Self Services 50% Predictive Data Mapping 60% Process Migration Activities DS Automation Capabilities CRUD based SQL conversion (Oracle, MySQL, SQLServer, Teradata, DB2) to MongoDB API 70% Data Migration Activities DS Automation Capabilities Migration Script Creation 90% Historical Data Migration 90% 2 Catch Load 90% DataSwitch Legacy Modernization as a Service (LMaas): Our consulting expertise combined with the DS Migrate tool allows us to harness the power of the cloud for data transformation of RDBMS legacy data systems to MongoDB. Our solution delivers legacy transformation in half the time frame through pay-per-usage. Key strengths include: ● Data Architecture Consulting ● Data Modernization Assessment and Migration Strategy ● Specialized Modernization Services DS Migrate Architecture Diagram Contact us to learn more.